#Data19 has come and gone, but there are still seven weeks left of 2019, so it’s time to finish strong. This week’s #MakeoverMonday data set, ‘Smartphone Ownership Among Youth Is on the Rise,’ comes to us from Common Sense. Below is a look at the viz we made over this week.
What works with the original viz
Labeling the years directly to the left of each line chart (although not needed as we will discuss later).
The line charts do make it easy to compare 2015 vs. 2019, for each age group. However…
What could be improved
Even though the viz has a label for age on the x-axis, it’s difficult for my brain to not want to think the line charts indicate change over time. Therefore, I would shy away from using a line chart in this situation, as it can cause confusion.
My go to for this type of analysis would typically be a dumbbell chart, like the image below as I feel it’s one of the best ways to show change between two periods. However, I felt the need to try something new, so I saved the dumbbells for another day.
It’s unnecessary to label every mark on the view, as it distracts the reader from focusing on the visualization.
There’s also no need for dots and grid lines at every age increment. A better approach would be to swap the x-axis (age) grid lines and for y-axis (ownership) ones instead.
Changing the title to a shade of gray and color coding the years in the title (2015 blue and 2019 yellow) would remove the need for the year labels in the view.
I wanted the focus to be on the change from 2015 to 2019, so I called that out directly in the title.
As I mentioned earlier, it’s really easy in a situation like this to just go with a dumbbell chart. However, I wanted to try a variation of Jeffrey Shaffer’s progress bars.
Since the values for 2019 are greater, I set 2019 as thin lines in the background of the thick, 2015 gray bars. I then labeled the 2019 bars as the difference in percentage points from 2015 to 2019.
For instance, in 2019 53% of 11 year old children owned a Smartphone vs. just 32% in 2015. That’s a difference of 21 percentage points.
For week 2019-30 of #MakeoverMonday, Andy published his 800th Tableau Public viz!! Congratulations Andy, what an unbelievably incredible feat. For week 2019-31, we were given the opportunity to makeover one of Andy’s first ever Tableau Public vizzes, a dashboard examining STD Infection Rates in the United States, from 1996-2014. For this challenge, I wanted to produce a dashboard with a similar layout and design to one I might create in a business setting. It’s worth noting that this post will focus on design and won’t go into what worked/didn’t work with the original dashboard, what the data looked like, etc. Heading into the makeover, my focus was to create an exploratory dashboard while achieving five goals; communicate clearly, keep it simple, effective design, effective use of color and effective use of text. Let’s take a look at how the dashboard came together.
Goal 1. Communicate Clearly
With nineteen years of data on three diseases, sliced by gender, as well as seven different age groups, covering fifty states + the District of Columbia, I saw a dataset that had the potential of quickly getting away from me if I wasn’t careful. How could a user consume all of this information, in an easy to use format, while not being overwhelmed? That was the question I needed to answer. To communicate clearly, I chose to use interactivity that allowed the user to select whatever mattered to them. I then chose simple chart types and the use of color/highlighting to help focus the user’s attention.
Goal 2. Keep It Simple
Everything placed on a dashboard should add value to the user. Sure, six KPIs felt like a lot. But, with the addition of the bar chart trend, set in the background (a trick learned from Tableau Zen Master Ryan Sleeper), I felt breaking out the disease rate trends by gender, which the dashboard otherwise did not contain, added value.As mentioned above, I kept it simple with common, easy to understand chart types; callout numbers (or BANs), bar charts, hex maps, line chart and dot plots. That’s it! When exploring the data, several other chart types were tested, but ultimately, the others did not communicate the data as clearly as the ones chosen.
Goal 3. Effective Design
As mentioned earlier, my aim was to create an exploratory dashboard which achieved five goals. Thus far, in my experience in the business world, I have mostly designed for consumption on either a laptop or desktop computer, so chose to go that route with this dashboard as well. I felt the dashboard didn’t need to be very big, so went with a 900px by 850px layout. Tableau’s recent addition of collapsible containers will be huge for filter placement on dashboards, so I’m really looking forward to the time when my current client updates their version of Tableau!! That said, when designing a dashboard with just a few filters, my preference is to create a bar (horizontal container) along the top, that separates the title from the viz and then drop the filters/parameters into the bar. This makes them easily accessible to the user, without taking up much real estate.
I then dropped the KPIs just below the filters bar to ensure they were one of the first things the user saw. The decision was also made to leverage a hex map to drive interactivity to other parts of the dashboard. Because the map was vital to the interactivity, it seemed that the only logical place for it was in the upper left-hand corner, where our eyes are drawn to first. Here’s what the dashboard looked like with those three components in place. At this point, it was beginning to feel like we were on to something.With the addition of three other sheets to show trends and comparisons, I felt we now had a dashboard that contained a ton of great information, in an easy to consume format. Remember that it is very important to give the components of any visualization a chance to breath, to allow for flow and ensure the viz is not crammed together. Therefore, I always use padding in my vizzes. Here’s a great blog post by Tableau Zen Master, Adam Crahen, on the use of padding.
Goal 4. Effective Use of Color
While the dashboard was coming together, the important pieces of data could not stand out for the user without leveraging preattentive attributes. I chose to use color to help the data stand out and after walking through many variations, landed on black dots for the dot plots, indicating the state that had been selected from the map, as well as a black line on the trend line, indicating the disease that had been selected from the parameter at the top of the dashboard. Now, when the user selected a disease from the parameter and a state from the map, they would see the following highlights, in the dot plots and line chart. The black dots allow for a nice, clean comparison of the selected state vs. all other states, for each age group, broken out by both male and female. And the highlighted line chart allows the user to quickly see how the trend of the selected disease compares to the others. And remember, to get a zoomed in trend for both male and female, we just need to look at the bar charts in the background on the KPIs.
Goal 5. Effective Use of Text
While, it has already been shown in the above screenshots, the last piece added was the dynamic titles, which help the user identify which state and disease are being analyzed, as well as which year has been selected, as this would impact the KPIs, hex map and dot plot views. Finally, making this dynamic text bold would signal to the user that these pieces of text were dynamic and would update with the interactivity of the dashboard. Here’s a link to the interactive version on my Tableau Public profile and below is a view of the final product. Thanks for reading and have a wonderful day!
This week’s #MakeoverMonday data set examines the twenty -five countries in the world with the highest consumption of pure alcohol per capita. Below is a picture of the original viz, let’s see what we can do to improve it.
What Does Not Work and Why?
In looking over the original visualization, it became clear quickly that a few small tweaks could drastically improve our audience’s ability to consume the data. So, what doesn’t work and how can it be improved?
The Title – it’s misleading and could have us believe we’re looking at actual rates (percentages) of consumption when in fact the data displayed are liters of alcohol consumed. To improve this, we grabbed the title from the y-axis and made it our main title. While exploring the data, I noticed a majority of the countries were European countries, so decided this would be the focus of our viz. To call out the fact that only three of the countries in the Top 25 were non-European countries, we leveraged a light gray/dark red color combination, to bring attention to those three non-European countries. The subtitle coloring ties into the coloring within the viz (which we’ll see shortly), grabbing the reader’s attention.
There are several issues with the chart itself, so instead of showing a before/after snapshot for each individual issue, we’ll first cover what doesn’t work and then provide one before/after that captures all of the updates made.
The Truncated Y-Axis – this is a HUGE no-no when working with bar charts. Truncating the axis of a bar chart will ALWAYS result in an inaccurate representation of the data!! My favorite quote on this topic is from Curtis Harris and his Pluralsight course, “Data Visualization: Best Practices.”
Check out the two charts below, the top one has the same truncated axis as the original, while the bottom has a zero baseline. Just look at how the truncated axis distorts the data!! It looks as though the value of Belarus (the top country) is nearly 5x the value of Slovenia (the bottom country) when in reality, the value of Belarus (17.5 liters) is only 1.5x that of Slovenia (11.6 liters). Again, repeat after Curtis…I cannot stress this enough.
The Country Labels – it takes our brains longer to read text that is presented vertically or at an angle, so avoid this whenever possible. A simple flip of the chart allows us to display the country names horizontally and is much easier to read.
The Grid Lines – I’m a big fan of labeling my bar charts directly when the situation allows for it and felt this was an instance where we could remove the grid lines and simply label the ends of the bars instead.
The Color – Nothing in the original viz grabs the reader’s attention. This is where we can leverage the color mentioned earlier to guide the reader’s focus to whatever our particular insights may be; in this example, we wanted the reader to quickly see that out of a list of 25 countries, just three were non-European.
Now that we’ve covered a few items from the original viz that don’t quite work out, let’s take a look back at it, as well as the updates we’ve made, below. Here’s what changed;
By flipping the viz we are now able to display the country labels horizontally, thus eliminating the strain on our audience.
By removing the truncated axis and setting a zero baseline, we’re able to accurately display the data.
We’ve removed the grid lines and labeled the bars directly. What this does is remove any distraction that may be caused by the grid lines and turns our focus to the labeled ends of the bars, instead. Also worth noting, since the bars are labeled directly, we can remove the y-axis (x-axis in my viz), as it no longer provides value.
Lastly, we color the three non-European countries to match the red coloring in the title. Notice how quickly your eyes are drawn to those three countries; Grenada, South Korea and Australia.
So there you have it, just a few small changes to the original visualization and we’ve transformed a difficult to read chart with inaccurately displayed data into a clean, crisp looking chart, that leverages color to guide our audience. Thanks, I hope you enjoyed reading this and were able to take away something useful. Have a great day!!
In celebration of pride month, this week’s #MakeoverMonday looks at the question, “Is it wrong for same-sex adults to have sexual relations?” The original visualization by GSS Data Explorer (below), tracks the progress over time of the percentage of the population to answer that it is “Not wrong at all,” broken down by four different age groups. It’s going to be a shorter post this week, so let’s get right to it.
Step 1. What Works and What Does Not Work?
Since we’re trending the percentages over time, the original line chart is a logical decision. However, there are a few things that don’t quite work for me. After downloading the data, I noticed there are several years missing and that doesn’t appear to be called out anywhere on the visualization. With data missing between the starting and ending points, a slope chart would be another way to effectively show the change over time. A slope chart would also prevent the lines from overlapping and crossing one another so often. Slope chart or line chart, the colors in the original viz could also be improved upon and I know somewhere where you can find a ton of awesome color palettes…thanks Neil!! Lastly, I would have labeled the ends of the lines…either with the value or with the age group. Labeling the ends of the lines with the age group would allow us to get rid of the color legend that is forcing us to look back and forth between the legend and the graph, to see which color represents which age group.
Step 2. Know and Understand the Data
The data set this week was super clean, with the exception of some missing years like I mentioned earlier. Once opened in Tableau, a quick pivot brought the years and their values into rows as opposed to columns. So after pivoting, we end up with a tall data set instead of the original wide data set. Now we’re ready to head into Tableau to begin building our visualization.
Step 3. Choosing the Right Chart Type
Earlier I mentioned that a slope chart would be a good way to visualize this data set, given the fact that several years were missing in the data. I wanted to show the difference from the first year (1973) to the last year (2018), without showing any of the data in between those two years. But, I also wanted to show that, despite considerable growth over this current 45-year period, each age group was still very far away from 100%. So, with this in mind, I began by building a dot plot that looked like the below chart. This was a good start, but now I needed to show the gaps in each age group. For instance, for the 18-34 year old age group, I wanted to highlight the 71% to 100% section.
So, I changed the colors of the dots in the dot plot, as I would later tie their gray color into my title. Next, I thickened the line connecting the two dots, which, if you recall, represent 1973 and 2018 and ended up with this. I liked how simple the visualization was to read, each age group has increased its percentage of the population answering our question “Not wrong at all” by quite a lot, over the years. However, those are still huge gaps to reach 100% and it is quite disappointing to think that such a large portion of our society is this close minded. So, I wanted to make sure to capture the gaps that still remain.
To do this, I would leverage Tableau’s transparent sheets as well as a video from Andy Kriebel. I would use Andy’s tip to create a rounded bar chart, but instead of starting the bars at 0, I wanted mine to start at the 1973 value for each age group, to ensure they didn’t extend to the left of the gray dot plot, shown above. Here’s how my worksheet was set up to achieve this, you can view Andy’s video above to master the steps required to get there. Ok, so we had two worksheets, now we just needed to build the dashboard and layer one worksheet on top of the other.
Step 4. Finishing Touches
We didn’t need a big dashboard for this, so I just set mine to a fixed size of 1200px by 500px and probably could have gone 800px wide to be honest. I tiled my title text box, the sheet with the blue rounded bar charts and my footer text boxes and then laid the gray, thickened dot plot on top of the blue rounded bars. If you’ve never used transparent sheets before, the key is to float the top sheet on top of the bottom sheet and set the size and position to exactly match the bottom sheet. Also, in order for the sheet to be transparent, the background must be set to None. Here’s how my floating, transparent sheet was set up.
The last thing to do was add the title, where I tied in the colors to match those in the visualization. Then the information in the footer and some tooltips and we’re done! I was short on time this week, but still feel this quick visualization provides a good look into not only how far each age group has come, but also how far there still is to go, on this subject. Thanks for reading, I hope you enjoyed and were able to take away something useful. Have a great day!!
The data set for this week’s #MakeoverMonday is CO2 emissions per capita, per country, with the original visualization (below) showing the trends of nine selected countries, from 1960 through 2014. So, what works and what doesn’t work with this chart? I don’t mind a line chart displaying the trends of CO2 emissions by country. However, here are a few things I don’t like about the original. The colors are difficult to deal with and I would prefer a solid line vs. the dashed line in the original viz. The country labels block the last 5-10 years of the viz, depending on what line you’re following, so that’s not ideal either. I see in the original, the user has the option of toggling the labels on or off. But, if you turn the labels on and they end up covering part of the viz, I would have gone for an alternative approach to labeling the lines. Alright, let’s get down to business.
Step 1. Understanding the Data
The data set is a nice and easy one to work with, giving us Country Name, Country Code and the CO2 emissions for each year, from 1960 to 2014. However, in looking through the data set, the first thing that caught my attention was there are several additional rows of aggregated data, such as ‘Arab World’ and ‘Caribbean small states’ below. Depending on your analysis, you may want to use these, so just be aware that they are there. If not interested in using them, consider throwing a data source filter on Country Name, before jumping into Tableau, and filtering these out, so you don’t have to deal with them.
The only other thing with the data set is when pulling it into Tableau, you’ll likely need to take a few small steps to reshape the data;
You’ll notice the field names are in row 1 and the headers read F1, F2, F3, etc. To fix this, from the Data Source pane, click on the drop down of the sheet you pulled onto the canvas and select ‘Field Names are in first row.’
Next, the years 1960 to 2018 are in columns and we want those in rows instead, so we’ll pivot our data, giving us a tall data set as opposed to the current wide data set.
To do this click on the header of the year 1960, hold shift and scroll to 2014, click on that as well. This will select all years from 1960 to 2014. Next, right-click and select pivot.
Since there’s no data in the years 2015 to 2018, feel free to hide them.
Now, rename your new columns;
Change ‘Pivot Field Names’ to ‘Year’
Change ‘Pivot Field Values’ to ‘Value’
That should leave us with four columns; Country Name, Country Code, Year and Value. Alright, now we’re ready to jump into Sheet 1.
Step 2. Recreating the Original
After taking some time to explore the data, I decided to try something I’m not sure I’ve ever actually done as part of a #MakeoverMonday and that is to make a recreation of the original visualization with the exact same chart types. So, I’ll make a replica of the line chart and look to incorporate the bar charts into my viz as well. With this approach, I’ve defined three goals;
Make the viz cleaner
Better solution for the labels
Improve the interactivity
Usually my goal for #MakeoverMonday is to come up with a better way to visualize the data through the use of a different chart type. However, with this visualization, I feel the line chart and bar chart are good choices, the line chart just needs to be cleaned up and the bar chart is a little blah. What better time to try out Tableau’s BRAND NEW Parameter Actions, featured in the recent 2019.2 release?!!
original line chart
original bar chart
Step 3. Effective Use of Color
I took to Tableau and built exact replicas of the original line chart and bar chart. While I changed a few things formatting-wise, the only thing different with the charts themselves is the use of color. Instead of several different colors on the line chart, I used parameter actions to highlight, in red, the country being hovered on. Likewise, I followed this coloring through to the bar chart, which will end up in a viz in tooltip. Here they are below, as stand alone charts. Using color to highlight a certain country helps the audience to see how that country differs from the others.
color to highlight the line
color to highlight the bar
Tableau’s parameter actions are so easy to use. If you’ve used Set Actions before, the set up is very similar. Here’s all that is required for the three parameter actions in my viz.
Create a parameter using the Country Name field. I called it Country Parameter. This one parameter will be used in all the parameter actions.
2. Create a Boolean calculation called Country T/F and drag it to both the size card and the color card. Then simply adjust the size to your liking for both the True and False values and do the same for coloring. I adjusted my sizing and color, so when a Country was selected, the line thickened and turned red in color, while the other countries are thin gray lines, pushing them to the background but keeping them plenty visible for comparisons. Quick note: I also dropped this calculation on the color card of my viz in tooltip bar chart, allowing it to highlight the country being hovered on, just like the line chart.
3. Create a calculation that checks to see if the Country Name = the Country Parameter. If True, then it displays the Country Name, if False then it is blank. I dragged this to the label card to label the country being highlighted via the parameter actions. All other countries will receive no label.
Here are the calculations as well as the sheet. To get the label to fit at the end of the line chart, I both fixed the Year axis to add a few additional years and also added 25 pixels of right outer padding to this sheet, once it was dropped onto the dashboard. I could have just done more padding without fixing the axis and got the same result.
4. Once the sheet was pulled onto the dashboard it was time to set up the Parameter Actions. This is literally all there is to it; from the Menu go to Dashboard –> Actions –> Add Action –> Change Parameter. An Edit Parameter Action dialogue box will pop up. Simply name your action if you’d like, select your Source Sheets, Target Parameter and Field and then set the action to run on either Hover, Select or Menu. I chose to run the action on Hover, as it made the most sense for the interactivity in this viz.
Step 4. Formatting
Alright, with the Parameter Actions set up, it was time to finish this thing off with a little formatting. Here are some formatting steps I took to clean up the viz from its original version.
Changed the Y-axis tick marks to an interval of 5 instead of 2
Changed the X-axis tick marks to an interval of 10 instead of 5
Removed the grid lines
Changed the default axis font to Tableau Book 8pt, bold and gave it a darker color to help push it to the background
Replaced the default tooltips with the viz in tooltip, featuring the bar chart with selected country highlight
Added a small message letting the user know to hover for interactivity
Changed the background to a darker color…just personal preference
There we go, that’s it. Nothing crazy, but I feel like we gave the original viz a nice makeover, cleaning it up and making it more user friendly. The final product is below and you can play around with the interactive version right here. Thanks for reading and have a wonderful day!!
The following blog post takes the reader through the process of building my March Madness Bracket of Champions viz, in Tableau. However, this project involved quite a bit of pre-Tableau work, which I would also like to share, so if you came strictly for the Tableau part, please scroll down to the ‘Building the Viz in Tableau’ section.
Prepping the Viz for Tableau
I first saw data portraits being used in Tableau by Zen Master, Neil Richards, in November of 2018, with his TUG data portraits viz. At the time, I was unaware of their origination, but on Neil’s viz he included that the idea was inspired by Giorgia Lupi, so I did a little research to become more familiar with the concept. It appears Giorgia introduced the idea at TED 2017 in Vancouver, through the creation of buttons for conference attendees, as a way to create connections with other conference goers. Prior to the conference, attendees filled out a series of non-invasive questions that revealed fun facts about them. A design system then turned the answer to each question into a unique set of shapes, colors and symbols. About a month after Neil’s viz, I saw Josh Tapley create a viz of badges as well, his for the Philadelphia Tableau User Group. I loved how creative and beautiful they were, so knew I wanted to try it out, the only question was what to do?
Inspiration from Giorgia Lupi
Inspiration from Neil Richards
I didn’t want to copy Neil and Josh…although it does seem like a really cool thing for the Twin Cities Tableau User Group to try one of these months!! Instead I wanted to try something a little different. Being the sports fan I am, it was only natural that my version of data portraits would somehow tie in sports. My initial thought was to make a data portrait for each of the top players in the upcoming NBA Draft. I thought the data from each player’s scouting report could work perfectly for a data portrait, as you would essentially be answering questions, just like on Giorgia’s buttons. What is the player’s position? How tall is the player? What is their biggest strength, etc? However, it was still only December and with the draft still six months away, I simply could not wait that long! So, sticking with the basketball theme, my next thought was to create a bracket, where each team is represented by a data portrait. So, I filed away the idea and a few months later, with NCAA March Madness looming, tried creating my first badge. The North Carolina Tar Heels are my favorite college basketball team, so I created the below (left), badge, which displayed the following information; the team (logo), the year they won the national championship (1993), their tournament seed that year (#1 seed), the conference they played in (bottom coloring), their win/loss record (34-4), win/loss margin by game (step line chart), and the number of players who would go on to reach the NBA (one star per player). I chose to create a bracket of past champions, as I felt it could be a fun lead up to the actual tournament and because fans are always debating which past teams were better, etc. Why not create an interactive bracket, where people could fill out their bracket of past March Madness champions and share it with others?!
I had an idea, but what did the data look like, that would support the idea? To be honest, I didn’t need much to get started. My initial data set included only the Year, the Champion, their Seed, their win/loss record and their conference. I grabbed it from sports-reference.com/cbb and dumped it into Google sheets. It looked like this.
From here, I could start building out the team data portraits. Where else would I turn for this step, other than PowerPoint?! For more on combining the powers of Tableau and PowerPoint, be sure to check out this great post from Kevin Flerlage. In his post, Kevin recommends blog posts by Josh Tapley and one by Kevin’s brother and Tableau Zen Master, Ken Flerlage, that introduced him to the concept of mixing Tableau with PowerPoint. The only other data I would end up including was game by game margins of victory/defeat for each team (for the step line chart), as well as statistical leaders for each team, which was a late addition to the tooltips.
The Data Portraits
With the initial data in hand, it was off to PowerPoint to create 32 more data portraits, one for each NCAA Men’s Basketball champion, from 1985 through 2018. Basically, all I did here was make copies of the original North Carolina data portrait and then swap out the elements for each of the other teams. For example, to create this Michigan data portrait, I copied the North Carolina one, switched the year, added/removed the appropriate number of stars, changed the seed number and conference color accordingly and finally swapped the logo and line graph and adjusted the win/loss record. The line graphs were made in Tableau, saved as images and brought into PowerPoint. The logos were saved as images from ESPN.com and brought into PowerPoint and then I added an artistic effect under the formatting tab, to give them a little colored pencil look.
Dean Smith’s 2nd title
The Glen Rice Wolverines
It took some patience, but after several hours, over the course of a few late nights, I had finally completed all 33 of the data portraits and was ready to start building the bracket! One quick note; the 2013 championship won by the Louisville Cardinals was vacated due to team violations, so I omitted them from the viz.
After taking a stab at ranking the teams myself, it dawned on me that maybe someone else, much more qualified, had already done this work. A quick google search and I was delighted to see that, indeed, this had been done and fairly recently. In April 2018, ESPN Insider, John Gasaway had ranked all champions from 1939 to 2018. I compared my rankings against his and although many of mine were within one or two spots of his, a few, most notably 1995 UCLA, were way off. I had that Bruins squad much higher than Gasaway’s ranking of sixteenth. So, to ensure the seedings in the bracket were legitimate, I decided to follow Gasaway’s rankings, with a few very small tweaks, in order to balance out the bracket and avoid having the same school play another version of itself, early on.
Of the 33 teams, there were five instances of Duke, four North Carolina’s, four Connecticut’s, three Kentucky’s and three Villanova’s. So, those five schools accounted for 19 of the 33 teams. With far too much time spent jockeying the teams around, I was finally able to produce a bracket in which none of the above schools would meet until at least the third round. So, with the rankings set, it was time to build the viz.
Building the Viz in Tableau
The Set Up
I wanted the viz to have the look of an actual bracket that you might fill out by hand or online, in your local bracket challenge pool. So, in Tableau, once I had the team data portraits placed on the dashboard, I would leverage ninety-two text boxes to draw out the bracket. Each text box was filled with navy blue and set to be 3 pixels tall or wide, depending on its position. Looking back, this part was pretty tedious, but it allowed me to design the bracket exactly the way I wanted it to look, which was nice. Ok, back to the data portraits.
My goal in building this viz was to create a fun March Madness bracket, that would become interactive through the use of Tableau Set Actions. If you remember from above, the placement of the teams into the bracket had been determined, so Step 1 was to essentially create a bracket that had not yet been filled out. To place each team into their respective position in the bracket, I created a worksheet, that looked like the one below, for each of the sixteen first round match-ups and then floated (don’t hate me Team Tiled!!) each worksheet on the dashboard. Side note: this dashboard is literally a Team Tiled member’s worst nightmare, as there are somewhere in the neighborhood of 150 floating objects on the dashboard.
Setting up the bracket
Calc to separate out ’93 UNC from ’05 UNC, etc
I used the ‘Bracket’ field to filter each worksheet to its appropriate bracket and then the ‘Seed 1’ field to filter to the correct match-up. To account for schools with multiple championships, I then created a calculated field called ‘Year+Team’ which combined the ‘Year’ and ‘Champion’ fields. Pulled onto the shapes card, this would allow me to assign one data portrait per champion. Once this part was complete, I was left with eighteen sheets (originally seventeen) to float onto the dashboard. Why eighteen and originally eighteen? The original viz was built prior to the 2019 tournament and featured one “play-in” game. The play-in game was built using two sheets instead of one, so that’s how we get to seventeen sheets. Also, I updated the viz after the 2019 tournament, to include the 2019 champion Virginia Cavaliers, after their miracle run to the title; the last two games of which I was fortunate enough to have seen in person, at the Final Four in Minneapolis. What an amazing sports experience!! Anyway, adding Virginia led to the need for another play-in game, thus adding another sheet and getting us to eighteen. Alright, the bracket was set up, next up was to add the interactivity.
The interactivity was set up with a few simple steps, which were repeated for each game throughout the tournament.
Create a Set for each game in the bracket. Each Set looked identical to the one pictured below. The set was created using the Year+Team field and I left all boxes unchecked to ensure the worksheets that would later be dropped onto the dashboard were blank until the addition of the Set Actions.
2. I then created a Boolean (T/F) calculation for each game like the one shown below, created a sheet for each game in the tournament and dragged the Boolean calculations for each game onto the Filters shelf of their respective sheets, setting them all to True. This would ensure that once the Set Actions were in place, the blank sheets would populate with the expected data portrait.
3. Next, the sheets needed to be placed (floated) onto the dashboard, into their positions within the bracket. I floated them on the bracket as shown in the picture below.
4. Lastly, we needed to add in the Set Actions. Once again, there are 31 game so we need 31 Set Actions. In the example below, we’re using the Source Sheet 2.1, which contains the 1995 UCLA Bruins and the 2016 Villanova Wildcats. We tell the Set Action to target the Game 2 Set, which was set to True on the blank sheet named 2.2. And then we click ok and back on the dashboard, if we click the UCLA data portrait on Sheet 2.1, we see them advance into the second round of the tournament, onto Sheet 2.2. Every other Set Action is set up just like this and together, they provide the dashboard interactivity.
Source Sheet 2.1 | Target Set Game 2 Set which is on Sheet 2.2
The Viz in Tooltips
Lastly, while I felt the data portraits provided great high level information about each champion, what they lacked was any type of information regarding the players. So, I pulled some more data from sports-reference.com/cbb and added a tooltip that, on the left-hand side, provided a zoomed in view of the data portrait and on the right-hand side, provided the user with each team’s statistical leaders in three main categories; points, rebounds and assists. The ’94 Arkansas Razorbacks were one of my all-time favorite college teams…and it didn’t hurt that they also beat Duke in the title game!!
Before wrapping up, I want to give a shout out to Kevin Flerlage for some fantastic feedback throughout the whole process of building this viz. Kevin helped me with some decisions regarding the tooltips and a nice clean way of executing a “clear bracket” option, among other great input. Also, when I was in the early stages of building out the viz, I thought it was a pretty cool idea. But after getting it to a point where it could be shared with others, for feedback, Kevin’s reaction and genuine excitement for the viz made me that much more motivated to get this thing across the finish line. Also, a big thanks to my co-workers Jim Van Sistine and Tom Coyer for providing their feedback as well and last, but not least, my friend Jason Underdahl, who said of the initial data portrait “why do you have the logo grayed out? You can barely even see it!” That’s tough love, but he had a good point! Adding the color back to the logos really made them pop!!
Thanks for reading, I hope you enjoyed this post and found it useful.
For this week’s #MakeoverMonday, we’re looking into cost effectiveness in Major League Baseball. More specifically, how does a player/team salary translate into productivity on the field, across a variety of statistical categories. For instance, if Player X made $20 million in 2015 and hit 20 home runs that year, you could say the team is paying $1 million per home run hit by Player X. Alright, let’s get started.
Step 1. Understanding the Data
Since I’m a lifelong baseball fan, this is a data scenario that is familiar to me. However, Andy and Eva did a great job of including links on the data sets page for those who may be less familiar with the sport. If this were, say, Rugby data, I would absolutely be diving into those resources, so if you ever feel uncomfortable with the data set, be sure to do a little bit of research. One thing I will mention is that it looks like the data set is focused on hitting stats only and does not include pitching stats. However, pitchers are included in the data set, because they do compile hitting stats in certain situations. If you’re unfamiliar with the rules and why this could potentially matter for this data set, here are a few notes;
Pitchers ONLY hit in games that are played in National League ballparks
Starting pitchers ONLY start every 4th or 5th game
MOST (not all) pitchers are not very good at hitting the ball
MOST (not all) good pitchers have high salaries
So, if the average National League pitcher starts every 5th game (32 starts) and gets three plate appearances per game, that comes to 96 at bats, for the season. So why does any of this matter? Like we mentioned, pitchers typically aren’t great at hitting the ball, so their hitting stats could look very poor when compared to the average position player (all other players on the field, other than the pitcher are referred to as position players). So, if we’re analyzing the cost a team pays players per home run, for instance, let’s look at an example of what it could look like when comparing a pitcher vs. a good position player.
makes $25 million and hits 1 home run in 96 plate appearances
This would suggest we pay Pitcher X $25 million per home run hit
Position Player Y
makes $25 million and hits 25 home runs in 432 plate appearances
We pay Position Player Y $1 million per home run hit
This scenario would lead you to believe that Position Player Y is a much more cost effective player, when the reality is simply that he is paid to hit the ball, while Pitcher X is paid primarily to pitch the ball. And since the data set does not include a field for each player’s “position,” we’re unable to simply filter pitchers out of the data set. Therefore, it may make sense to set a filter on plate appearances and set it to a minimum of 200 per season. This would filter the pitchers out, as in my opinion, it does not make sense to include them in this analysis. I apologize for the long-winded explanation, but in my first glance at the data, I saw this as potentially slipping up some participants who may not be familiar with the game. Ok, what about the original viz?
Step 2. The Original Viz
The scatter plots on the original viz are easy enough to understand, but to be honest the way in which it was labeled, made it difficult for me to follow, especially the bottom, team section. Also, I didn’t find the team section all that interesting, because basically it was just showing us what teams have the lowest payrolls (Houston Astros) and which teams have the highest payrolls (New York Yankees and Los Angeles Dodgers). I guess the most interesting part of the team section was it tells us just how unbelievably bad the Miami Marlins were, offensively, in 2013. Wow!! Last in the league in all five categories.
Step 3. Try New Things
Awhile back, I saw this really good video by Andy, on how to build a no-whisker box plot and have been waiting for the right opportunity to try creating something similar. I was hopeful this data set would provide that opportunity, but after working through a few different scenarios, I was unhappy with the results. So, we’ll continue to file that chart type away for a different day and move onto something else. Another recent viz I really liked was this beautiful viz by Lindsey Poulter, which used a stepped line and dot combo chart to capture the magical 2018 season of Kansas City Chiefs QB, Patrick Mahomes. In his #WorkoutWednesday challenge for Week 4 of 2019, Curtis Harris built a similar chart that tracked headcount. I really loved not only the look of these vizzes, but also the ease of understanding them. So, I decided to go with this chart type, but the question was what data would it work well with? It’s probably worth mentioning that in a business setting, choosing your chart type first is probably not going to be the best approach. However, one of the great things about Tableau Public and community projects like #MakeoverMonday are that they offer us great opportunities to try new things and approach data visualization in different ways, in a safe environment.
Step 4. Finding the Story
The next step was to begin playing around with the data to find a story that fit the vision I had in my head. Early on, I had ruled out looking at team data, as I wanted to focus on players instead. Looking at hits, runs, RBI and home runs, I worked through some different ideas before landing on a viz that would feature the most recent members of the 500 home run club. Leveraging the stepped line dot combo chart, I felt it would be fun to visualize each player’s home runs by season, along with their team’s cost per home run (or the player’s salary per home run, whichever way you prefer looking at it). What I expected to see was as a player’s salary increased throughout their career, their salary per home run would increase fairly closely along with it. While this was true in a majority of cases, it certainly was not the case for all players on the list and in other cases, the increase in cost per home run was not as steep as I had guessed. Now that I had found a story, the next step was communicating it with a clean, engaging design.
Step 5. Simplicity in Design
I used just two colors in the viz, with a third for non-data related text. My colors were shades of red and blue, the colors of the MLB logo. For the chart, I set home runs as the stepped line chart and salary per home run as the sized dots. Here’s what it looked like.
The chart looked nice, but something very important was missing; salary by season. I immediately thought back to a fantastic blog post by Ryan Sleeper, in which he shares creative ways to use transparent sheets to add context to your vizzes. This was exactly what I needed, context. The moment I saw it, I fell in love with Ryan’s bar chart trend pushed to the background. So, I implemented this strategy, with player salary set as the trend in the background, with an opacity of 20%. This way it would be there for context, but not draw attention away from the other chart. With home runs set to running total for the player’s career, this worked out well, because as home runs increased, a player’s salary typically increased as well. So, for the most part, they increased along with one another. Here’s what it looked like after floating the stepped line on top of the bar chart.
To add more context, I included text for each player’s career salary, home runs and salary per home run, as well as the years they played. Lastly, I wanted the reader to be able to see the differences in all three measures, so I fixed the y-axis for the home runs stepped chart from 0 to 800, fixed the salary bar chart from 0 to $40 million and fixed the salary per home run dot size from $0 to $5 million. Below is a view after adding the text. Since I didn’t show any axes, I included an explanation through the use of an info button.
Step 6. Sense Checking the Data
It was not until after building the entire viz that I really took the time to look closely at the numbers to make sure everything made sense. Guess what? It didn’t!! For a while, I couldn’t figure out why a few players had extreme spikes in their salaries. Take a look at the below comparison of before and after I found the issue. Look at those spikes!! Why on earth would Gary Sheffield randomly make nearly $30 million dollars in one season and then go back to making $9-10 for the next several years? Answer? He wouldn’t, so there was clearly something wrong with either the data or one of my calculations.
incorrect salary figures
corrected salary figures
After digging around, here’s what I found. My ‘FIXED Player Salary’ calculation had originally been set up as SUM([Salary]), as I had not taken into account the fact that if a player had played for more than one team during the same season, they would have more than one row of data for that season. Here’s what the incorrect calculation and the result looked like.
I was certain $29.9 million was incorrect, but I also wanted to be sure that the $14.9 million figure was correct, so I checked the trusty old baseball-reference.com and saw that the numbers matched and Sheffield did indeed make that amount in 1998. So, I needed to change my calculation to pull in the MIN([Salary]) as opposed to SUM.
Overall, I enjoyed working with the data set this week and wound up spending a lot more time on this viz than on a typical #MakeoverMonday, mostly due to just playing around with exploring the data. Below is a look at my final viz, the interactive version can be found here. Thanks for reading, I hope you enjoyed!!