This week’s #MakeoverMonday data set examines the twenty -five countries in the world with the highest consumption of pure alcohol per capita. Below is a picture of the original viz, let’s see what we can do to improve it.
What Does Not Work and Why?
In looking over the original visualization, it became clear quickly that a few small tweaks could drastically improve our audience’s ability to consume the data. So, what doesn’t work and how can it be improved?
The Title – it’s misleading and could have us believe we’re looking at actual rates (percentages) of consumption when in fact the data displayed are liters of alcohol consumed. To improve this, we grabbed the title from the y-axis and made it our main title. While exploring the data, I noticed a majority of the countries were European countries, so decided this would be the focus of our viz. To call out the fact that only three of the countries in the Top 25 were non-European countries, we leveraged a light gray/dark red color combination, to bring attention to those three non-European countries. The subtitle coloring ties into the coloring within the viz (which we’ll see shortly), grabbing the reader’s attention.
There are several issues with the chart itself, so instead of showing a before/after snapshot for each individual issue, we’ll first cover what doesn’t work and then provide one before/after that captures all of the updates made.
The Truncated Y-Axis – this is a HUGE no-no when working with bar charts. Truncating the axis of a bar chart will ALWAYS result in an inaccurate representation of the data!! My favorite quote on this topic is from Curtis Harris and his Pluralsight course, “Data Visualization: Best Practices.”
Check out the two charts below, the top one has the same truncated axis as the original, while the bottom has a zero baseline. Just look at how the truncated axis distorts the data!! It looks as though the value of Belarus (the top country) is nearly 5x the value of Slovenia (the bottom country) when in reality, the value of Belarus (17.5 liters) is only 1.5x that of Slovenia (11.6 liters). Again, repeat after Curtis…I cannot stress this enough.
The Country Labels – it takes our brains longer to read text that is presented vertically or at an angle, so avoid this whenever possible. A simple flip of the chart allows us to display the country names horizontally and is much easier to read.
The Grid Lines – I’m a big fan of labeling my bar charts directly when the situation allows for it and felt this was an instance where we could remove the grid lines and simply label the ends of the bars instead.
The Color – Nothing in the original viz grabs the reader’s attention. This is where we can leverage the color mentioned earlier to guide the reader’s focus to whatever our particular insights may be; in this example, we wanted the reader to quickly see that out of a list of 25 countries, just three were non-European.
Now that we’ve covered a few items from the original viz that don’t quite work out, let’s take a look back at it, as well as the updates we’ve made, below. Here’s what changed;
By flipping the viz we are now able to display the country labels horizontally, thus eliminating the strain on our audience.
By removing the truncated axis and setting a zero baseline, we’re able to accurately display the data.
We’ve removed the grid lines and labeled the bars directly. What this does is remove any distraction that may be caused by the grid lines and turns our focus to the labeled ends of the bars, instead. Also worth noting, since the bars are labeled directly, we can remove the y-axis (x-axis in my viz), as it no longer provides value.
Lastly, we color the three non-European countries to match the red coloring in the title. Notice how quickly your eyes are drawn to those three countries; Grenada, South Korea and Australia.
So there you have it, just a few small changes to the original visualization and we’ve transformed a difficult to read chart with inaccurately displayed data into a clean, crisp looking chart, that leverages color to guide our audience. Thanks, I hope you enjoyed reading this and were able to take away something useful. Have a great day!!
In celebration of pride month, this week’s #MakeoverMonday looks at the question, “Is it wrong for same-sex adults to have sexual relations?” The original visualization by GSS Data Explorer (below), tracks the progress over time of the percentage of the population to answer that it is “Not wrong at all,” broken down by four different age groups. It’s going to be a shorter post this week, so let’s get right to it.
Step 1. What Works and What Does Not Work?
Since we’re trending the percentages over time, the original line chart is a logical decision. However, there are a few things that don’t quite work for me. After downloading the data, I noticed there are several years missing and that doesn’t appear to be called out anywhere on the visualization. With data missing between the starting and ending points, a slope chart would be another way to effectively show the change over time. A slope chart would also prevent the lines from overlapping and crossing one another so often. Slope chart or line chart, the colors in the original viz could also be improved upon and I know somewhere where you can find a ton of awesome color palettes…thanks Neil!! Lastly, I would have labeled the ends of the lines…either with the value or with the age group. Labeling the ends of the lines with the age group would allow us to get rid of the color legend that is forcing us to look back and forth between the legend and the graph, to see which color represents which age group.
Step 2. Know and Understand the Data
The data set this week was super clean, with the exception of some missing years like I mentioned earlier. Once opened in Tableau, a quick pivot brought the years and their values into rows as opposed to columns. So after pivoting, we end up with a tall data set instead of the original wide data set. Now we’re ready to head into Tableau to begin building our visualization.
Step 3. Choosing the Right Chart Type
Earlier I mentioned that a slope chart would be a good way to visualize this data set, given the fact that several years were missing in the data. I wanted to show the difference from the first year (1973) to the last year (2018), without showing any of the data in between those two years. But, I also wanted to show that, despite considerable growth over this current 45-year period, each age group was still very far away from 100%. So, with this in mind, I began by building a dot plot that looked like the below chart. This was a good start, but now I needed to show the gaps in each age group. For instance, for the 18-34 year old age group, I wanted to highlight the 71% to 100% section.
So, I changed the colors of the dots in the dot plot, as I would later tie their gray color into my title. Next, I thickened the line connecting the two dots, which, if you recall, represent 1973 and 2018 and ended up with this. I liked how simple the visualization was to read, each age group has increased its percentage of the population answering our question “Not wrong at all” by quite a lot, over the years. However, those are still huge gaps to reach 100% and it is quite disappointing to think that such a large portion of our society is this close minded. So, I wanted to make sure to capture the gaps that still remain.
To do this, I would leverage Tableau’s transparent sheets as well as a video from Andy Kriebel. I would use Andy’s tip to create a rounded bar chart, but instead of starting the bars at 0, I wanted mine to start at the 1973 value for each age group, to ensure they didn’t extend to the left of the gray dot plot, shown above. Here’s how my worksheet was set up to achieve this, you can view Andy’s video above to master the steps required to get there. Ok, so we had two worksheets, now we just needed to build the dashboard and layer one worksheet on top of the other.
Step 4. Finishing Touches
We didn’t need a big dashboard for this, so I just set mine to a fixed size of 1200px by 500px and probably could have gone 800px wide to be honest. I tiled my title text box, the sheet with the blue rounded bar charts and my footer text boxes and then laid the gray, thickened dot plot on top of the blue rounded bars. If you’ve never used transparent sheets before, the key is to float the top sheet on top of the bottom sheet and set the size and position to exactly match the bottom sheet. Also, in order for the sheet to be transparent, the background must be set to None. Here’s how my floating, transparent sheet was set up.
The last thing to do was add the title, where I tied in the colors to match those in the visualization. Then the information in the footer and some tooltips and we’re done! I was short on time this week, but still feel this quick visualization provides a good look into not only how far each age group has come, but also how far there still is to go, on this subject. Thanks for reading, I hope you enjoyed and were able to take away something useful. Have a great day!!
The data set for this week’s #MakeoverMonday is CO2 emissions per capita, per country, with the original visualization (below) showing the trends of nine selected countries, from 1960 through 2014. So, what works and what doesn’t work with this chart? I don’t mind a line chart displaying the trends of CO2 emissions by country. However, here are a few things I don’t like about the original. The colors are difficult to deal with and I would prefer a solid line vs. the dashed line in the original viz. The country labels block the last 5-10 years of the viz, depending on what line you’re following, so that’s not ideal either. I see in the original, the user has the option of toggling the labels on or off. But, if you turn the labels on and they end up covering part of the viz, I would have gone for an alternative approach to labeling the lines. Alright, let’s get down to business.
Step 1. Understanding the Data
The data set is a nice and easy one to work with, giving us Country Name, Country Code and the CO2 emissions for each year, from 1960 to 2014. However, in looking through the data set, the first thing that caught my attention was there are several additional rows of aggregated data, such as ‘Arab World’ and ‘Caribbean small states’ below. Depending on your analysis, you may want to use these, so just be aware that they are there. If not interested in using them, consider throwing a data source filter on Country Name, before jumping into Tableau, and filtering these out, so you don’t have to deal with them.
The only other thing with the data set is when pulling it into Tableau, you’ll likely need to take a few small steps to reshape the data;
You’ll notice the field names are in row 1 and the headers read F1, F2, F3, etc. To fix this, from the Data Source pane, click on the drop down of the sheet you pulled onto the canvas and select ‘Field Names are in first row.’
Next, the years 1960 to 2018 are in columns and we want those in rows instead, so we’ll pivot our data, giving us a tall data set as opposed to the current wide data set.
To do this click on the header of the year 1960, hold shift and scroll to 2014, click on that as well. This will select all years from 1960 to 2014. Next, right-click and select pivot.
Since there’s no data in the years 2015 to 2018, feel free to hide them.
Now, rename your new columns;
Change ‘Pivot Field Names’ to ‘Year’
Change ‘Pivot Field Values’ to ‘Value’
That should leave us with four columns; Country Name, Country Code, Year and Value. Alright, now we’re ready to jump into Sheet 1.
Step 2. Recreating the Original
After taking some time to explore the data, I decided to try something I’m not sure I’ve ever actually done as part of a #MakeoverMonday and that is to make a recreation of the original visualization with the exact same chart types. So, I’ll make a replica of the line chart and look to incorporate the bar charts into my viz as well. With this approach, I’ve defined three goals;
Make the viz cleaner
Better solution for the labels
Improve the interactivity
Usually my goal for #MakeoverMonday is to come up with a better way to visualize the data through the use of a different chart type. However, with this visualization, I feel the line chart and bar chart are good choices, the line chart just needs to be cleaned up and the bar chart is a little blah. What better time to try out Tableau’s BRAND NEW Parameter Actions, featured in the recent 2019.2 release?!!
original line chart
original bar chart
Step 3. Effective Use of Color
I took to Tableau and built exact replicas of the original line chart and bar chart. While I changed a few things formatting-wise, the only thing different with the charts themselves is the use of color. Instead of several different colors on the line chart, I used parameter actions to highlight, in red, the country being hovered on. Likewise, I followed this coloring through to the bar chart, which will end up in a viz in tooltip. Here they are below, as stand alone charts. Using color to highlight a certain country helps the audience to see how that country differs from the others.
color to highlight the line
color to highlight the bar
Tableau’s parameter actions are so easy to use. If you’ve used Set Actions before, the set up is very similar. Here’s all that is required for the three parameter actions in my viz.
Create a parameter using the Country Name field. I called it Country Parameter. This one parameter will be used in all the parameter actions.
2. Create a Boolean calculation called Country T/F and drag it to both the size card and the color card. Then simply adjust the size to your liking for both the True and False values and do the same for coloring. I adjusted my sizing and color, so when a Country was selected, the line thickened and turned red in color, while the other countries are thin gray lines, pushing them to the background but keeping them plenty visible for comparisons. Quick note: I also dropped this calculation on the color card of my viz in tooltip bar chart, allowing it to highlight the country being hovered on, just like the line chart.
3. Create a calculation that checks to see if the Country Name = the Country Parameter. If True, then it displays the Country Name, if False then it is blank. I dragged this to the label card to label the country being highlighted via the parameter actions. All other countries will receive no label.
Here are the calculations as well as the sheet. To get the label to fit at the end of the line chart, I both fixed the Year axis to add a few additional years and also added 25 pixels of right outer padding to this sheet, once it was dropped onto the dashboard. I could have just done more padding without fixing the axis and got the same result.
4. Once the sheet was pulled onto the dashboard it was time to set up the Parameter Actions. This is literally all there is to it; from the Menu go to Dashboard –> Actions –> Add Action –> Change Parameter. An Edit Parameter Action dialogue box will pop up. Simply name your action if you’d like, select your Source Sheets, Target Parameter and Field and then set the action to run on either Hover, Select or Menu. I chose to run the action on Hover, as it made the most sense for the interactivity in this viz.
Step 4. Formatting
Alright, with the Parameter Actions set up, it was time to finish this thing off with a little formatting. Here are some formatting steps I took to clean up the viz from its original version.
Changed the Y-axis tick marks to an interval of 5 instead of 2
Changed the X-axis tick marks to an interval of 10 instead of 5
Removed the grid lines
Changed the default axis font to Tableau Book 8pt, bold and gave it a darker color to help push it to the background
Replaced the default tooltips with the viz in tooltip, featuring the bar chart with selected country highlight
Added a small message letting the user know to hover for interactivity
Changed the background to a darker color…just personal preference
There we go, that’s it. Nothing crazy, but I feel like we gave the original viz a nice makeover, cleaning it up and making it more user friendly. The final product is below and you can play around with the interactive version right here. Thanks for reading and have a wonderful day!!
The following blog post takes the reader through the process of building my March Madness Bracket of Champions viz, in Tableau. However, this project involved quite a bit of pre-Tableau work, which I would also like to share, so if you came strictly for the Tableau part, please scroll down to the ‘Building the Viz in Tableau’ section.
Prepping the Viz for Tableau
I first saw data portraits being used in Tableau by Zen Master, Neil Richards, in November of 2018, with his TUG data portraits viz. At the time, I was unaware of their origination, but on Neil’s viz he included that the idea was inspired by Giorgia Lupi, so I did a little research to become more familiar with the concept. It appears Giorgia introduced the idea at TED 2017 in Vancouver, through the creation of buttons for conference attendees, as a way to create connections with other conference goers. Prior to the conference, attendees filled out a series of non-invasive questions that revealed fun facts about them. A design system then turned the answer to each question into a unique set of shapes, colors and symbols. About a month after Neil’s viz, I saw Josh Tapley create a viz of badges as well, his for the Philadelphia Tableau User Group. I loved how creative and beautiful they were, so knew I wanted to try it out, the only question was what to do?
Inspiration from Giorgia Lupi
Inspiration from Neil Richards
I didn’t want to copy Neil and Josh…although it does seem like a really cool thing for the Twin Cities Tableau User Group to try one of these months!! Instead I wanted to try something a little different. Being the sports fan I am, it was only natural that my version of data portraits would somehow tie in sports. My initial thought was to make a data portrait for each of the top players in the upcoming NBA Draft. I thought the data from each player’s scouting report could work perfectly for a data portrait, as you would essentially be answering questions, just like on Giorgia’s buttons. What is the player’s position? How tall is the player? What is their biggest strength, etc? However, it was still only December and with the draft still six months away, I simply could not wait that long! So, sticking with the basketball theme, my next thought was to create a bracket, where each team is represented by a data portrait. So, I filed away the idea and a few months later, with NCAA March Madness looming, tried creating my first badge. The North Carolina Tar Heels are my favorite college basketball team, so I created the below (left), badge, which displayed the following information; the team (logo), the year they won the national championship (1993), their tournament seed that year (#1 seed), the conference they played in (bottom coloring), their win/loss record (34-4), win/loss margin by game (step line chart), and the number of players who would go on to reach the NBA (one star per player). I chose to create a bracket of past champions, as I felt it could be a fun lead up to the actual tournament and because fans are always debating which past teams were better, etc. Why not create an interactive bracket, where people could fill out their bracket of past March Madness champions and share it with others?!
I had an idea, but what did the data look like, that would support the idea? To be honest, I didn’t need much to get started. My initial data set included only the Year, the Champion, their Seed, their win/loss record and their conference. I grabbed it from sports-reference.com/cbb and dumped it into Google sheets. It looked like this.
From here, I could start building out the team data portraits. Where else would I turn for this step, other than PowerPoint?! For more on combining the powers of Tableau and PowerPoint, be sure to check out this great post from Kevin Flerlage. In his post, Kevin recommends blog posts by Josh Tapley and one by Kevin’s brother and Tableau Zen Master, Ken Flerlage, that introduced him to the concept of mixing Tableau with PowerPoint. The only other data I would end up including was game by game margins of victory/defeat for each team (for the step line chart), as well as statistical leaders for each team, which was a late addition to the tooltips.
The Data Portraits
With the initial data in hand, it was off to PowerPoint to create 32 more data portraits, one for each NCAA Men’s Basketball champion, from 1985 through 2018. Basically, all I did here was make copies of the original North Carolina data portrait and then swap out the elements for each of the other teams. For example, to create this Michigan data portrait, I copied the North Carolina one, switched the year, added/removed the appropriate number of stars, changed the seed number and conference color accordingly and finally swapped the logo and line graph and adjusted the win/loss record. The line graphs were made in Tableau, saved as images and brought into PowerPoint. The logos were saved as images from ESPN.com and brought into PowerPoint and then I added an artistic effect under the formatting tab, to give them a little colored pencil look.
Dean Smith’s 2nd title
The Glen Rice Wolverines
It took some patience, but after several hours, over the course of a few late nights, I had finally completed all 33 of the data portraits and was ready to start building the bracket! One quick note; the 2013 championship won by the Louisville Cardinals was vacated due to team violations, so I omitted them from the viz.
After taking a stab at ranking the teams myself, it dawned on me that maybe someone else, much more qualified, had already done this work. A quick google search and I was delighted to see that, indeed, this had been done and fairly recently. In April 2018, ESPN Insider, John Gasaway had ranked all champions from 1939 to 2018. I compared my rankings against his and although many of mine were within one or two spots of his, a few, most notably 1995 UCLA, were way off. I had that Bruins squad much higher than Gasaway’s ranking of sixteenth. So, to ensure the seedings in the bracket were legitimate, I decided to follow Gasaway’s rankings, with a few very small tweaks, in order to balance out the bracket and avoid having the same school play another version of itself, early on.
Of the 33 teams, there were five instances of Duke, four North Carolina’s, four Connecticut’s, three Kentucky’s and three Villanova’s. So, those five schools accounted for 19 of the 33 teams. With far too much time spent jockeying the teams around, I was finally able to produce a bracket in which none of the above schools would meet until at least the third round. So, with the rankings set, it was time to build the viz.
Building the Viz in Tableau
The Set Up
I wanted the viz to have the look of an actual bracket that you might fill out by hand or online, in your local bracket challenge pool. So, in Tableau, once I had the team data portraits placed on the dashboard, I would leverage ninety-two text boxes to draw out the bracket. Each text box was filled with navy blue and set to be 3 pixels tall or wide, depending on its position. Looking back, this part was pretty tedious, but it allowed me to design the bracket exactly the way I wanted it to look, which was nice. Ok, back to the data portraits.
My goal in building this viz was to create a fun March Madness bracket, that would become interactive through the use of Tableau Set Actions. If you remember from above, the placement of the teams into the bracket had been determined, so Step 1 was to essentially create a bracket that had not yet been filled out. To place each team into their respective position in the bracket, I created a worksheet, that looked like the one below, for each of the sixteen first round match-ups and then floated (don’t hate me Team Tiled!!) each worksheet on the dashboard. Side note: this dashboard is literally a Team Tiled member’s worst nightmare, as there are somewhere in the neighborhood of 150 floating objects on the dashboard.
Setting up the bracket
Calc to separate out ’93 UNC from ’05 UNC, etc
I used the ‘Bracket’ field to filter each worksheet to its appropriate bracket and then the ‘Seed 1’ field to filter to the correct match-up. To account for schools with multiple championships, I then created a calculated field called ‘Year+Team’ which combined the ‘Year’ and ‘Champion’ fields. Pulled onto the shapes card, this would allow me to assign one data portrait per champion. Once this part was complete, I was left with eighteen sheets (originally seventeen) to float onto the dashboard. Why eighteen and originally eighteen? The original viz was built prior to the 2019 tournament and featured one “play-in” game. The play-in game was built using two sheets instead of one, so that’s how we get to seventeen sheets. Also, I updated the viz after the 2019 tournament, to include the 2019 champion Virginia Cavaliers, after their miracle run to the title; the last two games of which I was fortunate enough to have seen in person, at the Final Four in Minneapolis. What an amazing sports experience!! Anyway, adding Virginia led to the need for another play-in game, thus adding another sheet and getting us to eighteen. Alright, the bracket was set up, next up was to add the interactivity.
The interactivity was set up with a few simple steps, which were repeated for each game throughout the tournament.
Create a Set for each game in the bracket. Each Set looked identical to the one pictured below. The set was created using the Year+Team field and I left all boxes unchecked to ensure the worksheets that would later be dropped onto the dashboard were blank until the addition of the Set Actions.
2. I then created a Boolean (T/F) calculation for each game like the one shown below, created a sheet for each game in the tournament and dragged the Boolean calculations for each game onto the Filters shelf of their respective sheets, setting them all to True. This would ensure that once the Set Actions were in place, the blank sheets would populate with the expected data portrait.
3. Next, the sheets needed to be placed (floated) onto the dashboard, into their positions within the bracket. I floated them on the bracket as shown in the picture below.
4. Lastly, we needed to add in the Set Actions. Once again, there are 31 game so we need 31 Set Actions. In the example below, we’re using the Source Sheet 2.1, which contains the 1995 UCLA Bruins and the 2016 Villanova Wildcats. We tell the Set Action to target the Game 2 Set, which was set to True on the blank sheet named 2.2. And then we click ok and back on the dashboard, if we click the UCLA data portrait on Sheet 2.1, we see them advance into the second round of the tournament, onto Sheet 2.2. Every other Set Action is set up just like this and together, they provide the dashboard interactivity.
Source Sheet 2.1 | Target Set Game 2 Set which is on Sheet 2.2
The Viz in Tooltips
Lastly, while I felt the data portraits provided great high level information about each champion, what they lacked was any type of information regarding the players. So, I pulled some more data from sports-reference.com/cbb and added a tooltip that, on the left-hand side, provided a zoomed in view of the data portrait and on the right-hand side, provided the user with each team’s statistical leaders in three main categories; points, rebounds and assists. The ’94 Arkansas Razorbacks were one of my all-time favorite college teams…and it didn’t hurt that they also beat Duke in the title game!!
Before wrapping up, I want to give a shout out to Kevin Flerlage for some fantastic feedback throughout the whole process of building this viz. Kevin helped me with some decisions regarding the tooltips and a nice clean way of executing a “clear bracket” option, among other great input. Also, when I was in the early stages of building out the viz, I thought it was a pretty cool idea. But after getting it to a point where it could be shared with others, for feedback, Kevin’s reaction and genuine excitement for the viz made me that much more motivated to get this thing across the finish line. Also, a big thanks to my co-workers Jim Van Sistine and Tom Coyer for providing their feedback as well and last, but not least, my friend Jason Underdahl, who said of the initial data portrait “why do you have the logo grayed out? You can barely even see it!” That’s tough love, but he had a good point! Adding the color back to the logos really made them pop!!
Thanks for reading, I hope you enjoyed this post and found it useful.
For this week’s #MakeoverMonday, we’re looking into cost effectiveness in Major League Baseball. More specifically, how does a player/team salary translate into productivity on the field, across a variety of statistical categories. For instance, if Player X made $20 million in 2015 and hit 20 home runs that year, you could say the team is paying $1 million per home run hit by Player X. Alright, let’s get started.
Step 1. Understanding the Data
Since I’m a lifelong baseball fan, this is a data scenario that is familiar to me. However, Andy and Eva did a great job of including links on the data sets page for those who may be less familiar with the sport. If this were, say, Rugby data, I would absolutely be diving into those resources, so if you ever feel uncomfortable with the data set, be sure to do a little bit of research. One thing I will mention is that it looks like the data set is focused on hitting stats only and does not include pitching stats. However, pitchers are included in the data set, because they do compile hitting stats in certain situations. If you’re unfamiliar with the rules and why this could potentially matter for this data set, here are a few notes;
Pitchers ONLY hit in games that are played in National League ballparks
Starting pitchers ONLY start every 4th or 5th game
MOST (not all) pitchers are not very good at hitting the ball
MOST (not all) good pitchers have high salaries
So, if the average National League pitcher starts every 5th game (32 starts) and gets three plate appearances per game, that comes to 96 at bats, for the season. So why does any of this matter? Like we mentioned, pitchers typically aren’t great at hitting the ball, so their hitting stats could look very poor when compared to the average position player (all other players on the field, other than the pitcher are referred to as position players). So, if we’re analyzing the cost a team pays players per home run, for instance, let’s look at an example of what it could look like when comparing a pitcher vs. a good position player.
makes $25 million and hits 1 home run in 96 plate appearances
This would suggest we pay Pitcher X $25 million per home run hit
Position Player Y
makes $25 million and hits 25 home runs in 432 plate appearances
We pay Position Player Y $1 million per home run hit
This scenario would lead you to believe that Position Player Y is a much more cost effective player, when the reality is simply that he is paid to hit the ball, while Pitcher X is paid primarily to pitch the ball. And since the data set does not include a field for each player’s “position,” we’re unable to simply filter pitchers out of the data set. Therefore, it may make sense to set a filter on plate appearances and set it to a minimum of 200 per season. This would filter the pitchers out, as in my opinion, it does not make sense to include them in this analysis. I apologize for the long-winded explanation, but in my first glance at the data, I saw this as potentially slipping up some participants who may not be familiar with the game. Ok, what about the original viz?
Step 2. The Original Viz
The scatter plots on the original viz are easy enough to understand, but to be honest the way in which it was labeled, made it difficult for me to follow, especially the bottom, team section. Also, I didn’t find the team section all that interesting, because basically it was just showing us what teams have the lowest payrolls (Houston Astros) and which teams have the highest payrolls (New York Yankees and Los Angeles Dodgers). I guess the most interesting part of the team section was it tells us just how unbelievably bad the Miami Marlins were, offensively, in 2013. Wow!! Last in the league in all five categories.
Step 3. Try New Things
Awhile back, I saw this really good video by Andy, on how to build a no-whisker box plot and have been waiting for the right opportunity to try creating something similar. I was hopeful this data set would provide that opportunity, but after working through a few different scenarios, I was unhappy with the results. So, we’ll continue to file that chart type away for a different day and move onto something else. Another recent viz I really liked was this beautiful viz by Lindsey Poulter, which used a stepped line and dot combo chart to capture the magical 2018 season of Kansas City Chiefs QB, Patrick Mahomes. In his #WorkoutWednesday challenge for Week 4 of 2019, Curtis Harris built a similar chart that tracked headcount. I really loved not only the look of these vizzes, but also the ease of understanding them. So, I decided to go with this chart type, but the question was what data would it work well with? It’s probably worth mentioning that in a business setting, choosing your chart type first is probably not going to be the best approach. However, one of the great things about Tableau Public and community projects like #MakeoverMonday are that they offer us great opportunities to try new things and approach data visualization in different ways, in a safe environment.
Step 4. Finding the Story
The next step was to begin playing around with the data to find a story that fit the vision I had in my head. Early on, I had ruled out looking at team data, as I wanted to focus on players instead. Looking at hits, runs, RBI and home runs, I worked through some different ideas before landing on a viz that would feature the most recent members of the 500 home run club. Leveraging the stepped line dot combo chart, I felt it would be fun to visualize each player’s home runs by season, along with their team’s cost per home run (or the player’s salary per home run, whichever way you prefer looking at it). What I expected to see was as a player’s salary increased throughout their career, their salary per home run would increase fairly closely along with it. While this was true in a majority of cases, it certainly was not the case for all players on the list and in other cases, the increase in cost per home run was not as steep as I had guessed. Now that I had found a story, the next step was communicating it with a clean, engaging design.
Step 5. Simplicity in Design
I used just two colors in the viz, with a third for non-data related text. My colors were shades of red and blue, the colors of the MLB logo. For the chart, I set home runs as the stepped line chart and salary per home run as the sized dots. Here’s what it looked like.
The chart looked nice, but something very important was missing; salary by season. I immediately thought back to a fantastic blog post by Ryan Sleeper, in which he shares creative ways to use transparent sheets to add context to your vizzes. This was exactly what I needed, context. The moment I saw it, I fell in love with Ryan’s bar chart trend pushed to the background. So, I implemented this strategy, with player salary set as the trend in the background, with an opacity of 20%. This way it would be there for context, but not draw attention away from the other chart. With home runs set to running total for the player’s career, this worked out well, because as home runs increased, a player’s salary typically increased as well. So, for the most part, they increased along with one another. Here’s what it looked like after floating the stepped line on top of the bar chart.
To add more context, I included text for each player’s career salary, home runs and salary per home run, as well as the years they played. Lastly, I wanted the reader to be able to see the differences in all three measures, so I fixed the y-axis for the home runs stepped chart from 0 to 800, fixed the salary bar chart from 0 to $40 million and fixed the salary per home run dot size from $0 to $5 million. Below is a view after adding the text. Since I didn’t show any axes, I included an explanation through the use of an info button.
Step 6. Sense Checking the Data
It was not until after building the entire viz that I really took the time to look closely at the numbers to make sure everything made sense. Guess what? It didn’t!! For a while, I couldn’t figure out why a few players had extreme spikes in their salaries. Take a look at the below comparison of before and after I found the issue. Look at those spikes!! Why on earth would Gary Sheffield randomly make nearly $30 million dollars in one season and then go back to making $9-10 for the next several years? Answer? He wouldn’t, so there was clearly something wrong with either the data or one of my calculations.
incorrect salary figures
corrected salary figures
After digging around, here’s what I found. My ‘FIXED Player Salary’ calculation had originally been set up as SUM([Salary]), as I had not taken into account the fact that if a player had played for more than one team during the same season, they would have more than one row of data for that season. Here’s what the incorrect calculation and the result looked like.
I was certain $29.9 million was incorrect, but I also wanted to be sure that the $14.9 million figure was correct, so I checked the trusty old baseball-reference.com and saw that the numbers matched and Sheffield did indeed make that amount in 1998. So, I needed to change my calculation to pull in the MIN([Salary]) as opposed to SUM.
Overall, I enjoyed working with the data set this week and wound up spending a lot more time on this viz than on a typical #MakeoverMonday, mostly due to just playing around with exploring the data. Below is a look at my final viz, the interactive version can be found here. Thanks for reading, I hope you enjoyed!!
This week for #MakeoverMonday, the data set is ISS Spacewalks, which takes a look at the 216 Spacewalks at the International Space Station, from December 7, 1998 through April 8, 2019. The original visualization (pictured below) shows the difference in spacewalks in U.S. Spacesuits vs. Russian Spacesuits. Let’s get started.
Step 1. Understand the Data
There were two data sets this week. For my analysis, I’ll be sticking to the ISS Spacewalks data set, which is super simple to understand. However, if you choose to build a visualization using the Spacewalk Detail data set, you may want to do a little research to get a better idea of the history of the various missions throughout the years, as that data set contains more information pertaining to each individual mission. Some light reading could help you tell a more compelling story, should you choose to go that route. Ok, back to the ISS Spacewalks data set, here’s what it looks like; we have Year, Nation and Number of Spacewalks. Yep, that’s it. So, what did we like about the original viz?
Step 2. The Original Viz
I liked the original viz, as it was easy to understand that since 1998, many more spacewalks have taken place by American astronauts than by Russian astronauts. However, stacked bar charts can be difficult to read, so my goal was to move away from that chart type in favor of an easier to read chart type that would highlight the fact that the United States has had more spacewalks than the Russians in all but three years.
Step 3. Keeping it Simple
My main goal here was to display the data in a way that is easy to understand. I feel that when highlighting the variance in American spacewalks vs. Russian spacewalks, from year to year, the bar in bar chart does a better job than the stacked bar. Of course, there are other chart types that would work as well, but we’ll go with the bar in bar this time. Now, I’m also a big fan of work that is visually appealing and bar charts aren’t the most elegant, especially when you put one inside of another. So, after starting out with a regular old bar in bar, I quickly decided I needed something a little more pleasing to the eyes…insert the rounded bar chart. This video by Andy Kriebel is a great resource if you’re not sure how to make them in Tableau. Below are the final charts, my eyes are much more drawn to the rounded version, how about you?
Step 4. Finishing Touches
So, with the chart built, you can quickly see 2004, 2013 and 2014 are the only three years in which Russia had more spacewalks than the United States. You’ll also noticed I stuck with the colors from the original viz (however, I softened them a bit). I used them because they are familiar to anyone who has seen the original and I wanted them to be the only colors used, other than a shade of gray for text not related to the U.S. or Russia. A few other changes I made was to add a y-axis for the number of spacewalks, while removing the grid lines and rotating the labels for year, so they are easier to read. I went with a shade of black for my background, added a tooltip, title and some info in the footer area and was ready to publish. Here’s what the final viz looks like. An interactive version can be found on my Tableau Public page, here. Thanks for reading!!
In October of 2017, I wrote up some Data and Analytics related goals that I wanted to obtain by the end of the 2018 calendar year. My list contained six goals, all of which I felt were obtainable, but would take some dedication and hard work, in order to achieve. At the end of 2018, I sat down to review this list and realized for the first time, that while the fifth item on my list was checked complete, it hadn’t really been FULLY completed. The item read as follows; ‘Improve and Learn Every Week.’ While, I felt 2018 had seen me take some big steps forward in my journey, I didn’t feel I had improved and learned EVERY week throughout the year. So, in 2019 I wanted to do better. Realizing there is a ton of room for me to grow in this space and endless resources available, I decided that tracking my growth would not only accelerate my learning while holding me accountable, but it would also allow me to eventually share the idea with others!! And so, #PlanToGrow was born. I started by compiling lists of blogs, vlogs, videos, podcasts, books, community projects, anything and everything I wanted to learn around the topic of Data and Analytics, in 2019. So, with good enough lists to get me through at least the first few months of the year, I felt ready to get started. Each week throughout 2019, I would Tweet out what I planned to learn for that week. Then, after checking an item off the list, I would track it in a Google Sheets document. This was because I eventually wanted to visualize the results in Tableau. But, thanks to Sarah Bartlett and #IronQuest, I was able to make the visual part of the project a reality much sooner than it otherwise would have happened. #IronQuest’s February theme was “business dashboards,” so I thought what better time than now to put my data into a visual form, start tracking my progress in Tableau and share the templates with the community?
Learning and Tracking
To be honest, I put no thought whatsoever into where I would track my data, but instead went right to Google Sheets, where I felt comfortable and knew it could easily be shared once that time came. So, my tracker began and has evolved slightly over the past month or so. It may not be elegant, but it works; Here’s how. As mentioned earlier, at the beginning of each week, typically on Sunday night or Monday morning, I’ll Tweet out my plan for the week. That Tweet will look something like…
…then, throughout the week, as items are completed, they are tracked in the Google Sheets document. I shared the project at the February Twin Cities Tableau User Group meeting and was asked how I stay disciplined in my tracking. The answer is that Google Sheets is almost always open on my computer, so on good days, as soon as I finish reading a blog post or watching a video, I’ll turn right to Google Sheets and enter in the data. Otherwise, the app is on my phone, so that has helped tremendously, as well. With blogs/vlogs being the easiest resources to consume, after a couple weeks, I found that rather than bounce around, it was a good idea to focus on learning from just one individual each week. Therefore, in a worst case scenario, if I dropped the ball on tracking for that week, a quick pull-up of that person’s blog would allow me to scroll through the content and know whether or not I had read/watched each item. And with so many amazing people in our community having more than enough content on their blogs for me to get through in a week, this just seemed to make sense.
Sharing With Others
For me, the most exciting part of this project is the opportunity to share the idea and materials, with others. Having been engaged in the Tableau community for just over 18 months now, I know firsthand just how overwhelming it can be. There is SO MUCH amazing information out there, but sometimes it can be difficult to even know where to begin. In 2018, there were many occasions in which my free/learning time was unproductive, because too much of it was wasted with trying to figure out which blog post to read, which video to watch or what viz to download and reverse-engineer. Thanks to #PlanToGrow, my learning experience in 2019 has been much more relaxing and enjoyable. It really is amazing what a difference being prepared can make.
From beginning to end, the #PlanToGrow project has helped me in several ways. First and foremost, planning ahead helps me stay focused and on track each week…no more time wasted! Staying disciplined in tracking has helped that part of the project to not fall completely off the rails. Without the consistent tracking, there is no final piece of the project, the visual component in Tableau. The visuals allow me to quickly and easily see how I’m tracking to my overall goals and what each week looks like. Additionally, I can see how my community participation has been and which days of the week I need to be more productive. Shocker…I’m least productive on the weekends! All that is great, but my favorite part of the Tableau viz is the combination of the Set Actions and URL Actions that allow for lightning quick access to every piece of material I’ve covered in 2019. So many times in the past, I would read a great blog post or watch a useful video and then several weeks later, come across a situation where that very blog post or video would come in handy. But, in some cases I would be left asking myself…”what was that video called?” or “who wrote that blog post, again?” or “I swear I saw that on Twitter, right?” only to either fail in finding it again or waste too much time in the process of finding it. There will be no more of that…every great learning resource will now be a section of the treemap. Easily accessible from either directly within the treemap or by first selecting a category from the bar chart, to narrow down the options within the treemap. I love this!!
If you would like to create your own #PlanToGrow Tracker, all you need to do is follow a few simple steps. First, you will need to download a copy of the 2019 #PlanToGrow Tracker Template from my Google Drive and that can be found here. This document includes a Data tab as well as a Directions tab that explains the process of making this your own. Secondly, you will need to download the Tableau workbook and replace my data with yours. The workbook can be found here. That’s it!! If you’d like, feel free to share your workbook on Twitter and tag me (@JtothaVizzo) so I can see it.