#MakeoverMonday Week 2019-22 Diary

The data set for this week’s #MakeoverMonday is CO2 emissions per capita, per country, with the original visualization (below) showing the trends of nine selected countries, from 1960 through 2014. So, what works and what doesn’t work with this chart? I don’t mind a line chart displaying the trends of CO2 emissions by country. However, here are a few things I don’t like about the original. The colors are difficult to deal with and I would prefer a solid line vs. the dashed line in the original viz. The country labels block the last 5-10 years of the viz, depending on what line you’re following, so that’s not ideal either. I see in the original, the user has the option of toggling the labels on or off. But, if you turn the labels on and they end up covering part of the viz, I would have gone for an alternative approach to labeling the lines. Alright, let’s get down to business.

 

Step 1. Understanding the Data

The data set is a nice and easy one to work with, giving us Country Name, Country Code and the CO2 emissions for each year, from 1960 to 2014. However, in looking through the data set, the first thing that caught my attention was there are several additional rows of aggregated data, such as ‘Arab World’ and ‘Caribbean small states’ below. Depending on your analysis, you may want to use these, so just be aware that they are there. If not interested in using them, consider throwing a data source filter on Country Name, before jumping into Tableau, and filtering these out, so you don’t have to deal with them.

data

The only other thing with the data set is when pulling it into Tableau, you’ll likely need to take a few small steps to reshape the data;

  • You’ll notice the field names are in row 1 and the headers read F1, F2, F3, etc. To fix this, from the Data Source pane, click on the drop down of the sheet you pulled onto the canvas and select ‘Field Names are in first row.’
  • Next, the years 1960 to 2018 are in columns and we want those in rows instead, so we’ll pivot our data, giving us a tall data set as opposed to the current wide data set.
    • To do this click on the header of the year 1960, hold shift and scroll to 2014, click on that as well. This will select all years from 1960 to 2014. Next, right-click and select pivot.
    • Since there’s no data in the years 2015 to 2018, feel free to hide them.
    • Now, rename your new columns;
      • Change ‘Pivot Field Names’ to ‘Year’
      • Change ‘Pivot Field Values’ to ‘Value’
  • That should leave us with four columns; Country Name, Country Code, Year and Value. Alright, now we’re ready to jump into Sheet 1.

Step 2. Recreating the Original

After taking some time to explore the data, I decided to try something I’m not sure I’ve ever actually done as part of a #MakeoverMonday and that is to make a recreation of the original visualization with the exact same chart types. So, I’ll make a replica of the line chart and look to incorporate the bar charts into my viz as well. With this approach, I’ve defined three goals;

  • Make the viz cleaner
  • Better solution for the labels
  • Improve the interactivity

Usually my goal for #MakeoverMonday is to come up with a better way to visualize the data through the use of a different chart type. However, with this visualization, I feel the line chart and bar chart are good choices, the line chart just needs to be cleaned up and the bar chart is a little blah. What better time to try out Tableau’s BRAND NEW Parameter Actions, featured in the recent 2019.2 release?!!

 

Step 3. Effective Use of Color

I took to Tableau and built exact replicas of the original line chart and bar chart. While I changed a few things formatting-wise, the only thing different with the charts themselves is the use of color. Instead of several different colors on the line chart, I used parameter actions to highlight, in red, the country being hovered on. Likewise, I followed this coloring through to the bar chart, which will end up in a viz in tooltip. Here they are below, as stand alone charts. Using color to highlight a certain country helps the audience to see how that country differs from the others.

 

Tableau’s parameter actions are so easy to use. If you’ve used Set Actions before, the set up is very similar. Here’s all that is required for the three parameter actions in my viz.

  1. Create a parameter using the Country Name field. I called it Country Parameter. This one parameter will be used in all the parameter actions.

countryparam

 

 

 

 

 

 

 

 

2.  Create a Boolean calculation called Country T/F and drag it to both the size card and the color card. Then simply adjust the size to your liking for both the True and False values and do the same for coloring. I adjusted my sizing and color, so when a Country was selected, the line thickened and turned red in color, while the other countries are thin gray lines, pushing them to the background but keeping them plenty visible for comparisons. Quick note: I also dropped this calculation on the color card of my viz in tooltip bar chart, allowing it to highlight the country being hovered on, just like the line chart.

3. Create a calculation that checks to see if the Country Name = the Country Parameter. If True, then it displays the Country Name, if False then it is blank. I dragged this to the label card to label the country being highlighted via the parameter actions. All other countries will receive no label.

Here are the calculations as well as the sheet. To get the label to fit at the end of the line chart, I both fixed the Year axis to add a few additional years and also added 25 pixels of right outer padding to this sheet, once it was dropped onto the dashboard. I could have just done more padding without fixing the axis and got the same result.

 

linesheet

4. Once the sheet was pulled onto the dashboard it was time to set up the Parameter Actions. This is literally all there is to it; from the Menu go to Dashboard –> Actions –> Add Action –> Change Parameter. An Edit Parameter Action dialogue box will pop up. Simply name your action if you’d like, select your Source Sheets, Target Parameter and Field and then set the action to run on either Hover, Select or Menu. I chose to run the action on Hover, as it made the most sense for the interactivity in this viz.

paramaction

Step 4. Formatting

Alright, with the Parameter Actions set up, it was time to finish this thing off with a little formatting. Here are some formatting steps I took to clean up the viz from its original version.

  • Changed the Y-axis tick marks to an interval of 5 instead of 2
  • Changed the X-axis tick marks to an interval of 10 instead of 5
  • Removed the grid lines
  • Changed the default axis font to Tableau Book 8pt, bold and gave it a darker color to help push it to the background
  • Replaced the default tooltips with the viz in tooltip, featuring the bar chart with selected country highlight
  • Added a small message letting the user know to hover for interactivity
  • Changed the background to a darker color…just personal preference

There we go, that’s it. Nothing crazy, but I feel like we gave the original viz a nice makeover, cleaning it up and making it more user friendly. The final product is below and you can play around with the interactive version right here. Thanks for reading and have a wonderful day!!

co2snip

Tableau Set Actions – creating an interactive March Madness bracket

topphoto

The following blog post takes the reader through the process of building my March Madness Bracket of Champions viz, in Tableau. However, this project involved quite a bit of pre-Tableau work, which I would also like to share, so if you came strictly for the Tableau part, please scroll down to the ‘Building the Viz in Tableau’ section.

Prepping the Viz for Tableau

The Inspiration

I first saw data portraits being used in Tableau by Zen Master, Neil Richards, in November of 2018, with his TUG data portraits viz. At the time, I was unaware of their origination, but on Neil’s viz he included that the idea was inspired by Giorgia Lupi, so I did a little research to become more familiar with the concept. It appears Giorgia introduced the idea at TED 2017 in Vancouver, through the creation of buttons for conference attendees, as a way to create connections with other conference goers. Prior to the conference, attendees filled out a series of non-invasive questions that revealed fun facts about them. A design system then turned the answer to each question into a unique set of shapes, colors and symbols. About a month after Neil’s viz, I saw Josh Tapley create a viz of badges as well, his for the Philadelphia Tableau User Group. I loved how creative and beautiful they were, so knew I wanted to try it out, the only question was what to do?

The Idea

I didn’t want to copy Neil and Josh…although it does seem like a really cool thing for the Twin Cities Tableau User Group to try one of these months!! Instead I wanted to try something a little different. Being the sports fan I am, it was only natural that my version of data portraits would somehow tie in sports. My initial thought was to make a data portrait for each of the top players in the upcoming NBA Draft. I thought the data from each player’s scouting report could work perfectly for a data portrait, as you would essentially be answering questions, just like on Giorgia’s buttons. What is the player’s position? How tall is the player? What is their biggest strength, etc? However, it was still only December and with the draft still six months away, I simply could not wait that long! So, sticking with the basketball theme, my next thought was to create a bracket, where each team is represented by a data portrait. So, I filed away the idea and a few months later, with NCAA March Madness looming, tried creating my first badge. The North Carolina Tar Heels are my favorite college basketball team, so I created the below (left), badge, which displayed the following information; the team (logo), the year they won the national championship (1993), their tournament seed that year (#1 seed), the conference they played in (bottom coloring), their win/loss record (34-4), win/loss margin by game (step line chart), and the number of players who would go on to reach the NBA (one star per player). I chose to create a bracket of past champions, as I felt it could be a fun lead up to the actual tournament and because fans are always debating which past teams were better, etc. Why not create an interactive bracket, where people could fill out their bracket of past March Madness champions and share it with others?!

Screen Shot 2019-04-16 at 2.13.20 PM

The Data

I had an idea, but what did the data look like, that would support the idea? To be honest, I didn’t need much to get started. My initial data set included only the Year, the Champion, their Seed, their win/loss record and their conference. I grabbed it from sports-reference.com/cbb and dumped it into Google sheets. It looked like this.

Screen Shot 2019-04-16 at 2.34.29 PM

From here, I could start building out the team data portraits. Where else would I turn for this step, other than PowerPoint?! For more on combining the powers of Tableau and PowerPoint, be sure to check out this great post from Kevin Flerlage. In his post, Kevin recommends blog posts by Josh Tapley and one by Kevin’s brother and Tableau Zen Master, Ken Flerlage, that introduced him to the concept of mixing Tableau with PowerPoint. The only other data I would end up including was game by game margins of victory/defeat for each team (for the step line chart), as well as statistical leaders for each team, which was a late addition to the tooltips.

The Data Portraits

With the initial data in hand, it was off to PowerPoint to create 32 more data portraits, one for each NCAA Men’s Basketball champion, from 1985 through 2018. Basically, all I did here was make copies of the original North Carolina data portrait and then swap out the elements for each of the other teams. For example, to create this Michigan data portrait, I copied the North Carolina one, switched the year, added/removed the appropriate number of stars, changed the seed number and conference color accordingly and finally swapped the logo and line graph and adjusted the win/loss record. The line graphs were made in Tableau, saved as images and brought into PowerPoint. The logos were saved as images from ESPN.com and brought into PowerPoint and then I added an artistic effect under the formatting tab, to give them a little colored pencil look.

It took some patience, but after several hours, over the course of a few late nights, I had finally completed all 33 of the data portraits and was ready to start building the bracket! One quick note; the 2013 championship won by the Louisville Cardinals was vacated due to team violations, so I omitted them from the viz.

The Rankings

After taking a stab at ranking the teams myself, it dawned on me that maybe someone else, much more qualified, had already done this work. A quick google search and I was delighted to see that, indeed, this had been done and fairly recently. In April 2018, ESPN Insider, John Gasaway had ranked all champions from 1939 to 2018. I compared my rankings against his and although many of mine were within one or two spots of his, a few, most notably 1995 UCLA, were way off. I had that Bruins squad much higher than Gasaway’s ranking of sixteenth. So, to ensure the seedings in the bracket were legitimate, I decided to follow Gasaway’s rankings, with a few very small tweaks, in order to balance out the bracket and avoid having the same school play another version of itself, early on.

image1.jpeg
The original sketch

Of the 33 teams, there were five instances of Duke, four North Carolina’s, four Connecticut’s, three Kentucky’s and three Villanova’s. So, those five schools accounted for 19 of the 33 teams. With far too much time spent jockeying the teams around, I was finally able to produce a bracket in which none of the above schools would meet until at least the third round. So, with the rankings set, it was time to build the viz.  

Building the Viz in Tableau

The Set Up

I wanted the viz to have the look of an actual bracket that you might fill out by hand or online, in your local bracket challenge pool. So, in Tableau, once I had the team data portraits placed on the dashboard, I would leverage ninety-two text boxes to draw out the bracket. Each text box was filled with navy blue and set to be 3 pixels tall or wide, depending on its position. Looking back, this part was pretty tedious, but it allowed me to design the bracket exactly the way I wanted it to look, which was nice. Ok, back to the data portraits.

My goal in building this viz was to create a fun March Madness bracket, that would become interactive through the use of Tableau Set Actions. If you remember from above, the placement of the teams into the bracket had been determined, so Step 1 was to essentially create a bracket that had not yet been filled out. To place each team into their respective position in the bracket, I created a worksheet, that looked like the one below, for each of the sixteen first round match-ups and then floated (don’t hate me Team Tiled!!) each worksheet on the dashboard. Side note: this dashboard is literally a Team Tiled member’s worst nightmare, as there are somewhere in the neighborhood of 150 floating objects on the dashboard.

I used the ‘Bracket’ field to filter each worksheet to its appropriate bracket and then the ‘Seed 1’ field to filter to the correct match-up. To account for schools with multiple championships, I then created a calculated field called ‘Year+Team’ which combined the ‘Year’ and ‘Champion’ fields. Pulled onto the shapes card, this would allow me to assign one data portrait per champion. Once this part was complete, I was left with eighteen sheets (originally seventeen) to float onto the dashboard. Why eighteen and originally eighteen? The original viz was built prior to the 2019 tournament and featured one “play-in” game. The play-in game was built using two sheets instead of one, so that’s how we get to seventeen sheets. Also, I updated the viz after the 2019 tournament, to include the 2019 champion Virginia Cavaliers, after their miracle run to the title; the last two games of which I was fortunate enough to have seen in person, at the Final Four in Minneapolis. What an amazing sports experience!! Anyway, adding Virginia led to the need for another play-in game, thus adding another sheet and getting us to eighteen. Alright, the bracket was set up, next up was to add the interactivity.

The Interactivity

The interactivity was set up with a few simple steps, which were repeated for each game throughout the tournament.

  1. Create a Set for each game in the bracket. Each Set looked identical to the one pictured below. The set was created using the Year+Team field and I left all boxes unchecked to ensure the worksheets that would later be dropped onto the dashboard were blank until the addition of the Set Actions.
sets
31 Sets, one for each game

2. I then created a Boolean (T/F) calculation for each game like the one shown below, created a sheet for each game in the tournament and dragged the Boolean calculations for each game onto the Filters shelf of their respective sheets, setting them all to True. This would ensure that once the Set Actions were in place, the blank sheets would populate with the expected data portrait.

tf

3. Next, the sheets needed to be placed (floated) onto the dashboard, into their positions within the bracket. I floated them on the bracket as shown in the picture below.

floatingsheets
Floating more objects onto the dashboard!

4. Lastly, we needed to add in the Set Actions. Once again, there are 31 game so we need 31 Set Actions. In the example below, we’re using the Source Sheet 2.1, which contains the 1995 UCLA Bruins and the 2016 Villanova Wildcats. We tell the Set Action to target the Game 2 Set, which was set to True on the blank sheet named 2.2. And then we click ok and back on the dashboard, if we click the UCLA data portrait on Sheet 2.1, we see them advance into the second round of the tournament, onto Sheet 2.2. Every other Set Action is set up just like this and together, they provide the dashboard interactivity.

The Viz in Tooltips

Lastly, while I felt the data portraits provided great high level information about each champion, what they lacked was any type of information regarding the players. So, I pulled some more data from sports-reference.com/cbb and added a tooltip that, on the left-hand side, provided a zoomed in view of the data portrait and on the right-hand side, provided the user with each team’s statistical leaders in three main categories; points, rebounds and assists. The ’94 Arkansas Razorbacks were one of my all-time favorite college teams…and it didn’t hurt that they also beat Duke in the title game!!

ark

The Feedback

Before wrapping up, I want to give a shout out to Kevin Flerlage for some fantastic feedback throughout the whole process of building this viz. Kevin helped me with some decisions regarding the tooltips and a nice clean way of executing a “clear bracket” option, among other great input. Also, when I was in the early stages of building out the viz, I thought it was a pretty cool idea. But after getting it to a point where it could be shared with others, for feedback, Kevin’s reaction and genuine excitement for the viz made me that much more motivated to get this thing across the finish line. Also, a big thanks to my co-workers Jim Van Sistine and Tom Coyer for providing their feedback as well and last, but not least, my friend Jason Underdahl, who said of the initial data portrait “why do you have the logo grayed out? You can barely even see it!” That’s tough love, but he had a good point! Adding the color back to the logos really made them pop!!

Thanks for reading, I hope you enjoyed this post and found it useful.

The Final Product

Interactive Version here

Bracket of Champions (4)

#MakeoverMonday Week 2019-19 Diary

For this week’s #MakeoverMonday, we’re looking into cost effectiveness in Major League Baseball. More specifically, how does a player/team salary translate into productivity on the field, across a variety of statistical categories. For instance, if Player X made $20 million in 2015 and hit 20 home runs that year, you could say the team is paying $1 million per home run hit by Player X. Alright, let’s get started.

original

Step 1. Understanding the Data

Since I’m a lifelong baseball fan, this is a data scenario that is familiar to me. However, Andy and Eva did a great job of including links on the data sets page for those who may be less familiar with the sport. If this were, say, Rugby data, I would absolutely be diving into those resources, so if you ever feel uncomfortable with the data set, be sure to do a little bit of research. One thing I will mention is that it looks like the data set is focused on hitting stats only and does not include pitching stats. However, pitchers are included in the data set, because they do compile hitting stats in certain situations. If you’re unfamiliar with the rules and why this could potentially matter for this data set, here are a few notes;

  • Pitchers ONLY hit in games that are played in National League ballparks
  • Starting pitchers ONLY start every 4th or 5th game
  • MOST (not all) pitchers are not very good at hitting the ball
  • MOST (not all) good pitchers have high salaries

So, if the average National League pitcher starts every 5th game (32 starts) and gets three plate appearances per game, that comes to 96 at bats, for the season. So why does any of this matter? Like we mentioned, pitchers typically aren’t great at hitting the ball, so their hitting stats could look very poor when compared to the average position player (all other players on the field, other than the pitcher are referred to as position players). So, if we’re analyzing the cost a team pays players per home run, for instance, let’s look at an example of what it could look like when comparing a pitcher vs. a good position player.

  • Pitcher X
    • makes $25 million and hits 1 home run in 96 plate appearances
      • This would suggest we pay Pitcher X $25 million per home run hit
  • Position Player Y
    • makes $25 million and hits 25 home runs in 432 plate appearances
      • We pay Position Player Y $1 million per home run hit

This scenario would lead you to believe that Position Player Y is a much more cost effective player, when the reality is simply that he is paid to hit the ball, while Pitcher X is paid primarily to pitch the ball. And since the data set does not include a field for each player’s “position,” we’re unable to simply filter pitchers out of the data set. Therefore, it may make sense to set a filter on plate appearances and set it to a minimum of 200 per season. This would filter the pitchers out, as in my opinion, it does not make sense to include them in this analysis. I apologize for the long-winded explanation, but in my first glance at the data, I saw this as potentially slipping up some participants who may not be familiar with the game. Ok, what about the original viz?

Step 2. The Original Viz

The scatter plots on the original viz are easy enough to understand, but to be honest the way in which it was labeled, made it difficult for me to follow, especially the bottom, team section. Also, I didn’t find the team section all that interesting, because basically it was just showing us what teams have the lowest payrolls (Houston Astros) and which teams have the highest payrolls (New York Yankees and Los Angeles Dodgers). I guess the most interesting part of the team section was it tells us just how unbelievably bad the Miami Marlins were, offensively, in 2013. Wow!! Last in the league in all five categories.

Step 3. Try New Things

Awhile back, I saw this really good video by Andy, on how to build a no-whisker box plot and have been waiting for the right opportunity to try creating something similar. I was hopeful this data set would provide that opportunity, but after working through a few different scenarios, I was unhappy with the results. So, we’ll continue to file that chart type away for a different day and move onto something else. Another recent viz I really liked was this beautiful viz by Lindsey Poulter, which used a stepped line and dot combo chart to capture the magical 2018 season of Kansas City Chiefs QB, Patrick Mahomes. In his #WorkoutWednesday challenge for Week 4 of 2019, Curtis Harris built a similar chart that tracked headcount. I really loved not only the look of these vizzes, but also the ease of understanding them. So, I decided to go with this chart type, but the question was what data would it work well with? It’s probably worth mentioning that in a business setting, choosing your chart type first is probably not going to be the best approach. However, one of the great things about Tableau Public and community projects like #MakeoverMonday are that they offer us great opportunities to try new things and approach data visualization in different ways, in a safe environment.

Step 4. Finding the Story

The next step was to begin playing around with the data to find a story that fit the vision I had in my head. Early on, I had ruled out looking at team data, as I wanted to focus on players instead. Looking at hits, runs, RBI and home runs, I worked through some different ideas before landing on a viz that would feature the most recent members of the 500 home run club. Leveraging the stepped line dot combo chart, I felt it would be fun to visualize each player’s home runs by season, along with their team’s cost per home run (or the player’s salary per home run, whichever way you prefer looking at it). What I expected to see was as a player’s salary increased throughout their career, their salary per home run would increase fairly closely along with it. While this was true in a majority of cases, it certainly was not the case for all players on the list and in other cases, the increase in cost per home run was not as steep as I had guessed. Now that I had found a story, the next step was communicating it with a clean, engaging design.

Step 5. Simplicity in Design

I used just two colors in the viz, with a third for non-data related text. My colors were shades of red and blue, the colors of the MLB logo. For the chart, I set home runs as the stepped line chart and salary per home run as the sized dots. Here’s what it looked like.

step

The chart looked nice, but something very important was missing; salary by season. I immediately thought back to a fantastic blog post by Ryan Sleeper, in which he shares creative ways to use transparent sheets to add context to your vizzes. This was exactly what I needed, context. The moment I saw it, I fell in love with Ryan’s bar chart trend pushed to the background. So, I implemented this strategy, with player salary set as the trend in the background, with an opacity of 20%. This way it would be there for context, but not draw attention away from the other chart. With home runs set to running total for the player’s career, this worked out well, because as home runs increased, a player’s salary typically increased as well. So, for the most part, they increased along with one another. Here’s what it looked like after floating the stepped line on top of the bar chart.

step2

To add more context, I included text for each player’s career salary, home runs and salary per home run, as well as the years they played. Lastly, I wanted the reader to be able to see the differences in all three measures, so I fixed the y-axis for the home runs stepped chart from 0 to 800, fixed the salary bar chart from 0 to $40 million and fixed the salary per home run dot size from $0 to $5 million. Below is a view after adding the text. Since I didn’t show any axes, I included an explanation through the use of an info button.

step3

Step 6. Sense Checking the Data

It was not until after building the entire viz that I really took the time to look closely at the numbers to make sure everything made sense. Guess what? It didn’t!! For a while, I couldn’t figure out why a few players had extreme spikes in their salaries. Take a look at the below comparison of before and after I found the issue. Look at those spikes!! Why on earth would Gary Sheffield randomly make nearly $30 million dollars in one season and then go back to making $9-10 for the next several years? Answer? He wouldn’t, so there was clearly something wrong with either the data or one of my calculations.

After digging around, here’s what I found. My ‘FIXED Player Salary’ calculation had originally been set up as SUM([Salary]), as I had not taken into account the fact that if a player had played for more than one team during the same season, they would have more than one row of data for that season. Here’s what the incorrect calculation and the result looked like.

calcs

sheffsal

I was certain $29.9 million was incorrect, but I also wanted to be sure that the $14.9 million figure was correct, so I checked the trusty old baseball-reference.com and saw that the numbers matched and Sheffield did indeed make that amount in 1998. So, I needed to change my calculation to pull in the MIN([Salary]) as opposed to SUM.

bbref

calc

Overall, I enjoyed working with the data set this week and wound up spending a lot more time on this viz than on a typical #MakeoverMonday, mostly due to just playing around with exploring the data. Below is a look at my final viz, the interactive version can be found here. Thanks for reading, I hope you enjoyed!!

Home Runs