Category Archives: Infographics

Infographics and Data Visualization (Week 2)

For week two of the course, we’ve been asked to take a look at this interactive graphic from the New York Times, which compares the different words that Democrats and Republicans speakers used during their respective conventions.

Overall, I thought that the graphic was pretty good but there were a few things that I might consider redesigning. The first problem I noticed was that, when you click on the word bubbles, the political quotes below the chart change based on your selection. Unfortunately, most of this interactivity occurs “below the fold” or off-screen so you don’t necessaryily see it right away. I would need to be presented with more cues to know that this was going on. It seems like tightening up the top part of the chart and shrinking some of the ad space or menu heights might help here.

It also took me awhile to figure out that you could type in your own words and add them to the graphic. This feature is pretty cool but I don’t think it is necessarily obvious to first time visitors. I liked how the new word bubbles kind of migrated around to find a spot in the crowd but they sometimes got stuck in the middle of the pack if the words around them were too big.

The bubble sizes are difficult to interpret directly but I don’t think that is necessary for this graphic. I do have a problem with the way the bubbles indicate the % of word usage by political party. I would expect either a pie chart with the % in a slice or maybe a color difference along a spectrum (blue to red).

My first redesign attempt:

Although this “sketch” is not interactive, you can kind of see where I was headed. The first issue I tackled was trying to make it more obvious that the individual words or phrases could be shown in context. I did this by moving the quotes up from the bottom and placing them in cartoon speech bubbles along the sides of the graphic. The directional arrow for each speech bubble points to the word being examined and also indicates a slider that can be moved up and down from word to word. The speech bubbles could expand to include multiple quotes or maybe there could be some other form of gallery navigation within the bubble itself.

The individual words are displayed in a standard bar chart that clearly shows the word itself but doesn’t play with the font size at all. I let all comparisons between the words be shown using the red and blue bars, with relative usage rates treated by the length of the bars. This allows direct comparison of usage rates between the two parties as well as relative comparison between words.

I imagined that typing a word or phrase in the box would add that word or phrase to the top of the “stack” of bar charts, moving the rest of the words down one slot. This way the user could add as many words as they want and scroll down the length of the chart to look at their entire list and make comparisons.

Despite these adjustments, it’s still hard to see how the average user would pull a compelling narrative out of this  presentation without some assistance. To me, the story of this graphic is about the language that the different parties use to craft their messages. The use of certain words over others reflects each party’s priorities and their understanding of the intended audience.

Since we know word choice is designed to influence the audience in some way, it might be interesting to include examples of how the two parties have used language in the past. On the Republican side, Newt Gingrich’s 1994 memo to the GOPAC titled “Language: A Key Mechanism of Control” is a famous example. It contains a list of “optimistic positive governing words” that Gingrich recommended for use in describing Republican politicians and “contrasting words” that he suggested using to describe Democrats.

On the other side of the aisle, people like George Lakoff and Elisabeth Wehling at the The Little Blue Blog use concepts like “frames” to describe how the use of particular words trigger associations with either conservative or progressive moral systems. (Another interesting look at the use of language in politics can be found at Sasha Issenberg’s Victory Lab site.)

Either of these resources might be a good starting point for an analysis of word usage by politicians. In fact, one member of the class posted a quick graphic using Gingrich’s positive words here and I found it fascinating that the top three positive words used by Democrats (fair, building and reform) demonstrated a far different focus than those used by Republicans (liberty, freedom and lead).)

Modifying the NYT graphic to accomodate these investigations might involve the addition of “starter lists” of words such as the top 10 words for each party by word count, top 10 words by uniqueness to each party, or Gingrich’s positive word list. I also like the idea of a word association feature which could suggested related topics via a word cloud or a “you might also try this word” feature.

 

Infographics and Data Visualization (Week 1)

The Introduction to Infographics and Data Visualization course begins Sunday so I’m starting to receive emails from the instructor. The first thing I need to do is tackle the reading list and then take a look at the first assignment, which involves the review of this graphic, which was based on a survey of 32,000 Internet users from 16 different countries. The survey asked these users about the kind of online services they used on a regular basis.

The online class discussion was pretty good and very thorough. My own thoughts began with the graphic “building block” that the designer used to organize and convey information. This consisted of a nested group of overlapping doughnut charts that used color, size and fractional divisions to represent the data for each country (see below).

I think that the arcs of the doughnut are meant to be interpreted in two dimensions: 1) the sweep of the arc represents the % of the category population that is engaged in the activity (similar to a regular pie chart) and 2) the radius from the center represents the overal size of the category population (similar to a regular bubble chart). Both pie charts and bubble charts can work in certain circumstances but they make direct comparisons difficult. Throw in the fact that the arcs overlap and it is almost impossible to understand the meaning associated with different variables. For example, the predominant color in graphs for countries like the U.S. or Canada is pink, which downplays the larger population of social profile users.

My first instinct for adjusting this infographic was to “unpack” the doughnut chart and place the data in a regular bar chart. By using standard bars, it is fairly easy to make comparisons between the different categories. The bar chart also shows percentages naturally if I include a gray bar that represents the total population of internet users. (The value of the gray bar is an assumption on my part, calculated by dividing the user value by the access percentage. This works for almost every country excpet the U.S. and the U.K.)

The real power of this approach comes with side-by-side comparisons of the data. After swapping the axes and adding in the other countries, the resulting chart allows for relatively easy comparison of both overall Internet usage and individual social media involvement. Both the U.S. and U.K. totals are fudged.

One problem I have with this chart is the huge amount of white space in the upper right quadrant. This is caused by the great disparity in size between the Internet populations of the largest and smallest countries. Adjustments like the use of a logarithmic scales or scatterplots might be able to fill out the canvas a bit but they also make direct comparisons more difficult. I’m also not too sure about the color scheme, which I find somewhat distracting.

Tackling both of these issues at once, I’ve removed the seperate colors for the social media categories and added in an overlay that uses a radar chart to show the realtive differences between social media usage within countries.

The radar charts are kind of fun and they make it pretty easy to see different patterns of Internet usage among the 16 countries. The higher social profile participation (and lower blog usage) of Western countries creates a distinctive shape when compared to Asian countries like Japan and South Korea. The two-color scheme also makes it easier to see patterns in the column charts. However, I’m not sure that depending on the order of the columns is enough to compare social categories across countries.

I’m going to let my solution stand for now. Meanwhile, here are some other solutions from the class and around the web:

 

 

Infographics and Data Visualization (Sign Up)

Despite a crazy schedule, I’ve decided to sign up for a free online course offered by the Knight Center for Journalism called Introduction to Infographics and Data Visualization. It runs from October 28 – December 8 and will be taught by Alberto Cairo, the author of The Functional Art: an Introduction to Information Graphics and Visualization, published by PeachPit Press. I will be sure to post my completed assignments here. It should be fun!

Family Pool Trends

With the start of a rather warm, dry summer here in Wisconsin, we’ve decided to take the plunge (literally) and purchase a new above ground pool. We seem to outgrow these things every few years and I’ve become intrigued with the idea that we just keep buying larger and large cylinders of water. After some exhaustive research (which mostly involved looking through a lot of old photographs and estimating pool sizes), I present you with a timeline of our family’s pool history. The bubbles represent surface area and allow for relative size comparison.


Pool History (1997-2012)

Bubble Size = Pool Size

It is interesting to see that — with the exception of a few strays — we seem to buy a new pool every three years. It is also interesting to see the exponential growth in water volume that began about the time my daughter was born (when my son was four). If we keep up that pace, our next pool will be over 10,000 cubic feet — about the size of two 18-wheelers full of water.

You Are What You Watch

Experian-Simmons released some survey data in December that looked at the relative popularity of major television shows for three different political groups: liberal Democrats; conservative Republicans; and middle-of-the-road voters. Each show was given an index based on the concentration of specific voters and this information was used to create lists of the top programs for each political group in both entertainment and news categories.

Although these top ten lists were interesting on their own, the fact that each individual TV program actually had an index rating for all three groups offers an opportunity for more complex analysis. The most obvious next step involves comparing pairs of groups in a 2D scatterplot chart. The Tableau visualization below shows the results.

A few notes:

  • Entertainment shows are in blue, news shows are in orange.
  • Shows without enough data for a particular group were still plotted as a zero index.
  • Hovering over each data point reveals the show and its indices.

 

The first thing I noticed was that news shows were much more partisan than entertainment shows. In fact, almost all of the shows with the most extreme scores were either news shows (primarily FOX and MSNBC) or fake news shows (Comedy Central’s Daily Show and Colbert Report). PBS gets a few high scores on the liberal side but the standard television networks are all fairly evenly watched.

Another thing that strikes me is how similar the watching habits of middle-of-the-road voters are to those of conservatives Republicans. The only noticeable exception occurs with news programs, but it is a pretty big exception: FOX News. All of the top ten conservative news programs were all on FOX while none of the top middle-of-the-road news programs were on that network. It might be encouraging for conservative politicians to see the similarities in entertainment interests between conservative voters and independents but I suspect that the gulf in news sources would be hard to overcome.

Many of the other differences have been noted elsewhere but are worth repeating: liberal Democrats tend to favor funnier shows and stories involving morally complex characters while conservative Republicans favor shows where people are doing stuff — either real work or reality competitions.

Of course, having complained about the lack of 2D analysis for this data in the major online outlets, I would be remiss if I didn’t point out the fact that each show has three indices apiece. Logically, we should be trying to show the data in a 3D scatterplot.

This isn’t as easy as it sounds since most of the major charting applications aren’t very good in 3D and they don’t provide any interactive option for the web that I could find. The best options seemed to be R or something called CanvasXpress — neither of which I had worked with before. I chose R, which allowed me to create both static and interactive 3D plots. However, only screenshots of the interactive plot are available at the moment. Several hours later …

Much Ado About Coughin’

Whether you know it as whooping cough or the 100 days’ cough, pertussis — a bacterial infection that causes severe coughing fits — is no fun. According to Wikipedia, it affects nearly 50 million people annually and causes almost 300,000 deaths worldwide. Although most of these deaths occur in developing nations, pertussis is the only vaccine-preventable disease that is associated with increasing deaths in the U.S.

Pertussis can be particularly dangerous for young children, so health departments keep a pretty close eye on local outbreaks and ask parents to keep their kids home from school while undergoing treatment. Unfortunately, the infection is very contagious and early symptoms are pretty mild. Combine this with some parental fears surrounding the vaccine and you’ve got a pretty good recipe for the occasional quasi-epidemic.

This year’s “winner” in the whooping cough stakes is apparently Wisconsin. As of April 21, 2012, the CDC estimates that the Badger State has had over 1,000 cases of pertussis, which is about as many cases as all of the Pacific Coastal states combined. Among these unlucky cheeseheads were the two fully-vaccinated kids that currently live under my roof. (My wife speculates that they picked it up at an extremely packed showing of The Hunger Games.)

Now that the quarantine period is over and my two little data points are on the mend, I thought it would be interesting to use some of the CDC data to experiment with Google Charts. I was especially interested to note that Google had a treemap feature. In the chart below, the size of the rectangles represents the current number of whooping cough cases, while the colors represent the increase or decrease over the same period in 2011. (Note: in the revised treemap option, the size of the rectangles represents the current number of whooping cough cases per million in population.)

Pretty simple example, no drill downs or tooltips for now.

U.S. Cases of Whooping Cough (April 21, 2012)

Toggle Between Cases and Cases per Million



Oh, and if you’re looking for Minnesota or Oklahoma, neither state has any current cases.

My favorite online example of a treemap is the Map of the Market on SmartMoney.com. The navigation is very robust and you can nest groups of categories on the primary display. Google’s product allows you to drill down several levels but I couldn’t figure out a way to combine them in one view. I also like the way SmartMoney’s chart allows you to display additional information about each element when you hover over it with your mouse. I suspect that this is possible with the Google version but it is not explicitly called out in the documentation.

Does it work? For comparision, here is the same data in a standard Google bar chart:

U.S. Cases of Whooping Cough (April 21, 2012)

The bar chart results in a lot of whitespace and it needs to be much bigger in order for all the bars to fit. I tried a bubble chart as well (below) but there are limitations for this format, too. In particular, clumps of bubbles are difficult to read. I had to transform the data using a logarithmic scale to spread the shapes out a bit.

U.S. Cases of Whooping Cough (April 21, 2012)

2012 Cases vs. Cases per Million (Size=Population)

Geographic References in Local Business Names

This little exercise came about after I read an article on the old Northwest Territory in the U.S., which basically consisted of all the land west of Pennsylvania, northwest of the Ohio River, and east of the Mississippi River. As the country expanded westward, this geographic area gradually became known as the “Midwest” (or the East North Central States region) but not before the older name left its mark on the local culture. Organizations like Northwestern Mutual Life (Milwaukee) and Northwestern University (Chicago) still refer back to to the days when these places were located on the fringe of the country, not at its center.

It occurred to me that researching such place names would be a good way to see if there was still a residual “shadow” of the old Northwest territory so I downloaded a sample list of company headquarters with the phrase “Northwest” or “Northwestern” in their names and plotted them on a map. Alas, this attempt failed to find anything significant (there was too much competition with the Pacific Northwest in name usage). However, I did look up some other regional terms with more positive results.

 

The geographic patterns for most of these terms are fairly distinct but there are also some areas of overlap. It was especially interesting to see regions that had local businesses in three or more categories. The old Northwest territory fits this mold with a combination of Midwest, Great Lakes, and Prairie.