Author Archives: mkinde

Infographics and Data Visualization (Week 2)

For week two of the course, we’ve been asked to take a look at this interactive graphic from the New York Times, which compares the different words that Democrats and Republicans speakers used during their respective conventions.

Overall, I thought that the graphic was pretty good but there were a few things that I might consider redesigning. The first problem I noticed was that, when you click on the word bubbles, the political quotes below the chart change based on your selection. Unfortunately, most of this interactivity occurs “below the fold” or off-screen so you don’t necessaryily see it right away. I would need to be presented with more cues to know that this was going on. It seems like tightening up the top part of the chart and shrinking some of the ad space or menu heights might help here.

It also took me awhile to figure out that you could type in your own words and add them to the graphic. This feature is pretty cool but I don’t think it is necessarily obvious to first time visitors. I liked how the new word bubbles kind of migrated around to find a spot in the crowd but they sometimes got stuck in the middle of the pack if the words around them were too big.

The bubble sizes are difficult to interpret directly but I don’t think that is necessary for this graphic. I do have a problem with the way the bubbles indicate the % of word usage by political party. I would expect either a pie chart with the % in a slice or maybe a color difference along a spectrum (blue to red).

My first redesign attempt:

Although this “sketch” is not interactive, you can kind of see where I was headed. The first issue I tackled was trying to make it more obvious that the individual words or phrases could be shown in context. I did this by moving the quotes up from the bottom and placing them in cartoon speech bubbles along the sides of the graphic. The directional arrow for each speech bubble points to the word being examined and also indicates a slider that can be moved up and down from word to word. The speech bubbles could expand to include multiple quotes or maybe there could be some other form of gallery navigation within the bubble itself.

The individual words are displayed in a standard bar chart that clearly shows the word itself but doesn’t play with the font size at all. I let all comparisons between the words be shown using the red and blue bars, with relative usage rates treated by the length of the bars. This allows direct comparison of usage rates between the two parties as well as relative comparison between words.

I imagined that typing a word or phrase in the box would add that word or phrase to the top of the “stack” of bar charts, moving the rest of the words down one slot. This way the user could add as many words as they want and scroll down the length of the chart to look at their entire list and make comparisons.

Despite these adjustments, it’s still hard to see how the average user would pull a compelling narrative out of this  presentation without some assistance. To me, the story of this graphic is about the language that the different parties use to craft their messages. The use of certain words over others reflects each party’s priorities and their understanding of the intended audience.

Since we know word choice is designed to influence the audience in some way, it might be interesting to include examples of how the two parties have used language in the past. On the Republican side, Newt Gingrich’s 1994 memo to the GOPAC titled “Language: A Key Mechanism of Control” is a famous example. It contains a list of “optimistic positive governing words” that Gingrich recommended for use in describing Republican politicians and “contrasting words” that he suggested using to describe Democrats.

On the other side of the aisle, people like George Lakoff and Elisabeth Wehling at the The Little Blue Blog use concepts like “frames” to describe how the use of particular words trigger associations with either conservative or progressive moral systems. (Another interesting look at the use of language in politics can be found at Sasha Issenberg’s Victory Lab site.)

Either of these resources might be a good starting point for an analysis of word usage by politicians. In fact, one member of the class posted a quick graphic using Gingrich’s positive words here and I found it fascinating that the top three positive words used by Democrats (fair, building and reform) demonstrated a far different focus than those used by Republicans (liberty, freedom and lead).)

Modifying the NYT graphic to accomodate these investigations might involve the addition of “starter lists” of words such as the top 10 words for each party by word count, top 10 words by uniqueness to each party, or Gingrich’s positive word list. I also like the idea of a word association feature which could suggested related topics via a word cloud or a “you might also try this word” feature.

 

Trends in NFL Football Scores (Part 1)

One of the goals I set for myself this summer was to learn a bit about D3, a visualization toolkit that can be used to manipulate and display data on the web. Considering that the trees are bare and we’ve already had our first frost here in Wisconsin, you can safely assume that I am behind schedule. Nevertheless, I feel that I’ve finally reached a point where I have something to publish, so here goes.

First of all, a little background. D3 is a JavaScript library that allows you to bind data to any of the elements (text, lines and shapes) you might normally find on a web page.  These objects can be stylized using CSS and animated using simple dynamic functions. These features make D3 a perfect tool for creating interactive charts and graphs without having to depend on third party programs like Google Charts, Many Eyes or Tableau.

I wanted to start out with something simple so I elected to go with a basic line chart using data I pulled from Pro-Fooball-Reference.com. This site contains a ton of great information and statistics from the past 90+ years of the National Football League but — for now — I just looked at the final scores of all the games played from 1920 to 2011. My first D3-powered chart is below. It shows the average combined scores of winning and losing teams for each year of the NFL’s existence.

Although this chart looks pretty simple, every element — including titles, subtitles, axes, labels, grids and data lines — has been created manually using the D3 code. The payoff is pretty nice. All of the elements can be reused and you have tremendous control over what is shown onscreen. To demonstrate some of these cababilities, I’ve added interactive overlays that show a few of the major eras in NFL football (derived from work of David Neft and this discussion thread). If you move your mouse over the graph, you will see these different eras highlighted:

Early NFL (1920-1933) – The formation of the American Professional Football Association (APFA) in 1920 marked the official start of what was to become the National Football League. This era was marked by rapid formation (and dissolution) of small town franchises, vast differences in team capabilities and a focus on a relatively low-scoring running game. At this time, the pass was considered more of an emergency option than a reliable standard. The rapid growth in popularity of the NFL during this era culminated with the introduction of a championship game in 1932.

Introduction of the Forward Pass (1933-1945) – The NFL discontinued the use of collegiate football rules in 1933 and began to develop its own set of rules designed around a faster-paced, higher-scoring style of play. These innovations included the legalization of the forward pass from anywhere behind the line of scrimmage — a change that is often called the  “Bronko Nagurski Rule” after his controversial touchdown in the 1932 NFL Playoff Game.

Post-War Era (1945-1959) – The end of WWII saw the expansion of the NFL beyond its East Coast and Midwestern roots with the move of the Cleveland Rams to Los Angeles — the first big-league sports franchise on the West Coast. This period also saw the end of racial segregation (enacted in the 30s) and the start of nationally televised games.

Introduction of the AFL (1959-1966) – Professional football’s surge in popularity led to the formation of a rival organization — the American Football League — in 1960. The growth of the flashy AFL was balanced by a more conservative style of play in the NFL. This style was epitomized by coach Vince Lombardi and the Green Bay Packers, who would win five championships in the 1960s. In 1966, the two leagues agreed to merge as of the 1970 season.

Dead Ball Era (1966-1977) – Driven in part by stringent restrictions on the offensive line, this period is marked by low scores and tough defensive play. Teams that thrived in this environment include some of the most famous defenses in modern NFL history: Pittsburgh’s Steel Curtain, Dallas’ Doomsday Defense, Minnesota’s Purple People Eaters and the Rams’ Fearsome Foursome.

Live Ball Era (1978-present) – Frustrated by the decreasing ability of offenses to score points in 70s, the NFL began to add rules and make other changes to the structure of the game in an attempt to boost scoring. The most famous of these initiatives was the so-called “Mel Blount Rule” (introduced in 1978), which severely restricted the defense’s ability to interfere with passing routes. With the subsequent introduction of the West Coast Offense in 1979 — an offense based on precise, short passes — this period became marked by a major focus on the passing game.

Having created this first chart, I decided to build a second chart based on the ratio of average winning scores to average losing scores to see if there were any patterns.

The chart above shows how — after a period of incredibly lopsided victories — the average scoring differential settled in to a very steady pattern by the late 1940s and stayed at that level (roughly 2:1) for the next 30 years. Despite many changes in rules, coaching techniques, technology and other factors, only the pass interference rules of the late 1970s seemed to have any signifcant effect on this ratio, shifting it to just under 1.8:1 for the next 30 years.

While I had the data available, I also decided to look at the differences in average scores between home teams and away teams. The chart below plots this data along with the same overlay I used in the first chart.

A look at the ratio of average home team scores to average away team scores follows:

What’s fascinating about this chart is how quickly a form of parity was acheived among all the NFL teams. By the mid-30s, a measurable home field advantage can be seen at roughly 15%, a rate that has remained essential constant for over 70 years. Factors for this boost could include the psychological support of fans, familiar weather conditions, unique features of local facilities, lack of travel fatigue, referee bias and/or increased levels of motivation in home town players.

Thanks to Charles Martin Reid for his solution to getting D3 and WordPress to play nice.

Infographics and Data Visualization (Week 1)

The Introduction to Infographics and Data Visualization course begins Sunday so I’m starting to receive emails from the instructor. The first thing I need to do is tackle the reading list and then take a look at the first assignment, which involves the review of this graphic, which was based on a survey of 32,000 Internet users from 16 different countries. The survey asked these users about the kind of online services they used on a regular basis.

The online class discussion was pretty good and very thorough. My own thoughts began with the graphic “building block” that the designer used to organize and convey information. This consisted of a nested group of overlapping doughnut charts that used color, size and fractional divisions to represent the data for each country (see below).

I think that the arcs of the doughnut are meant to be interpreted in two dimensions: 1) the sweep of the arc represents the % of the category population that is engaged in the activity (similar to a regular pie chart) and 2) the radius from the center represents the overal size of the category population (similar to a regular bubble chart). Both pie charts and bubble charts can work in certain circumstances but they make direct comparisons difficult. Throw in the fact that the arcs overlap and it is almost impossible to understand the meaning associated with different variables. For example, the predominant color in graphs for countries like the U.S. or Canada is pink, which downplays the larger population of social profile users.

My first instinct for adjusting this infographic was to “unpack” the doughnut chart and place the data in a regular bar chart. By using standard bars, it is fairly easy to make comparisons between the different categories. The bar chart also shows percentages naturally if I include a gray bar that represents the total population of internet users. (The value of the gray bar is an assumption on my part, calculated by dividing the user value by the access percentage. This works for almost every country excpet the U.S. and the U.K.)

The real power of this approach comes with side-by-side comparisons of the data. After swapping the axes and adding in the other countries, the resulting chart allows for relatively easy comparison of both overall Internet usage and individual social media involvement. Both the U.S. and U.K. totals are fudged.

One problem I have with this chart is the huge amount of white space in the upper right quadrant. This is caused by the great disparity in size between the Internet populations of the largest and smallest countries. Adjustments like the use of a logarithmic scales or scatterplots might be able to fill out the canvas a bit but they also make direct comparisons more difficult. I’m also not too sure about the color scheme, which I find somewhat distracting.

Tackling both of these issues at once, I’ve removed the seperate colors for the social media categories and added in an overlay that uses a radar chart to show the realtive differences between social media usage within countries.

The radar charts are kind of fun and they make it pretty easy to see different patterns of Internet usage among the 16 countries. The higher social profile participation (and lower blog usage) of Western countries creates a distinctive shape when compared to Asian countries like Japan and South Korea. The two-color scheme also makes it easier to see patterns in the column charts. However, I’m not sure that depending on the order of the columns is enough to compare social categories across countries.

I’m going to let my solution stand for now. Meanwhile, here are some other solutions from the class and around the web:

 

 

Infographics and Data Visualization (Sign Up)

Despite a crazy schedule, I’ve decided to sign up for a free online course offered by the Knight Center for Journalism called Introduction to Infographics and Data Visualization. It runs from October 28 – December 8 and will be taught by Alberto Cairo, the author of The Functional Art: an Introduction to Information Graphics and Visualization, published by PeachPit Press. I will be sure to post my completed assignments here. It should be fun!

How Louis C.K. Could Help Improve Street Sign Design

The opening of last week’s season premiere of Louie offered up a hilarious scene of Louie and a fellow New Yorker trying to decipher an odd assortment of street signs to see if it was safe to park their cars. This must be a fairly common problem in bigger cities because I came across these weird signage clusters a lot back when I was trying to eek out a living as an urban planner. The picture below was taken in front of my sister’s old house in Evanston, IL (a Chicago suburb) back in the 90s. Despite all of the warnings, it was actually okay to park in this spot at the time we were there.

In any given city I suppose there are a bunch of different parking rules and each one has its own sign associated with it. Every now and then you get a situation where overlapping rules apply and the result is a bit of a jumble. It is a classic — albeit minor — case of the law of unintended consequences. There is simply nothing in a city worker’s toolkit that would allow them to provide an appropriate solution to such a complex problem.

What is needed is a more flexible approach — something that clearly outlines the rules of a given situation but can also be easily adjusted to meet slightly different circumstances.

After returning home, I put together a quick idea that involved more of a calendar-like design, with circumstances in rows and time-of-day in columns. My idea was to have a standard sign to which workers could affix a series of universal “no” stickers at the right points. The design is based primarily on the given situation (and does nothing to address the driveway warning) but I thought it was a good start.

The new sign would save taxpayer money by reducing both the number and variety of signs that needed to be made. It would also simplify the interpretation of complex situations for the average citizen and it could be easily modified by city traffic workers if parking circumstances changed. You could even use it to block off areas temporarily by adding a removable (magnetic?) marker during construction or special events.

The one big drawback that I could see for this design is that it leaves a lot of white space open for minor vandalism. Even I might be tempted to play a few games of tic-tac-toe on such a sign. Overall, though, I think it is a step in the right direction. I hope Louie would be proud.

Have a happy Fourth of July, everyone! Make sure you interpret those parking signs along the parade route carefully.

Revised Parking Sign System

The Days Keep Getting Longer … Literally

Keeping track of time is never easy without an accurate clock and so people have come up with a number of different folk methods to keep themselves on pace. One of the most common techniques is to introduce a multi-syllable word as you count seconds so that you don’t count too fast. The most familiar phrase is probably something like “A thousand one, a thousand two …” but there are several others. My Dad actually had a teacher in school that used the phrase “steam engine” and I’ve heard others use words like “Mississippi” or even “alligator.” Basically, any four or five syllable phrase will serve as a good placeholder. Whatever phrase you favor, be prepared to dust it off tonight as the world is officially given an extra leap second at the end of the day.

The reason for this extra second is rather complicated. A normal “day” was officially defined back in 1967 as 86,400 seconds in the International System of Units (SI) and it is tracked by a very precise atomic clock. This is the Universal Coordinated Time (UTC) that we all know and love. The actual solar day is pretty much the same length but not quite. There are several different events that can speed up or slow down the Earth’s rotation by a few thousandths of a second. These events can include earthquakes, changes in the jet stream, the tidal pull of the Moon, the position of the Earth in its orbit, fluid motion at the Earth’s core, and the gradual slowing of the Earth’s rotation.

Whenever these forces cause the solar day and the UTC to get too far out of whack, the Sub-bureau for Rapid Service and Predictions of Earth Orientation Parameters of the International Earth Rotation and Reference System Service — let me pause here while I catch my breath — calls for a leap second. This manifests itself as an additional second tacked on to the normal clock reading around midnight (11:59:59 –> 11:59:60 –> 12:00:00). This whole process is essentially designed to keep the sun directly above you at high noon.

Pretty cool, eh?

What’s really interesting about this issue is that, in the long run, it doesn’t really matter because the Earth’s rotation is slowing by a few fractions of a second each year and the standard Earth day continues to get longer. This is one of those weird facts that kind of blew my mind when I first heard it. I guess I had read too many science fiction stories where the hero hops into his time machine and goes back to some ridiculously precise date like 10:24 AM on Tuesday, August 13, 250,000,000 B.C. In reality, our current concept of days and dates are firmly based on the Earth’s current circumstances. Back in dinosaur times, the typical Earth day was an hour or two shorter and there were an extra 10-20 days in the year (the length of the year was the same overall). When the Earth was really young, days were only six hours long and there were over 1,000 of them per year.

In order to visualize this, I found a paper online which provided me with a model that estimates the length of a day and the number of days per year for any time period. I’m not sure how official these calculations are but they do appear to correlate with data obtained from fossil corals and radiometric dating methods. I’ve included information on each geologic period from Wikipedia, so use with the appropriate amount of caution.

Anyway, enjoy your leap second! One steam engine …

Update:

Family Pool Trends

With the start of a rather warm, dry summer here in Wisconsin, we’ve decided to take the plunge (literally) and purchase a new above ground pool. We seem to outgrow these things every few years and I’ve become intrigued with the idea that we just keep buying larger and large cylinders of water. After some exhaustive research (which mostly involved looking through a lot of old photographs and estimating pool sizes), I present you with a timeline of our family’s pool history. The bubbles represent surface area and allow for relative size comparison.


Pool History (1997-2012)

Bubble Size = Pool Size

It is interesting to see that — with the exception of a few strays — we seem to buy a new pool every three years. It is also interesting to see the exponential growth in water volume that began about the time my daughter was born (when my son was four). If we keep up that pace, our next pool will be over 10,000 cubic feet — about the size of two 18-wheelers full of water.