Tag Archives: Infographics

Most Popular Word Roots in U.S. Place Names

My family visited Washington D.C. last year for Spring Break and, during our 12-hour drive, I remember noticing a subtle change in the names of the cities and towns we were passing through. In the beginning, the place names had a familiar mid-western flavor; one that mixed Native American origins (e.g. Milwaukee, Chicago) with bits of French missionary and 19th-century European settler. The names slowly took on a more Anglo-Saxon bent as we moved east, traveling through spots like Wexford, PA, Pittsburgh, PA, Gaithersburg, MD, Boonsboro, MD, Hagerstown, MD, and Reston, VA.

We have English-sounding place names in Wisconsin, of course, including highfalutin towns like Brighton, Kingston, and New London, but they seem to get overwhelmed by the sheer number of places with syllables like “wau”, “kee”, and “sha” (or all three combined). Many of these town names can be difficult for “outsiders” to pronounce and the spelling is all over the place since they were often coined by non-native speakers who’d misheard the original words. (The Native American word for “firefly”, for example, is linked to variations like Wauwatosa (WI), Wawatasso (MN), and Wahwahtaysee Way (a street in MI).)

I thought it would be interesting to see if there were any patterns to these U.S. place names or toponyms so I pulled a list of Census Places and extracted the most frequent letter combinations from the names of the country’s cities, towns, and villages. I tried to isolate true prefixes and suffixes by remove any letter pairings that were simply common to the English language and I then counted up the number of times each word root appeared and ranked them by state.

Top 10 Word Roots by State

After looking over the top word roots by state, I was interested in seeing more detail so I calculated a location quotient for some of the most common word roots and plotted these out by county. Click on the maps for a larger D3 map.

Location Quotient for “ton”
ii_Map_word_root_ton
The word town derives from the Germanic word for “fence” or “fenced settlement.” In the U.S., the use of -ton/-town to honor important landowners or political leaders began before the American Revolution (think Jamestown, VA or Charleston, SC) and continued throughout the settlement of the country. (Interestingly, my hometown of Appleton, WI was named for philanthropist Samuel Appleton and is not a true town-based word root.)

Location Quotient for “boro/borough”
ii_Map_word_root_boro_borough
The word borough originates from the Germanic word for “fort” and has many common variations, including suffixes like -borough/-boro, and -burgh/-burg. Like -ton/-town, these place name suffixes became popular in the 18th century and were used extensively throughout New England and the Atlantic coastal colonies. You can see how dominant the -boro/-borough suffix is in the upper Northeast.

Location Quotient for “ville”
ii_Map_word_root_ville
The suffix “ville” comes from the French word for “farm” and is the basis for common words like “villa” and “village”. The use of the suffix -ville for the names of cities and towns in the U.S. didn’t really begin until after the Revolution, when pro-French sentiment spread throughout the country — particularly in the South and Western Appalachian regions. The popularity of this suffix began to decline in the middle of the 19th century but you can still see it’s strong influence in the southern states.

Location Quotient for “san/santa”
ii_Map_word_san_santa
The Spanish colonial period in the Americas left a large legacy of Spanish place names, particularly in the American West and Southwest. Many of the Californian coastal cities were named after saints by early Spanish explorers, while other cities in New Spain simply included the definite article (“la”, “el”, “las” and “los) in what was often a very long description (e.g. “El Pueblo de Nuestra Señora la Reina de los Ángeles del Río de Porciúncula” … now known simply as Los Angeles or LA). The map shows the pattern for the San/Santa prefix, which is strong on the West Coast and weaker inland, where it may actually be an artifact of some Native American word roots.

Location Quotient for “Lake/Lakes”
ii_Map_word_root_lake_lakes
The practice of associating a town with a nearby body of water puts a wrinkle into the process of tracking of place names (the history of “hydronyms” being an entirely different area of study) but it was common in parts of the country that were mapped by explorers first and settled later. This can be seen in the prevalence of town names with word roots like Spring, Lake, Bay, River, and Creek.

Location Quotient for “Beach”
ii_Map_word_root_beach
There is a similar process for other prominent features of the landscape such as fields, woods, hills, mountains, and — in Florida’s case — beaches.

Location Quotient for “wau”
ii_Map_word_root_wau
Here is the word root that started this whole line of inquiry. It is apparently a very iconic Wisconsin toponym, with even some of the outlying place names having Wisconsin roots (the city of Milwaukie in Clackamas County, Oregon was named after Milwaukee, Wisconsin in the 1840s).

D3 Notes:

Trends in NFL Football Scores (Part 1)

One of the goals I set for myself this summer was to learn a bit about D3, a visualization toolkit that can be used to manipulate and display data on the web. Considering that the trees are bare and we’ve already had our first frost here in Wisconsin, you can safely assume that I am behind schedule. Nevertheless, I feel that I’ve finally reached a point where I have something to publish, so here goes.

First of all, a little background. D3 is a JavaScript library that allows you to bind data to any of the elements (text, lines and shapes) you might normally find on a web page.  These objects can be stylized using CSS and animated using simple dynamic functions. These features make D3 a perfect tool for creating interactive charts and graphs without having to depend on third party programs like Google Charts, Many Eyes or Tableau.

I wanted to start out with something simple so I elected to go with a basic line chart using data I pulled from Pro-Fooball-Reference.com. This site contains a ton of great information and statistics from the past 90+ years of the National Football League but — for now — I just looked at the final scores of all the games played from 1920 to 2011. My first D3-powered chart is below. It shows the average combined scores of winning and losing teams for each year of the NFL’s existence.

Although this chart looks pretty simple, every element — including titles, subtitles, axes, labels, grids and data lines — has been created manually using the D3 code. The payoff is pretty nice. All of the elements can be reused and you have tremendous control over what is shown onscreen. To demonstrate some of these cababilities, I’ve added interactive overlays that show a few of the major eras in NFL football (derived from work of David Neft and this discussion thread). If you move your mouse over the graph, you will see these different eras highlighted:

Early NFL (1920-1933) – The formation of the American Professional Football Association (APFA) in 1920 marked the official start of what was to become the National Football League. This era was marked by rapid formation (and dissolution) of small town franchises, vast differences in team capabilities and a focus on a relatively low-scoring running game. At this time, the pass was considered more of an emergency option than a reliable standard. The rapid growth in popularity of the NFL during this era culminated with the introduction of a championship game in 1932.

Introduction of the Forward Pass (1933-1945) – The NFL discontinued the use of collegiate football rules in 1933 and began to develop its own set of rules designed around a faster-paced, higher-scoring style of play. These innovations included the legalization of the forward pass from anywhere behind the line of scrimmage — a change that is often called the  “Bronko Nagurski Rule” after his controversial touchdown in the 1932 NFL Playoff Game.

Post-War Era (1945-1959) – The end of WWII saw the expansion of the NFL beyond its East Coast and Midwestern roots with the move of the Cleveland Rams to Los Angeles — the first big-league sports franchise on the West Coast. This period also saw the end of racial segregation (enacted in the 30s) and the start of nationally televised games.

Introduction of the AFL (1959-1966) – Professional football’s surge in popularity led to the formation of a rival organization — the American Football League — in 1960. The growth of the flashy AFL was balanced by a more conservative style of play in the NFL. This style was epitomized by coach Vince Lombardi and the Green Bay Packers, who would win five championships in the 1960s. In 1966, the two leagues agreed to merge as of the 1970 season.

Dead Ball Era (1966-1977) – Driven in part by stringent restrictions on the offensive line, this period is marked by low scores and tough defensive play. Teams that thrived in this environment include some of the most famous defenses in modern NFL history: Pittsburgh’s Steel Curtain, Dallas’ Doomsday Defense, Minnesota’s Purple People Eaters and the Rams’ Fearsome Foursome.

Live Ball Era (1978-present) – Frustrated by the decreasing ability of offenses to score points in 70s, the NFL began to add rules and make other changes to the structure of the game in an attempt to boost scoring. The most famous of these initiatives was the so-called “Mel Blount Rule” (introduced in 1978), which severely restricted the defense’s ability to interfere with passing routes. With the subsequent introduction of the West Coast Offense in 1979 — an offense based on precise, short passes — this period became marked by a major focus on the passing game.

Having created this first chart, I decided to build a second chart based on the ratio of average winning scores to average losing scores to see if there were any patterns.

The chart above shows how — after a period of incredibly lopsided victories — the average scoring differential settled in to a very steady pattern by the late 1940s and stayed at that level (roughly 2:1) for the next 30 years. Despite many changes in rules, coaching techniques, technology and other factors, only the pass interference rules of the late 1970s seemed to have any signifcant effect on this ratio, shifting it to just under 1.8:1 for the next 30 years.

While I had the data available, I also decided to look at the differences in average scores between home teams and away teams. The chart below plots this data along with the same overlay I used in the first chart.

A look at the ratio of average home team scores to average away team scores follows:

What’s fascinating about this chart is how quickly a form of parity was acheived among all the NFL teams. By the mid-30s, a measurable home field advantage can be seen at roughly 15%, a rate that has remained essential constant for over 70 years. Factors for this boost could include the psychological support of fans, familiar weather conditions, unique features of local facilities, lack of travel fatigue, referee bias and/or increased levels of motivation in home town players.

Thanks to Charles Martin Reid for his solution to getting D3 and WordPress to play nice.

Earnings and Unemployment by College Major

The Wall Street Journal recently published a table of income and unemployment data  that presented pay and employment rates for various college majors. The original study by Georgetown University’s Center on Education and the Workforce contained enough additional details that I thought it might be worth trying to incorporate the information into a Tableau visualization.

After a little data massaging, I created charts for both the high-level fields of study and the more detailed individual majors. Each level contains unemployment rates, income levels, and popularity of major measured by number of enrollees.

One of the first things you notice is that, despite frequent claims to the contrary, college graduates with a degree in Education have the lowest median earnings overall. The Education field also has the narrowest range of income and includes four of the ten majors with the lowest median earnings. On the plus side, fifteen of the sixteen Education majors have (or had at the time of the study) unemployment rates below 5.5% — the weighted average rate of unemployment for all majors in the study.

Graduates with an Engineering degree have the highest median earnings overall and a relatively low unemployment rate compared to other disciplines. In addition, seven of the ten majors with the highest median earnings were found in Engineering.

Other majors with good earnings potential included the usual suspects (Computers & Mathematics, Health, and Business) while the best employment prospects were found in Education, Health, Physical Sciences, and Agriculture & Natural Resources.

As for individual majors, the winners in my completely fictitious categories are as follows:

  • Most Popular –  Business Management & Administration takes this category with nearly 2.8 million grads holding this degree. The next two majors in line (also in the Business field) weren’t even close — trailing by over a million people.
  • Best Prospects –  Actuarial Science beat out four other fully-employed competitors by coming in with a median income of over $80K.
  • Worst Prospects –  Clinical Psychology tops this category with an estimated unemployment rate of nearly 20%. Yikes! I also noticed that a number of other majors in the Psychology field had unemployment rates above 10%, which means that intra-discipline career changes for people with this major would be difficult.
  • Most Deceptive – The “winner” here is Architecture, an outlier with the lowest median earnings and the highest unemployment rate of all of the Engineering majors. For this category, I wanted a relatively popular major with an uncommonly high unemployment rate … the kind of major that churns out grads and then strands them in the unemployment line. An educational Judas, if you will. (Full disclosure: I have an Architecture degree, but I can’t say I wasn’t warned.)
  • Hidden Gem – I’m going to call this one a tie between Petroleum Engineering and Pharmacy Pharmaceutical Sciences & Administration. Petroleum Engineering has a slight edge on median earnings ($127K vs. $105K) but the Pharma major has a lower overall unemployment rate (3.2% vs. 4.4%). You probably can’t go wrong with either one but keep on eye on the horizon … Petroleum Engineering is notoriously dependent on the boom/bust cycles of the oil and gas industry while workers in the pharmaceutical industry are facing major changes as companies try to adjust to globalization and increasing costs of product development.