Category Archives: Information

How Louis C.K. Could Help Improve Street Sign Design

The opening of last week’s season premiere of Louie offered up a hilarious scene of Louie and a fellow New Yorker trying to decipher an odd assortment of street signs to see if it was safe to park their cars. This must be a fairly common problem in bigger cities because I came across these weird signage clusters a lot back when I was trying to eek out a living as an urban planner. The picture below was taken in front of my sister’s old house in Evanston, IL (a Chicago suburb) back in the 90s. Despite all of the warnings, it was actually okay to park in this spot at the time we were there.

In any given city I suppose there are a bunch of different parking rules and each one has its own sign associated with it. Every now and then you get a situation where overlapping rules apply and the result is a bit of a jumble. It is a classic — albeit minor — case of the law of unintended consequences. There is simply nothing in a city worker’s toolkit that would allow them to provide an appropriate solution to such a complex problem.

What is needed is a more flexible approach — something that clearly outlines the rules of a given situation but can also be easily adjusted to meet slightly different circumstances.

After returning home, I put together a quick idea that involved more of a calendar-like design, with circumstances in rows and time-of-day in columns. My idea was to have a standard sign to which workers could affix a series of universal “no” stickers at the right points. The design is based primarily on the given situation (and does nothing to address the driveway warning) but I thought it was a good start.

The new sign would save taxpayer money by reducing both the number and variety of signs that needed to be made. It would also simplify the interpretation of complex situations for the average citizen and it could be easily modified by city traffic workers if parking circumstances changed. You could even use it to block off areas temporarily by adding a removable (magnetic?) marker during construction or special events.

The one big drawback that I could see for this design is that it leaves a lot of white space open for minor vandalism. Even I might be tempted to play a few games of tic-tac-toe on such a sign. Overall, though, I think it is a step in the right direction. I hope Louie would be proud.

Have a happy Fourth of July, everyone! Make sure you interpret those parking signs along the parade route carefully.

Revised Parking Sign System

The Days Keep Getting Longer … Literally

Keeping track of time is never easy without an accurate clock and so people have come up with a number of different folk methods to keep themselves on pace. One of the most common techniques is to introduce a multi-syllable word as you count seconds so that you don’t count too fast. The most familiar phrase is probably something like “A thousand one, a thousand two …” but there are several others. My Dad actually had a teacher in school that used the phrase “steam engine” and I’ve heard others use words like “Mississippi” or even “alligator.” Basically, any four or five syllable phrase will serve as a good placeholder. Whatever phrase you favor, be prepared to dust it off tonight as the world is officially given an extra leap second at the end of the day.

The reason for this extra second is rather complicated. A normal “day” was officially defined back in 1967 as 86,400 seconds in the International System of Units (SI) and it is tracked by a very precise atomic clock. This is the Universal Coordinated Time (UTC) that we all know and love. The actual solar day is pretty much the same length but not quite. There are several different events that can speed up or slow down the Earth’s rotation by a few thousandths of a second. These events can include earthquakes, changes in the jet stream, the tidal pull of the Moon, the position of the Earth in its orbit, fluid motion at the Earth’s core, and the gradual slowing of the Earth’s rotation.

Whenever these forces cause the solar day and the UTC to get too far out of whack, the Sub-bureau for Rapid Service and Predictions of Earth Orientation Parameters of the International Earth Rotation and Reference System Service — let me pause here while I catch my breath — calls for a leap second. This manifests itself as an additional second tacked on to the normal clock reading around midnight (11:59:59 –> 11:59:60 –> 12:00:00). This whole process is essentially designed to keep the sun directly above you at high noon.

Pretty cool, eh?

What’s really interesting about this issue is that, in the long run, it doesn’t really matter because the Earth’s rotation is slowing by a few fractions of a second each year and the standard Earth day continues to get longer. This is one of those weird facts that kind of blew my mind when I first heard it. I guess I had read too many science fiction stories where the hero hops into his time machine and goes back to some ridiculously precise date like 10:24 AM on Tuesday, August 13, 250,000,000 B.C. In reality, our current concept of days and dates are firmly based on the Earth’s current circumstances. Back in dinosaur times, the typical Earth day was an hour or two shorter and there were an extra 10-20 days in the year (the length of the year was the same overall). When the Earth was really young, days were only six hours long and there were over 1,000 of them per year.

In order to visualize this, I found a paper online which provided me with a model that estimates the length of a day and the number of days per year for any time period. I’m not sure how official these calculations are but they do appear to correlate with data obtained from fossil corals and radiometric dating methods. I’ve included information on each geologic period from Wikipedia, so use with the appropriate amount of caution.

Anyway, enjoy your leap second! One steam engine …


Visualizing English Word Origins

I have been reading a book on the development of the English language recently and I’ve become fascinated with the idea of word etymology — the study of words and their origins. It’s no secret that English is a great borrower of foreign words but I’m not enough of an expert to really understand what that means for my day-to-day use of the language. Simply reading about word history didn’t help me, so I decided that I really needed to see some examples.

Using Douglas Harper’s online dictionary of etymology, I paired up words from various passages I found online with entries in the dictionary. For each word, I pulled out the first listed language of origin and then re-constructed the text with some additional HTML infrastructure. The HTML would allow me to associate each word (or word fragment) with a color, title, and hyperlink to a definition.

The results look like this:

The quick brown fox jumps over the lazy dog.

This simple sentence is constructed of eight distinct words and one word suffix. Six of the words are from Old English (colored in pink) while the others are from Gallo-Roman and Middle Low German (both colored in gray). Hovering over each word provides the exact source and clicking the word takes you to the full origin description.

A second example shows more variety:

Supreme executive power derives from a mandate from the masses, not from some farcical aquatic ceremony.

This is a surprisingly complex Monty Python quote where the colors represent Old English (pink), Middle English (red), Anglo-French (orange), Old French (light orange), Middle French (pale orange), and Classical and Medieval Latin (both yellow). I suspect that both the complexity and variety of word sources is intentional — standing in humorous contrast to the appearance of the speaker.

What follows are five excerpts taken from a spectrum of written sources. The intent was to investigate each passage and see if word origin varied significantly based on the intended purpose of the passage.

(This process was pretty involved and my initial dream of creating an app that would allow me to convert any paragraph to this format faded when I realized that much of the word matching process needed manual intervention. I definitely suggest digging in to the full etymology site to explore the full history of each word. I have probably made plenty of translation mistakes as I developed my paragraphs but I certainly had fun.)

Passage #1: American Literature

The first paragraph I looked at was an excerpt from Mark Twain’s The Adventures of Tom Sawyer. I chose this text because I thought it would have a good mix of English and American words.

Tom gave up the brush with reluctance in his face, but alacrity in his heart. And while the late steamer Big Missouri worked and sweated in the sun, the retired artist sat on a barrel in the shade close by, dangled his legs, munched his apple, and planned the slaughter of more innocents. There was no lack of material; boys happened along every little while; they came to jeer, but remained to whitewash. By the time Ben was fagged out, Tom had traded the next chance to Billy Fisher for a kite, in good repair; and when he played out, Johnny Miller bought in for a dead rat and a string to swing it withand so on, and so on, hour after hour. And when the middle of the afternoon came, from being a poor poverty-stricken boy in the morning, Tom was literally rolling in wealth. He had beside the things before mentioned, twelve marbles, part of a jews-harp, a piece of blue bottle-glass to look through, a spool cannon, a key that wouldn’t unlock anything, a fragment of chalk, a glass stopper of a decanter, a tin soldier, a couple of tadpoles, six fire-crackers, a kitten with only one eye, a brass door-knob, a dog-collar but no dog the handle of a knife, four pieces of orange-peel, and a dilapidated old window sash .


The passage has a solid base of Old English words mixed with a variety of French, Latin and Old Norse terms. Middle English makes an appearance in the form of a few words and suffixes while American English is found solely in the list of items Tom Sawyer collects from his friends. Two of these American terms (“fire-crackers” and “door-knob”) are hyphenated words built from Old English and Scandinavian components. (Several of Twain’s other hyphenated words apparently didn’t make it over the hump into full-fledged Americanisms. However, it should be noted that Twain was often the first author to record usage of U.S. slang of the era.)

I found it interesting that Middle English had such a poor showing in this text but it may be due to the fact that the defining elements of Middle English have more to do with sentence structure and grammatical elements than specific words. I was also surprised at the frequent use of longer, Latin-based words in an adventure novel, but the average word length comes in at about 4.4 characters — still fairly short and simple.

Although 73% of the word fragments are Old English, Twain uses words from over a dozen different sources in this short passage alone. Overall, the wide variety of word sources adds a pleasing “flavor” to the passage. The mix seems well-balanced and interesting.

Passage #2: British Literature

For my second test, I wanted to look at text from a non-American author. I chose a paragraph from Charles Dickens’ A Tale of Two Cities Great Expectations out of respect for my 7th-grade English teacher.

My sister had a trenchant way of cutting our bread-and-butter for us, that never varied. First, with her left hand she jammed the loaf hard and fast against her bib where it sometimes got a pin into it, and sometimes a needle, which we afterwards got into our mouths. Then she took some butter (not too much)on a knife and spread it on the loaf, in an apothecary kind of way as if she were making a plaister using both sides of the knife with a slapping dexterity, and trimming and moulding the butter off round the crust. Then, she gave the knife a final smart wipe on the edge of the plaister, and then sawed a very thick round off the loaf: which she finally, before separating from the loaf, hewed into two halves, of which Joe got one, and I the other.

The relative simplicity of this passage surprised me a little. The average word length is about 4.2 and over 84% of the word fragments are basic Old English. No other source comes in over 5% and the variety of sources is half that of the Twain passage. American English Hebrew makes an appearance in the form of the name “Joe” but most of the other borrowed words are French in origin. Still, I found the text appealing in a way — basic words for a basic task.

Passage #3: Legal

The third paragraph comes from a United Nations document on maritime territories. I selected this passage because it seemed to contain more jargon and I suspected that much of this jargon was borrowed. This hunch proved to be correct.

Where the coasts of two States are opposite or adjacent to each other, neither of the two States is entitled, failing agreement between them to the contrary, to extend its territorial sea beyond the median line every point of which is equidistant from the nearest points on the baselines from which the breadth of the territorial seas of each of the two States is measured. The above provision does not apply, however, where it is necessary by reason of historic title or other special circumstances to delimit the territorial seas of the two States in a way which is at variance therewith.

This text had a much higher ratio of French and Latin word fragments (16.9% and 9.3%) and a longer average word length — nearly 4.8 characters — than both previous passages. With 64.4% of the word fragments, Old English still serves as a major binding agent in this text but there is less variety overall. Middle English makes its appearance only as a suffix and there is only one word outside of the English/French/Latin triumvirate. After the visual and poetic excitement of the two literature entries, this paragraph seems very bland.

Passage #4: Medicine
Note: This passage has been revised (see thread)

My dad suggested that I take a look at a healthcare-related passage to see if the use of specific medical terminology would tilt the word usage even farther away from “native” English words. Boy, was he right.

The anatomic axis of the lower extremity is defined by the femorotibial angle, which averages 5° of valgus; the mechanical axis of the lower extremity is defined by a plumb line connecting the center of the femoral head to the mid ankle on a standing anteroposterior weight-bearing radiograph. The mechanical axis averages 1. 2° of varus, and it is more accurate than the anatomic axis in demonstrating load transmission across the knee joint, especially if femoral or tibial deformities contribute to limb malalignment. A study by Khan et al in patients with early symptomatic knee osteoarthritis showed a clear relationship between local knee alignment (as determined from short fluoroscopically guided standing anteroposterior knee radiographs)and the compartmental pattern and severity of knee osteoarthritis. In this study, each degree of increase in the local varus angle was associated with a significantly increased risk of having predominantly medial compartment osteoarthritis, and a similar association was found between the valgus angulation and lateral compartment osteoarthritis in 47 knees. osteoarthritis in 47 knees.

The medical paragraph has only 51.9% Old English word fragments and the average number of characters per word is 5.7 — much higher than even the legal text. French Latin, and Greek were used more frequently in this passage and, despite U.S. prowess in the healthcare field, there were no American English terms. This is a paragraph that is doing a lot of heavy lifting and it uses a lot of dense, muscular words to get the job done.

Passage #5: Sports

This last passage was an attempt to stack the deck in favor of some home grown words. It doesn’t get more American than baseball, but the only American word in this article about a spring training rainout between the Milwaukee Brewers and the Texas Rangers is the word “baseball” itself. Everything else is either Old English or borrowed. Still, I have to assume that phrases like “at-bats” and “suicide squeeze bunt” are not exactly common constructions and my guess is that the entire article would be a mystery to someone who didn’t know the game.

It was a wild, windy day at Maryvale Baseball Park before the rains came with the Brewers ahead, 6-4. The Brewers scored their runs on a throwing error, a delayed double steal, a wind-blown popup that fell in shallow center field, a fielding error on that same play, a wind-aided triple and a suicide squeeze bunt, all in three innings of at-bats.

The triple belonged to Caleb Gindl, who motored to third after Rangers center fielder Craig Gentry crashed into the wall, forcing open a large gate. Gentry and left fielder Conor Jackson worked together to close it so play could continue.

It was crazy out there, Gindl said. it was scary in the outfield. After a while we were all just playing deep, knowing the ball would either get to us or blow out.

Play didn’t last long after the Brewersfour-run third inning. Brewers reliever Manny Parra pitched a scoreless fourth, then the grounds crew covered the field before the bottom of the inning could begin.

After a delay of just 12 minutes, the game was called.

First of all, I absolutely LOVE the fact that Caleb Gindl uses two Old Norse words to describe the weather conditions during the game. It provides a certain primal, unhinged quality to the situation and adds a third element — nature — to the contest. I also like the use of the onomatopoeic terms “pop” and “crash” because they serve to underscore the action.

The passage itself is a little lighter on the French and Latin roots than some of the earlier paragraphs and many of the terms are fairly short — the average word length comes in at about 4.6 characters. Some of this may be due to the fact that it is an online article (and attention spans are short) but it may also related be to the simple concepts at the core of the game itself. Words like “bat” and “ball” are very similar to their proto Indo-European roots (*bhat- and *bhel- respectively), suggesting that any associated activities are pretty basic to the language. Also, the sheer number and variety of numeric references (e.g. “three”, “third”, and “triple”) bring in many simple terms.

Wisconsin Voters Banished to NULL Island

The top headline in my local paper this morning was “Glitch puts some Wisconsin voters in Africa” … an interesting thing to ponder over a bowl of Quaker Oatmeal Squares. I suppose this problem merits at least some attention given the heated political climate surrounding the state’s voter redistricting process. But headline news? Above the fold? Sounds like a slow news day to me.

Online, of course, the debate has already devolved into the standard round of mudslinging and name-calling so good luck trying to find out what’s going on from that crowd. The reporters themselves focused on the political fallout of the issue rather than an explanation so no help there either. I guess it’s up to the humble folks at Ideas Illustrated to offer up some insight!

The first clue to the problem can be found in the article’s pullout quote, which describes the voter’s location as the “coast of Africa” and not a specific country in Africa. The second clue can be found deep within the article when it is mentioned that clerks have recently made changes to the way voters are being entered into the voter registration database:

” … voters are [now] being entered into different districts by the physical location of their address in computerized maps. Previously, they were entered into different districts in the state voter database according to where their address fell in certain address ranges.”

These two hints point to a very common problem associated with geocoding, which is the process of converting a postal address to a set of map coordinates. Let’s backtrack. An online mapping tool like Google Maps uses specific geographic coordinates (latitude and longitude) to place a location on a map. However, because none of these physical locations are actually stored in a database anywhere, the tool needs to interpolate the coordinates from a vector database of the road network (i.e. a mathematically represented set of lines).

For example, if you look up the address for Trump Tower, you find that it is located at 725 Fifth Avenue in Midtown Manhattan. When you enter this address into Google Maps, the tool finds 5th Avenue on the underlying road grid and then uses an algorithm to determine that the “725” address is somewhere between 56th and 57th streets. It will also determine which side of the street the address is located based on stored knowledge of the “odd” and “even” numbering pattern. In other words, it’s guessing.

Google Map detail of the area around Trump Tower


TIGER/Line® Shapefile detail of the same area

These guesstimates work pretty well in dense urban environments where there are a lot of cross streets to serve as reference points. In rural areas, the curvilinear streets and widely-spaced buildings make things a little more difficult. When the situation gets really muddled, some mapping tools essentially “punt” and enter a default set of coordinates. In the case of the Wisconsin voter addresses, these default coordinates are 0.00 degrees latitude and 0.00 degrees longitude. Where is this exactly? It is the intersection of the Prime Meridian and the Equator … which occurs just off the coast of Africa.

Geographers have actually given this place a rather fanciful name called NULL island (it is not, in fact, a real island). It even has its own web site and unofficial flag (below right).

So there are no nefarious schemes behind this situation … just normal, everyday data problems. The state clerks need to tell their IT guys to flag the errant voter addresses and then they can assign them to the appropriate districts by hand. Problem solved. However, they should be aware that interpolation is an imperfect process and, in addition to assigning blocks of voters to NULL island, the geocoding process may also assign voters to the wrong districts. This could be particularly true for people who live close to a district boundary. It might actually make sense to keep the old method around for backup.

Canine Cop in Constitutional Crisis

[There have been some interesting topics flitting about the blogosphere in the past weeks but I’ve been too busy with other stuff to comment. To eliminate some of the backlog, I’ve decided to try and do a few quick takes. First up: Franky the chocolate lab.]

There is a Florida legal case winding its way through the court system that pits the skills of a drug sniffing police dog named Franky against the Fourth Amendment rights of alleged marijuana grower Joelis Jardines. Back in 2006, Franky’s keen nose detected the smell of $700,000 in marijuana plants wafting out of a Dade County home while he and his handlers were standing on the front porch. Franky signaled the police who subsequently obtained a warrant, searched the home, found the pot, and arrested Jardines. The question is does a dog sniffing the air outside a privately-owned house represent an illegal search?

The variables make it interesting. Use of a thermal imaging device to look into the interior of someone’s home constitutes a search and is not legal without a warrant. However, the use of dogs in airports and other public places is allowed under the law because people in those locations do not necessarily have an expectation of privacy. (This is similar to recent arguments favoring the warrantless use of a GPS tracking device on a private vehicle.) Complicating matters is the fact that a dog is trained to detect only one thing (drugs) while a mechanical device like thermal imaging might show other things (like you sitting naked on the toilet).

All of this becomes more interesting if you start to think about the trends in miniaturization and computing power that could be applied to today’s surveillance drones. How soon before these things migrate from the skies of Afghanistan to the air space above your own neighborhood?

Other questions that spring to mind:

  • Does it matter that Franky’s talent is a natural one? Would the case be any different if this involved a drug sniffing machine?
  • What would happen if the police themselves were augmented in some way (genetically? cybernetically?) that would allow them to detect drugs without the aid of anything else? Will robocops get more legal leeway when it comes to searches just because of their “innate” talents?
  • How fast will police be able to get search warrants in the future? Will judges allow instantaneous decisions on these matters?
  • Does the fact that people are willing to provide detailed personal information to their social networks change our society’s expectations of privacy?

The U.S. Supreme Court will hear the case later in the year. Given the pace of development of modern technology, constitutional scholars could be reading about the exploits of Franky the dog for years to come.


Earnings and Unemployment by College Major

The Wall Street Journal recently published a table of income and unemployment data  that presented pay and employment rates for various college majors. The original study by Georgetown University’s Center on Education and the Workforce contained enough additional details that I thought it might be worth trying to incorporate the information into a Tableau visualization.

After a little data massaging, I created charts for both the high-level fields of study and the more detailed individual majors. Each level contains unemployment rates, income levels, and popularity of major measured by number of enrollees.

One of the first things you notice is that, despite frequent claims to the contrary, college graduates with a degree in Education have the lowest median earnings overall. The Education field also has the narrowest range of income and includes four of the ten majors with the lowest median earnings. On the plus side, fifteen of the sixteen Education majors have (or had at the time of the study) unemployment rates below 5.5% — the weighted average rate of unemployment for all majors in the study.

Graduates with an Engineering degree have the highest median earnings overall and a relatively low unemployment rate compared to other disciplines. In addition, seven of the ten majors with the highest median earnings were found in Engineering.

Other majors with good earnings potential included the usual suspects (Computers & Mathematics, Health, and Business) while the best employment prospects were found in Education, Health, Physical Sciences, and Agriculture & Natural Resources.

As for individual majors, the winners in my completely fictitious categories are as follows:

  • Most Popular –  Business Management & Administration takes this category with nearly 2.8 million grads holding this degree. The next two majors in line (also in the Business field) weren’t even close — trailing by over a million people.
  • Best Prospects –  Actuarial Science beat out four other fully-employed competitors by coming in with a median income of over $80K.
  • Worst Prospects –  Clinical Psychology tops this category with an estimated unemployment rate of nearly 20%. Yikes! I also noticed that a number of other majors in the Psychology field had unemployment rates above 10%, which means that intra-discipline career changes for people with this major would be difficult.
  • Most Deceptive – The “winner” here is Architecture, an outlier with the lowest median earnings and the highest unemployment rate of all of the Engineering majors. For this category, I wanted a relatively popular major with an uncommonly high unemployment rate … the kind of major that churns out grads and then strands them in the unemployment line. An educational Judas, if you will. (Full disclosure: I have an Architecture degree, but I can’t say I wasn’t warned.)
  • Hidden Gem – I’m going to call this one a tie between Petroleum Engineering and Pharmacy Pharmaceutical Sciences & Administration. Petroleum Engineering has a slight edge on median earnings ($127K vs. $105K) but the Pharma major has a lower overall unemployment rate (3.2% vs. 4.4%). You probably can’t go wrong with either one but keep on eye on the horizon … Petroleum Engineering is notoriously dependent on the boom/bust cycles of the oil and gas industry while workers in the pharmaceutical industry are facing major changes as companies try to adjust to globalization and increasing costs of product development. 

Have the Mainstream Media Jumped the Shark?

There was a recent article in Slate that asked why the mainstream media was having such a tough time figuring out the message of the Occupy Wall Street protestors. Now, I wouldn’t call myself a full-throated supporter of #OWS, but I do think that it’s pretty easy to understand why they’re PO’d. You know the drill: 14 million unemployedcrony capitalism, income inequality, and rising costs for just about everything, including health care and education.

So what’s the great mystery? Things are bad … and they haven’t been good for awhile. People are concerned about the future and they’re upset because the country’s leaders are so busy fighting each other that they aren’t even trying to find a comprehensive solution. I can only assume that it is the very complexity of the issues that are causing so much angst among the pundits and political commentators.

The mainstream media thrives on simple solutions. It has no idea whatsoever of how to report on a story that isn’t about easy fixes so much as it is about anguished human frustration and fear. The media prides itself on its ability to tell you how to clear your clutter, regrout your shower, or purge your closet of anything that makes you look fat—in 24 minutes or less. It is bound to be flummoxed by a protest that offers up no happy endings.

People on right side of the political spectrum have never been happy with the liberal bias they perceive in the mainstream media. If the political left is also starting to tune out these news outlets because of their inability to explore and explain serious issues, how long is it before these sources are abandoned in favor of something more thoughtful and informative?