I’m Not Quitting, I’m Failing Fast

Well, that didn’t long. After only seven weeks of coursework in my Data Science program I have decided that it’s not for me. I can rationalize all I want but it basically boils down to the fact that I value my free time much more than I realized.

This really came to light during the six-part mid-term, which took me over twelve hours to complete. Although it is somewhat comforting to know that I was not alone (over 50% of the class fell into this category) it was still a wake-up call. Did I want to spend the next few years of my life doing homework every weekend?

It was a tough decision. The good thing is that I get my evenings and weekends back for more personal and family activities.

I did get some interesting data out of it. See if you can tell when I hit the proverbial wall:

Donald Trump and the Truth Bubble, Part 1 – The Misinformed

“Wherever the people are well-informed they can be trusted with their own government.” — Thomas Jefferson to Richard Price, 1789

Thomas Jefferson’s support of a free press and education for the common people — including entry to the highest levels of instruction (i.e. a college or university) — was based on the belief that a knowledgeable, well-educated citizenry was necessary for the preservation of democracy. One of his greatest fears was that the people would cede power to the government through sheer ignorance and lack of understanding.

So what happens when the process of educating and informing American citizens starts to break down? Can the checks and balances put in place by the Founding Fathers hold up in the face of a full blown idiocratic meltdown?

These types of questions were pretty far from my mind last summer when I started tracking the progress of Republican and Democratic presidential candidates using data from Politifact. I thought it would be an interesting exercise to see if there was a way to use this data to better understand the path to a successful nomination.

As a refresher, the 2016 Presidential campaign officially began in March of 2015 when Ted Cruz announced his intention to run for office. He was eventually joined by sixteen other Republicans, six Democrats and a miscellany of Libertarians, Socialists and Green Party candidates – the largest presidential primary field in American history. At the time, Hillary Clinton was considered the leading candidate for the Democratic nomination but there was no clear frontrunner on the Republican side. Pundits couldn’t decide if the large Republican field reflected that party’s depth of talent or its lack of cohesiveness.

With the participants off and running, I was interested in seeing whether or not a politician’s truthfulness would be reflected in their strength as a candidate or whether there were other factors involved. My first analysis consisted of looking at Poliifact’s “truth-o-meter” and seeing if I could tease out any meaningful differences between candidates. The following chart shows each candidate’s average rating (where 5 = True, 4 = Mostly True, 3 = Half True, 2 = Mostly False, 1 = False, 0 = “Pants on Fire”) and their “skewness”, which was my attempt at getting at the asymmetry (lean more true, lean more false) of their responses. Size represents the number of times Politifact checked statements by the candidates. I included a few outside sources of information (Facebook, emails, blogs) and politicians (Biden, Obama) for reference. (Responses as of September 18, 2015.)


Here’s the same chart (same time period) with just the two current candidates:


I had two takeaways from this exercise. The first was that candidates like Donald Trump and Ben Carson had about as much credibility as a chain letter from your cranky uncle. The second was that — based purely on my evaluations of truthfulness and believability – the most likely pick for the Republican nomination was probably going to be someone like Marco Rubio or Jeb Bush … two Republican candidates with relatively high levels of positive (true) Politifact ratings.

As we now know, pretty much everybody got this wrong. How did this happen? Throughout the campaign commentator after commentator expressed their concern with Trump’s rather dubious relationship with the truth. New York Times columnist David Brooks stated that Trump “is perhaps the most dishonest person to run for high office in our lifetimes” while political writer David Frum said that Trump’s mendacity is “qualitatively different than anything before seen from a major-party nominee.” Politico awarded Trump’s campaign misstatements their 2015 Lie of the Year and, in May of this year, Politico reporters analyzed a full week of his speeches and found that the orange one made nearly one false or misleading statement every five minutes.

Summarized from the Atlantic:

“PolitiFact recently calculated that only 2 percent of the claims made by Trump are true, 7 percent are mostly true, 15 percent are half true, 15 percent are mostly false, 42 percent are false, and 18 percent are “pants on fire.” Adding up the last three numbers (from mostly false to flagrantly so), Trump scores 75 percent. The corresponding figures for Ted Cruz, John Kasich, Bernie Sanders, and Hillary Clinton, respectively, are 66, 32, 31, and 29 percent.”

Far from hurting him in the polls, however, Trump’s dishonesty is viewed as a positive feature by his supporters. An NBC, Telemundo, and Marist College poll taken last December suggests that more than seven in ten Republicans believe Trump “tells it like it is.” Since “telling it like it is” seems to be synonymous with lying out of one’s ass, many have speculated that the underlying cause of Trump’s success springs from his ability to give voice to the concerns of the typical conservative voter.


Or perhaps there is just a large swath of the American electorate who can no longer tell the difference between fact and fairytale.

Among Trump supporters …

Of course, the left has its own set of conspiracy theories and American’s penchant for kooky ideas doesn’t seem to conform to any political boundaries. However, the statements above continue to be voiced by the candidate himself and that is unusual.

I get it. People are angry and frustrated and Trump gives them a voice. But instead of speaking with or compromising with their fellow citizens they are willing to throw bombs in the hopes that the country that rises out of the rubble is more suited to their tastes. Is that really what Jefferson and the Founding Fathers wanted for their republic? Mob rule?

Certainly, some people would say it is. In a letter to William Stephens Smith after Shay’s Rebellion in the 1780s, Jefferson famously stated that “the tree of liberty must be refreshed from time to time with the blood of patriots and tyrants.” Many people on the far right like to toss out this quote whenever they feel that American society needs a kick in the pants.

However, it should be noted that this particular quote is frequently taken out of context. Earlier in that same letter, Jefferson wrote that:

“the people cannot be all, and always, well informed. The part [of the population] which is wrong will be discontented in proportion to the importance of the facts they misconceive.”

Notice Jefferson’s use of the phrase “well-informed” in this situation (also used in the opening quote for this article). He is describing something that is absent from the discontented participants of the rebellion. They are not well-informed and their misconceptions have fueled their anger — leading them down the path to revolt. He’s not surprised by the fighting because a free people are passionate about their liberty and will fight to maintain it (thus the “blood of patriots” line). But he is also saying that their actions spring from a place of ignorance.

This sounds eerily familiar to our current situation. Ignorance is no longer seen as a negative, but a sincere sign of authenticity. Many of Trump’s most ardent supporters seem unable to process basic information, preferring conspiracy theories and “satisfying stories” to expertise and careful deliberation. No wonder people are angry … they are both blind and deaf to the truth.

Roger Cohen of the New York Times puts it succinctly:

“A know-nothing tide is upon us. Tribal politics, anchored in tribal media, has made knowing nothing a badge of honor. Ignorance, loudly declaimed, is an attribute, especially if allied to celebrity. Facts are dispensable baggage. To display knowledge, the acquisition of which takes time, is tantamount to showing too much respect for the opposition tribe, who know nothing anyway.”

It would seem a simple thing to set these people straight. In fact, if we look at the full paragraph of Jefferson’s “blood of patriots and tyrants” quote, we see that he outlines a solution plainly (highlighted):

“The people can not be all, and always, well informed. The part which is wrong will be discontented in proportion to the importance of the facts they misconceive. If they remain quiet under such misconceptions it is a lethargy, the forerunner of death to the public liberty … What country ever existed a century and a half without a rebellion? And what country can preserve it’s liberties if their rulers are not warned from time to time that their people preserve the spirit of resistance? Let them take arms. The remedy is to set them right as to facts, pardon and pacify them. What signify a few lives lost in a century or two? The tree of liberty must be refreshed from time to time with the blood of patriots and tyrants. It is it’s natural manure.”

Jefferson obviously felt that having a well-informed citizenry (via education and the free press) would eliminate or at least reduce the majority of these types of conflicts. But the path to enlightenment isn’t always so easy. A 2000 study by political scientists at the University of Illinois at Urbana-Champaign found that citizens with incorrect information can be divided into two groups, the misinformed and the uninformed.

“The difference between the two is stark. Uninformed citizens don’t have any information at all, while those who are misinformed have information that conflicts with the best evidence and expert opinion … the most misinformed citizens tend to be the most confident in their views and are also the strongest partisans. These folks fill the gaps in their knowledge base by using their existing belief systems. Once these inferences are stored into memory, they become ‘indistinguishable from hard data.'”

In other words, you can’t simply “set them right as to the facts” because they already have fake facts embedded in their heads. To make matters worse, another study found that attempts to correct people’s misconceptions often caused them to hold on to their opinions more tightly. This defensive processing (the “backfire effect”) allows politicians like Trump to fill people’s heads with nonsense while keeping them fully engaged and politically active. He is their friend and savior … the only person willing to tell them the truth.

Writing in FiveThirtyEight, Anne Pluta states breaks down the incentive to deceive people:

“For most politicians, it doesn’t make sense to use precious resources to try to move or dissuade people from their incorrect positions — especially if this misinformation supports the political actor’s policy positions or legislative goals (as it does in Trump’s case).”

So if some politicians are actively working against the establishment of a well-informed citizenry, how can we apply Jefferson’s remedy? We will explore this remedy – and why it is struggling during this election — in the next sections.

Part 1 – The Misinformed
Part 2 – The Captive Press
Part 3 – The Politicization of Education
Part 4 – The Information Virus


Discovering New Opportunities for Urban Design in American Cities

“What a city has to say must find expression in its architecture.” Walter Wallmann, Lord Mayor of Frankfurt/Main

Here in the U.S., we tend to think of our built environment –- our cities and towns — as mostly finished. Sure, we might tear down a few old buildings or add a few new subdivisions around the margins but, in general, the urban fabric is completely “baked” and there are no more opportunities for growth or expansion. This attitude is particularly true when it comes to urban design, which attracts little attention from the average citizen and perhaps even less from the average politician. This is too bad because I think the best days of American architecture and urban planning are ahead of us.

In many ways our country is actually still in pioneering mode. Most of our buildings are strictly functional, made of simple materials and designed to last the length of the loan used to pay for their construction. We build fast and haphazardly. The foundations we dig rarely sit on anything other than virgin soil. We pave over what is essentially prime agricultural land. Our buildings are often the first structures on a particular site in the entire history of mankind. Contrast this with Europe, where relatively new buildings might share space with Roman ruins and parking lot renovations turn into archeological digs. This constant confrontation with history requires a different mindset and leads to different approaches to design.

A recent visit to Scandinavia brought these differences into stark contrast for me. I was walking down a pedestrian street in Stockholm, Sweden and I noticed a pair of concrete barricades blocking cars from entering the area. These weren’t your typical Jersey barriers, however, because they were cleverly sculpted to look like lions – functional pieces of art incorporated into the streetscape instead of utilitarian eyesores. The Stockholm “lejon” barriers were created by artist Anders Årfelt in 1995 as a way to guard pedestrians from traffic while encouraging human interaction. They have since become popular local landmarks.

Lion Equation

What I liked about this solution is that it elegantly combines modern safety features with Swedish history (lions were added to the Swedish coat of arms in the 15th century) and a tradition of stone guardians that can be traced back to the Han Dynasty. Many of the castles we toured during our vacation had stone or marble lions (AKA “foo dogs”) at the gates and so these modern traffic barriers help enhance a strong local lineage.

Although the U.S. has historically made extensive use of similar icons, their popularity has yet to translate into more whimsical — yet practical — uses like the Stockholm lion barriers. This is because the current American approach to public design is limited by a strong cultural reluctance to invest in the community. To put it bluntly, no-frills design is considered a better use of taxpayer money. In these austere times, local governments can barely scrape together enough cash to collect the garbage, let alone pay artists to design something that taxpayers feel is unnecessary.

This line of thinking is unfortunate because there’s a lot of value in making cities more attractive to both residents and tourists. A recent study by the Knight Foundation found that physical beauty and opportunities for socializing helped strengthen the emotional bonds people had with their communities and that higher levels of community attachment were associated with increased GDP and stronger economies. This link between aesthetics and economics was made explicitly clear in the Stockholm example when my wife and I discovered a small shop inside the main train station that sold miniature replicas of the concrete lion barriers as paperweights. I paid a few hundred Swedish Kroner for the memento … money that circulated directly back into the local economy solely because somebody thought it would be a cool idea to make something special. Try doing that with a Jersey barrier!

Two Lions

Anatomy of an Analysis (Part 2) – The Enrichening

In the first part of this analysis, I turned a short list of movies into a database that could be used to answer basic questions about the list’s contents. Now I’d like to broaden this analysis by combining the original list with additional outside information — a process called data enrichment.

First, I needed to find and process a new set of data. In this case, I chose a list of the Best Movies of All Time compiled by popular film review aggregator, Rotten Tomatoes because I thought it might include movies that were more popular with a general audience. The RT list ranks movies by their adjusted Tomatometer rating (as of mid-August 2015) and pulls out the top 100. I copied this list over to a spreadsheet and created fields for Rank, Film, Year, and Decade.

Once this information was ready, I used the name of the movie itself to join the RT list to the original BBC list. This approach, while perfectly reasonable, does come with a certain level of risk because the two sources do not always match perfectly. When that happens you have to match the information by hand. Can you spot the problems associated with each pair of names below?

Best Movies (RT) Greatest American Films (BBC)
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb Dr Strangelove
E.T. the Extra-Terrestrial ET: The Extra-Terrestrial
It’s a Wonderful Life It’s a Wonderful Life
One Flew Over the Cuckoo’s Nest One Flew Over the Cuckoo’s Nest
Schindler’s List Schindler’s List
The Godfather Part II The Godfather Part II

The first mismatch is pretty obvious because Rotten Tomatoes includes the full tagline for the movie Dr. Strangelove in the title while the BBC does not. However, there are also some subtle differences in punctuation (such as the period after the abbreviation of “doctor” in the first column) that would still cause problems during a join. These punctuation issues show up more clearly with the second item which has differences in both the abbreviation of “E.T.” and the inclusion of a colon in the BBC version of the title. It gets more subtle from there!

The next three movie titles all contain a contraction or a possessive noun but one source uses an apostrophe while the other uses a single quotation mark. (To make this problem even harder to spot, some web browsers render them both the same. Check the page source.). Finally, the last paired items look identical … except that the first listing of The Godfather Part II includes a trailing space. Pretty esoteric, I know, but that is life in the data world.

With the two data sources aligned, I then created my final enhanced database and explored the information using another pivot table.
The first thing I noticed when I compared the BBC list to the Rotten Tomatoes list is that they only had 22 films in common. This surprised me a little at first but it makes sense when you realize that the RT list is not limited to American films. It also seemed to support my initial instinct that the RT database would contain many more recent films due to its online format.

TOP DECADES IN FILM (BBC vs. Rotten Tomatoes)

A quick look at the films by source and decade (above) shows a huge number of recent films in the RT listing (including one, Mad Max: Fury Road, that was still in theaters when I first downloaded the data). It is also interesting to note that the spike in “best” movies for Rotten Tomatoes occurs in the 1950s instead of the 1970s. However, the large number of foreign films in the RT list for the 1950s leads quickly to discussions of Japan’s “Golden Age” of cinema during that time period.


Another interesting view of this information can be seen when you compare the two ranked lists side-by-side. The chart above shows the 22 films that appear on both lists with a line connecting their two ranks. This makes it easy to see where the sources agree and where they disagree. Several of the critical darlings (Citizen Kane, The Godfather, Singin’ in the Rain, and North by Northwest) also rank high on the RT list while others (many of them from the American New Wave period of the 1970s) show a drop in popularity. Meanwhile, other classically popular films like The Wizard of Oz and ET: The Extra-Terrestrial float upward.

Anatomy of an Analysis (Part 1)

A few weeks ago, the BBC News produced a list of the top 100 greatest American films based on input from critics from around the world.

Here are the top ten films presented in rank order:

  1. Citizen Kane (Orson Welles, 1941)
  2. The Godfather (Francis Ford Coppola, 1972)
  3. Vertigo (Alfred Hitchcock, 1958)
  4. 2001: A Space Odyssey (Stanley Kubrick, 1968)
  5. The Searchers (John Ford, 1956)
  6. Sunrise (FW Murnau, 1927)
  7. Singin’ in the Rain (Stanley Donen and Gene Kelly, 1952)
  8. Psycho (Alfred Hitchcock, 1960)
  9. Casablanca (Michael Curtiz, 1942)
  10. The Godfather Part II (Francis Ford Coppola, 1974)

There is really nothing too surprising here. Perennial favorite Citizen Kane tops the list followed by The Godfather and Vertigo — two of the most famous films (by two of the most famous directors) ever produced. Perusing the full list, you might recognize a few other titles and maybe think about adding some of them to your Netflix queue. But that’s about it. Aside from a handful of ancillary stories, there was little additional commentary to draw you deeper into the story. Sensing an opportunity, I decided to use this list to demonstrate the steps involved in a quick and simple analysis of data found “in the wild.”

Here follows a demonstration of my 5-step program for data analysis:


The BBC asked each critic to submit a list of the ten films they felt were the greatest in American cinema (“… not necessarily the most important, just the best.”). For the project, an “American film” was defined as any movie that received funding from a U.S. source. This criteria included many films by foreign directors as well as films shot outside of the country. The highest ranking films on each list received ten points, the next film down received nine points, and so on. The tenth pick received one point. All the points were then tallied to produce the final list.


Even though the resulting “listicle” is fairly simple, it contains a lot of interesting information just waiting to be freed from its icy confines. I pulled the list into Excel and used some very basic string (text) manipulation to create four basic fields from each row of information:

Additional manipulation of the “Year” field yields a useful grouping category:

With the creation of these five fields, I now have a flexible database instead of a rigid list.


The final data set was “stored” as a table in a simple spreadsheet. Although I have many problems using Excel for data storage (more on that in a future post), it is a quick and easy way to organize small sets of data.


Once the data was in the format I wanted, I created a pivot table that allowed me to manipulate information in different ways. I was particularly interested in answering questions like “Who are the top directors?”, “When were most of these films made?”, and “Was there ever a ‘Golden Age’ of modern cinema?” Most of these questions can be answered through simple grouping and summarization.


After all that work, it’s time to pull together the results and display them in some way. For this exercise, that means a few simple tables and charts:


Director # of Films in the Top 100
Stanley Kubrick 5
Steven Spielberg 5
Alfred Hitchcock 5
Billy Wilder 5
Francis Ford Coppola 4
Howard Hawks 4
Martin Scorsese 4
John Ford 3
Orson Welles 3
Charlie Chaplin 3


Year # of Films in the Top 100
1975 5
1980 4
1974 4
1959 4
1939 3
1941 3
1977 3
1946 3
1994 3


These simple presentation tools start to tell some interesting stories and — like all good analysis tools — start to hint at additional avenues of exploration. For example, while two of the directors with five films in the Top 100 (Kubrick, Hitchcock) also made it into the Top 10, the other two (Spielberg and Wilder) did not … why? The year with the most films on the list was 1975 … what were the films? The 1970s account for over 20% of the films on the list … what was going on in the culture that lead to this flowering of expression?

It would have been really great if the BBC article had included some sort of interactive tool that allowed readers to explore the database themselves. I will see what I can do to tackle this in an upcoming post.

Jon Stewart on Misinformation

I just finished watching Jon Stewart’s final episode of The Daily Show and I was glad see that his parting speech addressed the topic of misinformation (aka: bullshit) and how to recognize it. The following is a rough transcript:

TRANSCRIPT: Jon Stewart delivers speech on “Bulls–t” during his final episode hosting “The Daily Show” Wednesday night on Comedy Central

Welcome back! Anyway, about the debate. I don’t have anything for you.

We’ve seen the correspondents. We’ve met everyone who works here. And now I feel like I should probably say something. So maybe one last time, maybe a little — if you want to — maybe a little camera three.

Bullshit is everywhere.

Are the kids still here? We’ll deal with that later.

Bullshit is everywhere. There is very little you will encounter in life that has not been, in some ways, infused with bullshit — not all of it bad. General day-to-day free range is often necessary, or at least innocuous: “Oh, what a beautiful baby. I’m sure he’ll grow into that head.” That kind of bullshit in many ways provides important social-contract fertilizer and keeps people from make each other cry all day. But then there’s the more pernicious bullshit, your premeditated institutional bullshit designed to obscure and distract. Designed by whom? The bullshitacracy.

It comes in three basic flavors.

One, making bad things sound like good things. “Organic, all-natural cupcakes” … because factory made sugar oatmeal balls doesn’t sell. “Patriot Act” … because “Are You Scared Enough to Let Me Look at All Your Phone Records Act” doesn’t sell. Whenever something is titled freedom, fairness, family, health, and America, take a good long sniff. Chances are it’s been manufactured in a facilitate that may contain traces of bullshit.

Number two, the second way, hiding the bad things under mountains of bullshit. Complexity — you know, I would love to download Drizzy’s latest Meek Mill diss. (Everyone promised me that that made sense.) But I’m not really interested right now in reading Tolstoy’s iTunes agreement, so I’ll just click “agree” even if it grants Apple prima noctae with my spouse. Here’s another one — simply put, banks shouldn’t be able to bet your pension money on red. Bullshitly put, it’s — hey, this. Dodd-Frank. Hey, a handful of billionaires can’t buy our elections, right? Of course not. They can only pour unlimited anonymous cash into a 501(c)4 other wise they’d have to 501(c)6 it, or funnel it openly through a non-campaign coordinated Super PAC … “I think they’re asleep now. We can sneak out.”

And finally — finally, it’s the Bullshit of infinite possibility. These bullshitters cover their unwillingness to act under the guise of unending inquiry. We can’t do anything because we don’t yet know everything. We cannot take action on climate change until everyone in the world agrees gay marriage vaccines won’t cause our children to marry goats who are going to come for our guns. Until then, I say teach the controversy.

Now, the good news is this– bullshitters have gotten pretty lazy, and their work is easily detected. And looking for it is a pleasant way to pass the time like an “I Spy” of bullshit. I say to you tonight friends the best defense against bullshit is vigilance. So if you smell something, say something.

Thanks for everything, Mr. Stewart. We couldn’t have made it through these last 16 years without you.