I was filling out the annual report forms for Ideas Illustrated LLC a few weeks ago and noticed that my original filing date was May 11, 2004… making today my 10th anniversary! It’s hard to believe that a full decade has passed since my wife and I sat around brainstorming ideas for a company. It’s been a fun ride so far, with a several great side projects, a well-regarded blog, and a lot of new challenges. It hasn’t made me a millionaire but it has put some extra cash in my pocket and probably saved my sanity on more than one occasion. Here’s to ten more years!
“There are two ways to be fooled. One is to believe what isn’t true; the other is to refuse to believe what is true.” ~ Søren Kierkegaard
Back in 2010, I wrote a short post about some of the problems associated with getting all of your news and information from biased sources. It was essentially a call for people to hone their critical thinking skills and take steps toward establishing a more reality-based approach to decision-making.
Unfortunately, people don’t like challenging their existing beliefs very much because it can be pretty uncomfortable. They prefer sources of information that support their established worldviews and generally ignore or filter out those that don’t. In our modern society, this confirmation bias supports an entire ecosystem of publishers, news outlets, TV shows, bloggers, and radio announcers designed to serve up pre-filtered opinion disguised as fact.
For many people, the glossy veneer of the news entertainment complex is all they want or need. As David McRaney so succinctly states in his blog:
Whether or not pundits are telling the truth, or vetting their opinions, or thoroughly researching their topics is all beside the point. You watch them not for information, but for confirmation.
The problem with this approach is that — every now and then — fantasy runs into cold, hard reality and gets the sh*t kicked out of it.
This was what happened during the 2012 Presidential election cycle. Talking heads on both ends of the political spectrum had spent months trying to sway their audiences with confident declarations of victory and vicious denials of opposing statements. By the week of the election, the conservative media in particular had created such a self-reinforcing bubble of polls and opinions that any hints of trouble were shouted down and ignored. Pundits reserved particularly strong venom for statistician Nate Silver, whose FiveThirtyEight blog in the New York Times had upped the chances of an Obama win to a seemingly outrageous 91.4% the Monday before the election.
The furor reached its peak with Karl Rove’s famous on-air exchange with FOX news anchor, Megyn Kelly, and rippled through the conservative echo chamber after the polls closed. There was a lot of soul searching over the next few days, with many people taking direct aim at the conservative media for its failure to present accurate information to its audience. This frustration was summed up clearly by one commenter on RedState, a right-leaning blog:
“I can accept that my news is not really ‘news’ like news in Cronkite’s day, but a conservative take on the news. But it’s unacceptable that Rasmussen appears to have distinguished themselves from everyone else in their quest to shade the numbers to appease us, the base. I didn’t even look at other polls, to tell the truth, trusting that their methodology was more sound because it jived with what I was hearing on Fox and with people I talked to. It pains me to say this, but next time I want a dose of hard truth, I’m looking to Nate Silver, even if I don’t like the results.”
It was a teachable moment and Nate Silver — no fan of pundits — suggested that the fatal flaw in the approach taken by most of these political “experts” was that they based their forecasts less on evidence and more on a strong underlying ideology. Their core beliefs — “ideological priors” as Silver calls them — colored their views on everything and made it difficult to read such an uncertain situation correctly. It was time for something new.
In his book, The Signal and the Noise, Silver elaborates on the work of Philip Tetlock, who found that people with certain character traits typically made more accurate predictions than those without these traits. Tetlock identified these two different cognitive styles as either “fox” (someone who considers many approaches to a problem) or “hedgehog” (someone who believes in one Big Idea). There has been much debate about which one represents the best approach to forecasting but Tetlock’s research clearly favors the fox.
Tetlock’s ideas as summarized by Silver:
|Fox-Like Characteristics||Hedgehog-Like Characteristics|
|Multidisciplinary – Incorporates ideas from a range of disciplines||Specialised – Often dedicated themselves to one or two big problems & are sceptical of outsiders|
|Adaptable – Try several approaches in parallel, or find a new one if things aren’t working||Unshakable – New data is used to refine an original model|
|Self-critical – Willing to accept mistakes and adapt or even replace a model based on new data||Stubborn – Mistakes are blamed on poor luck|
|Tolerant of complexity – Accept the world is complex, and that certain things cannot be reduced to a null hypothesis||Order seeking – Once patterns are detected, assume relationships are relatively uniform|
|Cautious – Predictions are probabilistic, and qualified||Confident – Rarely change or hedge their position|
|Empirical – Observable data is always preferred over theory or anecdote||Ideological – Approach to predictive problems fits within a similar view of the wider world|
|Better Forecasters||Weaker Forecasters|
Nate Silver also prefers the fox-like approach to analysis and even chose a fox logo for the relaunch of his FiveThirtyEight blog. As befitting a fox’s multidisciplinary approach to problems, his manifesto for the site involves blending good old-fashioned journalism skills with statistical analysis, computer programming, and data visualization. (It is essentially a combination of everything we’ve been saying about data science + data-literate reporting.)
This approach is very similar to the standard data science process.
- Data Collection – Performing interviews, research, first-person observation, polls, experiments, or data scraping
- Organization – Developing a storyline, running descriptive statistics, placing data in a relational database, or building a data visualization.
- Explanation – Performing traditional analysis or running statistical tests to look for relationships in the data.
- Generalization – Verifying hypotheses through predictions or repeated experiments.
Like data science, data journalism involves finding meaningful insights from a vast sea of information. And like data science, one of the biggest challenges to data-driven journalism is convincing people to actually listen to what the data is telling them. After FiveThirtyEight posted its prediction of a possible change in control of the Senate in 2014, Democrats have reacted with the same bluster as Republicans did back in 2012. At about the same time, economist Paul Krugman started a feud with Silver over — in my view — relatively minor journalistic differences. Meanwhile, conservatives gleeful at this apparent Leftie infighting continue to predict Silver’s ultimate failure because they still believe that politics is more art than science.
This seems to be a fundamental misunderstanding of what Silver and others like him are trying to do. Rather than look at how successful Silver’s forecasting methodology has been at predicting political results, most people seem to be treating him as just another pundit who has joined the political game. Lost in all of the fuss is his attempt to bring a little more scientific rigor to an arena that is dominated by people who generally operate on intuition and gut instinct. I’m certainly not trying to elevate statisticians and data journalists to god-like status here but it is my hope that people will start to recognize the value of unbiased evaluation and include it as one of their tools for gathering information. When it’s fantasy vs. reality, it is always better to be armed with the facts.
- November 14, 2016 – Whew! Less than a week after the 2016 election and predictive analytics has taken a beating. The discussion of salvaging the reputations of Nate Silver and other political scientists is just beginning: https://www.theatlantic.com/education/archive/2016/11/is-political-science-another-election-casualty/507515/.
- March 31, 2014 – Just came across another good article on media bubbles: http://talkingpointsmemo.com/livewire/nate-silver-politico-co-founders-lack-curiosity-for-the-world-outside-of-the-bubble.
My family visited Washington D.C. last year for Spring Break and, during our 12-hour drive, I remember noticing a subtle change in the names of the cities and towns we were passing through. In the beginning, the place names had a familiar mid-western flavor; one that mixed Native American origins (e.g. Milwaukee, Chicago) with bits of French missionary and 19th-century European settler. The names slowly took on a more Anglo-Saxon bent as we moved east, traveling through spots like Wexford, PA, Pittsburgh, PA, Gaithersburg, MD, Boonsboro, MD, Hagerstown, MD, and Reston, VA.
We have English-sounding place names in Wisconsin, of course, including highfalutin towns like Brighton, Kingston, and New London, but they seem to get overwhelmed by the sheer number of places with syllables like “wau”, “kee”, and “sha” (or all three combined). Many of these town names can be difficult for “outsiders” to pronounce and the spelling is all over the place since they were often coined by non-native speakers who’d misheard the original words. (The Native American word for “firefly”, for example, is linked to variations like Wauwatosa (WI), Wawatasso (MN), and Wahwahtaysee Way (a street in MI).)
I thought it would be interesting to see if there were any patterns to these U.S. place names or toponyms so I pulled a list of Census Places and extracted the most frequent letter combinations from the names of the country’s cities, towns, and villages. I tried to isolate true prefixes and suffixes by remove any letter pairings that were simply common to the English language and I then counted up the number of times each word root appeared and ranked them by state.
Top 10 Word Roots by State
After looking over the top word roots by state, I was interested in seeing more detail so I calculated a location quotient for some of the most common word roots and plotted these out by county. Click on the maps for a larger D3 map.
Location Quotient for “ton”
The word town derives from the Germanic word for “fence” or “fenced settlement.” In the U.S., the use of -ton/-town to honor important landowners or political leaders began before the American Revolution (think Jamestown, VA or Charleston, SC) and continued throughout the settlement of the country. (Interestingly, my hometown of Appleton, WI was named for philanthropist Samuel Appleton and is not a true town-based word root.)
Location Quotient for “boro/borough”
The word borough originates from the Germanic word for “fort” and has many common variations, including suffixes like -borough/-boro, and -burgh/-burg. Like -ton/-town, these place name suffixes became popular in the 18th century and were used extensively throughout New England and the Atlantic coastal colonies. You can see how dominant the -boro/-borough suffix is in the upper Northeast.
Location Quotient for “ville”
The suffix “ville” comes from the French word for “farm” and is the basis for common words like “villa” and “village”. The use of the suffix -ville for the names of cities and towns in the U.S. didn’t really begin until after the Revolution, when pro-French sentiment spread throughout the country — particularly in the South and Western Appalachian regions. The popularity of this suffix began to decline in the middle of the 19th century but you can still see it’s strong influence in the southern states.
Location Quotient for “san/santa”
The Spanish colonial period in the Americas left a large legacy of Spanish place names, particularly in the American West and Southwest. Many of the Californian coastal cities were named after saints by early Spanish explorers, while other cities in New Spain simply included the definite article (“la”, “el”, “las” and “los) in what was often a very long description (e.g. “El Pueblo de Nuestra Señora la Reina de los Ángeles del Río de Porciúncula” … now known simply as Los Angeles or LA). The map shows the pattern for the San/Santa prefix, which is strong on the West Coast and weaker inland, where it may actually be an artifact of some Native American word roots.
Location Quotient for “Lake/Lakes”
The practice of associating a town with a nearby body of water puts a wrinkle into the process of tracking of place names (the history of “hydronyms” being an entirely different area of study) but it was common in parts of the country that were mapped by explorers first and settled later. This can be seen in the prevalence of town names with word roots like Spring, Lake, Bay, River, and Creek.
Location Quotient for “wau”
Here is the word root that started this whole line of inquiry. It is apparently a very iconic Wisconsin toponym, with even some of the outlying place names having Wisconsin roots (the city of Milwaukie in Clackamas County, Oregon was named after Milwaukee, Wisconsin in the 1840s).
Although the fields of statistics, data analysis, and computer programming have been around for decades, the use of the term “data science” to describe the intersection of these disciplines has only become popular within the last few years.
The rise of this new specialty — which the Data Science Association defines as “the scientific study of the creation, validation and transformation of data to create meaning” — has been accompanied by a number of heated debates, including discussions about its role in business, the validity of specific tools and techniques, and whether or not it should even be considered a science. For those convinced of its significance, however, the most important deliberations revolve around finding people with the right skills to do the job.
On one side of this debate there are those purists who insist that data scientists are nothing more than statisticians with fancy new job titles. These folks are concerned that people without proper statistics training are trying to horn in on a rather lucrative gig without getting the necessary training. Their solution is to simply ignore the data science buzzword and hire a proper statistician.
At the other end of the spectrum are people who are convinced that making sense out of large data sets requires more than just number-crunching skills, it also requires the ability to manipulate the data and communicate insights to others. This view is perhaps best represented by Drew Conway’s data science venn diagram and Mike Driscoll’s blog post on the three “sexy skills” of the data scientist. In Conway’s case, the components are computer programming (hacking), math and statistics, and specific domain expertise. With Driscoll, the key areas are statistics, data transformation — what he calls “data munging” — and data visualization.
The main problem with this multi-pronged approach is that finding a single individual with all of the right skills is nearly impossible. One solution to this dilemma is to create teams of two or three people that can collectively cover all of the necessary areas of expertise. However, this only leads to the next question, which is: What roles provide the best coverage?
In order to address this question, I decided to start with a more detailed definition of the process of finding meaning in data. In his PhD dissertation and later publication, Visualizing Data, Ben Fry broke down the process of understanding data into seven basic steps:
- Acquire – Find or obtain the data.
- Parse – Provide some structure or meaning to the data (e.g. ordering it into categories).
- Filter – Remove extraneous data and focus on key data elements.
- Mine – Use statistical methods or data mining techniques to find patterns or place the data in a mathematical context.
- Represent – Decide how to display the data effectively.
- Refine – Make the basic data representations clearer and more visually engaging.
- Interact – Add methods for manipulating the data so users can explore the results.
These steps can be roughly grouped into four broad areas: computer science (acquire and parse data); mathematics, statistics, and data mining (filter and mine); graphic design (represent and refine); and information visualization and human-computer interaction (interaction).
In order to translate these skills into jobs, I started by selecting a set of occupations from the Occupational Information Network (O*NET) that I thought were strong in at least one or two of the areas in Ben Fry’s outline. I then evaluated a subset of skills and abilities for each of these occupations using the O*NET Content Model, which allows you to compare different jobs based on their key attributes and characteristics. I mapped several O*NET skills to each of Fry’s seven steps (details below).
Acquire (Computer Science)
- Learning Strategies – Selecting and using training/instructional methods and procedures appropriate for the situation when learning or teaching new things.
- Active Listening – Giving full attention to what other people are saying, taking time to understand the points being made, asking questions as appropriate, and not interrupting at inappropriate times.
- Written Comprehension – The ability to read and understand information and ideas presented in writing.
- Systems Evaluation – Identifying measures or indicators of system performance and the actions needed to improve or correct performance, relative to the goals of the system.
- Selective Attention – The ability to concentrate on a task over a period of time without being distracted.
- Memorization – The ability to remember information such as words, numbers, pictures, and procedures.
- Oral Comprehension – The ability to listen to and understand information and ideas presented through spoken words and sentences.
- Technology Design – Generating or adapting equipment and technology to serve user needs.
Parse (Computer Science)
- Reading Comprehension – Understanding written sentences and paragraphs in work related documents.
- Category Flexibility – The ability to generate or use different sets of rules for combining or grouping things in different ways.
- Troubleshooting – Determining causes of operating errors and deciding what to do about it.
- English Language – Knowledge of the structure and content of the English language including the meaning and spelling of words, rules of composition, and grammar.
- Programming – Writing computer programs for various purposes.
Filter (Mathematics, Statistics, and Data Mining)
- Flexibility of Closure – The ability to identify or detect a known pattern (a figure, object, word, or sound) that is hidden in other distracting material.
- Judgment and Decision Making – Considering the relative costs and benefits of potential actions to choose the most appropriate one.
- Critical Thinking – Using logic and reasoning to identify the strengths and weaknesses of alternative solutions, conclusions or approaches to problems.
- Active Learning – Understanding the implications of new information for both current and future problem-solving and decision-making.
- Problem Sensitivity – The ability to tell when something is wrong or is likely to go wrong. It does not involve solving the problem, only recognizing there is a problem.
- Deductive Reasoning – The ability to apply general rules to specific problems to produce answers that make sense.
- Perceptual Speed – The ability to quickly and accurately compare similarities and differences among sets of letters, numbers, objects, pictures, or patterns. The things to be compared may be presented at the same time or one after the other. This ability also includes comparing a presented object with a remembered object.
Mine (Mathematics, Statistics, and Data Mining)
- Mathematical Reasoning – The ability to choose the right mathematical methods or formulas to solve a problem.
- Complex Problem Solving – Identifying complex problems and reviewing related information to develop and evaluate options and implement solutions.
- Mathematics – Using mathematics to solve problems.
- Inductive Reasoning – The ability to combine pieces of information to form general rules or conclusions (includes finding a relationship among seemingly unrelated events).
- Science – Using scientific rules and methods to solve problems.
- Mathematics – Knowledge of arithmetic, algebra, geometry, calculus, statistics, and their applications.
Represent (Graphic Design)
- Design – Knowledge of design techniques, tools, and principles involved in production of precision technical plans, blueprints, drawings, and models.
- Visualization – The ability to imagine how something will look after it is moved around or when its parts are moved or rearranged.
- Visual Color Discrimination – The ability to match or detect differences between colors, including shades of color and brightness.
- Speed of Closure – The ability to quickly make sense of, combine, and organize information into meaningful patterns.
Refine (Graphic Design)
- Fluency of Ideas – The ability to come up with a number of ideas about a topic (the number of ideas is important, not their quality, correctness, or creativity).
- Information Ordering – The ability to arrange things or actions in a certain order or pattern according to a specific rule or set of rules (e.g., patterns of numbers, letters, words, pictures, mathematical operations).
- Communications and Media – Knowledge of media production, communication, and dissemination techniques and methods. This includes alternative ways to inform and entertain via written, oral, and visual media.
- Originality – The ability to come up with unusual or clever ideas about a given topic or situation, or to develop creative ways to solve a problem.
Interact (Information Visualization and Human-Computer Interaction)
- Engineering and Technology – Knowledge of the practical application of engineering science and technology. This includes applying principles, techniques, procedures, and equipment to the design and production of various goods and services.
- Education and Training – Knowledge of principles and methods for curriculum and training design, teaching and instruction for individuals and groups, and the measurement of training effects.
- Operations Analysis – Analyzing needs and product requirements to create a design.
- Psychology – Knowledge of human behavior and performance; individual differences in ability, personality, and interests; learning and motivation; psychological research methods; and the assessment and treatment of behavioral and affective disorders.
Using occupational scores for these individual ONET skills and abilities, I was able to assign a weighted value to each of Ben Fry’s categories for several sample occupations. Visualizing these skills in a radar graph shows how different jobs (identified using standard SOC or ONET codes) place different emphasis on the various skills. The three jobs below have strengths that could be cultivated and combined to meet the needs of a data science team.
Another example includes occupations that fall outside of the usual sources of data science talent. You can see how — taken together — these non-traditional jobs can combine to address each of Fry’s steps.
According to a recent study by McKinsey, the U.S. “faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions” based on data. Instead of fighting over these scarce resources, companies would do well to think outside of the box and build their data science teams from unique individuals in other fields. While such teams may require additional training, they bring a set of skills to the table that can boost creativity and spark innovative thinking — just the sort of edge companies need when trying to pull meaning from their data.
May 2, 2014 – The folks over at DarkHorse Analytics put together a list of the “five faces” of analytics. Great article.
- Data Steward – Manages the data and uses tools like SQL Server, MySQL, Oracle, and maybe some more rarified tools.
- Analytic Explorer – Explores the data using math, statistics, and modeling.
- Information Artist – Organizes and presents data in order to sell the results of data exploration to decision-makers.
- Automator – Puts the work of the Explorer and Visualizer into production.
- The Champion – Helps put all of the pieces in place to support an analytics environment.
A former co-worker of mine always used to joke about our company’s customer database by posing the deceptively simple question: “How many ways can you spell ‘IBM’?” In fact, the number of unique entries for that particular client was in the dozens. Here is a sample of possible iterations, with abbreviations alone counting for several of them:
- I B M
- I. B. M.
- IBM CORP
- IBM CORPORATION
- INTL BUS MACHINES
- INTERNATION BUSINESS MACHINES
- INTERNATIONAL BUSINESS MACHINES
- INTERNATIONAL BUSINESS MA
I thought of this anecdote recently while I was reading an article about the government’s Terrorist Identities Datamart Environment list (TIDE), an attempt to consolidate the terrorist watch lists of various intelligence organizations (CIA, FBI, NSA, etc.) into a single, centralized database. TIDE was coming under scrutiny because it had failed to flag Tamerlan Tsarnaev (the elder suspect in the Boston Marathon bombings) as a threat when he re-entered the country in July 2012 after a six-month trip to Russia. It turns out that Tsarnaev’s TIDE entry didn’t register with U.S. customs officials because his name was misspelled and his date of birth was incorrect.
These types of data entry errors are incredibly common. I keep a running list of direct marketer’s misspellings of my own last name and it currently stands at 22 variations. In the data world, these variation can be described by their “edit distance” or Levenshtein distance — the number of single character changes, deletions, or insertions required to correct the entry.
|Actual Name||Phonetic Misspellings||Dropped Letters||Inserted Letters||Converted Letters|
Many of these typographical mistakes are the result of my own poor handwriting, which I admit can be difficult to transcribe. However, if marketers have this much trouble with a basic, five-letter last name, you can imagine the problems the feds might have with a longer foreign name with extra vowels, umlauts, accents, and other flourishes thrown in for good measure. Add in a first name and a middle initial and the list of possible permutations grows quite large … and this doesn’t even begin to address the issue of people with the same or similar names. (My own sister gets pulled out of airport security lines on a regular basis because her name doppelgänger has caught the attention of the feds.)
The standard solutions for these types of problems typically involve techniques like fuzzy matching algorithms and other programmatic methods for eliminating duplicates and automatically merging associated records. The problem with this approach is that it either ignores or downplays the human element in developing and maintaining such databases.
My personal experience suggests that most people view data and databases as an advanced technological domain that is the exclusive purview of programmers, developers, and other IT professionals. In reality, the “high tech” aspect of data is limited to its storage and manipulation. The actual content of databases — the data itself — is most decidedly low tech … words and numbers. By focusing popular attention almost exclusively on the machinery and software involved in data processing, we miss the points in the data life-cycle where most errors start to creep in: the people who enter information and the people who interpret it.
I once worked at a company where we introduced a crude quality check to a manual double-entry process. If two pieces of information didn’t match, the program paused to let the person correct their mistake. The data entry folk were incensed! The automatic checks were bogging down the process and hurting their productivity. Never mind that the quality of the data had improved … what really mattered was speed!
On the other hand, I’ve also seen situations where perfectly capable people had difficulty pulling basic trends from their Business Intelligence (BI) software. The reporting deployments were so intimidating that people would often end up moving their data over to a copy of Microsoft Excel so they could work with a more familiar tool.
In both cases, the problem wasn’t the technology per se, but the way in which humans interacted with the technology. People make mistakes and take shortcuts … it is a natural part of our creativity and self-expression. We’re just not cut out to follow the exacting standards of some of these computerized environments.
In the case of databases like TIDE, as long as the focus remains on finding technical solutions to data problems, we miss out on what I think is the real opportunity — human solutions that focus on usability, making intuitive connections, and the ease of interpretation.
- July 7, 2013 – In a similar database failure, Interpol refused to issue a worldwide “Red Notice” for Edward Snowden recently because the U.S. paperwork didn’t include his passport number and listed his middle name incorrectly.
- January 2, 2014 – For a great article on fuzzy matching, check out the following: http://marketing.profisee.com/acton/attachment/2329/f-015e/1/-/-/-/-/file.pdf.
One of the topics that seemed to keep cropping up in the news this year was the growing power of the amateur in public life. This trend is not necessarily new but it has been gaining momentum as modern technologies make it easier for the average person to create things (i.e. books, music, videos or physical products) and deliver them to a wider audience. Combine this with an anemic economic recovery and you have the perfect environment for people striking out on their own.
American history is full of passionate amateurs who ignored societal rules or overcame an entrenched bureaucracy to introduce new and exciting ideas to our culture. We admire the business entrepreneur, the garage band, and the inventor working out of his basement. They are some of our most cherished icons and they speak to our desire to make it big on our own terms. This attitude finds its purest expression in the Do-It-Yourself (DIY) ethic, which encourages individuals to bypass specialists altogether and seek out knowledge and expertise on their own.
There are some problems with this relentless individualism, however. Taken to the extreme this skeptical attitude toward the professional “elite” can lead to the distrust — and perhaps even disdain — of true experts. People now diagnose their own medical conditions, create their own legal documents, homeschool their own children, and regularly deny the validity of scientifically accepted facts. In an article which discusses recent changes in the distribution of information, Larry Sanger talks about how the aggregation of public opinion on the Internet (what he calls the “politics of knowledge”) has eroded our very understanding and respect for reliable information:
“With the rejection of professionalism has come a widespread rejection of expertise — of the proper role in society of people who make it their life’s work to know stuff.”
Everybody’s an expert now, in the sense that we can all do our own research online and come to our own conclusions about any topic under the sun. It’s the perfect democratization of knowledge … except most of us aren’t really experts in the traditional sense. Experts typically possess a very deep understanding of a subject and are aware of its subtleties and nuances. The average person may only scratch the surface of a topic and can miss important details because they literally don’t know what they don’t know. Nobody’s seriously going to call in an amateur cardiac surgeon if they’ve got a heart problem, so why is it so easy to dismiss the work of professionals in other fields?
Before I’m accused of being elitist, let’s lay down a framework for discussing the differences between amateurs, experts, and professionals. In an article published by Wharton, Kendall Whitehouse draws the distinction between “knowledgeable enthusiasts” (amateurs) and professionals based on the editorial process (this is in a journalistic context):
“Carefully checked sources and consistent editorial guidelines are key differences between most professional and amateur content … The latter brings quickness and a personal viewpoint and the former provides analysis and consistent quality.”
While I certainly agree that results are important, there are plenty of situations where amateurs deliver results that are as good as those of professionals. In fact, the DIY community frequently uses the term amateur expert and notes that the word “amateur” stands in contrast to the commercial motivation (i.e. financial reward) of the professional, not their level of skill. Following this reasoning, a professional is not necessarily an expert, they are simply someone who happens to get paid for what they do. An amateur can still be an expert based on their skills and abilities, they just don’t get paid.
If the amateur/professional word pairing makes sense, we still need an antonym of “expert” to refer to deficiencies in skills. In this case, I would suggest the term “novice,” which is defined as someone who has very little training or experience. Essentially this means that a thorough discussion of experts and amateurs needs to account for both a financial dimension (amateur vs. professional) and a skill or experience dimension (novice vs. expert). I’ve created a quick quad chart to visualize these relationships:
If we return to our previous discussion, we can now see that the rejection of expertise does not necessarily represent support for the plucky amateur, it represents a shift toward glorification of the naive. Sure, there are times when novices can bring a fresh perspective to established practices (punk rockers and other creative outsiders come to mind). But in 2012, the growing regularity of this superficial approach led to a few very interesting — and very public — failures.
The first example is the unauthorized attempt by an elderly parishioner to restore a painting in a Spanish church over the summer. The tragi-comic results of Cecilia Gimenez’s fresco fiasco were all over the news in August and it was pretty clear to everyone that her work was a massive failure. Using our new definition, she is clearly a novice (unskilled) amateur (unpaid).
Ms. Gimenez later complained that, with all the attention that her botched restoration of Ecce Homo had gotten, she should have received some compensation for her work. This would have made her a quasi-professional, I guess, but I don’t suppose there are a lot of museums out there who’d be willing to hand over their cultural treasures to her care.
(To create your own Ecce Homo restoration, check out this site.)
The second example was the National Football League’s use of replacement referees during the early part of the 2012 season. With the regular officials locked out due to contracts negotiations, NFL management brought in referees from semi-professional football leagues, lower college divisions and even high schools in hopes that nobody would notice the difference. They noticed.
Throughout the preseason, a series of bad penalties, missed calls, and even blown coin tosses made it clear that the new guys were not ready for prime time. As the regular season progressed and the mistakes accumulated, demands for the return of the regular refs grew louder. Finally, two days after the outcome of a game between the Green Bay Packers and the Seattle Seahawks was decided by a controversial call, an agreement between the NFL and NFL Referees Association was reached. (Photo below from the Washington Post.)
NFL management clearly misjudged the level of skill needed officiate a pro football game and how quickly the replacement refs would be exposed for what they were: novice professionals. This isn’t to say that some of these guys couldn’t have developed into perfectly good officials over time. But such a high-profile occupation doesn’t really lend itself to on-the-job training.
Not all skilled workers are lucky enough to have their expertise hit the bottom line so obviously. Writing in an article about the NFL lockout, Paul Weber noted that”
“Attitudes about expertise can … make it a risky hand to play in a negotiation, depending on who’s on the other side of the table. The idea that no one is irreplaceable and there’s always a guy next in line willing to do the job run deep in America. Professing expertise can also bring on suspicions of elitism and scratch an itch to knock someone down a peg.”
This inclination can be seen clearly in my third example of the year, which involves several high-profile political pundits who insisted that Mitt Romney would win the 2012 Presidential election. When statistician Nate Silver of the New York Times began predicting an Obama victory back in June, many conservative commentators questioned both his methodology and his masculinity (offending comments have since been removed).
Despite Silver’s clear statements regarding the laws of probability, conservatives just could not get past the fact that most of their favored polls (University of Colorado, Rasmussen) showed a neck-and-neck race. In the end, the elections validated the statistical approach that Silver used and forced many people to rethink their reliance on ‘unskewed’ polls or Karl Rove’s math skills.
Although the animosity toward Silver subsided after the election, I have my doubts that his success will lead to a sudden surge in respect for professional experts. There seems to be a natural tendency in our culture to distrust anyone who stakes a claim to the truth — especially if we don’t like what they’re saying.
The most vociferous of these battles are those fought between journalists and bloggers but there are plenty of other amateur/professional pairings that set off fireworks. In a recent book review on Slate, professional writer Doree Shafrir openly wonders why anyone would be satisfied with being an amateur. To her, the only path to gratification and validation is through professional success:
“The idea of being an office drone by day and by night being, say, an amateur astronomer is completely bizarre to me. Why wouldn’t you just be an astronomer?”
To which a wise reader responds:
“The sad fact is that many of us simply aren’t good enough at what we really love to do it for a living … Or we were good, but unlucky. Or unwilling to sacrifice our families. Or we’re still living down the consequences of a previous failure.”
Amateur interests are a way for someone to gain new skills, test drive a new career, or just participate in a community despite the fact that they aren’t collecting a paycheck. The amateur/professional spectrum doesn’t just exist at the endpoints, it runs the gamut from hobbyists and tinkerers to semi-professionals and professionals. Back in 2004, a report titled The Pro-Am Revolution by Charles Leadbeater (a frequent contributor to TED), suggested that improved tools and new methods of collaboration are helping to create a breed of amateurs that hold themselves to professional standards and can even produce significant discoveries.
In the field of astronomy, these “demi-experts” had an amazing year. Recent developments in computer technology and digital imaging have allowed amateur astronomers to explore regions of the universe never before seen by non-scientists. Plus, the sky is so vast (and observation time so restricted) that serious amateurs can help professional astronomers simply by observing unrecorded (or underrecorded) stellar objects. Significant amateur finds in 2012 included: new comets; new exoplanets; explosions on Jupiter; a planet with four suns; a detailed map of Ganymede; mysterious clouds on Mars; and even previously undiscovered photos from the Hubble telescope.
While these examples make it clear that amateurs can contribute meaningfully to many fields, it is less obvious how society can avoid the pitfalls associated with the well-intended novice. The key, I think, is for everyone — from novice to expert, amateur to professional — to recognize their own limitations. Businesses want expertise but they don’t always want to pay for it. People want to do what they love but they don’t always have the time or skills to make it their career. A novice who tries to recreate the work of an expert will almost certainly fail but an amateur with passion and drive can spur innovations beyond the abilities of entrenched professionals.
These labels are fluid. All experts were once beginners and all professionals were once unpaid. People progress from novice to expert in distinct stages but they can also move from expert to novice if they change careers. In today’s job market, it even seems possible that some of us could apply all of these labels to ourselves at once. To paraphrase author Richard Bach, a professional is simply an amateur who didn’t quit.
- December 16, 2014 – io9 article on the “Beast Janus” painting (link: http://io9.com/beast-jesus-has-become-a-bona-fide-tourist-attraction-1671951451).
One of the goals I set for myself this summer was to learn a bit about D3, a visualization toolkit that can be used to manipulate and display data on the web. Considering that the trees are bare and we’ve already had our first frost here in Wisconsin, you can safely assume that I am behind schedule. Nevertheless, I feel that I’ve finally reached a point where I have something to publish, so here goes.
I wanted to start out with something simple so I elected to go with a basic line chart using data I pulled from Pro-Fooball-Reference.com. This site contains a ton of great information and statistics from the past 90+ years of the National Football League but — for now — I just looked at the final scores of all the games played from 1920 to 2011. My first D3-powered chart is below. It shows the average combined scores of winning and losing teams for each year of the NFL’s existence.
Although this chart looks pretty simple, every element — including titles, subtitles, axes, labels, grids and data lines — has been created manually using the D3 code. The payoff is pretty nice. All of the elements can be reused and you have tremendous control over what is shown onscreen. To demonstrate some of these cababilities, I’ve added interactive overlays that show a few of the major eras in NFL football (derived from work of David Neft and this discussion thread). If you move your mouse over the graph, you will see these different eras highlighted:
Early NFL (1920-1933) – The formation of the American Professional Football Association (APFA) in 1920 marked the official start of what was to become the National Football League. This era was marked by rapid formation (and dissolution) of small town franchises, vast differences in team capabilities and a focus on a relatively low-scoring running game. At this time, the pass was considered more of an emergency option than a reliable standard. The rapid growth in popularity of the NFL during this era culminated with the introduction of a championship game in 1932.
Introduction of the Forward Pass (1933-1945) – The NFL discontinued the use of collegiate football rules in 1933 and began to develop its own set of rules designed around a faster-paced, higher-scoring style of play. These innovations included the legalization of the forward pass from anywhere behind the line of scrimmage — a change that is often called the “Bronko Nagurski Rule” after his controversial touchdown in the 1932 NFL Playoff Game.
Post-War Era (1945-1959) – The end of WWII saw the expansion of the NFL beyond its East Coast and Midwestern roots with the move of the Cleveland Rams to Los Angeles — the first big-league sports franchise on the West Coast. This period also saw the end of racial segregation (enacted in the 30s) and the start of nationally televised games.
Introduction of the AFL (1959-1966) – Professional football’s surge in popularity led to the formation of a rival organization — the American Football League — in 1960. The growth of the flashy AFL was balanced by a more conservative style of play in the NFL. This style was epitomized by coach Vince Lombardi and the Green Bay Packers, who would win five championships in the 1960s. In 1966, the two leagues agreed to merge as of the 1970 season.
Dead Ball Era (1966-1977) – Driven in part by stringent restrictions on the offensive line, this period is marked by low scores and tough defensive play. Teams that thrived in this environment include some of the most famous defenses in modern NFL history: Pittsburgh’s Steel Curtain, Dallas’ Doomsday Defense, Minnesota’s Purple People Eaters and the Rams’ Fearsome Foursome.
Live Ball Era (1978-present) – Frustrated by the decreasing ability of offenses to score points in 70s, the NFL began to add rules and make other changes to the structure of the game in an attempt to boost scoring. The most famous of these initiatives was the so-called “Mel Blount Rule” (introduced in 1978), which severely restricted the defense’s ability to interfere with passing routes. With the subsequent introduction of the West Coast Offense in 1979 — an offense based on precise, short passes — this period became marked by a major focus on the passing game.
Having created this first chart, I decided to build a second chart based on the ratio of average winning scores to average losing scores to see if there were any patterns.
The chart above shows how — after a period of incredibly lopsided victories — the average scoring differential settled in to a very steady pattern by the late 1940s and stayed at that level (roughly 2:1) for the next 30 years. Despite many changes in rules, coaching techniques, technology and other factors, only the pass interference rules of the late 1970s seemed to have any signifcant effect on this ratio, shifting it to just under 1.8:1 for the next 30 years.
While I had the data available, I also decided to look at the differences in average scores between home teams and away teams. The chart below plots this data along with the same overlay I used in the first chart.
A look at the ratio of average home team scores to average away team scores follows:
What’s fascinating about this chart is how quickly a form of parity was acheived among all the NFL teams. By the mid-30s, a measurable home field advantage can be seen at roughly 15%, a rate that has remained essential constant for over 70 years. Factors for this boost could include the psychological support of fans, familiar weather conditions, unique features of local facilities, lack of travel fatigue, referee bias and/or increased levels of motivation in home town players.
Thanks to Charles Martin Reid for his solution to getting D3 and WordPress to play nice.