The Data Uncertainty Principle

Every now and then I’ll finish a big reporting project and the project lead will send me a follow-up comment like:

“Here’s hoping everything’s perfect!”

For some reason this always strikes me as a very weird thing to say. Of course it isn’t going to be perfect … it’s data. Data is only a rough approximation of the real world and is subject to all the erratic vagaries associated with any human endeavor. By the time you account for sampling errors, data entry mistakes, programming gaffes, information degradation and general compromises to the business process, you’ve got to expect at least a few issues to crop up. The key is to make sure these problems don’t overwhelm the main reason you’re gathering this information in the first place, which is to make a decision.

Experience has shown that this isn’t how most people look at data, though. They expect perfection and will dismiss almost anything that falls short of their ideal. (Pie charts that don’t add up to 100%? That’s bad data no matter how often you try to explain banker’s rounding!) Unfortunately, this rigid attitude doesn’t acknowledge the fact that there is a cost associated with getting more precise information and sometimes that cost just can’t be justified.

What you should strive for is a system that provides just enough certainty in information to make a good decision. Controlling for errors should be part of the process, of course, with feedback loops to catch the big problems. This is analogous to how architects tackle moisture control when they design buildings. They don’t just try and prevent water from getting into the structure, they also plan for the situation when water does (inevitably) get inside.

No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Donald Trump and the Truth Bubble

“Wherever the people are well-informed they can be trusted with their own government.” — Thomas Jefferson to Richard Price, 1789 Thomas Jefferson’s support of a free press and education for the common people — including entry to the highest levels of instruction (i.e. a college or university) — was based …

Anatomy of an Analysis (Part 2) – The Enrichening

In the first part of this analysis, I turned a short list of movies into a database that could be used to answer basic questions about the list’s contents. Now I’d like to broaden this analysis by combining the original list with additional outside information — a process called data …

Anatomy of an Analysis (Part 1)

A few weeks ago, the BBC News produced a list of the top 100 greatest American films based on input from critics from around the world. Here are the top ten films presented in rank order: Citizen Kane (Orson Welles, 1941) The Godfather (Francis Ford Coppola, 1972) Vertigo (Alfred Hitchcock, …