The Data Uncertainty Principle

Every now and then I’ll finish a big reporting project and the project lead will send me a follow-up comment like:

“Here’s hoping everything’s perfect!”

For some reason this always strikes me as a very weird thing to say. Of course it isn’t going to be perfect … it’s data. Data is only a rough approximation of the real world and is subject to all the erratic vagaries associated with any human endeavor. By the time you account for sampling errors, data entry mistakes, programming gaffes, information degradation and general compromises to the business process, you’ve got to expect at least a few issues to crop up. The key is to make sure these problems don’t overwhelm the main reason you’re gathering this information in the first place, which is to make a decision.

Experience has shown that this isn’t how most people look at data, though. They expect perfection and will dismiss almost anything that falls short of their ideal. (Pie charts that don’t add up to 100%? That’s bad data no matter how often you try to explain banker’s rounding!) Unfortunately, this rigid attitude doesn’t acknowledge the fact that there is a cost associated with getting more precise information and sometimes that cost just can’t be justified.

What you should strive for is a system that provides just enough certainty in information to make a good decision. Controlling for errors should be part of the process, of course, with feedback loops to catch the big problems. This is analogous to how architects tackle moisture control when they design buildings. They don’t just try and prevent water from getting into the structure, they also plan for the situation when water does (inevitably) get inside.

No Comments

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

The Short-Circuiting of the American Mind (Part 3: The OCDN Doom Loop)

“Just remember, what you’re seeing and what you’re reading is not what’s happening” – Donald Trump If we accept the premise that American society has intentionally damaged its ability to make decisions, we can return to John Boyd’s OODA framework to see exactly how various political, cultural, and technological forces …

Lexi-Conflict: Harris vs Pence

Another fun debate! Since I already had the methodology in place from my evaluation of the Trump v Biden debates, it seemed like a logical step to tackle the vice-presidential debate as well. The same basics apply here: transcript from the The Rev and the text inspector tool from the …

Lexi-Conflict: Trump vs Biden

The political circus surrounding the U.S. election has already moved on to something more interesting but I wanted to take a look at last week’s presidential debates from a lexicological standpoint. Full disclosure: I didn’t actually watch the entire debate in real time because I value my sanity. However, I …