The Data Uncertainty Principle

Every now and then I’ll finish a big reporting project and the project lead will send me a follow-up comment like:

“Here’s hoping everything’s perfect!”

For some reason this always strikes me as a very weird thing to say. Of course it isn’t going to be perfect … it’s data. Data is only a rough approximation of the real world and is subject to all the erratic vagaries associated with any human endeavor. By the time you account for sampling errors, data entry mistakes, programming gaffes, information degradation and general compromises to the business process, you’ve got to expect at least a few issues to crop up. The key is to make sure these problems don’t overwhelm the main reason you’re gathering this information in the first place, which is to make a decision.

Experience has shown that this isn’t how most people look at data, though. They expect perfection and will dismiss almost anything that falls short of their ideal. (Pie charts that don’t add up to 100%? That’s bad data no matter how often you try to explain banker’s rounding!) Unfortunately, this rigid attitude doesn’t acknowledge the fact that there is a cost associated with getting more precise information and sometimes that cost just can’t be justified.

What you should strive for is a system that provides just enough certainty in information to make a good decision. Controlling for errors should be part of the process, of course, with feedback loops to catch the big problems. This is analogous to how architects tackle moisture control when they design buildings. They don’t just try and prevent water from getting into the structure, they also plan for the situation when water does (inevitably) get inside.

Leave a Reply

Your email address will not be published. Required fields are marked *