The first to share a data horror story is Matt Stiles, data journalist at the NPR, America’s National Public Radio. His story tells you all about an almost misplaced accusation and the problems with government data.
“I once incorrectly identified a low-level government affairs official at a major U.S. utility as the top lobbyist in terms of spending on elected officials in Texas. She wasn’t.”
Too many zeros
“She incorrectly reported spending $2 million in one month rather than $2,000. When I aggregated spending totals over several years for all lobbyists, the $2 million figure put her at the top of the list — but not so much that I suspected anything was incorrect.
I should have checked for each monthly figure reported by each of the top lobbyists, not just their aggregated totals. I would have noticed the $2 million in one month.”
Check, double check
“We made a list for the paper of the top 10 lobbyists. Just before the story ran, I decided to call each lobbyist on that list. She alerted me to her mistake and filed a formal correction with the state’s ethics commission, so I removed her from the list.”
“The error taught me that you can’t trust government data — especially when it’s based on records submitted by people working outside of government. We work a lot with large data sets that mention many names, and it’s still OK to put raw data online if that’s useful to the audience. But anytime someone is mentioned in a story or highlighted in some way, we must contact them and give them the option to correct a mistake or give us context about their data.”
Failing fast seems to be a good option if you want to learn something quickly. But what about learning from mistakes others have made? In the Data Horror Story series data journalists share a mistake for us to learn from. Share your own data horror story here, or read some more horror stories from others.