How not to be wrong

At the intersection of data and journalism, lots can go wrong. Taking only precautions might sometimes not be enough. “It’s very well possible that your story is true but wrong”, New York Times data journalist Robert Gebeloff explained at the DataHarvest+ conference.

TL;DR: click here for a how not to be wrong checklist based on Gebeloff’s presentation.

“When I work on a big story, I want to know everything about the topic.” To make sure he doesn’t miss out, Gebeloff there fore gets all relevant data sources; examines the data in all relevant ways; and publish only what he beliefs to be true.

Half true is false

But this approach is not totally foolproof. “In data journalism, we cannot settle for ‘half-true’. Anything short of true is wrong – and we cannot afford to be wrong.” Unlike factchecking websites like Politifact, that invented ‘scales’ for truthfulness, from false to true and everything in between, data journalism should always be true.

Image Politifact Truth-o-Meter

Scale from the Politifact Truth-o-Meter

True but wrong

Even when your story is true, according to Gebeloff you could still be wrong. “You can do the math correctly, but get the context wrong, fail to acknowledge uncertainties or not describe your findings correctly.”

“You can get the context wrong, not acknowledge uncertainties or not describe your findings correctly”

Gebeloff mentions a story on asylum judges appointed by Bush who, under political litmus test were far more likely to reject asylum seekers. The original premise was based on the fact that 11 of the judges said ‘no’ at a higher rate than their peers. While this thesis seemed true, it was also wrong, according to Gebeloff: “I thought we had to ditch the 11 of 16 being above average as our best number. Two of the eleven have differences that ar not statistically significant, so we should not portray them as above average.” To calculate statistical significance you can use a Chi Square formula. This formula accounts for the difference, but also for the sample size. “Because our premise was based on a small sample size that was accounted for, it was true but wrong.”

Fancy math

When working on a story, journalists should consider whether they use ‘fancy math’ – think statistics, or ‘standard math’. “Using ‘fancy math’ you can explore complex relationships, but at the same time your story will be harder to explain.”

Using ‘fancy math’ might be necessary to find or report a story. But if you cannot explain what you’ve done to your readers, your audience probably will be skeptical. “Sometimes I don’t do a story because I could never explain the methods I’d need to my audience.”

Targets as a source

For the New York Times project ‘Race behind Bars’ (part II) Robert Gebeloff and collegues questioned racial bias methodically, using data. Since haters gonna hate, they did the calculations in every way they could think of. It shows in this paragraph:

In most prisons, blacks and Latinos were disciplined at higher rates than whites – in some cases twice as often, the analysis found. They were also sent to solitary confinement more frequently and for longer durations.

Image The Scourge of Racial Bias in New York State’s Prisons The New York Times

Screenshot of a New York Times Race Behind Bars production.

To make sure you’re not going to be wrong, you should share your findings. “Don’t just share findings with experts, share them with hostile experts too”, Gebeloff advises. “Use your targets as a source. If there’s a blowback, you want to know before publication –
and include the blowback in the publication.”

How not to be wrong checklist

Why you want to use this: a half truth is false, and data journalism should always be true. But just being true is not enough: you’re story can be mathematically true but wrong in context or explanation. You should want your stories to be true and not wrong.

  1. Check your data carefully:
    •   pay attention to dates
    •   check for spelling and duplicates
    •   identify outliers
    •   statistical significance alone is not news.
    •   prevent base year abuse: if something is a trend, ti should be true in general, not just if you cherrypick a base year.
    •   make sure your data represents reality
  2. As you work, keep a data diary that records what you’ve done and how you’ve done it. You should be able to reproduce your calculations.
  3. Make sure you explain the methods you used – your audience should be able to understand how you find a story.
  4. Play offense and defense simultaneously. Go for the maximum possible story, but at all times think of why you might be wrong, or what your target would say in response.
  5. Use your targets as a source to find blowbacks before publication.
  6. As part of the proofing process, create a ‘footnotes file’: ID each fact and give it a number. Than list for each fact which document it came from, how you know it, and what’s the proof. Fix what needs to be fixed.
Additional links of interest: the slides of Robert Gebeloffs how not to be wrong presentation, and the methodology notes and data from the series on discipline and parole in New York State.