Nieman Lab Predictions on Journalism 2019

Data journalism goes undercover

Journalism 05/03/2019

This essay was originally published by the Nieman Lab, as part of their annual predictions on journalism series.

It’s my hope more than my prediction that 2019 will be the year in which data journalism goes undercover. All journalists should become data literate and more journalists should learn basic data skills. And having these basic data skills should be as exciting as having what it takes to send an e-mail or make a phone call. Only then data journalists will lose their unicorn status, which allows the field of data journalism to simply disappear in the field of journalism.

Modest journalism

Despite all the beautiful data productions I’ve seen throughout the year, it’s my hope that data productions will eventually go undercover too. As Bill Kovach and Tom Rosenstiel describe in The Elements of Journalism (excerpt by the American Press Institute) the first task of the news journalist is “to verify what information is reliable and then order it so people can grasp it efficiently”. Thinking of all the beautiful but sometimes complex visual data journalism production I’ve seen, I dare to ask if all these forms of storytelling are ‘efficiently to grasp’. Even though I’m a fan of high-end visuals, technological innovation and new forms of storytelling; I feel that visually modest journalism can be just as efficient, if not more.

About the story

Besides, isn’t the best data journalism invisible? Data-driven stories should center around the story, not around the data, analysis or technology that keeps the story afloat. When reading, listening or watching such productions, a possible data visualisation here and there excepted, the public should not be actively thinking about data. If they do, they’re not thinking about the story. Why not? Isn’t the story efficient to grasp? Since journalism creates the map for citizens to navigate society with, we should make sure our maps are readable for all and read by many.

Off course this might be a lot to ask for. But in a world filled with fake news and alternative facts, we can only welcome more fact-based, data-driven journalism. And I think common data knowledge within and outside of journalism, would be a good start.

Winny de Jong is data journalist at the Dutch national broadcast NOS.

Python, R or SQL? The tribal ‘war’ of data journalism

Journalism 27/05/2018

Should I learn Python, R or SQL? Or all three? My house would be too small if I invited all the people to dinner that have recently asked me this question.

But if I could invite all these colleagues for a meal, this is what I would do: I would put a knife next to some plates, a fork next to others, and a spoon next to the last plates. But I would provide no-one with a knife and a fork and a spoon. Then I would serve a typical Dutch menu of vegetable soup, potatoes, and steak. Now guess who would complain about the soup, who about the potatoes and who about the steak?

Technically it is possible to eat soup with a knife or a fork. It just takes longer than with a spoon. Attacking a steak with a spoon, on the other hand, is not ideal either, but it can be done. You get the picture. Python, R and SQL are the cutlery we use to satisfy our data hunger. Obviously, having a fork, knife and spoon is the preferred option. But with only one out of three you can still go a long distance in analysing large datasets.
Python is the language of the new generation (at least from my mid-career point of view), wielded by savvy coders and hackers for whom sheer investigative reporting is not exciting enough. I, on the other end of the spectrum, am from the Generation Sequel. Dull, but solid. And only recently I dived into R.

At the Dataharvest last week in Mechelen, hands-on sessions in all three coding languages were programmed. By all accounts Python classes were packed, the R classes were full and for good old SQL four people turned up. But these numbers tell us more about the quality of the participants (craving for the newest!) than of the languages that were taught.

At the NICAR conference in Chicago, in March this year, it dawned on me that the data journalism community is now divided into three tribes and that a tribal war is around the corner. Buttons were distributed with “SQL Team”, “R Team” and “Python Team” to distinguish the expertises. Meant as a helpfull tool, the buttons instead were worn as proud insignia’s. “I had never thought that you were a R-man” I heard one colleague say to another, in a tone that seemed to herald the end of a friendship.

My comparison of the three languages is far from final and you may not agree on all points. But since we are all still in the discovery phase I dare risk saying a few words about the pro’s and con’s of a knife, a fork and a spoon for data crunching:
With Jonathan Stoneman I taught R at Dataharvest and with Helena Bengtsson SQL. In both classes we used the same datasets and the same exercises. The code we typed covered exactly the same amount of pages in both instances, meaning that R is not quicker or slower (to execute) than SQL.

Participants that went to both classes said they thought that SQL code is slightly simpler – and I agree. But then again: R can be extended into the realms of scraping and visualizing, where SQL doesn’t go. People who went to learn Python and R were surprised to find that R offers a more visual experience, with code and tables in the same screen space. R beats both SQL and Python when it comes to workflow management.

From a teachers point of view, Jonathan and I discovered that R-functions can be taught in the same sequence as we have always done with SQL-funcions: from ‘select’ all the way down to ‘order by’. SQL and R are very similar, not just regarding the names of the functions, but also regarding the structure of the queries (‘scripts’ in R).

DataHarvest+ 2018: presentations, tools and datasets

Resources 27/05/2018

A list of resources – presentations, datasets and tools etc. – mentioned during the DataHarvest+ 2018 European investigative journalism conference. This list is not complete, and will probably be updated later on.

Investigative reporting

  • Strategies to find personal information, by Marcus Lindemann, slides.
  • Gadgets for investigative reporting, by Marcus Lindemann, slides.
  • Can journalism networks help investigations under authoritarian regimes? THe case of Turkey, by Craig Shaw, Sebnem Arsu, and Efe Kerem Sozeri, slides.
  • Don’t fear the robots: 5 reasons to welcome automation, by Leila Haddou and Max Harlow, slides.

Data Journalism

Tools presentations

Python

All Python materials will be collected in this Github repo. For the time being, use these links:

Tools

  • CSVMatch, a command line tool to find (fuzzy) matches between two CSV files, by Max Harlow.
  • Reconcile, a command line tool to enrich data by doing batch lookups against online services, by Max Harlow.
  • Tabula, a tool for liberating data tables stored in PDF’s.
  • Flourish for polished, beautiful datavisualisations.

Datasets

Elvis image by Matej Chudada

Elvis: tracking tax money and revealing corruption

(Open) Data 13/12/2017

Corruption in public spending is a problem in many countries around the world. Elvis, the data platform – not the singer, visualises European pubic spending data to help journalists find fishy relationships between governments and companies.

The platform makes tender data published by the European Commision easily searchable for journalists. Right now there are over 7.5 million rows of data, covering around ten years of public spending in several European countries. “We hope to add data on small tenders, as collected by Digiwhist, later to our platform”, data journalist Adriana Homolova explained at Elvis’ launch.

Tracking our tax money

Isabel Da Rosa, electronic public procurement expert at the European Commission, could not agree more: “It’s time for us to move into a data driven decision process. So many times we’re drwaing legal frameworks on what we think is right, while we don’t have the facts or information to create good legislation.”

Revealing corruption

“If you see company owners who are related to politicians, you know something is wrong”, Hungarian investigative reporter Ágnes Czibik states. But in order to investigate wrongdoings like this, data needs to be open and accessible. Elvis’ superpower is not so much in opening up data that was not available – but in making data accessible that wasn’t accessible before. “In that sense journalists matter a lot”, Czibik states, “journalists need that data to built their cases and get publicity for these cases.”

Public Spending visualisations from a paper by Adriana Homolova

Visualisation from the paper on public spending in Slovakia and The Netherlands by Adriana Homolova

Data difficulties

Currently tender data is published by the EU on the Tender Electronic Daily webpage. Sounds good, but the differences in data published are quite big among countries. “On average 15 percent of the mandotory information is missing in Tender Daily”, Czibik explains. “Next to that data missing, there’s also the treshold problem. Some countries publish all tenders, where other countries only publish tenders above the European treshold of 134.000 euro. We cannot say anything about the public spending universe with so many data missing.”

Charlotte Waaijer, at the time an investigative reporter for the Dutch magazine De Groene, investigated procurements by the Dutch ministery of defense: “It’s good to see data being made more accessible by Elvis and other websites. But during my investigation I found that the Dutch governments cooperates with other countries for missions in Mali or Afghanistan. A lot of the hiring for these missions is done by NATO or the coalition. Hence, a lot of the data I looked for was not available…”

Positive changes

According to Jonathan Huseman, advocacy officer open society at Hivos, the global universe of tender data is just as diverse. “In Malawi my colleagues deal with a lot of paperwork, while Indonesia is well advanced with e-procurement.” According to Huseman health, education and infrastructure are topics close to peoples hearts. “These topics tend to be the first that are tracked by civil society. For instance, in Bolivia children weren’t getting their schoolmeals. Open procurement data revealed which companies lowered the quality of theh food or didn’t deliver at all. Eventhough the same companies are responsible for the schoolmeals, nowadays, the food is actually delivered.” The power of transparency.

Luckily, when open procuement data is open, it can bring positive changes. “In Portugal the government is much more careful in how they use public procurements and which services and products they buy”, Isabel Da Rosa attests. “In Portugal procurement data is made public – even the conclusion on how the money was actually spent is open. Citizens and organisations can see the differences between contracts and reality. Government now needs to justify their costs, which has decreased the difference between contracts and actual money awarded.”

More Elvis

Elvis, the tool that helps you look into public spending, is free to use: create an account here to get started. In case you’re interested in the academic foundation the tool is built upon, you’ve come to the right place. First, there is the masterthesis by Homolova on the application of social network analysis to a network of public institutions and companies that deliver services to them. Or read her paper on the public procurement networks of both Slovakia and The Netherlands.

datajournalism.tools

New: data journalism toolbox for beginners

Journalism, Tools 21/11/2017

Where to start if you’re new to data journalism? Sure, you should probably start with a story, but then what? Nowadays there are so many tools available, it can be hard to see the wood for the trees. Fortunately Datajournalism.tools is there to help you.

Starting in data journalism can be hard enough by itself, therefore Datajournalism.tools is a database with a small selection of beginner friendly data journalism tools. This way the initiators – Hogeschool Utrecht – a Dutch journalism college, and data journalist Winny de Jong – tried to lower the treshold for you to find the tools that suit you. For the same reason each tool in the database includes links to tutorials where applicable.

Applications

The website recognizes four applications where you can choose from: find/collect data, clean data, analyse data, or visualise data. You just pick wether you’d like to scrape or share data (find/collect); make messy dataset ready for analysis by editing mistakes and typos (clean data); analyse data to find your story (analyse); or create maps, charts or network visualisations (visualise data).

datajournalism.tools1

Experience levels

After you’ve picked which application you need, you can set your level of experience. All tools are categorised by application and experience level. This way, you’ll know for sure that you can immediately use the suggested tools. It’s good to know that datajournalism.tools was build for total data journalism beginners. Hence, ‘a lot of experience’ means a lot of experience for beginners in data journalism.

datajournalism.tools2

Sharing

Once you’ve set the application and experience level, the site generates a unique link for these settings. This makes it easier to share a specific selection of tools from datajournalism.tools.

Add a tool

In case you feel the team left out a great, beginner friendly data journalism tool, you can always suggest to add it. Just fill out the form to let us know. Please be aware of the fact that the site is supposed to be feasable for beginners, and therefore the team adds tools sparingly.

‘Are you fucking kidding me?’, and other additions to the five W’s.

Journalism, Journalism 06/02/2017

Referring back to the Five “W”s helps journalists address the fundamental questions that every story should be able to answer. Recent events, however, have shown that traditional journalistic practices might not be working as effectively as they used to. As such, here are a few additions to the Five “W”s that will surely come in handy for today’s journalists.

The new quantitative journalism

Journalism 08/09/2016

Personally I consider this a must read for all in the field of data journalism. Andrew Gelman wonders why there were no skeptical, investigative, quantitative journalists decades ago? A growing lists of professionals answers his question: because of the lack of tools; emerging technologies; or better education that leads to more data literacy. The true gold is to be found in the comments – about the new quantitative journalism.

The Ultimate Guide to Bad Data

Uncategorized 04/09/2016

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them. As a reporter your world is full of data. And those data are full of problems. This guide presents thorough descriptions and suggested solutions to many of the kinds of problems that you will encounter when working with data.