Python, R or SQL? The tribal ‘war’ of data journalism

Should I learn Python, R or SQL? Or all three? My house would be too small if I invited all the people to dinner that have recently asked me this question.

But if I could invite all these colleagues for a meal, this is what I would do: I would put a knife next to some plates, a fork next to others, and a spoon next to the last plates. But I would provide no-one with a knife and a fork and a spoon. Then I would serve a typical Dutch menu of vegetable soup, potatoes, and steak. Now guess who would complain about the soup, who about the potatoes and who about the steak?

Technically it is possible to eat soup with a knife or a fork. It just takes longer than with a spoon. Attacking a steak with a spoon, on the other hand, is not ideal either, but it can be done. You get the picture. Python, R and SQL are the cutlery we use to satisfy our data hunger. Obviously, having a fork, knife and spoon is the preferred option. But with only one out of three you can still go a long distance in analysing large datasets.
Python is the language of the new generation (at least from my mid-career point of view), wielded by savvy coders and hackers for whom sheer investigative reporting is not exciting enough. I, on the other end of the spectrum, am from the Generation Sequel. Dull, but solid. And only recently I dived into R.

At the Dataharvest last week in Mechelen, hands-on sessions in all three coding languages were programmed. By all accounts Python classes were packed, the R classes were full and for good old SQL four people turned up. But these numbers tell us more about the quality of the participants (craving for the newest!) than of the languages that were taught.

At the NICAR conference in Chicago, in March this year, it dawned on me that the data journalism community is now divided into three tribes and that a tribal war is around the corner. Buttons were distributed with “SQL Team”, “R Team” and “Python Team” to distinguish the expertises. Meant as a helpfull tool, the buttons instead were worn as proud insignia’s. “I had never thought that you were a R-man” I heard one colleague say to another, in a tone that seemed to herald the end of a friendship.

My comparison of the three languages is far from final and you may not agree on all points. But since we are all still in the discovery phase I dare risk saying a few words about the pro’s and con’s of a knife, a fork and a spoon for data crunching:
With Jonathan Stoneman I taught R at Dataharvest and with Helena Bengtsson SQL. In both classes we used the same datasets and the same exercises. The code we typed covered exactly the same amount of pages in both instances, meaning that R is not quicker or slower (to execute) than SQL.

Participants that went to both classes said they thought that SQL code is slightly simpler – and I agree. But then again: R can be extended into the realms of scraping and visualizing, where SQL doesn’t go. People who went to learn Python and R were surprised to find that R offers a more visual experience, with code and tables in the same screen space. R beats both SQL and Python when it comes to workflow management.

From a teachers point of view, Jonathan and I discovered that R-functions can be taught in the same sequence as we have always done with SQL-funcions: from ‘select’ all the way down to ‘order by’. SQL and R are very similar, not just regarding the names of the functions, but also regarding the structure of the queries (‘scripts’ in R).