A list of resources – presentations, datasets and tools etc. – mentioned during the DataHarvest+ 2018 European investigative journalism conference. This list is not complete, and will probably be updated later on.
- Strategies to find personal information, by Marcus Lindemann, slides.
- Gadgets for investigative reporting, by Marcus Lindemann, slides.
- Can journalism networks help investigations under authoritarian regimes? THe case of Turkey, by Craig Shaw, Sebnem Arsu, and Efe Kerem Sozeri, slides.
- Don’t fear the robots: 5 reasons to welcome automation, by Leila Haddou and Max Harlow, slides.
- Trends in data journalism, by Marianne Bouchart, slides
- How to find stories in public data, by Simon Woerpel, data + information.
- Statistical pitfalls in the news, by Maarten Lambrechts, slides.
- Everything you need to know to become a world class data journalist — in one fabulous session, by Robert Gebeloff, slides.
- Data tools for everyday journalism, by Maarten Lambrechts, slides.
- Finding needles in haystacks with fuzzy matching (command line tool), by Max Harlow, slides, github.
- Command line magic, by Simon Woerpel, blogpost.
- Batch geocoder for journalists by LocalFocus, presented by Yordi Dam.
- Introduction to mapping with QGIS, by Robert Gebeloff, hand-out.
- Graph Databases 1, by Leila Haddou and Max Harlow, slides.
All Python materials will be collected in this Github repo. For the time being, use these links:
- Learning the Python basics, by Adriana Homolova and Winny de Jong, notebook.
- How to load and manipulate files in Python, by Robert Gebeloff, empty notebook, notebook with answers.
- Scraping websites, by Barnaby Skinner, data + notebook.
- Analysing tens of thousands of files in one go, by Barnaby Skinner, data + notebook.
- Python Pandas: analysing tender data, by Adriana Homolova, notebook, cheatsheet + data.
- CSVMatch, a command line tool to find (fuzzy) matches between two CSV files, by Max Harlow.
- Reconcile, a command line tool to enrich data by doing batch lookups against online services, by Max Harlow.
- Tabula, a tool for liberating data tables stored in PDF’s.
- Flourish for polished, beautiful datavisualisations.
- Public available data portals, a list by Simon Woerpel.
- Elvis, public procurement data visualised in a network, by Adriana Homolova and team.
- Lists of subsidiaries from SEC filings – great for researching multinational companies, via Helena Bengtsson.
- European regional data, presented by Peter Sherlock and John Walsh. slides and factsheet