4 mistakes in data journalism – and how to avoid them

It may sometimes feel as though data journalism is inherently more objective than other types of reporting. Numbers can’t lie, right?


There are lots of ways of tricking your audience or even yourself when working with data. It needn’t even be malicious. Having spent the past year studying data journalism, I’ve had plenty of opportunities to discover first-hand that it’s all too easy to make mistakes that skew your results completely.

So without further ado, here are the four biggest problems I’ve encountered with bad data journalism over the past year.

1. A lack of context or proportion

Numbers are meaningless without some context. This rarely becomes more obvious than in news reports on spending, where this problem crops up on a regular basis.

“Taxpayers paying more than $1 billion for illegal immigrant children,” headlines yell out. “Benefits spending up £6.4 billion.” The figures sound outrageous, astronomical even. It’s tempting to want to splash on them. But public spending figures have a tendency to be, well, astronomical. Put it into context: Break it down per person and you may find that in fact they’re totally reasonable.

What’s the lesson here? Proportions tell us more than absolute numbers, to be sure. But they’re not always the right way to go, either. Think about your data and how to represent it most faithfully.

Guardian data journalist James Ball recommended in a lecture that all data journalists put together some basic figures to avoid making stupid mistakes and have an easier time spotting what’s reasonable and what isn’t: How many people of working age are there in the UK? What’s the average salary? What’s the employment rate? Et cetera. Not a bad suggestion.

2. Correlation does not equal causation

If you know one thing about statistics, it’s likely to be this. Correlation and causation are two very different things.

However, this is also something that newsrooms ignore all the time. But just because you have two variables that correlate – don’t automatically assume you’ve got a scoop. This could equally be caused by some other, underlying variable. Or just be a total coincidence.

Correlation between Internet Explorer's market share and murder rate

Seems legit. (Photo via Gizmodo)

The relationship between Internet Explorer’s market share and the murder rate is a personal favourite. Check out Spurious Correlations for more (don’t blame me when you realise you’ve wasted an afternoon there, though!).

3. Not knowing how to visualise it

Okay, this really deserves a post of its own. Or several. But for now, this will have to do.

You’ve done your data analysis, you’ve got a cracking story. But a poor visualisation may leave viewers confused. Or worse, misled.

3d pie chart

Please don’t do this. (Photo via Business Insider)

Maybe you’re using line charts to show discrete data (don’t). Maybe you’re trying out some funky 3D pie charts (DON’T). Or maybe you’re just becoming part of that eternal debate on whether it’s ever, ever okay to truncate the y-axis.

Data visualisation’s both an art and a science, and there are many potential pitfalls. Here are some good guides on how to avoid them:

4. Forgetting the narrative

This is the most important point, in my opinion:

Data journalism gives us the power to explore topics quantitatively. But it’s still journalism, which means it’s still storytelling. If you’re just tossing out a bag of random figures, you’re not doing your job properly. They’re just the starting point. Now, you need to guide your readers through the story. You need to make them understand why those figures are important and how.

As  Tanveer Ali puts it in the Columbia Journalism Review:

“Numbers are a means of storytelling – not the story itself.”



