Epic Data Blog

Correlation vs Causality

Written by Geert De Laet | May 13, 2022 12:04:48 PM
In business, it is often hard to be 100% sure that the correlation you are seeing is also a causality.

With summer coming to an end, we can all think back to a time when we were eating a delicious ice cream in the sun. We all know that it is not the healthiest snack there is, but according to the data, there might be some unexpected benefits of eating less ice cream. During the 2009 – 2018 period the consumption of ice cream per capita in the US decreased from 6.30 kg (13.9 lbs) to 5.35 kg (11.8 lbs). During that same period, the labor force unemployment also decreased from 8.8% to 4.3%. Good for a correlation of nearly 0,90. So let's all show some determination and eat even less ice cream to get unemployment to zero, right? Well not exactly.

 
 

Source: https://coincidentalcorrels.wixsite.com/scotw

 

What exactly is correlation?

 Let’s first dive a bit deeper into what correlation stands for. Correlation is a statistical measure that indicates how two quantitative variables move in relation to each other. Even though others exist, Pearson’s correlation coefficient is the most used. The higher (the absolute value of) the coefficient, the higher the (linear) correlation between both variables. The more ice creams you eat, the higher your blood sugar level, ergo: a high (positive) correlation.

Why not to focus only on correlation

What correlation doesn’t do, however, is giving an explanation on why both variables move in that direction to each other. In other words, correlation measures association, but doesn't show if x causes y or vice versa. The association is perhaps caused by a third–unknown–factor, but a correlation does not take that into account either.

By just looking at the graph you could even turn the reasoning upside down. Because of a lower unemployment percentage, people are working harder and have no time to enjoy ice cream.

You could also argue that, as more people have a working salary and therefore are more able to afford luxuries and ice cream, sales should have actually gone up.

Look for causes in correlation

In reality, both variables are correlated with each other by nothing more than simple randomness.
There are for both variables multiple underlying reasons why they have been decreasing from 2009 to 2018. For ice cream consumption, the price of raw materials such as milk could have gone up, resulting in higher ice cream prices; the ice cream market might be under stress due to new products such as frozen yoghurt; or there is a general trend moving towards healthier food among the population.

Regarding unemployment, on the other hand, additional jobs matching the skill set of the unemployed might have been created. Also remember that in 2009, we had just survived the financial crisis, leaving us with an all-time high unemployment rate, that was gradually lowered again under the Obama presidency.

As you can see, the underlying reasons for the decreasing numbers are (most probably) not at all related for both variables. There is, to our best knowledge, no causal relationship between ice cream sales and the unemployment rate. You can thus comfortably keep eating ice cream, without worrying about single-handedly crashing the economy.

How to look for causality in data

We must however be careful in day-to-day activities to not see causality when there is none. Let's say the manager of a grocery store runs an advert for ice cream: 2 + 1 for free. At the end of the week, he asks you to do an analysis of the sales data and the manager is over the moon with the result. Not only because you did a splendid job, but there is an increase of 400% in ice cream sales!

The real question is: which part of this increase has a casual relation with the advert? The weather might have been a sunny 27 degrees instead of the rainy 15 degrees of last week. Competitors in the neighbourhood might have had delivery issues and were already out of ice cream in the middle of the week, leading all those ice-hungry consumers directly to your store.

In business, it is often hard to be 100% sure that the correlation you are seeing is also a causality. This is where expert knowledge comes into play. It is paramount to present data analysis results with the necessary disclaimers and the assumptions made to business people.

This is also why good decision-making lies somewhere in the middle between gut feeling and fully automated. Some activities can definitely be automated by an intelligent agent such as AI, but the most value that you can get out of data as a company, lies in empowering your people. By giving them the right information at the right time, they can combine this with their expert knowledge built up over the years to optimize the decision making, in order to reduce cost or increase revenues.

 

Do you want to empower your people to make the right decisions based on data?
👉 Get in touch with us.