5 mins

How often do you receive a targeting brief with an audience like this? 

  • Markets: UK, France, Italy, US, Germany
  • Gender: 60% Male, 40% Female
  • Age Range: 18-35

In the example above, we might guess that insights have been gleaned from an overall view of the data due to the lack of overlap between the variables. This means that these conclusions are susceptible to various statistical fallacies that occur when not properly segmenting data.

As a marketing agency, we regularly receive briefs based on a client’s analysis of their customer data to inform our campaign targeting. However, we aren’t always certain that the audiences provided will produce the best results. Data can be strange, especially if the analysis is surface level or oversimplified.

Simpson’s Paradox

One of the most prevalent of these fallacies is the Simpson’s Paradox (also known as the Yule–Simpson effect). Simpson’s Paradox is a phenomenon in statistics in which a trend seen in different groups of data reverses or disappears when the groups are combined.

This paradox is one of the oddest occurrences in statistics and highlights the need for scepticism when interpreting data for real-world applications. Failing to account for it can severely impact your marketing, waste money and negatively affect user experience.

You can read more about the specifics of the Simpsons Paradox here: https://arkeagency.com/news/simpsons-paradox-is-your-data-telling-the-truth/

Continuing with the example above, let’s take a closer look at the gender targeting suggested. When observing the total income from purchases made by gender, we understand how the client has reached the prospective split in targeting:

We see that just over 60% of all purchase income is from male customers, so it would be reasonable to focus a higher percentage of the budget on male prospects.

However, there’s a hidden story in this data. Is that true for all audience segments? We investigate further and observe purchases by gender in each of the 5 given markets:

In fact, when we look at the data more granularly, we see that in all markets but the UK we find the opposite result to when we looked overall, with more of the purchase income generated by women. If we had taken the insights at face value and focused on male prospects in these markets, it could have led to low performance and increased costs.

In this case, Simpson’s Paradox has been caused by a large proportion of male customers in the UK market. The UK is the largest market and so the high number of purchases from males has skewed the overall result when the market segments are combined.

We call the market variable a ‘confounding’ or ‘associated’ variable when considering gender. It is important to take as many of these associated variables into account when reaching decisions from your data.

The impacts of Simpson’s Paradox

As demonstrated, Simpson’s Paradox is especially dangerous in a commercial company or any sector that requires predictions of a person’s intent or behaviour. Incorrect insights may be more readily accepted if the conclusion aligns with our own biases. For example, if the company in question sold boxing equipment, the heavier focus on male prospects could seem reasonable due to existing beliefs and the evidence for this conclusion would be less likely to be interrogated.

In this instance, our bias was correct for the UK but in all other markets, women outperformed men.

Overcoming Simpson’s Paradox

It is vital to identify segments and variables for which an overall analysis produces incorrect results. Targeted variables must be broken out as much as possible, not just in performance reporting but in campaign setup and planning as well. Analysing granular segments is key to combating this common mistake.

There’s added complexity to the campaign for each variable accounted for, and there’s a point where optimisations and setup become overly cumbersome. Too many different audiences can also stretch budget and mean small sample sizes for analysis. Discover key associated variables and ensure that a campaign’s complexity is not disproportionate to the budget.

It’s also important to question insights you are given. Make sure you understand where data has come from, how the analysis was designed and how a conclusion was arrived at. Even an insight that an organisation has historically understood for years may be incorrect if it hasn’t been investigated in an in depth manner.

It can be daunting to tackle this paradox.

  • Too little or too much data can make it difficult to reach proper conclusions or fully uncover all the associated variables.
  • Start with key variables you know you can control and have enough data for statistically significant insights.
  • Going just one level deeper (breaking out one confounding variable) can yield surprising results and greatly impact your planning and performance, as shown in this example.
  • Keep reassessing your strategy but do not be too hasty.
  • Make sure you have a large enough sample size to prove your assertions and once you have that, don’t be afraid to act on new data.

Stay sceptical, interrogate data and find the hidden patterns beneath the surface for truer and powerful insights that you can leverage for your marketing needs.

Get in touch with our data science & analytics experts to find out how to tell stories from your data and make data-based decision making to transform your organisation.


Let's talk

More News