(Read in: Nederlands)
‘There are lies, damned lies and statistics’. You may have heard of the phrase before. In this world of abundant data and the many applications that can analyse this data it might sound counterintuitive. Shouldn't the data provide us with new insights instead of lies? Well yes, from data we can obtain new insights but it can also be misleading if you interpret the data in the wrong way. The results of acting on misleading statistics can be catastrophic, as we have seen during the economic crises of 2008. The statistics financial institutions used to estimate risk, turned out to be completely wrong, causing an economic crisis with huge consequences.
Although being mislead by statistics cannot be completely avoided, knowing how you can be mislead definitely helps to identify and avoid common misinterpretations. Therefore I'll give some fictional examples of misleading statistics I found while analysing recruitment processes. I will also provide ways to avoid these misinterpretations.
Mislead by diagrams
In figures 1 and 2 we see the number of sessions over the past years on a recruitment website. Although both figures are based on the same data, the upwards trend is much clearer in figure 2 than in figure 1. Also, the fluctuations look more extreme in figure 2. Hence we see that by only changing the shape of the chart our perception of the chart changes. The flatter the chart, the more the data looks constant over time, in which trends and outliers are less visible. While charts with a large height exaggerate trends and outliers. Figure 3 shows an example of a Google Analytics chart that we use to analyse traffic over time on our websites. As you might notice, this chart is rather flat and therefore disguises trends and outliers in the data. To avoid being mislead by the shape of charts, make sure that you change the width and height of the chart a few times to consider the plot from a different perspective. Even better is computing numerical statistics such as regression or time serie analysis to identify possible trends in the data.
Mislead by luck
Let's say we run a Google Adwords campaign to stimulate traffic to the recruitment website. With the following keywords, these are the results so far:
Now we make the decision on whether to continue with this campaign and which keywords we should use. Especially the first three keywords show a high conversion rate, therefore we decide to start a more aggressive bidding strategy for these keywords to obtain more applicants. We end up with the following results:
The conversion percentages collapsed completely! Leaving you with less applicants than during your first AdWords campaign and with higher costs. So what happened? We made a decision based on too little observations. This situation is similar to rolling three sixes in 3 rolls with a dice, and based on that assuming that you will roll a hundred sixes in a hundred rolls. The more observations you have, the more the conversion percentage will converge to the actual average conversion percentage for a particular keyword, and this average should be leading in making bidding strategies.
The Bill Gates effect
A difficult question to answer is how many observations (in this case how many clicks) you need. Nassim Taleb explains this problem in his book 'The Black Swan' with the following example. Let's say that we fill a stadium with 100 people and measure their length, from which we obtain an average length of 1.75 meters. Now we add the tallest person on the planet (who is 2.51 meters), what happens to the average length? It increases with only 0.4% to 1.757 meters. Thus a sample size of 100 people is adequate to estimate the average length.
Now we again gather 100 people in the stadium, but this time we measure their net wealth, of which the calculated average is 35.000 US Dollars. Now we add Bill Gates with an estimated wealth of 73 billion, what happens to the average income? The average changes with 20856.14% to an average of 72.2 billion! Thus a sample size of 100 is definitely not enough to measure average wealth.
Let's compare this with the conversion percentage. On a particular day you could have exceptional conversion percentages. Given the number of clicks that we obtained so far we have to ask ourselves: will this have the “Bill Gates effect" or not? If so, we cannot draw conclusions on the conversion percentages we can expect in the future.
What gets measured, gets done
In the past decade recruitment has become more and more similar to marketing. This is reflected in the tools that are commonly used to attract job-seekers: these are usually the same as we use in marketing. Also the tools that we use to track how our vacancies are doing, are largely the same as in marketing. There is however an important difference between selling products and recruitment: the contract between employer-employee is much more demanding than the contract between buyer-seller. This difference should be reflected in the tools we use to analyse our recruitment processes. In recruitment we should combine metrics of quantity: how many applicants, with quality: how well is the applicant-job and applicant-organisation fit? However, because we use analysis tools that are copied from marketing, the quality metric is frequently omitted. Furthermore, because of the 'what gets measured, gets done' principle, this also means that recruitment efforts usually act on measurements of quantity.
Integrating quality and quantity measurements in recruitment analysis tools is vital to obtain a complete overview of the effectiveness of recruitment efforts and to find ways to improve the process. In particular, organisations should start saving data on how they performed a prediction on whether an applicant was suited for the job and how well these predictions turned out in practice. Combining this with quantity metrics such as vacancy lead times, will provide much insight in how to effectively improve recruitment efforts.
As we move further into the information age we will spend more time analysing and interpreting data. Luckily, also the tools we use to analyse this data are becoming better at revealing insights. We should however be regardful of how data and statistics can fool us. This can be done by using common sense, being aware of frequent misinterpretations as I've done here, or by asking for a second opinion from an expert.
Finally, there are some great (non-mathematical) books that describe how we are sometimes fooled by statistics and how to avoid this:
- The Black Swan and Fooled By Randomness by Nassim Taleb
- The Goal by Eliyahu Goldratt
- Thinking Fast and Slow by Daniel Kahneman