In statistics an outlier is any data item that is numerically or abnormally distant from other data values in a random sample drawn from a population. You can spot an outlier by creating a graph of the whole data set. Sometimes an outlier is defined as a data point that falls more than 1.5 times the interquartile range above the third quartile or below the first quartile.
There are two activities required to identify outliers. Firstly, review the overall shape of the data set using either a graph to identify features, including symmetry and deviations from assumptions. Secondly, examine the data for abnormal variations that are significantly away from the mass of the data. These data points are normally referred to as outliers.
Where do outliers come from?
The main reason for outliers is measurement error. For example web analytics has not been configured correctly and it is including known bots in the data which leads to spikes in the number of visitors which do not reflect real users.
Another problem might be that the A/B testing script has not been fully integrated and tested with an AA test. This could result in some types of users (e.g. affiliates) not being measured and so we might see a very low conversion rate for such users.
Other reasons for outliers include:
- Chance – In testing there is always the possibility of abnormal behaviour being observed.
- Human error – Errors can occur with data analysis, especially if there is any manual transfer or data entry involved in the process.
- Sampling error – Existing visitors might be included in a test by mistake and this could skew the results as some of these users will have already experienced the default design. It can also be down to insufficient data being collected. This is why it is important to ensure you have a large enough sample size before reporting results.
- Inaccurate reporting – It is not uncommon for A/B testing software not to include certain browsers in a test and so some users will be excluded from the experiment.
What should you do with outliers in conversion rate optimisation?
Outliers, such as bulk orders or other abnormally high transaction values distort conversion rate optimisation test results because reporting is based upon averages. Outliers can potentially skew the outcome as a relatively few users will have a disproportionate effect on the test result and they will hide other differences in the conversion rate that could be valuable to understand. For this reason it is important to remove outliers from test results.
Even in sectors such as gaming, where a small number of users generate a majority of revenue, it is a mistake not to remove outliers. As I explained in a post on whether to optimise your site for your best customers, such players have been through a process which has changed their characteristics. If you focus on your most valuable customers you would suffer the effects of survivorship bias.
It is important to look out and investigate outliers as they may be a symptom of a problem with your optimisation tool set. Where possible outliers should be removed because they result is bias in reporting and can hide important variations in individual customer segments.
Conversion marketing – Glossary of Conversion Marketing.
Over 300 tools reviewed – Digital Marketing Toolbox.
A/B testing software – Which A/B testing tools should you choose?
Types of A/B tests – How to optimise your website’s performance using A/B testing.