Multi-armed bandits seek to aggressively optimise content

Are Bandit Tests Superior To A/B Tests?

What is multi-armed bandit testing?

Multi-armed bandit testing (MAB) uses an algorithm to proactively seek out the best performing experience and aggressively optimises to increase the average conversion rate during a test. This means that you can earn and learn simultaneously as traffic is automatically switched to the variant with the highest conversion rate.

What is a bandit?

A bandit is another name for a slot machine. Imagine that you were in Vegas with a limited budget and time to play a selection of slot machines with different pay-outs. Multi-armed bandit (MAB) testing seeks to maximise your winnings by trying to work out which slot machine has the highest pay out and automatically adjusts resources (i.e. traffic) to optimise revenues.

This is very different from A/B testing where traffic is evenly split between each variant. In the example below there are two variants (a control and a challenger) and so each receives a 50% of all traffic from the beginning to the end of the test.

In a multi-armed bandit test for 10% of the time the traffic is split equally between the two variants (the exploration phase). For the remaining 90% of the test though it sends traffic to the best performing variant (the exploitation phase). MABs also provide the option to weight traffic according the estimated value of different variants from the beginning of the test. This is simply based upon a best guess approach. Below is an example of potential weights for a three variant experiment.

Image Source:
Image Source:

 

What about statistical confidence?

MABs aggressively optimises for the best performing variant by sending little traffic to the worst performing variant during the explorative phase. However, this will usually occur before full statistical confidence is obtained and so we may not be able to identify whether a   variant is indeed the worst performing variant or whether it’s just down to chance. This means that it will require a lot more traffic to reach full statistical confidence for poorly performing variants in an MAB test and thus take longer to get a conclusive result.

What assumptions do MAB’s make?

Most multi-armed bandit algorithms make a number of assumptions about conversion rates.

  • Serving a variant and observing a conversion happen instantaneously. This means that MABs are not suitable for email marketing or where there is a significant time-lag between when a customer sees a variant and the conversion occurring.
  • Conversion rates are fairly constant and don’t significantly change over time. If your conversion rate is subject to substantial fluctuations over time due to factors such as the weather or other seasonal factors then MABs may not be appropriate.
  • Samples in MABs are independent of each other and so don’t influence the conversion rate.

What are the benefits of MABs?

  • Exploit winning variants: MABs generally achieve a higher average conversion rates during the test period. They allow you to reduce the opportunity cost of testing by allowing for a smooth transition from exploration to exploitation to increase revenues.
Image of AB test compared to multi-armed bandit
Image source:

 

  • Automate optimisation: MABs allow you to automate the optimisation process with machine learning so that low performing variants can be dropped and traffic can be channelled towards the best revenue generating variant.
  • Continuous optimisation: Where you are frequently adding or removing variants to be tested it provides the flexibility that A/B testing is not designed for. If you want to add new variants to replace low performing experiences during the testing process MABs facilitate this. They also work well with targeting specific ads or content to customer segments.
  • Innovation tests: MABs perform best when there is a very large difference in the conversion rates of different variants. MABs are therefore best suited for optimisation when you have radically different experiences where you might expect to see big differences in the conversion rates of each variant.
  • Persuasive  profiling: MAB’s are suitable for persuasive profiling so that you identify what content works best for a particular personality trait.
  • Time is not a priority: When you are not in any rush to identify the best performing variant and want to optimise the average conversion rate MAB can be a suitable tool.

Disadvantages:

  • Traffic greedy: MABs require more traffic and more time to reach full statistical confidence. If you are not bothered about the average conversion rate during the test and need a speedy, but conclusive test result then A/B testing is probably the right methodology for you.
  • Needs large differences: When there is little difference between the conversion rate for each variant the benefit of multi-armed bandits disappear. This is a concern as we know from experience it is almost impossible to predict how much a difference a new design or heading will make to the conversion rate. The danger is that our own subjective opinions and biases come into play here which is what experimentation is designed to avoid.
  • More room for error: As bandits begin switching traffic before full statistical confidence is reached there is more danger that a variant that is performing better purely by chance will be selected as the winning experience. Conversely a variant that is initially performing poorly due to chance is more likely to be dropped by the algorithm and revenues lost.
  • Implementation is not easy: Setting up MABs is technically challenging as you may need a data scientist to advise on how to integrate and scale the code and a developer to program the test.

So when should you use MABs?

Multi-armed bandits are best suited to the following campaigns:

  • When you want to simultaneously explore and exploit an optimisation opportunity.
  • Optimising radically different variants where there is a need to begin exploiting the best performing experience without delay.
  • Headlines and short-term campaigns, particularly if the content has a limited time span.
  • Automation for scale.
  • Targeting to understand how different customer segments respond to content.
  • Combining optimisation with attribution. By including a bandit algorithm on your website and in your call centre automated software you can seek to optimise across multiple touch points.

Conclusion:

Multi-armed bandit algorithms are not an alternative to A/B testing as they are designed for different roles in the optimisation toolkit. A/B testing is excellent for conducting online experiments to identify the best performing variant with a high degree of statistical confidence. MABs are more suited to continuous optimisation and short-term campaigns where the objective is to achieve a high average conversion rate. Ideally you would want to use both A/B and MAB testing as part of a comprehensive optimisation program.

Thank you for reading my post and if you found it useful please share using the social media icons on the page.

You can view my full Digital Marketing and Optimization Toolbox here.

To browse links to all my posts on one page please click here.

  • Neal has had articles published on website optimisation on Usabilla.com  and as an ex-research and insight manager on the GreenBook Blog research website.  If you wish to contact Neal please send an email to neal.cole@outlook.com. You can follow Neal on Twitter @northresearch, check out the Conversion Uplift  Facebook page or connect on LinkedIn.