Multi-armed bandits seek to aggressively optimise content
By - Neal Cole

Are Multi-Armed Bandit Tests Superior To A/B Tests?

Multi-armed bandit tests seek to aggressively optimise content


What are multi-armed bandit tests?


Multi-armed bandit tests (MAB) uses an algorithm to proactively seek out the best performing experience and aggressively optimises to increase the average conversion rate during a test. This means that you can earn and learn simultaneously as traffic is automatically switched to the variant with the highest conversion rate.

What is a bandit?


A bandit is another name for a slot machine. Imagine that you were in Vegas with a limited budget and time to play a selection of slot machines with different pay-outs. Multi-armed bandit (MAB) tests seek to maximise your winnings by trying to work out which slot machine has the highest pay out and automatically adjusts resources (i.e. traffic) to optimise revenues.


This is very different from A/B testing where traffic is evenly split between each variant. In the example below there are two variants (a control and a challenger) and so each receives a 50% of all traffic from the beginning to the end of the test.


Multi-armed bandit tests work so that  for 10% of the time the traffic is split equally between the two variants (the exploration phase). For the remaining 90% of the test though it sends traffic to the best performing variant (the exploitation phase). Multi-armed bandit tests also provide the option to weight traffic according the estimated value of different variants from the beginning of the test. This is simply based upon a best guess approach. Below is an example of potential weights for a three variant experiment.


Image Source:
Image Source:


What about statistical confidence?


Multi-armed bandit tests aggressively optimises for the best performing variant by sending little traffic to the worst performing variant during the explorative phase. However, this will usually occur before full statistical confidence is obtained and so we may not be able to identify whether a   variant is indeed the worst performing variant or whether it’s just down to chance. This means that it will require a lot more traffic to reach full statistical confidence for poorly performing variants in an MAB test and thus take longer to get a conclusive result.


What Assumptions Do Multi-Armed Bandit Tests Make?


Most multi-armed bandit tests use algorithms which make a number of assumptions about conversion rates.

      • Serving a variant and observing a conversion happen instantaneously. This means that multi-armed bandit tests are not suitable for email marketing or where there is a significant time-lag between when a customer sees a variant and the conversion occurring.
      • Conversion rates are fairly constant and don’t significantly change over time. If your conversion rate is subject to substantial fluctuations over time due to factors such as the weather or other seasonal factors then MABs may not be appropriate.
      • Samples in MABs are independent of each other and so don’t influence the conversion rate.

What Are The Benefits of Multi-Armed Bandit Tests?


      • Exploit winning variants: MABs generally achieve a higher average conversion rates during the test period. They allow you to reduce the opportunity cost of testing by allowing for a smooth transition from exploration to exploitation to increase revenues.
Image of AB test compared to multi-armed bandit
Image source:


      • Automate optimisation: MABs allow you to automate the optimisation process with machine learning so that low performing variants can be dropped and traffic can be channelled towards the best revenue generating variant.
      • Continuous optimisation: Where you are frequently adding or removing variants to be tested it provides the flexibility that A/B testing is not designed for. If you want to add new variants to replace low performing experiences during the testing process MABs facilitate this. They also work well with targeting specific ads or content to customer segments.
      • Innovation tests: MABs perform best when there is a very large difference in the conversion rates of different variants. MABs are therefore best suited for optimisation when you have radically different experiences, such as in an innovation test, where you might expect to see big differences in the conversion rates of each variant.
      • Persuasive  profiling: MAB’s are suitable for persuasive profiling so that you identify what content works best for a particular personality trait.
      • Time is not a priority: When you are not in any rush to identify the best performing variant and want to optimise the average conversion rate MAB can be a suitable tool.



      • Traffic greedy: MABs require more traffic and more time to reach full statistical confidence. If you are not bothered about the average conversion rate during the test and need a speedy, but conclusive test result then A/B testing is probably the right methodology for you.
      • Needs large differences: When there is little difference between the conversion rate for each variant the benefit of multi-armed bandits disappear. This is a concern as we know from experience it is almost impossible to predict how much a difference a new design or heading will make to the conversion rate. The danger is that our own subjective opinions and biases come into play here which is what experimentation is designed to avoid.
      • More room for error: As bandits begin switching traffic before full statistical confidence is reached there is more danger that a variant that is performing better purely by chance will be selected as the winning experience. Conversely a variant that is initially performing poorly due to chance is more likely to be dropped by the algorithm and revenues lost.
      • Implementation is not easy: Setting up MABs is technically challenging as you may need a data scientist to advise on how to integrate and scale the code and a developer to program the test.

When Should You Use Multi-Armed Bandit Tests?


Multi-armed bandit tests are best suited to the following campaigns:

      • When you want to simultaneously explore and exploit an optimisation opportunity.
      • Optimising radically different variants where there is a need to begin exploiting the best performing experience without delay.
      • Headlines and short-term campaigns, particularly if the content has a limited time span.
      • Automation for scale.
      • Targeting to understand how different customer segments respond to content.
      • Combining optimisation with attribution. By including a bandit algorithm on your website and in your call centre automated software you can seek to optimise across multiple touch points.



Multi-armed bandit tests are not an alternative to A/B testing as they are designed for different roles in the optimisation toolkit. A/B testing is excellent for conducting online experiments to identify the best performing variant with a high degree of statistical confidence. MABs are more suited to continuous optimisation and short-term campaigns where the objective is to achieve a high average conversion rate. Ideally you would want to use both A/B and MAB testing as part of a comprehensive optimisation program.


Thank you for reading my post. Please leave feedback below because it helps us improve the quality of our content.

  • About the author:  Neal  (@northresearch) provides web analytics and CRO consultancy services and has worked in many sectors including financial services, online gaming and retail. He has helped brands such Hastings Direct, Manchester Airport Group Online and Assurant  Solutions Ltd to improve their digital marketing measurement and performance.


Call Now ButtonCall Me Now!