Are Bandit Tests Superior To A/B Tests?

What is multi-armed bandit testing?

Multi-armed bandit testing (MAB) uses an algorithm to proactively seek out the best performing experience and aggressively optimises to increase the average conversion rate during a test. This means that you can earn and learn simultaneously as traffic is automatically switched to the variant with the highest conversion rate.

What is a bandit?

A bandit is another name for a slot machine. Imagine that you were in Vegas with a limited budget and time to play a selection of slot machines with different pay-outs. Multi-armed bandit (MAB) testing seeks to maximise your winnings by trying to work out which slot machine has the highest pay out and automatically adjusts resources (i.e. traffic) to optimise revenues.

This is very different from A/B testing where traffic is evenly split between each variant. In the example below there are two variants (a control and a challenger) and so each receives a 50% of all traffic from the beginning to the end of the test.

In a multi-armed bandit test for 10% of the time the traffic is split equally between the two variants (the exploration phase). For the remaining 90% of the test though it sends traffic to the best performing variant (the exploitation phase). MABs also provide the option to weight traffic according the estimated value of different variants from the beginning of the test. This is simply based upon a best guess approach. Below is an example of potential weights for a three variant experiment.

Image Source:
Image Source:


What about statistical confidence?

MABs aggressively optimises for the best performing variant by sending little traffic to the worst performing variant during the explorative phase. However, this will usually occur before full statistical confidence is obtained and so we may not be able to identify whether a   variant is indeed the worst performing variant or whether it’s just down to chance. This means that it will require a lot more traffic to reach full statistical confidence for poorly performing variants in an MAB test and thus take longer to get a conclusive result.

What assumptions do MAB’s make?

Most multi-armed bandit algorithms make a number of assumptions about conversion rates.

  • Serving a variant and observing a conversion happen instantaneously. This means that MABs are not suitable for email marketing or where there is a significant time-lag between when a customer sees a variant and the conversion occurring.
  • Conversion rates are fairly constant and don’t significantly change over time. If your conversion rate is subject to substantial fluctuations over time due to factors such as the weather or other seasonal factors then MABs may not be appropriate.
  • Samples in MABs are independent of each other and so don’t influence the conversion rate.

What are the benefits of MABs?

  • Exploit winning variants: MABs generally achieve a higher average conversion rates during the test period. They allow you to reduce the opportunity cost of testing by allowing for a smooth transition from exploration to exploitation to increase revenues.
Image of AB test compared to multi-armed bandit
Image source:


  • Automate optimisation: MABs allow you to automate the optimisation process with machine learning so that low performing variants can be dropped and traffic can be channelled towards the best revenue generating variant.
  • Continuous optimisation: Where you are frequently adding or removing variants to be tested it provides the flexibility that A/B testing is not designed for. If you want to add new variants to replace low performing experiences during the testing process MABs facilitate this. They also work well with targeting specific ads or content to customer segments.
  • Innovation tests: MABs perform best when there is a very large difference in the conversion rates of different variants. MABs are therefore best suited for optimisation when you have radically different experiences where you might expect to see big differences in the conversion rates of each variant.
  • Persuasive  profiling: MAB’s are suitable for persuasive profiling so that you identify what content works best for a particular personality trait.
  • Time is not a priority: When you are not in any rush to identify the best performing variant and want to optimise the average conversion rate MAB can be a suitable tool.


  • Traffic greedy: MABs require more traffic and more time to reach full statistical confidence. If you are not bothered about the average conversion rate during the test and need a speedy, but conclusive test result then A/B testing is probably the right methodology for you.
  • Needs large differences: When there is little difference between the conversion rate for each variant the benefit of multi-armed bandits disappear. This is a concern as we know from experience it is almost impossible to predict how much a difference a new design or heading will make to the conversion rate. The danger is that our own subjective opinions and biases come into play here which is what experimentation is designed to avoid.
  • More room for error: As bandits begin switching traffic before full statistical confidence is reached there is more danger that a variant that is performing better purely by chance will be selected as the winning experience. Conversely a variant that is initially performing poorly due to chance is more likely to be dropped by the algorithm and revenues lost.
  • Implementation is not easy: Setting up MABs is technically challenging as you may need a data scientist to advise on how to integrate and scale the code and a developer to program the test.

So when should you use MABs?

Multi-armed bandits are best suited to the following campaigns:

  • When you want to simultaneously explore and exploit an optimisation opportunity.
  • Optimising radically different variants where there is a need to begin exploiting the best performing experience without delay.
  • Headlines and short-term campaigns, particularly if the content has a limited time span.
  • Automation for scale.
  • Targeting to understand how different customer segments respond to content.
  • Combining optimisation with attribution. By including a bandit algorithm on your website and in your call centre automated software you can seek to optimise across multiple touch points.


Multi-armed bandit algorithms are not an alternative to A/B testing as they are designed for different roles in the optimisation toolkit. A/B testing is excellent for conducting online experiments to identify the best performing variant with a high degree of statistical confidence. MABs are more suited to continuous optimisation and short-term campaigns where the objective is to achieve a high average conversion rate. Ideally you would want to use both A/B and MAB testing as part of a comprehensive optimisation program.

Thank you for reading my post and if you found it useful please share using the social media icons on the page.

You can view my full Digital Marketing and Optimization Toolbox here.

To browse links to all my posts on one page please click here.

  • Neal has had articles published on website optimisation on  and as an ex-research and insight manager on the GreenBook Blog research website.  If you wish to contact Neal please send an email to You can follow Neal on Twitter @northresearch, check out the Conversion Uplift  Facebook page or connect on LinkedIn.


Secrets of Optimising Gambling Sites – Bonuses

Challenges and Opportunities :

Until I moved to Gibraltar, the self-proclaimed home of online gambling, I had not given much thought to the challenges of optimising online gambling sites. I previously worked in e-commerce and financial services so it was a bit of a change.

Once I had completed a year in the sun I moved to London to work on gaming sites for a further two and half years. I now offer conversion rate optimisation consultancy services to a range of sectors, including gambling sites, and would like to share my thoughts on the challenges and opportunities for optimising these kinds of sites.

In this first post I outline my thoughts on the use of bonuses as an acquisition and retention tool.

Complexity turns customers off:


Behavioural psychologists have noticed that mental maths, complex language and reading rules in poor fonts triggers our slow, methodical System 2 decision making process. This alters our mood and makes us less impulsive as we focus our attention on the matter in hand. It can also often result in frustration and unhappiness. Even a simple frown has been found to negatively affect our mood.

As a result gambling sites using dark and low contrast pages are automatically ringing alarm bells in our brain as we sense danger in such environments. This makes people especially cautious, conservative and risk adverse.

Image of low contrast text on
Image source:

Some gaming sites also suffer from this reaction due to the complexity and presentation of their sign-up and deposit bonus offers. This is compounded by designers who wrongly believe that displaying small print in grey text on dark backgrounds is less distracting for users. The opposite is true as psychologists have discovered that this type of page design results in disfluency which disrupts the mental flow, increasing perceived effort and leads to cognitive strain.


  • Use high-contrast designs unless you want visitors to take extra care with reading instructions. Psychologists have found that low-contrast text encourages people to think more carefully when reading content in such environments and they are less honest compared to high-contrast sites.
  • Avoid difficult to pronounce words as easily read words evoke positive feelings, but the opposite is true for difficult for words that are not.
  • Use familiar words (e.g. avoid jargon) as if something is unfamiliar we are more critical and suspicious of it. We are, also more accepting of familiar ideas and phrases.
  • Avoid multitasking (switching from one task to another) as our brains are not designed for this. Ideally pre-populate bonus code fields. Otherwise allow customers to copy bonus codes (i.e. don’t use images). For mobile customers ask them to take a screen shot of the bonus code as our short-term memory has very limited capacity.
  • Don’t ask for too much information at once. Divide tasks into small steps and break-up registration forms into a number of separate pages. Only ask for information that is absolutely necessary (e.g. gender can be inferred from a person’s name).
  • Ensure there is a clear and compelling differences between choices offered to customers. Asking people to make trade-offs between offers (e.g. welcome packages) which lack a clear reason to select either option creates conflict and makes decision making onerous. It forces us to think about opportunity costs and the losses inevitably involved. Introducing a third, obviously inferior option, presents a comparison that simplifies the decision for customers (see decoy effect).

Gains are nice, but losses motivate more:

Due to loss aversion we understand that people are more concerned about avoiding a loss than making a gain of the same size. This means that if we frame a gain as loss (e.g. “Don’t miss out on a free £10 welcome bonus”) it will be perceived to be more valuable than expressed as a simple gain.

The Benefit of Segregating Gains –  Loss Aversion


Chart showing the benefits of segregating gains due to loss aversion
Image source – PDF


However, hedonic framing tells us that two individual gains are perceived to be more valuable than a single larger gain of the same total amount. This means you should always segregate gains and especially small gains as the gain curve is steepest near the origin (see diagram below). This suggests that gaming companies would be better to focus on offering a series of small bonuses rather than a single large bonus.


  • Focus on offering a series of small bonuses rather than a single large bonus as this will be perceived to have significantly greater value to customers.
  • Smaller gains should also be segregated from larger losses because of the steepness of the gain curve means that the utility of a small gain is likely to exceed the utility of slightly reducing that of a large loss.

This is also called  the silver lining effect and explains the appeal of cash-back or loss-back promotions such as this one from Paddypower. Slot machines also benefit from the phenomena as they show winnings separately from the amount wagered.

Image of cashback offer from
Image source:


  • Loss aversion also indicates that people should add together losses because the loss function is convex. This means that when we make multiple small losses and look at them separately we feel more pain than if we combined them into a single loss. This explains why people get more concerned about a series of small losses than a single large loss of around the same size.


Rewards need to be achievable to motivate:

As I discussed in my post on the psychology of rewards, offering an incentive to complete a task can be a great way of motivating people, but for this to work effectively the goal needs to be achievable without too much effort. Otherwise people become despondent and lose interest. For some sites where there is also a time limit to release a bonus this is a concern as the level of commitment required can be unrealistic for most recreational players.

For example to release the poker bonus shown below from you need to earn 1,250 Status Points before you get your first £10 and you have to achieve this within 45 days. However, if you want to start off on beginners tables  as I did with micro-stakes you will earn relatively few Status Points and will struggle to obtain a bonus despite playing a lot of poker.  There appears to be little allowance for inexperienced players who want to play for low stakes or that for £10 it’s just not worth the effort.

Image of poker deposit bonus terms and conditions from
Image source:


The challenge here is to design bonuses that protect companies from potential fraud without penalising genuine new poker customers. The simplest way to deal with this problem is keep the first time deposit bonus to a relatively small sum (e.g. £10) as many new players only deposit the minimum amount when they first sign up.

PokerStars offers a £20 first deposit bonus for all customers who deposit £10 without any need to wager their own money to release the bonus.  This is a much better user experience than discovering you have to earn points within so many days as otherwise your bonus will expire.

Image of deposit bonus of £20 from
Image source –


  • Rewards need to be perceived as achievable to be effective incentives to help attract and retain customers. Onerous rules and time limits for releasing bonuses reduce their appeal as they lead to anxiety and frustration among regular customers.
  • This often results in poor retention rates which marketing then responds to by offering additional bonuses as an incentive to reactivate customers. Keeping incentives simple and making them more achievable may help break this cycle for some customers and encourage greater loyalty.

It’s not all about bonuses:

Image of poker table

Although bonuses are a useful acquisition and retention tool, it’s not the main reason why most genuine customers want to gamble online.  As with any optimisation process successful organisations need to begin by understanding customers and developing a strong value proposition that is aligned to customer expectations and goals. The Lift Model from Widerfunnel is my favourite optimisation tool as it’s a simple but effective way to visualise the optimisation process.

Image of lift model

The product is also important and the advertising man Dave Trott sums its influence up perfectly.

“The product creates the experience. The experience creates the reputation. The reputation creates the brand.” Dave Trott, One + One = Three.

Some gambling brands clearly understand this. Mr Green for example has created an outstanding online experience with a compelling proposition. This includes a quirky website design which definitely has the novelty factor.

They made responsible gambling prominent in their sign up process long before it became mandatory in the UK and employed account verification measures to prevent customers opening multiple accounts. This strategy of openness and responsibility helps build credibility and confidence among online players that the site is both reliable and trustworthy.


  • Your value proposition needs to be much more than just a bonus as otherwise you may only have price to differentiate between you and the competition. When a new visitor lands on a site they will often decide within a matter of seconds whether your proposition appeals to them and so it is essential that you get their attention with relevant imagery, headings, clear reasons to explore further and establish your credibility.
  • As Phil Barden explains in his book Decoded – The science behind why we buy, products work at an explicit level (e.g. we want to play a game of poker) , but brands are perceived to have psychological benefits that help differentiate them from each other (e.g. they offer escapism, fun and recognition of success ).  A strong brand needs to deliver on both explicit and implicit (psychological) goals by communicating a compelling value proposition.
6 main implicit psychologial goals
Source: Decode Marketing


  • Phil identified 6 core psychological goals that customers might expect brands to deliver on. These are based on the latest research from the fields of neuroscience and psychology. Check out his book Decoded – I strongly recommend it.


If a gaming brand is not strongly associated with relevant psychological goals then customers may take the free bonus, but they are unlikely to ever return once they have used it up. Psychological goals are especially important for products where there is little to differentiate between individual brands. Gambling sites are often perceived to have similar offerings and so understanding those deep psychological goals are key to acquisition and retention rates.

Thank you for reading my post and if you found it useful please share using the social media icons on the page.

You can view my full Digital Marketing and Optimization Toolbox here.

To browse links to all my posts on one page please click here.

  • Neal has had articles published on website optimisation on  and as an ex-research and insight manager on the GreenBook Blog research website.  If you wish to contact Neal please send an email to You can follow Neal on Twitter @northresearch and view his LinkedIn profile.

Why did the polls get it wrong again?

When will we learn?

The opinion polls were wrong again! Just like in 2015 with the UK General Election the US polls wrongly predicted the outcome.

Why are opinion polls wrong, wrong, wrong?

The most obvious reason is that we are inter-connected, emotionally volatile beings, with complex underlying psychological motivations that subconsciously drive our behaviour. We are not fully aware of why we might vote for Trump or Clinton.  If you try to over-simplify human decision making you stand no chance of predicting it. Dale Carnegie summed this up very well over 60 years ago:

“When dealing with people, let us remember we are not dealing with creatures of logic. We are dealing with creatures of emotion, creatures bristling with prejudices and motivated by pride and vanity”  – Dale Carnegie, How to win friends and influence people

Trump undoubtedly connected with many disillusioned voters at an emotional level. He engaged our fast,  intuitive and impulsive System 1 brain by using highly emotive language in a simple but effective way. “Build a wall”, “deport illegal immigrants” and “lock her up” resonated with many white, working class voters at a deep emotional level.

Hilary Clinton was unable to do the same because she used more rational language which was aimed at System 2, our slow, logical brain. She was also strongly associated with the established political classes and Trump capitalised on this big time with attacks on the Washington elite. Ironically, Trump’s standing was probably helped by the lack of support and criticism he received from many established members of the Republican party.

People don’t tell the whole truth!

With so much negativity about the campaign should we be surprised that some people may have lied about their intentions? We are social creatures and like to feel we fit in with the groups and networks that we associate with. We dislike being an outsider and may feel uncomfortable admitting to ourselves, never mind other people that we are considering voting for someone with highly divisive policies. Get real – people will tell you what they think you want to hear rather than what might be their true intentions.

Stereo types and attitudes:

We also suffer from implicit bias which shape attitudes and stereotypes that influence our behaviour at a subconscious level. This means that we like or dislike certain kinds of people or cultures without being full aware of the reasons. However much we might try to ignore such feelings they are fully integrated into our decision making machinery and insidiously influence our behaviour.

Hilary Clinton is of course a women, ex-first lady, the 67th US Secretary of State and was strongly aligned to the first black president of the United States. Never mind her use of a private email server which opened up a whole can of worms for her during the campaign. Trump played on all these points during the campaign as he even called for Hillary to be jailed. He understood that people are not fully rational or isolated from one another.

What about the undecided voters?

Those people who had not made up their mind (or rather had it made up for them) are always a challenge for pollsters. The evidence suggests that these voters can be heavily influenced by what they think other people are going to do and opinion polls are part of this jigsaw. However, again some voters may just not want to admit how they plan to vote. Taking people literally when they give us an answer to a question is again just plain silly.


The investigation of the reasons behind the failure to predict the UK General Election in 2015 suggested that the pollsters did not allow enough time or energy to contact Conservative voters. The people they had most difficulty contacting were generally busier and more difficult to get hold of and were more likely to be Conservative voters. This partly comes down to money as taking the effort to knock on doors and actively select people using more scientific sampling methods is extremely time-consuming and expensive.


However, when it comes down to it the reliance on asking direct questions of people is still a fundamentally flawed way of predicting future behaviour. Like much of the traditional market research that is pumped out each day it is not worth the paper it is printed on. Instead we should be using more implicit and indirect questioning similar to that carried out by Jon Pulseton at Lightspeed Research  or  text analytics as conducted by Tom Anderson. These methods appear better at tapping into the emotional response that each candidate evokes.

Thank you for reading my post. If you found it interesting please share with the social media icons below.

Related Posts:

Polls and the UK 2015 election  – Why did the opinion polls get it so wrong in 2015?

Influence of polls – Do opinion polls influence voters?

European Referendum – Why emotions won over logic?

Marketing lessons – 7 Marketing lessons from the Brexit campaigns.

Referendum – A device for dictators and demagogues?


To browse links to all my posts on one page please click here.

  • Neal has had articles published on website optimisation on  and as an ex-research and insight manager on the GreenBook Blog research website.  If you wish to contact Neal please send an email to You can follow Neal on Twitter @northresearch and view his LinkedIn profile.