Earlier this year I published a blog post about a Baysian decision rule (now dead link, but VWO now uses this for A/B testing and tech docs are here) for choosing between two variations, each with a potentially different conversion rate. The basic idea of the decision rule is as follows.
- Choose a "threshold of caring" - if A and B differ by less than this threshold, you don't care which one you choose.
- Choose a prior on the distribution of conversion rates of A and B.
- Compute a posterior, and use it to estimate whether the expected losses you'd make by choosing A (or B) are below the threshold of caring. If so, stop the test.
This A/B testing procedure has two main advantages over the standard Students T-Test. The first is that unlike the Student T-Test, you can stop the test early if there is a clear winner or run it for longer if you need more samples. The second is that as a Bayesian test, your outputs are easily interpreted quantities - for example, the probability that version A is better than version B, or your expected loss from choosing the wrong one.
I won't repeat the details of the method, instead referring the reader to the original post. The crucial part of the test is determining when to stop. Suppose version A has a higher empirical mean than version B, i.e. $@ \textrm{clicks on A} / \textrm{displays of A} > \textrm{clicks on B} / \textrm{displays of B} $@. Then the test is stopped when:
Or, to simplify:
In the original source, the integral is calculated numerically. However, it turns out we can compute it exactly using the following formula. Define first the function:
Note that $@ h(a,b,c,d) = P(X > Y) $@ where $@ X \sim \beta(a,b) $@ and $@ Y \sim \beta(c,d) $@.
Then:
The numerical computation in the original blog post can then be replaced by the above formula.
How to compute the formula
Recap: Evan Miller's Closed Form Solution for $@ P(X > Y) $@
In a very nice blog post, Evan Miller cooked up a nice closed form formula for evaluating \(@ P( X > Y) $@ when $@ X $@ is drawn from a [Beta distribution](http://en.wikipedia.org/wiki/Beta_distribution) with integer parameters $@ (a,b) $@ and $@ Y $@ is drawn from a Beta distribution with integer parameters $@ (c,d)\)@.
Expressed as an integral, we have that:
Evan Miller computed the integral and came up with a closed form solution for it:
Computing the loss function
We can bootstrap this analysis and compute the loss function. We start by distributing across $@ (y-x) $@:
We then multiply by \(@ B(c+1,d)/B(c+1,d)\)@ and \(@ B(a+1,b)/B(a+1,b)\)@ and do simple arithmetic:
This is what we wanted to show.
Note: The original version of this post had two sign errors. Thanks to Frank (unknown last name) for catching the mistake.