I’m Leo Pekelis, a Statistician at Optimizely. I just hosted an online workshop called “Take Action on Results with Statistics” as part of our hands-on Optimizely Workshop series. Today, we covered:
- Why Optimizely built Stats Engine
- How to tune Stats Engine to get the best performance for your unique needs for one goal and variation.
- Choosing the optimal number of goals and variations for your experiment (preview)
First, why did Optimizely build Stats Engine?
In short, traditional statistics (or t-tests) was effective 100 years ago, but isn’t so effective in today’s landscape. Back then, results were looked at once and only once for a pre-determined sample size.
Today, A/B tests are much more complicated; there are multiple goals, constant iterations, and a desire to check results early and often. Unfortunately, some businesses still use t-tests when they A/B test, which comes with a couple of pitfalls:
Pitfall 1: Peeking
In traditional statistics, you likely want to see how your test is doing once your experiment hits the minimum sample size, even though it has not yet reached statistical significance, or p-value < 5%. This is called peeking. Why is peeking a problem? Because every time you peek, you increase the chance of a false positive, meaning it’s showing the variation is winning, but in reality there is no difference, or even a loss.
Pitfall 2: Mistaking “false positive rate” for “chance of wrong answer”
Even if you don’t peek, there’s still a chance of making the wrong call. This is because a t-test guarantees 5% wrong calls, or false positives, among all your goals and variations. So if you are testing multiple variations and goals, and many of them turn out to be inconclusive, there is a high likelihood that a number of the conclusive tests are incorrect.
Fortunately with Optimizely, you don’t have to worry about making the mistake of peeking or mistaking a false positive for the chance of a wrong answer. Stats Engine takes care of these common pitfalls for you, but by knowing some statistics of your own, you can maximally tune the Stats Engine to get the most performance for your unique needs.
The Stats Engine and tradeoffs of A/B Testing
There are three tradeoffs of A/B testing that you should be particularly aware of when running a test:
- Error Rates
- Effect Size/Baseline Conversion Rate
Let’s start with Error Rates.
In your Optimizely settings, you can adjust the statistical significance of a test before you run it. For example, if you set your statistical significance to 85%, your error rate would be 15%. In other words, when a winner or loser is declared, there is a 15% chance that the call is false and the winner is actually a loser, or the loser is actually a winner.
Now let’s talk runtime.
Runtime is the length of your experiment. The longer your runtime, the more visitors you will need for your test to reach statistical significance. The exact number of visitors for a length of time depends on your daily traffic, and is unique to your business.
Finally there’s Improvement/Baseline Conversion Rate
Improvement let’s you know the percentage by which your variation is winning or losing, compared to the original. Your baseline test is usually the original, but the Results page allows you to toggle between different experiments.
What’s the tradeoff between the three? They are all inversely related!
For example, at any number of visitors, the less you threshold your error rate, the smaller effect sizes you can detect.
At any error rate threshold, stopping your test earlier means you can only detect larger effect sizes.
Finally, for any effect size, the lower error rate you want, the longer you need to run your test.
Preview: How many goals and variations should I use
Stats Engine is more conservative when there are more goals that low signal, or that aren't strongly or directly affected by the changes you made in your variation. Adding a lot of “random” goals will slow down your experiment.
Here are a few tips to keep in mind with multiple goals and variations:
- Ask: Which goal is most important to me?
- This should be the primary goal (not impacted by all other goals)
- Run large, or large multivariate tests without fear of finding spurious results, but be prepared for the cost of exploration.
- For maximum velocity, only test goals and variations that you believe have highest impact.
That’s it! Have any questions? Feel free to chat with me here in Optiverse.
If you missed the workshop and want more than the highlights, the recording is here:
<script src="//fast.wistia.net/assets/external/E-v1.js" async></script>
- For the Workshop slides, see below
- To learn more from our Optimizely Workshop series, sign up here.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.