Early stopping rules for A/B testing
Ideally you'd want to specify the size of your test sample before starting an A/B test. However, in some cases we might achieve significance early in the experiment. Although it's statistically not correct to eliminate a test before the total sample size has been reached, in some cases you might want to do so to optimize business value.
Do you have stopping rules for deciding when you're allowed to stop an experiment earlier than expected?
For example, you could decide to stop an experiment when it's at least 99,9% significant at any stage of the experiment, while, at the end of an experiment, you could draw your conclusions when you see 90/95% significance or more.
Great question! You are correct that Optimizely could show that a variation has reached statistical significance before there is actually enough statistical power to know if that determination is true. For this reason, we encourage customers not to regularly look at the results until a certain amount of traffic has been reached for the test. To determine the right number of traffic needed, what we do provide is a Sample Size Calculator (look under Resources from the Optimizely homepage). If you have not already seen it, take a look at this article:
Currently we do not have any such logic to look for a higher significance level early on and then bring it back down towards the end of the experiment. So it is important to take into consideration your sample size and the idea of statistical power. That being said, our team has been working on a revamped statistical model to allow users to make decisions based on the data they see without comparing it to the sample size calculator. I don't have an exact date for when that will be launched, but it should be coming out in the next few months so keep an eye out for it!
This can be a little harder to follow further up the funnel, where your sample size can be pretty huge (if you're doing a homepage test with conversion goals, for example), so personally I'd just make sure you're comfortable committing to testing a potentially losing variant through to completion before setting anything live.
Really curious to see what others have to say though, as this is something I've wondered as well.