Testing with MDE 5% or less and new Stats Engine
Most of the tests that we run have a lift of 5% or less on the main KPI we are measuring. I should add that our starting conversion rate is 4%. WIth the new stats engine, we are running into an issue where it seems like it will be forever until we reach statistical significance. Several weeks into the test, Stats Engine still needs more than 100,000 visitors to reach significance.
Is anyone else having similar issues? How are you handling? I understand why this is happening, but I'm not quite sure how to go about making decisions without waiting a month or two to know results.
Solved! Go to Solution.
If the variation you're testing has very little or no impact whatsoever, then you'll never see a statistically signifcant result.
When a test results in neither a winner nor a loser, you should consider the following:
- What hypothesis, if any, does the neutral result invalidate? The problem might not be what you thought.
- Was the variation drastic enough to have an effect? You may need to be more creative or increase the scope of the experiment.
- Should the test be targeting specific user segments to find significant results? Maybe the A/B test affected a user segment, but that change isn’t visible in the average.
I wrote about this on the Optimizely blog: http://blog.optimizely.com/2014/10/30/the-problem-with-ab-testing-success-stories/
We’d also like to offer some general advice about using Stats Engine for very small lifts on small baseline rates.
One feature of Stats Engine is that you can now decide whether to wait longer to determine if a very small lift is a true difference or due to random fluctuation.
On the other hand, you are also free to conclude that you were unable to conclude with confidence that the variation has significant difference from baseline. There is still valuable information to glean from a test which has not reached significance, in the difference interval. Regardless of the statistical significance of your experiment, the difference interval shows a range of lifts that we are 90% certain contains the true lift between your variation and baseline. It is useful to know that a variation has between -1% and 3% lift, and you can use this information to guide you in selecting variations to iterate and improve.
Note that the 90% number corresponds to the significance threshold set in your project level settings.
We have created Stats Engine to coincide with the dynamic environment of a quickly moving businesses. Feel free to closely watch your test results, iterate on variations, and test lots & often!
Statistician at Optimizely