Rich-text Reply

Quantifying actual impact of a test

DarthB 03-11-15

Quantifying actual impact of a test

I was told that if I run a test and see a 20% improvement compared to control at 90+% confidence, all it tells me is the variant is better than control, but doesn't mean I would see the 20% lift if I ramp the variant to 100%.  


Can someone please explain the technical reason behind this?  Thanks.  

Level 1

JDahlinANF 03-11-15

Re: Quantifying actual impact of a test

The "technical reason" has to do with semantics of language.


Suppose the weather man predicts a 90% chance of getting 20 inches of snow. 


Does this mean you will get 20 inches of snow?


No.  It only means you can be pretty sure (90% sure) that you will get snow and that the best estimate is that it will be around 20 inches.

robinp 03-12-15

Re: Quantifying actual impact of a test

Hi DarthB,
Nap0leon’s analogy is a good place to start in understanding what’s happening.
What’s going on here is that the 20% is your observed lift during the course of the experiment, while the statistical significance tells you the confidence Optimizely has that the results you are seeing are due to a real difference between your control and variation. This can be read essentially as a prediction of what would happen when you roll out the winning variation to all of your visitors. You’re not guaranteed to see exactly that lift if you implement the variation because the visitors to your site on one day likely do not behave the same way as visitors the next. 
That’s where difference intervals come in. The difference interval on Optimizelys results page shows you the range of values that likely contains the absolute difference in conversion rates that would show up in a long-term implementation of your variation over the baseline. So, if your baseline has a conversion rate of 10% and your variation has a conversion rate of 12% (a 20% lift), a difference interval might show a range of values between 1% at the low end and 3% at the high end. With 90% significance, this difference interval can be read as: “There is a 90% chance that the true conversion rate of this variation is between 11% and 13%."
The longer you run your experiment, and the more data it collects, the smaller the difference interval will become, honing in more closely to your true conversion rate. If youre comfortable knowing that there’s a 90% chance your true conversion rate is higher than the baseline, you may just want to end the experiment as soon as the results reach significance. But if you need to know exactly how much ROI to expect from implementing the variation, you can leave the test on longer to collect more data and become even more precise in your prediction.
We have a lot more resources to learn about statistics in Optimizely in our Knowledge Base, and I think you'll also find our recent webinar about Stats Engine will answer a lot of questions about this as well.