Rich-text Reply

How does the difference interval work in Stats Engine? It's not centered for conversion goals?

Leo 03-13-15

How does the difference interval work in Stats Engine? It's not centered for conversion goals?

[ Edited ]

It may come as a bit of a surprise that even for conversion rate goals our confidence intervals with Stats Engine are not symmetric about the current difference in conversion rates. The reason for this comes back to how Stats Engine calculates statistical significance.


Statistical significance is achieved when Stats Engine finds a large enough piece of evidence that demonstrates the variation has a different conversion rate than baseline. Evidence is accumulated in two main ways - conversion rate differences that are large enough, and conversion rate differences that persist for many visitors.


A second, completely equivalent way to describe this is that Stats Engine always considers the range of improvements (positive or negative) that are 90% likely (where 90 is set in your project leveling settings) to contain the actual average improvement after removing random fluctuation from the equation. As Stats Engine sees more visitors and sees bigger evidence that the actual improvement is non-zero, it is able to discard more potential improvements from this range.


To illustrate, it will help to use the same example we used to explain the step-wise behavior of statistical significance in this post.



Early on in the experiment, there is not much evidence toward a non-zero effect and so the range of potential improvements, the combined evidence up to today, is wide around the observed difference (the orange dot).



Each day new evidence comes in corresponding to a different range of potential improvements. The improvements that are not in today’s range can be considered those improvements which Stats Engine conclusively discarded with today’s evidence. And so, the combined evidence, or statistical significance, corresponds to all the improvements that have not yet been discarded up to today. Today’s range of improvements is always symmetric around the observed difference, but the intersection of past and present ranges does not have to be.


Eventually, if there is an actual difference between your variation and baseline, Stats Engine will see large enough evidence to have knocked out all improvements either below or above zero, respectively if the actual improvement is positive or negative. The time when this happens is exactly when statistical significance crosses your threshold and Stats Engine calls a winner or loser.   


The statistical name for the difference interval we report is a confidence interval on the absolute difference in conversion rates, and is the range of improvements that Stats Engine has not yet discarded.


In fact, we use this very result to detect when the actual improvement has shifted substantially since we started the test, often due to temporal variation. Observing a conversion rate that Stats Engine has previously discarded is a pretty good indication that something has changed between then and now.


Leonid Pekelis
Statistician at Optimizely