Rich-text Reply

Explaining step-wise increase of statistical significance with Stats Engine

Leo 03-04-15

Explaining step-wise increase of statistical significance with Stats Engine

[ Edited ]

Hello,

 

Since we've been getting a number of questions about it, here is quick post that explains why Statistical Significance increases in a step-wise pattern instead of smoothy with Stats Engine. It has to do with our sequential testing procedure and how it accumulates evidence of a significant difference between variation and baseline over time. Read on for the full story!

 

Why does my significance increase in a stepwise manner?

 

The statistical significance calculation reflects the most conclusive evidence Stats Engine has seen towards a true difference between your variation and baseline. Conclusive evidence comes in two main forms: larger conversion rate differences and conversion rate differences that persist over more visitors.

 

If you glance at the statistical significance chart on any results page, you’re likely to see a statistical significance line that starts flat for awhile, increases sharply, then continues to experience a stepwise trajectory. When you see statistical significance increase sharply, you’re seeing the test accumulate more conclusive evidence than it had before. Conversely, during the flat periods, Stats Engine is not finding additional conclusive evidence beyond what it already knew about your test.

 

This is best explained by an example.

 

converion_rate_blur.png

stat_sig_over_time_blur1.png

 

In the first four days of the experiment above the variation shows a large conversion rate improvement over the baseline. Yet to Stats Engine, this looks indistinguishable from chance fluctuation because there haven’t been many visitors. The large conversion rate difference has not persisted for enough visitors. Stats Engine shows 0% significance for this time because it has not yet seen any compelling evidence of a difference between variation and baseline.

 

converion_rate_blur.png

stat_sig_over_time_blur2.png

 

The big difference in conversion rates then persisted for another three days. With this additional evidence, statistical significance increased above 0%. Following this period, the improvement shrank. It turned out that the first seven days saw an abnormally high amount of traffic which preferred the variation. Stats Engine is built to protect you from concluding a variation is a winner too early in exactly these circumstances.

 

converion_rate_blur.png

stat_sig_over_time_blur3.png

 

Following the drop in improvement, from Friday the 13th (scary!) to Thursday the 19th, Statistical Significance remains flat. Stats Engine does not find any more compelling evidence than it had on the 12th because you now need relatively more visitors for the evidence in a smaller improvement to outweigh the earlier, larger improvement. Nevertheless, since the smaller improvement stays stable, eventually you attain a greater quantity of evidence than before, and statistical significance continues it’s climb.

 

Every flat period of a statistical significance graph can be explained in exactly the same way - through the interplay of magnitude of improvement and persistence of the improvement - just on different time scales. Random fluctuations force this interplay into action on the level of days, hours and even minutes, depending on the volume of traffic to your site.  

 

The behavior of Statistical Significance in simulations.

 

Below is another example to provide intuition into how Stats Engine works. This time through simulations.

 

the_behavior_of_statistical_significance.png

 

In the first set of charts we simulate a fixed conversion rate difference of two variations. The Statistical Significance over time increases in a progressive, regular stepwise manner. However, in the real world, variation conversion rates fluctuate, even if the true probability of converting remains the same. The charts to right add in this natural variance and show Statistical Significance still increasing, but now in a more separated, random stepwise manner. This irregular stepwise behavior is expected of Statistical Significance calculated in the framework of Sequential Testing.


To learn more about the specific math behind Sequential Testing and how we utilize this framework in Stats Engine, checkout our technical post.

 

Thanks for reading!

 

Leonid Pekelis
Statistician at Optimizely
Leo
Optimizely