Statistical Test Choice & Power Analysis

Status: Done
by ‎04-16-2014 April 16, 2014 - edited ‎04-16-2014 April 16, 2014

Choosing Statistical Test:

It would be helpful if there were different statistical tests to choose from when looking at the results. We generally use two-tail tests, so we take the numbers of visitors and conversions and run the results in R.

Power Analysis:

This would  tell the user what kind of effect size he or she could expect to be able to observe given the traffic so far (or how many additional visitors will be needed to reach a certain confidence level).

Status: Done

With our new Stats Engine, Optimizely now uses sequential hypothesis testing with false discovery rate controls to calculate statistical significance. This means that the p-value that you see is a valid measure of statistical significance at all times that your experiment is running.

As for power, there is no longer a need to consciously set statistical power. Instead the specificity of your results depends only on how long you are willing to wait. The sequential test we implemented is a test of power one, which means it will detect a non-zero effect size always, if you wait for enough samples. Waiting longer on any test gives you more chance to detect a winner or loser, if one exists. Plus, instead of the effect size you can expect to detect with power given your traffic so far, we now give an in-product estimate of how much more traffic you need to call your test significant given the currently observed effect size stays the same.

Level 2

by
‎04-28-2014 April 28, 2014
Status changed to: New

Community Manager
by
‎06-02-2014 June 2, 2014
Status changed to: Great Idea!

Optimizely
by
‎06-17-2014 June 17, 2014

I agree in that adding power analysis would be great and will improve deriving conclusions from tests.

Level 2
by
‎10-27-2014 October 27, 2014

Choosing statistical tests:  including Bayesian ...

Level 2
by
‎01-23-2015 January 23, 2015

solved ... (new stat engine)

Level 2
by
‎01-28-2015 January 28, 2015
Status changed to: Done

With our new Stats Engine, Optimizely now uses sequential hypothesis testing with false discovery rate controls to calculate statistical significance. This means that the p-value that you see is a valid measure of statistical significance at all times that your experiment is running.

As for power, there is no longer a need to consciously set statistical power. Instead the specificity of your results depends only on how long you are willing to wait. The sequential test we implemented is a test of power one, which means it will detect a non-zero effect size always, if you wait for enough samples. Waiting longer on any test gives you more chance to detect a winner or loser, if one exists. Plus, instead of the effect size you can expect to detect with power given your traffic so far, we now give an in-product estimate of how much more traffic you need to call your test significant given the currently observed effect size stays the same.

Optimizely