- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report

## Statistical Test Choice & Power Analysis

Status: Done**Choosing Statistical Test: **

It would be helpful if there were different statistical tests to choose from when looking at the results. We generally use two-tail tests, so we take the numbers of visitors and conversions and run the results in R.

**Power Analysis: **

This would tell the user what kind of effect size he or she could expect to be able to observe given the traffic so far (or how many additional visitors will be needed to reach a certain confidence level).

With our new Stats Engine, Optimizely now uses sequential hypothesis testing with false discovery rate controls to calculate statistical significance. This means that the p-value that you see is a valid measure of statistical significance at all times that your experiment is running.

As for power, there is no longer a need to consciously set statistical power. Instead the specificity of your results depends only on how long you are willing to wait. The sequential test we implemented is a test of power one, which means it will detect a non-zero effect size always, if you wait for enough samples. Waiting longer on any test gives you more chance to detect a winner or loser, if one exists. Plus, instead of the effect size you can expect to detect with power given your traffic so far, we now give an in-product estimate of how much more traffic you need to call your test significant given the currently observed effect size stays the same.

With our new Stats Engine, Optimizely now uses sequential hypothesis testing with false discovery rate controls to calculate statistical significance. This means that the p-value that you see is a valid measure of statistical significance at all times that your experiment is running.

As for power, there is no longer a need to consciously set statistical power. Instead the specificity of your results depends only on how long you are willing to wait. The sequential test we implemented is a test of power one, which means it will detect a non-zero effect size always, if you wait for enough samples. Waiting longer on any test gives you more chance to detect a winner or loser, if one exists. Plus, instead of the effect size you can expect to detect with power given your traffic so far, we now give an in-product estimate of how much more traffic you need to call your test significant given the currently observed effect size stays the same.