04-26-15

Statistical significance vs sample size

[ Edited ]

I appreciate that back in January a change to the stat engine has been introduced, and that now a on-going process to determine whether an experiment is in place.

However, given the fact your calculator still show the sample size figure I'm not sure whether regardless the number of users that ran through the experiment I can end a test if the statistic significance has been reached.

On the opposite side, what could be the best way for smaller / medium sites in term of testing length?
Should I run an experiment until the sample size has been reached? or should I set a minimum of say 7 days and then stop the experiment anyway?

Thanks

Level 1

Amanda 04-27-15

Re: Statistical significance vs sample size

Great question. Our Statistician here at Optimizely posted a great response on Statistical Significance for lower traffic sites here: https://community.optimizely.com/t5/Strategy-Culture/Testing-with-low-traffic-probability-of-no-diff...

I'd encourage you to take a look at this post. I'll also have my colleague follow-up about the best way to use the calculator.
Optimizely
robinp 04-29-15

Re: Statistical significance vs sample size

Hi Moroandrea,

You are correct that Stats Engine now calculates statistical significance automatically while your test is running, without the need to set your sample size in advance. This means you can end your test as soon as you see statistically significant results at a level you're comfortable with (90% by default), and have confidence in the results.

The sample size calculator is meant to show you the longest you might have to wait given the effect you are looking for in your results. If a larger effect exists, Stats Engine will likely find it sooner. If Stats Engine is detecting either no effect or one that is smaller than the one you are looking for, your results will be inconclusive, and it might be time to move on to your next idea. That’s where the sample size calculator can really help you decide when to stop your test.

Here’s an example: Say your baseline conversion rate is 5%, the minimum detectable effect you’re willing to wait for is a 15% improvement, and you’re running your test at 90% significance. The sample size calculator in this case says you’ll need 10,500 visitors to detect that 15% improvement. If you run the test to 10,500 visitors and the results are still inconclusive, you can then decide whether to wait longer for the chance of seeing a smaller effect, or end your test wherever you are and move on to your next idea. On the flip side, if the effect is actually a 30% improvement, you might see the test reach significance as fast as a few thousand visitors.

You can read more about how to use the sample size calculator with Stats Engine in the knowledge base.

Optimizely
claudiobellei 03-22-16

Re: Statistical significance vs sample size

Hi robinp,

Your explanation is clear but I would like to understand better how the sample size calculator compares against classical statistics. I am used to calculate sample size after specifying a power (usually 80%) and a significance level (usually 5%), however with the new stats engine we are not required to specify a power anymore and that's what confuses me.

My question is: is there a figure for what the sample size calculator actually means? For example, if we wait for the amount suggested by the sample size calculator (for a given baseline and mde) are we expected to find an effect at least 50% of the times within this period?

Thanks!
Claudio
Level 1