Statistical significance vs sample size
I appreciate that back in January a change to the stat engine has been introduced, and that now a on-going process to determine whether an experiment is in place.
However, given the fact your calculator still show the sample size figure I'm not sure whether regardless the number of users that ran through the experiment I can end a test if the statistic significance has been reached.
On the opposite side, what could be the best way for smaller / medium sites in term of testing length?
Should I run an experiment until the sample size has been reached? or should I set a minimum of say 7 days and then stop the experiment anyway?
I'd encourage you to take a look at this post. I'll also have my colleague follow-up about the best way to use the calculator.
You are correct that Stats Engine now calculates statistical significance automatically while your test is running, without the need to set your sample size in advance. This means you can end your test as soon as you see statistically significant results at a level you're comfortable with (90% by default), and have confidence in the results.
Your explanation is clear but I would like to understand better how the sample size calculator compares against classical statistics. I am used to calculate sample size after specifying a power (usually 80%) and a significance level (usually 5%), however with the new stats engine we are not required to specify a power anymore and that's what confuses me.
My question is: is there a figure for what the sample size calculator actually means? For example, if we wait for the amount suggested by the sample size calculator (for a given baseline and mde) are we expected to find an effect at least 50% of the times within this period?