Testing for gross margins
At my company we AB test for gross margins, not for conversion-rates.
One of our test has 7.8k uniques, and the B version has an improvement of 35%, yet the statistical significance level is at 65% only. Why is that?
Another experiment, also for gross margin, has only 4k uniques, and the improvement is 6% only, yet the significance level is at 73%.
Why is it higher than for the first test?
Thank you for taking the time to answer these questions.
These are great questions, and I'll try to answer them succinctly.
The level of pure improvement in a test is separate from the statistical significance with which we state that improvement is due to chance or a real difference. In other words, given whatever improvement we are observing, we are going to make a statement about how confident we are that that improvement is due to a real difference between the original and variation rather than random chance. That statement is statistical significance.
Optimizely is able to detect very small and very large differences in variations, and generally, you.... For us to be able to state that a very small lift is statistically significant, we will need to see a large number of visitors per variation. Conversely, we might be able to state that a large difference has reached statistical significance with a smaller number of visitors.
While Stats Engine will always show you valid results (eliminating the need to pre-set locked expectations via a sample size calculator), you can still use our sample size calculator as a guide to get a sense of what number of visitors you might need based on your desired improvement. You'll see that, in general, smaller lift requires more visitors.
Please let us know if this helps clarify what your results are telling you, or if you'd like a deeper 1-on-1 review of your experiment. Thanks!
Solutions Architect | Optimizely, Inc.