At what point do you give up on your experiments?
The above is a screenshot taken from an experiment with 8,820 unique visitors running for 48 days with a signifcance level of 90%.
You can see despite a couple of swings there appears to be a negligible difference between the variation and original. With a suggesstion of around 10 months traffic before we hit statistical significance I'm about to close it down.
At what point would you end the experiement?
Solved! Go to Solution.
- When results are inconclusive, here are some of the other indicators I like to look at:
- Do conversion rates cross after the initial noise? If they don't (like in your case), that's a good indicator one variation might end up a winner (or loser).
- What do conversions and visitors over time look like? Are there steady patterns or irregular peaks and valleys? The former is a good indicator one variation might prove to be a winner, where the latter might lead us to let the test run longer to see if another full business week or weekend needs to be measured.
- How are the difference intervals trending related to 0? If they're pretty close to being entirely to the left or right of 0, that's a good indicator a variation might win or lose. If 0 is just about in the middle, either let the test run longer or call it inconclusive.
- If you have a lot of goals, more time and more visitors is going to be required to get to statistical significance.
The number of visitors remaining is simply an indicator that if nothing else changes with the experiment's current conditions, this is the amount of visitors that would be needed to reach statistical significance. This is somewhat impractical as we know any number of things, such as moving into a weekend or your company launching a new email campaign driving to the site, could change the conditions of the experiment and cause this number to drop quite a bit.
The best indicator in my opinion is the Difference Interval. The gray bar tells you how much risk your business is taking if they roll out the variation to the whole population. Without being able to see your exact interval endpoints, I'd roughly say there's about a 3x chance that rolling out the variation to the whole population would result in conversion rates worse than the baseline (i.e. an absolute negative difference from 0).
While there is a small chance you could see a positive lift from the variation, the ultimate question for your business is, what is worse - rolling out an inconclusive variation that turns out to be a loser, or NOT rolling out an inconclusive variation that turns out to be a winner? This latter option could take the form of calling this test. making no changes to your site, and moving on to others; or it could take the form of letting this test run to statistical significance, but at the expense of moving on to other tests.
I would say it's time to end the test, but your business has to decide what next step is least risky.
Does that guidance help?
Solutions Architect | Optimizely, Inc.
As a general rule I would write off a test like that and not declare a winner and move focus to a new test.
Without knowing what you are testing it is difficult to give a clear piece of advice, but generally this would indicate that the variations in your test may be too similar and you should look to make bigger changes to your site for testing, or that the elements being tested might not actually be a big conversion driver.
Very interesting Harrison, thanks for your reply.
I've dived into learning about the difference interval and it seems the variation is as bad as -0.93% or as good as +0.39% . So i guess this is where you got the 3x rough liklihood that rolling out the variation will lower the conversion rate?
How are the difference intervals trending related to 0?
It's trending closer to 0 so incluclusive in that sense.
In terms of risk we are at more risk to roll the test out so I'm going to call it a day.
Thanks for your in-depth response,
Yes it's a small change my manager requested but it added friction on the page as took users away from the page goal so I ran a test to decide.
Yup, you're exactly right on interpreting the difference interval. We're actually observing an absolute difference in conversion rates of -.07% (1.62 - 1.69), and given that this does fall within the difference interval, I'd say there's a pretty good chance the variation would be a loser if you rolled it out to everyone. I agree with your decision to call this test inconclusive and move on to other tests.
Further, @adzeds is also correct in that with a relatively small visitor size and small conversion rate, a great way to get closer to statistical significance is to go for bigger changes. The results of this test also lead us to believe that the added friction in taking a visitor away from the page will at best do a little bit better than the original experience, but has a 3x chance to do worse.
We have a lot of testing ideas for websites with small traffic here. Please let us know if you find these ideas helpful. Thanks!
Solutions Architect | Optimizely, Inc.