Rich-text Reply

When to declare a winner...

linda 06-13-14
Accepted Solution

When to declare a winner...

I'm currently running a test on an e-commerce website in Optimizely.  You can see a screenshot of the results here, http://take.ms/HbbRV.

 

According to Optimizely, by my Goals are showing the Challenger as winning the test.

 

Goal 1: Sale Completed is showing 137% improvement with 100% Confidence Interval.

Goal 2: Revenue is showing 118% lift with 100% Confidence Interval.

 

My feeling is that it's too early to stop the test as it's only been about a week of testing and the sample is 1,231 unique visitors. 

 

Should I let the test run another week, or should I cut and run with the Challenger right now?

Level 2

adzeds 06-13-14
 

Re: When to declare a winner...

Hi.

Looking purely at the numbers it is still too early to call a winner in this test from my point of view.

What are you testing? Is it a big change to the website or a relatively small change?

The stats are looking good though so I would expect that the challenger would win in the end but I would not be happy reporting those numbers to anyone unless there was a lot more confidence that there currently is.

With a few more details about the test I might be able to lean on some of my experience to give a guide as to whether you are onto a winner.



David Shaw
Level 11
linda 06-13-14
 

Re: When to declare a winner...

Thanks for your quick response. We're testing Product page clarity. There have been a number of minor changes around sizing info, delivery etc as well as styling changes. You can see it at Waggle.com.au.

Level 2
adzeds 06-13-14
 

Re: When to declare a winner...

Hi,

Nice website. Looks like you have a well laid out website.

I think I would give the test another week, if performance is still at the level it is now then I think you should be pretty safe to go for it.
David Shaw
Level 11

Re: When to declare a winner...

Hi Linda,

I would agree with adzeds about letting the test run for a bit longer. The disparity is wide between test and control page and it certainly seems like it's going to win.

Website traffic can have a different behavior patterns throughout the week/year. On a macro scale this is referred to seasonality, ie busy summer months, Christmas etc.

On a smaller scale, for most sites the traffic on a Sunday will perform differently than a Tuesday.

A good analogy for explaining this to clients is that website traffic is like the weather. Monday might be sunny and 70 degrees and on Friday it might be windy, rainy, and 45 degrees.

On a very micro scale, we've seen major new events effect conversion rate. During the Malaysian Airline disappearance we saw conversion rates rip across several sites. This can happen also for things like Presidential elections etc.

Hope this helps a bit. Kudos on your test, Linda. It looks like it's going very well.

Happy testing!

Keith
Keith Lovgren

Khattaab 06-13-14
 

Re: When to declare a winner...

[ Edited ]

Hi Linda, your question raises an essential point about testing strategy. It's important to have a standardized approach to test duration. While 2 full weeks (weekdays and weekends) is a frame of reference, there's no universal truth around how long to run experiments. Best practice is to use an A/B test sample size calulator before you run the experiment. If you don't have baseline conversion metrics before starting the experiment, it's acceptable to run an experiment for 1-2 weeks with no variations to track the conversion rate on the Orginal variaiton.

 

1) Baseline Conversion Rate: CR of original before testing; if Original CR is not available, run a test with no variations for at least 1 week for a rough estimate.
 
2) Minimum Detectable Effect: Minimum lift that you determine is meaningful to your business (higher MDE numbers will require less traffic). MDE introduces the consideration of opportunity cost. If you're chasing 1-2% lift, Optimizely's statistical model will require substantially more straffic to identify a statistically significant difference in conversion rates. When resvisiting past experiments to evaluate statisitical power, enter the highest lift/loss you achieved to confirm if you had a large enough sample.
 
3) Statistical Power: as low as 80% is reliable, but when making major changes, err on the conservative side of 90%+
 
4) Statistical Significance Level: 95%
 
What is statistical significance?
Statistical significance is the probability that the observed change in behavior is not due to chance alone. This is what Optimizely calls Chance to Beat Baseline.
 
What is statistical power?
Statistical power refers to the probability that your test will identify a statistically significant difference when such a difference actually exists. Power is the probability that you will reject the null hypothesis when you should, thereby avoiding a false negative.
 
The data Optimizely provides is actionable once the statistical significance and power conditions have been met.
Khattaab Khan
Director, Experience Optimization | BVAccel
Level 5