02-20-19

I've found some helpful blog posts on how to interpret confidence intervals in the Knowledge Base, but they all seem to tiptoe around how they may or may not be used erroneously by business users.

In this post, it says of the confidence interval in the image below that, "When implementing this variation, you can say, "We implemented a test result that we are 90% confident is better than -1.35% worse, but not more than 71.20% better"

I get that this description is factually accurate, but who could utter a sentence like that in a meeting without sounding like a dipshit?

Looking at these results what I would really want to say is: Although this is technically inconclusive, if you implemented it, it's much more likely to produce a positive result than a negative one.

My question is, would this be accurate? It's my understanding that in statistics, all values within a confidence interval are equally likely. So since there are more values to the right of zero, isn't a positive value more likely?

I suspect it's not that simple, but I'd like to know why. Thanks!

Level 2

JasonDahlin 02-21-19

[ Edited ]

The values inside the confidence interval are not equally likely to occur. With normal distribution, the value in the middle is most likely, the values on the edges are least likely.

To get a confidence interval, you look at the distribution of all results and ask "what is the range that captures 95% of the values". At 95%, your range (interval) captures almost all of the first 2 standard deviations.

If the data has a normal distribution, it will look like a traditional bell curve and the expected improvement marker will be in the middle of the interval. If the data is skewed, the marker will be closer to one of the edges.

With that said, a non-math non-tester would want to hear something like "Based on the data collected so far, it looks like we will see a 34% improvement, but the data set is still small, so that's not a guarantee".

If they are asking for a more detailed response, such that you need to mention the entire range, I would add "We need to run the test longer to create a smaller range - currently it could be as low as =1% or as high as +72%, but the expectation is around 34%"

edit: Based on your post, I must sound like a dipshit in every meeting. But, I don't have meetings to review test results with people who are not interested in testing or project accountability and we expect that everyone be prepared to discuss using the terms that are appropriate for the topic - so they need to know how all of this works and if they don't they are expected to talk with me after the meeting or look it up on their own.

--Jason Dahlin
Analytics and Testing Guru

Experimentation Hero
Leandra_Kim 02-21-19

Hi Ahinton,

This is a great topic and I agree with JasonDahlin's explanation as the confidence internval is designed to give you a range of potential uplift or drag in your conversions. This is especially insightful if you need to take action before a test has reached statistical significance.

In your example, if you were to roll this variation out into production now you could have a potential conversion lift as high as 71% or a possible drag of -1.35% based on the data that has been gathered. The potential drag is marginal compared to the potential uplift, so you might decide to move forward but the true improvement will lie somewhere in between as Jason mentioned. As a test gains more data and the statistical signficance percentage increases, the confidence interval will become more narrow and move to the right of the 0 to give you a more accurate range of lift.
Optimizely
ahinton 02-27-19