What are good ways to explain statistical significance to colleagues?
There are many people in my organization that run tests - on the website, for ads, emails, and so forth. They are do a good job at coming up with good test ideas, but they do not always understand the math behind a test.
For example, a recent test 4 variants, received ~3,000 total views, and ~20 total conversions. The person running this test was excited to declare a winner, because the percentages were so much different. However, the actual numbers were so small that they cannot be statistically significant.
I'm trying to find how I can best communicate 1. what stat sig really means and 2. why more volume (views and conversions) is needed to gain stat sig. My colleagues have been testing for a long time and communicating this to them will basically show that most of their tests have not had stat sig. So, I'm trying to find a way to tactfully and effectively explain this, so they will continue to work with me/my team on future tests.
Have any of you faced similar situations? How could I communicate stat sig in a way that makes sense to non-analytical people?
Solved! Go to Solution.
i usually use three pictures to explain what it is and which threshold is impressive:
- Statistical Significance is the chance that the result is no generated by coincidence, which means even if 50% is reached that means that chances are even, that is coincidence or not. 66% mean that the possibility of not being a coincidentally result is twice as high as being it. And so on...
- Tossing-Coin-Examples often work especially with numbers you mentioned: Toss a coin 3.000 times 1520 times head and 1480 times number....is there really something special or wrong with the coin? ;-)
- To explain that 95% statistical significance is the minimum i use russian roulette: If 80% is enough to be sure, would you play russian roulette with one bullet and 4 empty slots? The chance of survival is 80%...
Transferring the issue to more common problems works best, from my experience....
Thanks @CouchPsycho for your response. Those are really good suggestions. I like the idea of using something they are familiar with to explain why statistical significance is important.
Do you ever have to explain how statistical significance is reached? I feel that my coworkers want to reach stat sig / 90-95% confidence, but they don't know what it takes to get there. Maybe the coin tossing analogy would help show that too. For example, out of 10 tosses you may have a high percentage to declare a "winning" side, but not enough to have stat sig; if you keep flipping the coin for 10,000+ times, the results should be ~50%.
Any thoughts on how to explain what thresholds are needed, and what else is needed to have stat sig?
For statistical significance, I usually go with "it's the chance that the observed affect is not due to chance" and "if we ran this test 100 times, it's the number of times we expect that the test would beat the control"
For "how to reach significance" I usually speak in terms of "power" - how powerful is the change? If the change is super powerful, you'll know much quicker than if the impact is very small. To illustrate, contrast two experiments where one is very powerful and the other is very insignificant.
- Powerful: suppose you had a page that did not have a call to action button on it and tested it against one that does. You would quickly see that the number of people clicking on the call to action is statistically significant in the version of the page that actually has the CTA.
- Insignificant - test the page that already has a CTA and change the button from "square corners" to "rounded corners". You would expect this to have a very small impact and if there was indeed any impact at all, it would take a tremendous number of users before you would be confident that the impact was in fact real..
Analytics and Testing Guru
@JasonDahlin s "powerful"-definition is great. I would like to illustrate it somehow by:
Do you know these tests in newspapers or for children where you have to find ten differences in two pictures which seem to be more or less identical? Transfer this to "screenshots" of your variations. The time it takes to find all the differences is an index for the contrast and therefore for the probability to reach significance fast.
The quicker you find all differences, the more powerful is the change, which means a good chance to reach significance fast.
If it takes you a long time to find them there probability is quite low.
You may also ask colleagues familiar with the side. Just show them the change and see, if they say something like "Oh, X has changed", "There is something new".... If they do not recognize any change quickly, there is no "power", no contrast and probably no effect at all.
One main problem is the "change the button color"-myth and that many people refer web A/B-testing to the great power of button-colors. One has just to "use the force" of it ;-)