Quantifying actual impact of a test
I was told that if I run a test and see a 20% improvement compared to control at 90+% confidence, all it tells me is the variant is better than control, but doesn't mean I would see the 20% lift if I ramp the variant to 100%.
Can someone please explain the technical reason behind this? Thanks.
The "technical reason" has to do with semantics of language.
Suppose the weather man predicts a 90% chance of getting 20 inches of snow.
Does this mean you will get 20 inches of snow?
No. It only means you can be pretty sure (90% sure) that you will get snow and that the best estimate is that it will be around 20 inches.