False Discovery Rate and Audience Segments
In this article on Optimizely's StatsEngine it notes, "Because of false discovery rate control, you can now test as many goals and variations in an experiment as you want, without increasing your chance of statistical error."
Beyond goals and variations, are audience segments accounted for under false discovery rate control?
Solved! Go to Solution.
This is a great question. In creating stats engine for A/B Testing we decided not to include audience segments in false discovery rate control. Here’s why:
On the one hand, it would be reasonable to include audiences in an experiment’s false discovery rate calculation. Each audience segment adds another layer of tests to your experiment, meaning if you were testing 5 variations on 2 goals, and looked at results on 2 audiences, you are now running 20 A/B tests. If you used a testing procedure with a 5% chance of a false positive on each one, your chance of seeing at least one spurious result is now a much higher 64%. Including all audiences, variations and goals in a false discovery rate calculation would protect you from this increased chance of a false discovery by instead directly reporting the chance of any winning / losing result being false among all audience segments.
On the other hand, adding a third dimension to the number of tests in false discovery rate calculations could make it take a long time to see conclusive results. If one was not careful, adding 10 segments now increases the number of A/B tests to 100, meaning it would take quite a few visitors to have enough information to be confident in a low false discovery rate. What’s more, since segments are only a fraction of your overall site’s traffic, it would take even longer to achieve the same number of unique visitors in each segments with more, smaller segments.
Since the current A/B Testing platform does not currently expose this tradeoff in a clear way, we decided to silo false discovery rate calculations to goals and variations within an audience segment.
My advice for addressing the multiple testing problem associated with many audience segments is to first take a little time to think in advance of the segments that will be most interesting to you for your experiment. This will help remove audience segments that you didn’t believe would be interesting in the first place. Second, be aware that the statistical significance calculations are siloed for each segment and what the means. It means that if you are calling winners at a 90% significance level, then less than 10% of your winners and losers for audience segment 1 are false, less than 10% of your winners and losers for audience segment 2 are false, and so forth. This does not mean that less than 10% of all your winners and losers are false. What this means practically is that you should try to interpret results from each audience segment separately.
As a final note, multiple audiences will be a big part of our new personalization product that was just announced at Opticon 2015, and we do plan to take audiences into account in false discovery rate calculations in a clear and transparent way, exactly as you suggest.
Hope this helps.
Statistician at Optimizely