Rich-text Reply

Risk of ruining one test results by running several treatments on multiple pages of the site

mie 01-06-15

Risk of ruining one test results by running several treatments on multiple pages of the site

[ Edited ]

Does anyone have or has anyone developed a way or formula to assess the risk of having different treatments running on multiple pages of the site? 

For example if there is a treatment running on the product page that adds additional elements in the above the fold area and at the same time there is a more structure/visual hierarchy treatment running on the same page, then a assessment of risk formula should do the trick to understand impact

Level 1

Amanda 01-06-15

Re: Risk of ruining one test results by running several treatments on multiple pages of the site

Hi @mie - 


Great question. I am sure the Optiverse can provide some best practices on whether there is a formula you can use. 


In the meantime, here are some other discussions that highlight a similar topic. Both provide some great insight.





Hudson 01-08-15

Re: Risk of ruining one test results by running several treatments on multiple pages of the site

[ Edited ]

Hi @mie 


This is an interesting question, one with a short answer and a longer answer. 


The short answer, if I understand your given example correctly, is no, to my knowledge there isn't a one-size-fits-all mathematical formula that can take inputs of 


a. 'type of change' & 'type of page' A

b. 'type of change' & 'type of page' B


and generate a score or proxy for 'risk' you'd be taking for running those tests simultaneously. 


The longer answer is that a formula or system like the one you're describing would require a lot of assumptions and access to historical data, many of which would be largely unique to your operational circumstances, which is probably why there hasn't been a generalized risk formula developed for simultaneous tests. 


However, you can go a long way toward improving how your testing team operates by outlining some of these assumptions and what they mean to you:



  • By risk, do you mean the level of statistical 'noise' being introduced into conversion rate results by running more than one test? If so, you can define risk as '% of incremental conversion rate that we believe is due to influence from other test, not current test'.This hypothetical value should normally be 'counted out' due to randomized bucketing of visitors bet...
  • Is it the risk of mitigating the conversion rate win of one test with a loss from another test running simultaneously? '% conversion rate gain not realized due to contradictory results'  - it also should be 'counted out' due to randomized bucketing of visitors between experiments running simultaneously.
  • Is it the risk of the user seeing an inconsistent experience, by having one part of site dramatically different than another? This is a real concern, but one which still isn't highly quantifiable. If you have reason to believe that users may experience something so different in two simulatanous experiments that could make them think they've landed on a different site (dramatic look & feel differences), then you can stagger your test experiences so they don't run at the same time, or use JavaScript to create a Mutually Exclusive experiment targeting condition in Optimizely Audiences...


Opportunity Cost: The twin of risk is opportunity cost; if we were to not run these tests simultaneously in order to reduce your statistical noise, what is the potential loss of conversion rate improvement and insights we could find by running both? It's important to consider that you're just as, or more likely, to be able to learn more by testing more simultaneously rather than less. In my experience, the more prevalant hurdle to success in testing is the velocity and continuity of tests being ran, rather than confusion resulting from running more things at once. All else being equal, if our null hypothesis for both experiments is that no change on conversion rate will happen, and we assume that each experiment has an equally likely chance of improving conversion rate, then you're twice as likely to discover a meaningful change by running two tests instead of one. 


Estimated Conversion Rate Improvements or Losses: This is a factor highly unique to your circumstance; not every 'addition of elements in the above-the-fold' area of the page is created equal, so it is nearly impossible to accurately estimate what a given change will do to conversion rate. This is a large reason why it can be difficult to estimate risk formulaically, as a risk calculation must include some estimate of the behavioral change actually happening.


However, outlining minimum estimated conversion rate difference that you're hoping to achieve for a given test is essential to estimating sample size and how long to run a test, and a helpful thought exercise as you try to work through the impact of different changes you're looking to make, relative to one another. 


So, to recap, there is no cut and dry formula for estimating risk of running two simultaneous experiments. Yet, in the process of outlining some of the assumptions that would be critical to estimating such a formula, I believe you'll go a long way toward making smart decisions about the right cadence of tests to run on your site. 


Does this help address the question you're thinking of, @mie ? Are there any other contributors in the community who would like to weigh in? I must say, the idea of a system or formula that incorporates historical data and test parameters to estimate test impact is very appealing! Has anyone heard or thought of how such a system would work, either in estimating risk or otherwise helping guide optimization planning?