Measure ANNUAL cumulative uplift from all the A/B, MVT experiments
Hope you all are good.
We have been using optimizely for about a year now and finding it useful to get uplifts in various site sections.
However, I want to post this question out to all of you on HOW TO MEASURE A ANNUAL CUMULATIVE UPLIFT from all the experiements we have done.
in summary, I want to be able to say that we did 1000 experiments and got NET uplfit of X%, = #Orders = £Y revenue.
However uplifts during individual tests can be calculated and we can get that increased number of orders but how one can extrapolate the results of all the tests which have gone LIVE and get a (projected) cumulative incremental (delta) of orders.
Asking this as if one simply adds up all the experiment uplifts gone live on site, the number could possibly touch 500% increased conversion but in actuality we all do not observe that much increased conversion at the end of the year.
I and my colleague use a calculation, already but I want to know what others are doing to communicate the ANNUAL optimizely performance reconciliation.
If you have excel files, please do share the approach.
I hope i made clear what am chasing here. if not please feel free to ask.
Solved! Go to Solution.
“Asking this as if one simply adds up all the experiment uplifts gone live on site, the number could possibly touch 500% increased conversion but in actuality we all do not observe that much increased conversion at the end of the year.”
I’m not clear on this. Are you saying, based on your calculations, the before / after comparison shows you’ve driven a relative increase to conversion rate of 500%, but in actuality you aren’t observing that?
“I and my colleague use a calculation, already but I want to know what others are doing to communicate the ANNUAL optimizely performance reconciliation.”
It would be beneficial to the community if you shared how you and your colleagues currently calculate annual uplift
It's easy to show how your conversion rate increased within a year (using web analytics), but this includes many factors, like the pages and ads linking to your site, seasonality, changes in how you brand is perceived etc. – your A/B experiments are only one factor.
Isolating this factor is technically possible, but requires that a cohort of your visitors / users continues to see the original version of your site for the full year, without ever being exposed to any of your experiment variations. So, you'd pick e.g. 10% of your visitors and exclude them from all experiments, but track their conversion rate. You'd then run experiments with the remaining 90% – and make sure that the winning variation of each experiment is only ever implemented for those 90% of your visitors, not the ones in the 10% cohort who remain untouched:
After a year, you could then compare conversion rates of the control cohort and the remaining visitors who participated in experiments. In the example above, the 1% of improvement in the control cohort is due to other factors (e.g. your marketing team was more effective in bringing visitors to the site who were actually interested in the product). Hence, 3% improvement (difference between control cohort and experimentation cohort, not last year's numbers to this year's) can actually be attributed to only A/B experiments.
As you can see, this is a rather complex setup, typically causes some tech debt (you need to keep things running that could otherwise be turned off based on experiment findings) and there's a pretty hefty opportunity cost associated with not implementing winning experiences from your A/B experiment for the control cohort.
I'm aware of some customers who are doing this (often only for a part of their site) and some of the big players use this concept too. Just go ask some of your friends, you might find someone who is on the Facebook version from a year ago – I know first hand, my fiancé happens to be in a control cohort.
Hope this helps!
Thanks for pitching in.
A good approach indeed, but do you also know what is the technical setup for the control cohort? For Facebook people are logged in, but for most sites you don't have to be.
How would you address people switching from one device to another, cookie deletion and so on?
This doesn't sound doable for any site that doesn't require you to be logged in.
In this case, it seems to me that this is something that's doable for most sites, but shouldn't necessarily be done. The effort you put into this far outweighs the benefit of producing a figure at the end of the year for most companies. I personally would really only do this if there's an additional use case / benefit, other than just reporting.
It's doable for sites without login, too. Chances are, that you don't link your users / visitors across browsers and devices in web analytics either – and since this inaccuracy would be present in control as well as experimentation cohort, the data remains comparable. You'd basically set a cookie to "remember" which cohort a visitor belongs to. This could mean that the same visitor is part of a treatment group on their mobile device, but in the control cohort on desktop – or the other way around. It's not perfect, but as good as it gets, compared to the other data you work with (e.g. web analytics where the same issue exists).
Thanks for response.
Below is how we calculate extrapolated covnersion. the intest covnersion is based on actuals below is used for extrapolating the uplifts from diff Cohorts in to a Blended/Overall uplift.
- To calculate conversion growth we refer back to a Base conversion rate in X month (A)
- We then take the current traffic (B) and apply the base conversion A to get a numbers of orders (C) at the current traffic levels and base (A) conversion rate
- We then take all the uplifts from 'individual' experiments and get the TOTAL of incremental projected orders over current traffic.
- We add the total incremental orders (D) to (C) to get a total new order volume (E) and therefore we can calculate a relative conversion rate (F) based on the current traffic level (B)
- This therefore provides an cumulative uplift but ofcourse a bloated one.
- However, that there must be some cross-cannibalisation of all these initiatives, so we apply a -X% decelerator (F) which is based on something we call as Net ratio of impact ie dampen.
- And this therefore gives us our final conversion improvement (G) which more or less tallies with our current conversion rate.
–(A x B = C)
–C + D = E [adding incremental orders]
–E / B = F [new Conversion over current traffic B]
–F x (100%-X%) = F [dampened]
–(F – A)/A = G [uplift]
Thanks for you inputs.
We were doing exactly the same initially but as Marie pointed out could not carry on keeping a control cohort active forever due to continuous site improvement and challenges it posts. plus we as a company do not want any users to be devoid of better experience.
However, Thanks to your input, I got an idea to put up a test with the OLD basic site and journeys for just a week instead whole year. so we can quickly assess the uplift. that might well be a litmus paper.
Uplift hence evaluated may still be diluted as we made functional changes as well so wont be able to test the exact original journey/site.
Sounds like good middle ground, the concept I was describing is sound, but not the most practical approach. Once you get this up and running and have some first experience with it, feel free to share it here – I'd be interested to see how it goes.
We have had the some problem in the last year. We've done lots of a/b tests on our landing page with good results, but after a year conversion from visitors to signups is almost the some. We found the problem e.g. in testing group size. With 1000 people tested even with very big significance the result could change when we tested next 2 or 3 thousend of people.
Next things we found is that when we test some simple things like backgound color and even when we have good results it doesn't change enything in long term, but when we started do less but higher quality tests (things to test is picked base on reaserch and testing group is bigger) our conversion start growing now.
This is a question we hear a lot, and Toby's advice is great.
I have seen that companies who use the 'test the old site against the new site' tend to feel very confident in their attribution. TO do so, do something like this.
1. Save an instance of your status-quo at the beginning of a reporting period (yearly, quarterly)
2. At the end of the period, test the new site - the one that has all of your optimizations in the production code - and measure the two sites in a complete-site redirect a/b test.
Note: At this point you'll want to be sensitive to visitors that have seen your new site, so targeting to new visitors may be an effective way to keep your data clean.
This approach doesn't cleanly account for per-experiment attribution (identifying which variations and variables had direct impacted the overall performance) but does provide a very rigorous analysis of the reporting period's collection of optimizations.
Hope this helpful!