Culture: Hippo vs. Facts
i never thought that this might happen, but on friday it did:
We carried out several tests on a kind of personalisation. We took the location of the user into account while calculating package tour offers for teasers. All test showed the same result: the click-rate went down. My interpretation is the different price level, because by adding a parameter the price level rises. Regarding booking there were no conclusive results.
While discussing new personalisation ideas, we got to the point where i had to mention these results ( 3 diffentent tests ) in order to make us focus on different ideas for personalisation. At this point the hippo entered "the stage" and said: "I can not imagine, that these results are valid. I am expecting a completely different result. Only explanation: The test must be wrong."
Did you ever experience something like this? What did you do? I mean, the tests were really simple and there was obviously not much space to do something wrong ;-)
Have a nice weekend
Thanks for reaching out!
Sometimes the results that you get from your experiments can be surprising. You must be glad that you ran a test to find out what your visitor's behavior is like so you can take data-driven decision versus taking decisions based on assumptions.
As for the HiPPo that entered the room, your can re-assure him: Optimizely shows accurate results that are trusted everyday by some of the largest companies.
Let me know if there is anything anything that I can do to help,
I agree with @DavidS - use quantitative data to provide a stronger position against assumptions. In addition, if possible, I would suggest rerunning the experiment to see if you get different insight.
That's a tricky situation!
My first step is to empathize with your executive; as you've probably experienced, data discrepancies on digital advertising, analytics, etc., are widespread (Avinash Kaushik's 2008 article 'Ultimate Web Analytics Data Reconciliation Checklist' is still, sadly, highly relevant 8 years later).
This is an industry-wide problem, and puts undue accountability on the role of the development, PM, QA, and analyst stakeholders to constantly be proving data validity while also proving or disproving test hypotheses and other data trends.**
This (the data discrepancy issue) unfortunately gives those interpreting the data 'wiggle room' to selectively accept what the data is telling them, which is a limiting form of cognitive bias. (Which of course we all struggle with. In that Wikipedia list, you could ascribe your 'HiPPO' to no fewer than 4 of these biases. Semmelweis Reflex, Observer Expectancy Effect, Conservatism (belief revision), and most well-known, Confirmation bias. While to me this might mean we have some redundancy in our psychological models, it also is a fine indicator of just how prevalent this form of bias is.)
So, we may need to recognize that, to a degree, the 'cat may be out of the bag'. You might be unearthing a longer term, bigger issue // opportunity to drive cultural change, than you might think indicated by this single result. Bringing additional rigor to your process, effectively socializing the surprising results of experimentation via 'which test won' type games, may help, but cultural change takes time.
So, with all this being said, there's still some things you can do right now:
- I loved the suggestion to re-run the experiment. It will give you a chance to re-prove that the data is in fact correct, to be highly explicit with all parties about the assumptions underlying the test (such as the segment definitions and goal criteria).
- Acknowledge that even really good strategy needs to be iterated upon to be successful. (This is basically why A/B testing works and is so important). Let's assume your exec is correct, in that these segments and offers are strategically sound (It was a little unclear to me in your description exactly what you are working through). Let's remember that very small variables, such as the visual presentation of button colors, or the font size of prices, or the location of an image, can affect user behavior in surprising ways. Think and document some of the variables that could be improved or edited in this personalization idea (such as the language of the offers, the visual presentation in terms of styling, colors, etc., and the price level iteself), and use that list to propsoe iterations that can be further tested through.
- Set a time to walk through your executive through the ways the test 'may be wrong'. It is very important to be very clear and not suffering from bias yourself here, to understand
_ is the analytics data feeding the experiment result correct?
_ are the segments defined correctly?
_ what about the result is precisely so surprising or unintuitive to the HiPPO?
_ walk through the variations so that all parties are fully versed in the detail.
And as a word of advice, the more precise and streamlined you make this communication, the more quickly you may be able to come to an amicable next step.
So @CouchPsycho , there is certainly a lot at play here, and we as a group don't have all of the context necessary to figure it all out. I hope this response is helpful, and please continue to reach out to the community as you go through this process.
(To proactively fight this phenomenon, I'd recommend you document, socialize, and strictly adhere to a process that uses a QA checklist and strong QA methodology. Additionally, documenting and socializing experiment development process will both ensure quality and demonstrate rigor to the broader organization. Bringing procedural rigor through a consistent reporting framework, that consistently communicates results in a template fashion, at a consistent cadence, with agreed-upon expectations about how results will be acted on, should also help. However, this forward-thinking work won't solve your current problem, just set you up for long term success.)