How to Define Success and Failure in Website Testing
It’s never enough to just run a website test. You have to set goals, set expectations, and understand the results of your test so you can optimize based on key learnings. All of this is fundamental to effective post-click marketing. Here Jonghee Jo from JPMorgan Chase details how to define success and failure when running tests.
It is critical to clearly define “Success” and “Failure” measures before running a test. Without them you might get stumped especially when different metrics show contradicting results. If you are new to testing, I recommend you to stick to simple, easy-to-understand measures such as revenue-per-visit or download/registration ratio. These simple measures will be more effective than complex measures when you promote the impact of your test within the organization initially. However, as you run more and more tests down the road, you will need to define success and failure more precisely.
Don’t be disappointed when the lifts get smaller
When you run a test the first few times there are usually lots of areas that could be optimized dramatically. After you cover major high-impact areas through initial test runs, you will realize subsequent tests from remaining low-impact areas will not bring lifts quite as big. Does this mean latter tests are less successful than former tests? I don’t think so. I don’t regard smaller lift as failure or even as a smaller success. Don’t get me wrong – I love bigger lifts and higher financial impact. However, the goal of testing is not just getting the additional sales/profit/conversion, but understanding your customers’ behavior. If you identified which areas of the site have a stronger impact (and which areas don’t), you definitely achieved some success through the test.
Don’t fall into the Statistical Significance “Trap”
When you run a test (especially multivariate test), some of the test versions may not show a statistically significant difference against the control even after quite a long test period. In this situation, you might be tempted to extend the test to reach statistical significance since you will want to announce the winner between two versions with confidence. My advice is not to extend the test just for the sake of reaching statistical significance. Sometimes a new version will produce practically zero difference against an existing version. In most cases, extending the test does not change this conclusion. Just get over it and think of it as a valuable learning from the test. Your testing goal is not “reaching significance for all test versions”, but should be “learning as much and as quickly as you can from the test”.
Consider advanced success measures
To understand your customers’ behavior deeply, it is always helpful to look at a few different metrics. You will have a more complete picture of your customer when you review these metrics:
- Engagement Metrics: Number of visits during specific time periods, Average Visit Time, Return Visit Rate etc…
- Advanced Financial Metrics: Profit per Visit, Lifetime Customer Value etc…
- Segmented Metrics: By Geography, By Recency, By Frequency etc…
Hope this helps you to define the success and failure of your tests more effectively. Do you have any other ideas/examples/issues? Please share them via the comments.


