A / B tests are a frequently used research technique that provides test results in a simple and quick way. Its apparent simplicity can, however, lead to misleading conclusions, which, transferred to the final project, will prove completely ineffective. We will try to suggest how to carry out A / B tests to reach important conclusions for you and implement changes that will significantly change the behavior of your users.
To shortly define the classic A / B test, it should be said that it is a test method in which only one variable was tested. By variable, we mean here the element of the page that can be changed. This means that a given variable occurs in several variants (variations), eg they are different colors of the same button.
But a test that is not quality assured is a real risk. The effort – compared to the other process steps of a testing project – is not that high, but in case of doubt it can be the deciding factor for the success or failure of a test. It is therefore important that quality assurance as an essential process step of an (A / B) test is already taken into account in the project planning.
The following aspects may be responsible for the fact that quality assurance is not done properly:
As stated above, there are a lot of visual and functional deviations that can occur. As a result, an effect predicted by the hypothesis may not occur or another unwanted effect may take its place.
Of course, in addition to the optical and functional testing of the variants, it must also be ensured that the goals defined in the test tool have been set up correctly and triggered accordingly.
Quality directly influences the costs in the company. Using cheaper parts and devices can cut costs in the short term, but long-term effects can be significantly more expensive. Prevented progressions in the checkout process, or no longer displayed call to actions might seem like something small but it might make your customer leave your shop forever. Quality has a direct impact on customer satisfaction. If the company produces a high quality product, satisfied customers will come back and might even bring friends with him. In addition, dissatisfied customers express more criticism of a company with quality problems.
A good quality saver is able to prevent it from happening. Quality assurance is much more than just looking for the 5 differences between the two images. It is important to remember that proper quality assurance cannot be rushed, especially when it comes to proper device and browser testing after your A/B test has been set up. What is more, you need to pay attention and double check that your goals are set up and working properly, not to mention tracking. When it comes to A/B testing, rushing not an option because it can result into a sword with two edges.
In A-B tests, you test a new hypotheses against your original page – nothing more, nothing less. These should be clearly defined and clearly demarcated. For example, you might want to test whether integrating a CEO’s quote significantly improves or degrades the landing page conversion rate. The null hypothesis would be: the integration of the quote has no significant impact on the conversion rate. As a (directed) alternative hypothesis you would formulate: The integration of the quote leads to a significantly higher conversion rate.
Too small a sample is one of the biggest problems with A-B testing – and that’s why the tests are not meaningful or inappropriate for small online stores and websites. Small samples lead to high fluctuations and distortions in the results. The smaller the sample, the higher the probability of estimation error is assumed. It often happens that the test or the program used retains the null hypothesis, although the alternative hypothesis actually applies.
Not only too small, but too large samples can be a problem. With interval-scaled data – for example, when measuring the length of stay – very small mean differences also become significant in a very large sample. You can counteract this problem by calculating the effect size and the optimal sample size.