As you well know, the conversion rate is the ratio of the number of users who performed the planned action (they bought a product, sent an offer inquiry form, subscribed to the newsletter) to the number of all users. This metric is expressed in percentages and can not be more than 100% for simple reasons.

Optimizing websites for conversion is an activity or a set of actions that aim to increase the conversion rate. Its effect is that the website converts not 3 out of 100 users, but 6 out of 100, which is twice as good in exchanging visitors for customers. What’s more, there is no limit to the conversion rate (of course, it can not be greater than 100%), however, the further into the forest, the more difficult it is. Many aspects affect the size of the conversion rate.The task of tools used to test websites for conversion is to choose the winning copy of the page, which is characterized by a higher conversion rate. On what basis is the winning version chosen?

According to Google Adwords ‘The statistically significant difference is the one whose accidental occurrence is unlikely.’

More generally, we can say that a statistically significant result is one that is tested on the basis of a test (smaller sample) with a certain (usually high) probability in reality (in the whole population) – we can be sure that the test result will translate into reality.

What, therefore, can decide on the statistical significance of the optimization test?

Certainly, this is the number of people who participate in the experiment (statistical sample). The more people “confirm” during the testing phase that a given copy of the page is characterized by a higher conversion rate, the more likely it will be in reality.

However, we can not wait forever for as many people as possible to take part in the experiment – the test should not last longer than a month. We must decide as soon as possible which side will bring us more income and implement it immediately. That is why knowledge about the moment when the result is statistically significant is so important.

The first decision we have to make is the type of test that we will use to investigate the phenomenon in question. In the case of conversion optimization, it will be a proportion test.

Then we have to put forward hypotheses regarding our test – there are always two hypotheses: zero (H0) and alternative (H1). The null hypothesis is the statement from which we go e.g. “conversion factors in both versions of the page (current and tested) are the same”, i.e. the proposed changes will not affect the conversion rate in reality. However, we want the tested version of the site to be characterized by a higher conversion rate, hence the alternative hypothesis is that “the conversion rate of the tested page is greater than the conversion rate of the current copy”. It is not difficult to guess that we will be interested in rejecting the null hypothesis for an alternative hypothesis.

Depending on which alternate hypothesis we choose (we have two more choices: “the conversion rate of the tested page is smaller” or simply “conversion rates of the tested and current copies are different”) use a different test statistic pattern that will allow us to make a verification decision (can we reject the null hypothesis).

Depending on whether the value of calculated (testing) statistics will be greater or lesser than the critical value of the test (this one is read from the tables of the standard normal distribution based on the characteristics of the sample and the type of the alternative hypothesis). The verification decision is made on the basis of a comparison of the value of the calculated statistics and the critical value of the test read from the tables.

Before making the verification decision (about the rejection or lack of grounds for rejecting the null hypothesis), we must also determine the level of significance α (alpha), which tells us with what probability the given decision will be made. By default, a 5% level of significance is assumed, which means that we have 95% probability of taking the correct verification decision, eg rejecting H0 in favor of H1.

You have certainly noticed that the higher the value of test stats, the better for us, because we have a better chance that the null hypothesis will be rejected.

In the end p1 and p2 are nothing but conversion rates for individual page copies – the bigger the difference, the higher the calculated test statistic value, and the greater the chance that the difference in conversion rates is statistically significant, the size of the sample, i.e. the number of pageviews of individual page copies n1 and n2 – the more page views, the more extensive the statistic calculated, this is also logical because the more people participate in the experiment, the more likely it is that the results will translate into reality.

One of the most common scientific procedures is comparing two groups. With such data, the researchers ask various questions: How much is one group different from the other? Can you be sure that the difference is non-zero? How reliable are we the magnitude of this difference? The answer to these questions is difficult, because the data is contaminated by random variability (despite the researchers’ efforts to minimize external data impacts). In addition, experimental tests are not able to provide unambiguous, zero-one answers, therefore, when interpreting data, we must rely on statistical methods of probabilistic reasoning.

In the theory of probability, which is the basis of statistical inference, there are two schools of understanding of what the frequency and the Bayesian probabilities are in practice. The difference between them is that according to the first of them, the probability of a random event is actually represented by the frequency of occurrence of this event in a sufficiently large number of identical attempts, while the Bayesian approach, roughly speaking, understands by the term probability the measure of rational belief, that the event will occur. Frequency probability definition is problematic for several reasons: it defines the concept of probability by referring to it itself (because identical attempts are those for which the chance of occurrence of a given event is the same), does not apply to unique events, concerns in fact border behavior, without defining however, in a strict way what the “boundary” or a sufficiently large number of attempts would actually be considered. The understanding of probability in the Bayesian way is free from the above problems.

According to the Bayesian school of understanding probability, in order to make statistical inference, we assume at the beginning a certain a priori probability, which is a measure of the rational belief that a given event will occur (it can be, for example, the frequency of occurrence assessed from the literature). Then, by doing experiments, we modify them, obtaining the so-called probability a posteriori.