Power and Level in A/B Test

Qingchuan Lyu
2 min readOct 26, 2020

In this post, we’ll learn what is power, significant level, type I/II errors and how they relate to each other.

Power

Once we setup the null hypothesis and alternative hypothesis, we will collect data and compute test statistics. The power function (or simply “power) is the probability of rejecting the null hypothesis after seeing the data.

For example, your null hypothesis is population mean is no less than 22, and you alternative hypothesis is population mean < 22. In this case, you collect data of n samples, and compute the sample mean. You will reject the null hypothesis if your sample mean is too small compared to 22, say if sample mean < 22-c for some constant c. Then, the power of this test is the probability that the sample mean is less than 22, i.e., the CDF of the sample mean cut off at 22-c.

Type I and Type II Errors

Type I error is the probability of rejecting the null hypothesis when the null hypothesis is true. That says, it’s the power when the null hypothesis is true. Type I error is also called “alpha level.”

Type II error is the probability of not rejecting the null hypothesis when the null hypothesis is false. That says, it’s 1-power when the null hypothesis is false. Type II error is also called “beta level.”

The graph below summarizes alpha and beta levels:

Blue line is the null hypothesis distribution; red line is the alternative hypothesis distribution. Critical value corresponds to significance level explained below.

The red shaded area shows the probability of not rejecting the null hypothesis when the alternative hypothesis is true — Beta level; the blue shaded area shows the probability of rejecting the null hypothesis when the null hypothesis is true — alpha level.

Problem: we want to keep both Type I and Type II errors low. However, there’s a tradeoff between them. This leads to significant level.

Significant Level

Significant Level is the upper bound on Type I error. By setting this upper bound on Type I error, we then minimize Type II error. Significant Level is pre-determined as 0.05 usually. In the above graph, critical value is the value of test statistic corresponding to the significance level. If we let alpha=0.05, then the critical value is the value of sample mean where its CDF is 95% at this value.

--

--

Qingchuan Lyu

Data Engineering, Causal Inference & Predictive Analysis