# How to choose the sample size in a study with two population means

## Description

When designing a study, it is important to choose a sample size that is large enough to perform a useful test but that is also economically feasible. How we choose the sample size depends on what test we plan to run on the data from our study. Here, let’s say our data will be used to compare two population means. If we are planning such a study, how do we determine how large it should be in order for the test that compares the population means to have a certain power?

## Using statsmodels, in Python

View this solution alone.

Example: Let’s say we’re designing a study to assess the effectiveness of a new four-week exercise program for weight loss. Assume that weight loss in four-week exercise programs is normally distributed with a standard deviation of around 5 pounds. The goal is that the new exercise program will have a 4-pound higher weight loss than the average program. (Notice that we will be comparing the means of two populations, the weight loss in each of two programs.)

We choose a value $0 \leq \alpha \leq 1$ as the probability of a Type I error in our test that compares the two means. (Recall, Type I error is for a false positive, finding we should reject $H_0$ when it’s actually true). Let’s set $\alpha$ to be 0.05 here.

We choose a value $0 \leq \beta \leq 1$ as the probability of a Type II error (false negative, failing to reject $H_0$ when it’s actually false). Let’s set $\beta$ to be 0.2 here. The test’s power is $1-\beta$, or in this case, 0.8.

What should the sample size be for each group?

1
2
3
4
5
6
7
8
9
10
from statsmodels.stats.power import TTestIndPower

standard_deviation = 5
desired_increase = 4
alpha = 0.05
beta = 0.2

analysis = TTestIndPower()
analysis.solve_power( effect_size=desired_increase / standard_deviation,
power=1 - beta, alpha=alpha)

1
25.52457250047935


Our sample size needs to be 26 participants in order for the power of the study to be 80% with our specified parameters.

See a problem? Tell us or edit the source.

## Solution, in R

View this solution alone.

Example: Let’s say we’re designing a study to assess the effectiveness of a new four-week exercise program for weight loss. Assume that weight loss in four-week exercise programs is normally distributed with a standard deviation of around 5 pounds. The goal is that the new exercise program will have a 4-pound higher weight loss than the average program. (Notice that we will be comparing the means of two populations, the weight loss in each of two programs.)

We choose a value $0 \le \alpha \le 1$ as the probability of a Type I error in our test that compares the two means. (Recall, Type I error is for a false positive, finding we should reject $H_0$ when it’s actually true). Let’s set $\alpha$ to be 0.05 here.

We choose a value $0 \le \beta \le 1$ as the probability of a Type II error (false negative, failing to reject $H_0$ when it’s actually false). Let’s set $\beta$ to be 0.2 here. The test’s power is $1-\beta$, or in this case, 0.8.

What should the sample size be for each group?

1
2
3
4
5
6
# sd = standard deviation = 5 pounds
# delta = desired increase = 4 pounds
# sig.level = alpha = 0.05
# power = 1 - beta = 1 - 0.20 = 0.80
# n = NULL so R computes it for us
power.t.test(n = NULL, delta = 4, sd = 5, sig.level = 0.05, power = 0.80)

1
2
3
4
5
6
7
8
9
10
Two-sample t test power calculation

n = 25.52463
delta = 4
sd = 5
sig.level = 0.05
power = 0.8
alternative = two.sided

NOTE: n is number in *each* group


Our sample size needs to be 26 participants in order for the power of the study to be 80% with our specified parameters.