How to choose the sample size in a study with two population means
Description
When designing a study, it is important to choose a sample size that is large enough to perform a useful test but that is also economically feasible. How we choose the sample size depends on what test we plan to run on the data from our study. Here, let’s say our data will be used to compare two population means. If we are planning such a study, how do we determine how large it should be in order for the test that compares the population means to have a certain power?
Related tasks:
Using statsmodels, in Python
Example: Let’s say we’re designing a study to assess the effectiveness of a new four-week exercise program for weight loss. Assume that weight loss in four-week exercise programs is normally distributed with a standard deviation of around 5 pounds. The goal is that the new exercise program will have a 4-pound higher weight loss than the average program. (Notice that we will be comparing the means of two populations, the weight loss in each of two programs.)
We choose a value $0 \leq \alpha \leq 1$ as the probability of a Type I error in our test that compares the two means. (Recall, Type I error is for a false positive, finding we should reject $H_0$ when it’s actually true). Let’s set $\alpha$ to be 0.05 here.
We choose a value $0 \leq \beta \leq 1$ as the probability of a Type II error (false negative, failing to reject $H_0$ when it’s actually false). Let’s set $\beta$ to be 0.2 here. The test’s power is $1-\beta$, or in this case, 0.8.
What should the sample size be for each group?
1
2
3
4
5
6
7
8
9
10
from statsmodels.stats.power import TTestIndPower
standard_deviation = 5
desired_increase = 4
alpha = 0.05
beta = 0.2
analysis = TTestIndPower()
analysis.solve_power( effect_size=desired_increase / standard_deviation,
power=1 - beta, alpha=alpha)
1
25.52457250047935
Our sample size needs to be 26 participants in order for the power of the study to be 80% with our specified parameters.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
Example: Let’s say we’re designing a study to assess the effectiveness of a new four-week exercise program for weight loss. Assume that weight loss in four-week exercise programs is normally distributed with a standard deviation of around 5 pounds. The goal is that the new exercise program will have a 4-pound higher weight loss than the average program. (Notice that we will be comparing the means of two populations, the weight loss in each of two programs.)
We choose a value $0 \le \alpha \le 1$ as the probability of a Type I error in our test that compares the two means. (Recall, Type I error is for a false positive, finding we should reject $H_0$ when it’s actually true). Let’s set $\alpha$ to be 0.05 here.
We choose a value $0 \le \beta \le 1$ as the probability of a Type II error (false negative, failing to reject $H_0$ when it’s actually false). Let’s set $\beta$ to be 0.2 here. The test’s power is $1-\beta$, or in this case, 0.8.
What should the sample size be for each group?
1
2
3
4
5
6
# sd = standard deviation = 5 pounds
# delta = desired increase = 4 pounds
# sig.level = alpha = 0.05
# power = 1 - beta = 1 - 0.20 = 0.80
# n = NULL so R computes it for us
power.t.test(n = NULL, delta = 4, sd = 5, sig.level = 0.05, power = 0.80)
1
2
3
4
5
6
7
8
9
10
Two-sample t test power calculation
n = 25.52463
delta = 4
sd = 5
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
Our sample size needs to be 26 participants in order for the power of the study to be 80% with our specified parameters.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
- Julia
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.