# How to do a hypothesis test for the difference between two proportions

## Description

When dealing with qualitative data, we typically measure what proportion of the population falls into various categories (e.g., which religion a survey respondent adheres to, if any). We might want to compare two proportions by measuring their difference, and asking whether it is equal, greater, or less than zero. How can we perform such a test?

## Using SciPy, in Python

View this solution alone.

We will use some fake data in this example, but you can replace it with your real data. Imagine we conduct a survey of people in Boston and of people in Nashville and ask them if they prefer chocolate or vanilla ice cream. We get data like the following.

City Prefer chocolate Prefer vanilla Total
Boston 60 90 150
Nashville 85 50 135

We want to compare the proportions of people from the two cities who like vanilla.

Let $\bar{p}_1$ represent the proportion of people from Boston who like vanilla and $\bar{p}_2$ represent the proportion of people from Nashville who like vanilla.

1
2
3
4
n1 = 150         # number of observations in sample 1
n2 = 135         # number of observations in sample 2
p_bar1 = 90/150  # proportion in sample 1
p_bar2 = 50/135  # proportion in sample 2


We choose a value $0 \le \alpha \le 1$ as our Type 1 error rate. For this example, we will use $\alpha=0.05$.

### Two-tailed test

In a two-tailed test, the null hypothesis states that the difference between the two proportions equals a hypothesized value; let’s choose zero, $H_0: \bar{p}_1 - \bar{p}_2 = 0$. We perform this test by computing a test statistic and $p$-value as shown below, then comparing the $p$-value to our chosen $\alpha$.

1
2
3
4
5
6
7
import numpy as np
p_bar = (90 + 50) / (150 + 135)                   # overall proportion
std_error = np.sqrt(p_bar*(1-p_bar)*(1/n1+1/n2))  # standard error
test_statistic = (p_bar1 - p_bar2)/std_error      # test statistic

from scipy import stats
2*stats.norm.sf(abs(test_statistic))              # two-tailed p-value

1
0.00010802693662804402


Our $p$-value, 0.000108, is smaller than $\alpha$, so we can reject the null hypothesis and conclude that the difference between the two proportions is different from zero.

But we did not need to compare the difference to zero; we could have used any hypothesized difference for comparison. Let’s repeat the above test, comparing the difference to $0.15$ instead, as an example.

1
2
3
4
5
6
7
8
import numpy as np
hyp_diff = 0.15                                            # hypothesized difference
std_error = np.sqrt(p_bar1*(1-p_bar1)/n1
+ p_bar2*(1-p_bar2)/n2)                  # standard error
test_statistic = ((p_bar1 - p_bar2) - hyp_diff)/std_error  # test statistic

from scipy import stats
2*stats.norm.sf(abs(test_statistic))                       # two-tailed p-value

1
0.16744531573658772


Our $p$-value, 0.1674, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between these two proportions is significantly different from 0.15.

### Right-tailed test

In a right-tailed test, the null hypothesis states that the difference between the two proportions is less than or equal to a hypothesized value. Let’s begin by using zero as our hypothesized value, $H_0: \bar{p}_1 - \bar{p}_2 \le 0$.

We repeat some code below that we’ve seen above, just to make it easy to copy and paste the example elsewhere.

1
2
3
4
5
6
7
import numpy as np
p_bar = (90 + 50) / (150 + 135)                   # overall proportion
std_error = np.sqrt(p_bar*(1-p_bar)*(1/n1+1/n2))  # standard error
test_statistic = (p_bar1 - p_bar2)/std_error      # test statistic

from scipy import stats
stats.norm.sf(abs(test_statistic))                # right-tailed p-value

1
5.401346831402201e-05


Our $p$-value is smaller than $\alpha$, so we can reject the null hypothesis and conclude that the difference between the two proportions is significantly greater than zero.

But we did not need to compare the difference to zero; we could have used any hypothesized difference for comparison. Let’s repeat the above test, comparing the difference to $0.15$ instead, as an example.

1
2
3
4
5
6
7
8
import numpy as np
hyp_diff = 0.15                                            # hypothesized difference
std_error = np.sqrt(p_bar1*(1-p_bar1)/n1
+ p_bar2*(1-p_bar2)/n2)                  # standard error
test_statistic = ((p_bar1 - p_bar2) - hyp_diff)/std_error  # test statistic

from scipy import stats
stats.norm.sf(abs(test_statistic))                         # right-tailed p-value

1
0.08372265786829386


Our $p$-value, 0.0837, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between these two proportions is significantly greater than 0.15.

### Left-tailed test

In a left-tailed test, the null hypothesis states that the difference between the two proportions is greater than or equal to a hypothesized value. Let’s begin by using zero as our hypothesized value, $H_0: \bar{p}_1 - \bar{p}_2 \ge 0$.

We repeat some code below that we’ve seen above, just to make it easy to copy and paste the example elsewhere.

1
2
3
4
5
6
7
import numpy as np
p_bar = (90 + 50) / (150 + 135)                   # overall proportion
std_error = np.sqrt(p_bar*(1-p_bar)*(1/n1+1/n2))  # standard error
test_statistic = (p_bar1 - p_bar2)/std_error      # test statistic

from scipy import stats
stats.norm.sf(-test_statistic)                    # left-tailed p-value

1
0.999945986531686


Our $p$-value, 0.9999, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between the two proportions is significantly less than zero.

But we did not need to compare the difference to zero; we could have used any hypothesized difference for comparison. Let’s repeat the above test, comparing the difference to $0.15$ instead, as an example.

1
2
3
4
5
6
7
8
import numpy as np
hyp_diff = 0.15                                            # hypothesized difference
std_error = np.sqrt(p_bar1*(1-p_bar1)/n1
+ p_bar2*(1-p_bar2)/n2)                  # standard error
test_statistic = ((p_bar1 - p_bar2) - hyp_diff)/std_error  # test statistic

from scipy import stats
stats.norm.sf(-test_statistic)                             # left-tailed p-value

1
0.9162773421317061


Our $p$-value, 0.91627, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between these two proportions is significantly less than 0.15.

See a problem? Tell us or edit the source.

## Solution, in R

View this solution alone.

We will use some fake data in this example, but you can replace it with your real data. Imagine we conduct a survey of people in Boston and of people in Nashville and ask them if they prefer chocolate or vanilla ice cream. We get data like the following.

City Prefer chocolate Prefer vanilla Total
Boston 60 90 150
Nashville 85 50 135

We want to compare the proportions of people from the two cities who like vanilla.

Let $\bar{p}_1$ represent the proportion of people from Boston who like vanilla and $\bar{p}_2$ represent the proportion of people from Nashville who like vanilla.

1
2
3
4
n1 <- 150
n2 <- 135
p_bar1 <- 90/150
p_bar2 <- 50/135


We choose a value $0 \le \alpha \le 1$ as our Type 1 error rate. For this example, we will use $\alpha=0.05$.

### Two-tailed test

In a two-tailed test, the null hypothesis states that the difference between the two proportions equals a hypothesized value; let’s choose zero, $H_0: \bar{p}_1 - \bar{p}_2 = 0$. We perform this test by computing a test statistic and $p$-value as shown below, then comparing the $p$-value to our chosen $\alpha$.

1
2
3
4
p_bar <- (90 + 50) / (150 + 135)                 # overall proportion
std_error <- sqrt(p_bar*(1-p_bar)*(1/n1+1/n2))   # standard error
test_statistic <- (p_bar1 - p_bar2)/std_error    # test statistic
2*pnorm(q = test_statistic, lower.tail = FALSE)  # two-tailed p-value

1
[1] 0.0001080269


Our $p$-value, 0.000108, is smaller than $\alpha$, so we can reject the null hypothesis and conclude that the difference between the two proportions is different from zero.

But we did not need to compare the difference to zero; we could have used any hypothesized difference for comparison. Let’s repeat the above test, comparing the difference to $0.15$ instead, as an example.

1
2
3
4
5
hyp.diff = 0.15                                             # hypothesized difference
std_error <- sqrt(p_bar1*(1-p_bar1)/n1
+ p_bar2*(1-p_bar2)/n2)                     # standard error
test_statistic <- ((p_bar1 - p_bar2) - hyp.diff)/std_error  # test statistic
2*pnorm(q = test_statistic, lower.tail = FALSE)             # two-tailed p-value

1
[1] 0.1674453


Our $p$-value, 0.1674, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between these two proportions is significantly different from 0.15.

### Right-tailed test

In a right-tailed test, the null hypothesis states that the difference between the two proportions is less than or equal to a hypothesized value. Let’s begin by using zero as our hypothesized value, $H_0: \bar{p}_1 - \bar{p}_2 \le 0$.

We repeat some code below that we’ve seen above, just to make it easy to copy and paste the example elsewhere.

1
2
3
4
p_bar <- (90 + 50) / (150 + 135)                 # overall proportion
std_error <- sqrt(p_bar*(1-p_bar)*(1/n1+1/n2))   # standard error
test_statistic <- (p_bar1 - p_bar2)/std_error    # test statistic
pnorm(q = test_statistic, lower.tail = FALSE)    # right-tailed p-value

1
[1] 5.401347e-05


Our $p$-value is smaller than $\alpha$, so we can reject the null hypothesis and conclude that the difference between the two proportions is significantly greater than zero.

But we did not need to compare the difference to zero; we could have used any hypothesized difference for comparison. Let’s repeat the above test, comparing the difference to $0.15$ instead, as an example.

1
2
3
4
5
hyp.diff = 0.15                                             # hypothesized difference
std_error <- sqrt(p_bar1*(1-p_bar1)/n1
+ p_bar2*(1-p_bar2)/n2)                     # standard error
test_statistic <- ((p_bar1 - p_bar2) - hyp.diff)/std_error  # test statistic
pnorm(q = test_statistic, lower.tail = FALSE)               # right-tailed p-value

1
[1] 0.08372266


Our $p$-value, 0.0837, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between these two proportions is significantly greater than 0.15.

### Left-tailed test

In a left-tailed test, the null hypothesis states that the difference between the two proportions is greater than or equal to a hypothesized value. Let’s begin by using zero as our hypothesized value, $H_0: \bar{p}_1 - \bar{p}_2 \ge 0$.

We repeat some code below that we’ve seen above, just to make it easy to copy and paste the example elsewhere.

1
2
3
4
p_bar <- (90 + 50) / (150 + 135)                 # overall proportion
std_error <- sqrt(p_bar*(1-p_bar)*(1/n1+1/n2))   # standard error
test_statistic <- (p_bar1 - p_bar2)/std_error    # test statistic
pnorm(q = test_statistic, lower.tail = TRUE)     # left-tailed p-value

1
[1] 0.999946


Our $p$-value, 0.9999, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between the two proportions is significantly less than zero.

But we did not need to compare the difference to zero; we could have used any hypothesized difference for comparison. Let’s repeat the above test, comparing the difference to $0.15$ instead, as an example.

1
2
3
4
5
hyp.diff = 0.15                                             # hypothesized difference
std_error <- sqrt(p_bar1*(1-p_bar1)/n1
+ p_bar2*(1-p_bar2)/n2)                     # standard error
test_statistic <- ((p_bar1 - p_bar2) - hyp.diff)/std_error  # test statistic
pnorm(q = test_statistic, lower.tail = TRUE)                # left-tailed p-value

1
[1] 0.9162773


Our $p$-value, 0.91627, is greater than $\alpha$, so we cannot reject the null hypothesis and cannot conclude that the difference between these two proportions is significantly less than 0.15.