How to compute a confidence interval for the difference between two proportions

Description

When dealing with qualitative data, we often want to construct a confidence interval for the difference between two population proportions. For example, if we are trying a drug on experimental and control groups of patients, we probably want to compare the proportion of patients who got well in one group versus the other.

How do we make such a comparison using a confidence interval?

Related tasks:

Using SciPy, in Python

View this solution alone.

Here is some fake data for the purposes of this illustration. Let’s say we conduct a survey of people in Boston and of people in Nashville and ask them if they prefer chocolate or vanilla ice cream. We want to compare the proportions of people from the two cities who like vanilla.

Out of 150 people in Boston surveyed, 90 prefer vanilla.
Out of 135 people in Nashville surveyed, 50 prefer vanilla.

We’ll let $\bar{p_1}$ represent the proportion of people from Boston who like vanilla and $\bar{p_2}$ represent the proportion of people from Nashville who like vanilla. You can replace the code for this fake data below with your real data.

# number of observations in the samples
n1 = 150
n2 = 135
# proportions in the two samples
p_bar1 = 90/150
p_bar2 = 50/135

We now compute the confidence interval using tools from SciPy and NumPy.

# Find the critical value to compute the confidence interval
from scipy import stats
alpha = 0.05       # replace with your chosen alpha (here, a 95% confidence level)
critical_value = stats.norm.ppf(1-alpha/2)

# Compute the standard error of the proportions
import numpy as np
std_error = np.sqrt( p_bar1*(1-p_bar1)/n1 + p_bar2*(1-p_bar2)/n2 )

# Compute the upper and lower bounds of the confidence interval
upper_bound = (p_bar1 - p_bar2) + critical_value*std_error
lower_bound = (p_bar1 - p_bar2) - critical_value*std_error
lower_bound, upper_bound

(0.11657216971616415, 0.3426870895430951)

The confidence interval for the difference between these two proportions is $[0.11657, 0.34269]$.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Solution, in R

View this solution alone.

Out of 150 people in Boston surveyed, 90 prefer vanilla.
Out of 135 people in Nashville surveyed, 50 prefer vanilla.

# number of observations in the samples
n1 <- 150
n2 <- 135
# proportions in the two samples
p_bar1 <- 90/150
p_bar2 <- 50/135

We now compute the confidence interval using R’s qnorm function.

# Find the critical value to compute the confidence interval
alpha <- 0.05       # replace with your chosen alpha (here, a 95% confidence level)
critical_value <- qnorm(p = alpha/2, lower.tail=FALSE)

# Compute the standard error of the proportions
std_error <- sqrt( p_bar1*(1-p_bar1)/n1 + p_bar2*(1-p_bar2)/n2 )

# Compute the upper and lower bounds of the confidence interval and print them out
upper_bound <- (p_bar1 - p_bar2) + critical_value*std_error
lower_bound <- (p_bar1 - p_bar2) - critical_value*std_error
lower_bound
upper_bound

[1] 0.1165722

[1] 0.3426871

The confidence interval for the difference between these two proportions is $[0.11657, 0.34269]$.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Topics that include this task

Bentley University MA214

Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

Excel
Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.