# How to compute a confidence interval for the difference between two proportions

## Description

When dealing with qualitative data, we often want to construct a confidence interval for the difference between two population proportions. For example, if we are trying a drug on experimental and control groups of patients, we probably want to compare the proportion of patients who got well in one group versus the other.

How do we make such a comparison using a confidence interval?

## Using SciPy, in Python

Here is some fake data for the purposes of this illustration. Let’s say we conduct a survey of people in Boston and of people in Nashville and ask them if they prefer chocolate or vanilla ice cream. We want to compare the proportions of people from the two cities who like vanilla.

• Out of 150 people in Boston surveyed, 90 prefer vanilla.
• Out of 135 people in Nashville surveyed, 50 prefer vanilla.

We’ll let $\bar{p_1}$ represent the proportion of people from Boston who like vanilla and $\bar{p_2}$ represent the proportion of people from Nashville who like vanilla. You can replace the code for this fake data below with your real data.

 1 2 3 4 5 6 # number of observations in the samples n1 = 150 n2 = 135 # proportions in the two samples p_bar1 = 90/150 p_bar2 = 50/135

We now compute the confidence interval using tools from SciPy and NumPy.

 1 2 3 4 5 6 7 8 9 10 11 12 13 # Find the critical value to compute the confidence interval from scipy import stats alpha = 0.05 # replace with your chosen alpha (here, a 95% confidence level) critical_value = stats.norm.ppf(1-alpha/2) # Compute the standard error of the proportions import numpy as np std_error = np.sqrt( p_bar1*(1-p_bar1)/n1 + p_bar2*(1-p_bar2)/n2 ) # Compute the upper and lower bounds of the confidence interval upper_bound = (p_bar1 - p_bar2) + critical_value*std_error lower_bound = (p_bar1 - p_bar2) - critical_value*std_error lower_bound, upper_bound
 1 (0.11657216971616415, 0.3426870895430951)

The confidence interval for the difference between these two proportions is $[0.11657, 0.34269]$.

## Solution, in R

Here is some fake data for the purposes of this illustration. Let's say we conduct a survey of people in Boston and of people in Nashville and ask them if they prefer chocolate or vanilla ice cream. We want to compare the proportions of people from the two cities who like vanilla.

• Out of 150 people in Boston surveyed, 90 prefer vanilla.
• Out of 135 people in Nashville surveyed, 50 prefer vanilla.

• Out of 150 people in Boston surveyed, 90 prefer vanilla.
• Out of 135 people in Nashville surveyed, 50 prefer vanilla.

We'll let $\bar{p_1}$ represent the proportion of people from Boston who like vanilla and $\bar{p_2}$ represent the proportion of people from Nashville who like vanilla. You can replace the code for this fake data below with your real data.

 1 2 3 4 5 6 # number of observations in the samples n1 <- 150 n2 <- 135 # proportions in the two samples p_bar1 <- 90/150 p_bar2 <- 50/135

We now compute the confidence interval using R’s qnorm function.

 1 2 3 4 5 6 7 8 9 10 11 12 # Find the critical value to compute the confidence interval alpha <- 0.05 # replace with your chosen alpha (here, a 95% confidence level) critical_value <- qnorm(p = alpha/2, lower.tail=FALSE) # Compute the standard error of the proportions std_error <- sqrt( p_bar1*(1-p_bar1)/n1 + p_bar2*(1-p_bar2)/n2 ) # Compute the upper and lower bounds of the confidence interval and print them out upper_bound <- (p_bar1 - p_bar2) + critical_value*std_error lower_bound <- (p_bar1 - p_bar2) - critical_value*std_error lower_bound upper_bound
 1 2 3 4 5 [1] 0.1165722 [1] 0.3426871

The confidence interval for the difference between these two proportions is $[0.11657, 0.34269]$.

