# How to compute a confidence interval for the difference between two proportions

## Description

When dealing with qualitative data, we often want to construct a confidence interval for the difference between two population proportions. For example, if we are trying a drug on experimental and control groups of patients, we probably want to compare the proportion of patients who got well in one group versus the other.

How do we make such a comparison using a confidence interval?

Related tasks:

- How to compute a confidence interval for a mean difference (matched pairs)
- How to compute a confidence interval for a regression coefficient
- How to compute a confidence interval for a population mean
- How to compute a confidence interval for a single population variance
- How to compute a confidence interval for the difference between two means when both population variances are known
- How to compute a confidence interval for the difference between two means when population variances are unknown
- How to compute a confidence interval for the expected value of a response variable
- How to compute a confidence interval for the population proportion
- How to compute a confidence interval for the ratio of two population variances

## Using SciPy, in Python

Here is some fake data for the purposes of this illustration. Let’s say we conduct a survey of people in Boston and of people in Nashville and ask them if they prefer chocolate or vanilla ice cream. We want to compare the proportions of people from the two cities who like vanilla.

- Out of 150 people in Boston surveyed, 90 prefer vanilla.
- Out of 135 people in Nashville surveyed, 50 prefer vanilla.

We’ll let $\bar{p_1}$ represent the proportion of people from Boston who like vanilla and $\bar{p_2}$ represent the proportion of people from Nashville who like vanilla. You can replace the code for this fake data below with your real data.

1
2
3
4
5
6

# number of observations in the samples
n1 = 150
n2 = 135
# proportions in the two samples
p_bar1 = 90/150
p_bar2 = 50/135

We now compute the confidence interval using tools from SciPy and NumPy.

1
2
3
4
5
6
7
8
9
10
11
12
13

# Find the critical value to compute the confidence interval
from scipy import stats
alpha = 0.05 # replace with your chosen alpha (here, a 95% confidence level)
critical_value = stats.norm.ppf(1-alpha/2)
# Compute the standard error of the proportions
import numpy as np
std_error = np.sqrt( p_bar1*(1-p_bar1)/n1 + p_bar2*(1-p_bar2)/n2 )
# Compute the upper and lower bounds of the confidence interval
upper_bound = (p_bar1 - p_bar2) + critical_value*std_error
lower_bound = (p_bar1 - p_bar2) - critical_value*std_error
lower_bound, upper_bound

1

(0.11657216971616415, 0.3426870895430951)

The confidence interval for the difference between these two proportions is $[0.11657, 0.34269]$.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

## Solution, in R

Here is some fake data for the purposes of this illustration. Let’s say we conduct a survey of people in Boston and of people in Nashville and ask them if they prefer chocolate or vanilla ice cream. We want to compare the proportions of people from the two cities who like vanilla.

- Out of 150 people in Boston surveyed, 90 prefer vanilla.
- Out of 135 people in Nashville surveyed, 50 prefer vanilla.

We’ll let $\bar{p_1}$ represent the proportion of people from Boston who like vanilla and $\bar{p_2}$ represent the proportion of people from Nashville who like vanilla. You can replace the code for this fake data below with your real data.

1
2
3
4
5
6

# number of observations in the samples
n1 <- 150
n2 <- 135
# proportions in the two samples
p_bar1 <- 90/150
p_bar2 <- 50/135

We now compute the confidence interval using R’s `qnorm`

function.

1
2
3
4
5
6
7
8
9
10
11
12

# Find the critical value to compute the confidence interval
alpha <- 0.05 # replace with your chosen alpha (here, a 95% confidence level)
critical_value <- qnorm(p = alpha/2, lower.tail=FALSE)
# Compute the standard error of the proportions
std_error <- sqrt( p_bar1*(1-p_bar1)/n1 + p_bar2*(1-p_bar2)/n2 )
# Compute the upper and lower bounds of the confidence interval and print them out
upper_bound <- (p_bar1 - p_bar2) + critical_value*std_error
lower_bound <- (p_bar1 - p_bar2) - critical_value*std_error
lower_bound
upper_bound

1
2
3
4
5

[1] 0.1165722
[1] 0.3426871

The confidence interval for the difference between these two proportions is $[0.11657, 0.34269]$.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

## Topics that include this task

## Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

- Excel
- Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.