How to compute a confidence interval for the difference between two means when population variances are unknown (in R)

Task

If we have samples from two independent populations and both of the population variances are unknown, how do we compute a confidence interval for the difference between the population means?

Related tasks:

Solution

We’re going to use some fake data here to illustrate how to make the confidence interval. Replace our fake data with your actual data if you use this code.

sample.1 <- c(15, 10, 7, 22, 17, 14)
sample.2 <- c(9, 1, 11, 13, 3, 6)

In the example below, we specify var.equal = FALSE to indicate that we cannot assume that the variances are equal. If you know them to be equal in your situation, replace FALSE with TRUE.

alpha <- 0.05       # replace with your chosen alpha (here, a 95% confidence level)
conf.interval <- t.test(sample.1, sample.2, var.equal = FALSE, conf.level = 1-alpha)
# If you need the upper and lower bounds later, store them in variables like this:
lower.bound <- conf.interval$conf.int[1]
upper.bound <- conf.interval$conf.int[2]
# Print out the lower and upper bounds
lower.bound
upper.bound

[1] 0.5852484

[1] 13.41475

Our 95% confidence interval for the true difference between these population means is $[0.5852, 13.4147]$ .

You can also see the test statistic and $p$ -value by inspecting the result of the t.test function we ran above.

1
conf.interval

	Welch Two Sample t-test

data:  sample.1 and sample.2
t = 2.4363, df = 9.8554, p-value = 0.0354
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  0.5852484 13.4147516
sample estimates:
mean of x mean of y 
14.166667  7.166667 

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)