# How to compute a confidence interval for the expected value of a response variable (in R)

## Task

If we have a simple linear regression model, $y = \beta_0 + \beta_1x + \epsilon$, where $\epsilon$ is some random error, then given any $x$ input, $y$ can be veiwed as a random variable because of $\epsilon$. Let’s consider its expected value. How do we construct a confidence interval for that expected value, given a value for the predictor $x$?

Related tasks:

- How to compute a confidence interval for a mean difference (matched pairs)
- How to compute a confidence interval for a regression coefficient
- How to compute a confidence interval for a population mean
- How to compute a confidence interval for a single population variance
- How to compute a confidence interval for the difference between two means when both population variances are known
- How to compute a confidence interval for the difference between two means when population variances are unknown
- How to compute a confidence interval for the difference between two proportions
- How to compute a confidence interval for the population proportion
- How to compute a confidence interval for the ratio of two population variances

## Solution

Let’s assume that you already have a linear model. We construct an example one here from some fabricated data.

1
2
3
4

# Make the linear model
x <- c(34, 9, 78, 60, 22, 45, 83, 59, 25)
y <- c(126, 347, 298, 309, 450, 187, 266, 385, 400)
model <- lm(y ~ x)

Construct a data frame containing just one entry, the value of the independent variable for which you want to compute the confidence interval.
That data frame can then be passed to R’s `predict`

function to get a confidence interval for the expected value of $y$.

1
2
3
4

# Use your chosen value of x below:
data <- data.frame(x=40)
# Compute the confidence interval for y:
predict(model, data, interval="confidence", level=0.95) # or choose a different confidence level; here we use 0.95

1
2

fit lwr upr
1 313.7217 226.648 400.7954

Our 95% confidence interval is $[226.648, 400.7954]$. We can be 95% confident that the true average value of $y$, given that $x$ is 40, is between 226.648 and 400.7954.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)