How to compute a confidence interval for the expected value of a response variable

Description

If we have a simple linear regression model, $y = β_{0} + β_{1} x + ϵ$ , where $ϵ$ is some random error, then given any $x$ input, $y$ can be veiwed as a random variable because of $ϵ$ . Let’s consider its expected value. How do we construct a confidence interval for that expected value, given a value for the predictor $x$ ?

Related tasks:

Using statsmodels and sklearn, in Python

View this solution alone.

Let’s assume that you already have a linear model. We construct an example one here from some fabricated data. For a review of how this preparatory code works, see how to fit a linear model to two columns of data.

import statsmodels.api as sm

# Replace the following fake data with your actual data:
xs = [  34,   9,  78,  60,  22,  45,  83,  59,  25 ]
ys = [ 126, 347, 298, 309, 450, 187, 266, 385, 400 ]

# Create and fit a linear model to the data:
xs = sm.add_constant( xs )
model = sm.OLS( ys, xs ).fit()

Ask the model to do a prediction of one particular input, in this example $x = 40$ , with a $95 %$ confidence interval included ( $α = 0.05$ ). You can replce the $40$ with your chosen $x$ value, or an array of them, and you can replace the $0.05$ with your chosen value of $α$ .

(The extra 1 in the input to get_prediction is a placeholder, required because the model has been expanded to include a constant term.)

model.get_prediction( [1,40] ).summary_frame( alpha=0.05 )

	mean	mean_se	mean_ci_lower	mean_ci_upper	obs_ci_lower	obs_ci_upper
0	313.721744	36.823483	226.648043	400.795444	45.876725	581.566762

Our 95% confidence interval is $[226.648, 400.7954]$ . We can be 95% confident that the true average value of $y$ , given that $x$ is 40, is between 226.648 and 400.7954.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Solution, in R

View this solution alone.

Let’s assume that you already have a linear model. We construct an example one here from some fabricated data.

# Make the linear model
x <- c(34, 9, 78, 60, 22, 45, 83, 59, 25)
y <- c(126, 347, 298, 309, 450, 187, 266, 385, 400)
model <- lm(y ~ x)

Construct a data frame containing just one entry, the value of the independent variable for which you want to compute the confidence interval. That data frame can then be passed to R’s predict function to get a confidence interval for the expected value of $y$ .

# Use your chosen value of x below:
data <- data.frame(x=40)
# Compute the confidence interval for y:
predict(model, data, interval="confidence", level=0.95) # or choose a different confidence level; here we use 0.95

  fit      lwr     upr     
1 313.7217 226.648 400.7954

Our 95% confidence interval is $[226.648, 400.7954]$ . We can be 95% confident that the true average value of $y$ , given that $x$ is 40, is between 226.648 and 400.7954.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Topics that include this task

Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

Excel
Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.