How to predict the response variable in a linear model
Description
If we have a linear model and a value for each explanatory variable, how do we predict the corresponding value of the response variable?
Related tasks:
Using statsmodels, in Python
Let’s assume that you’ve already built a linear model. We do an example below with fake data, but you can use your own actual data. For more information on the following code, see how to fit a multivariate linear model.
1
2
3
4
5
6
7
8
9
10
import pandas as pd
df = pd.DataFrame( {
'x1' : [ 2, 7, 4, 3, 11, 18, 6, 15, 9, 12],
'x2' : [ 4, 6, 10, 1, 18, 11, 8, 20, 16, 13],
'x3' : [11, 16, 20, 6, 14, 8, 5, 23, 13, 10],
'y' : [24, 60, 32, 29, 90, 45, 130, 76, 100, 120]
} )
import statsmodels.api as sm
model = sm.OLS( df['y'], sm.add_constant( df[['x1','x2','x3']] ) ).fit()
Let’s say we want to estimate $y$ given that $x_1 = 5$, $x_2 = 12$, and $x_3=50$.
We can use the model’s predict()
function as shown below, but we must add an
entry for the constant term in the model—we can use any value, but we choose 1.
1
model.predict( [ 1, 5, 12, 50 ] )
1
array([-91.71014402])
For the given values of the explanatory variables, our predicted response variable is $-91.71014402$.
Note that if you want to compute the predicted values for all the data
on which the model was trained, simply call model.predict()
with no arguments,
and it defaults to using the training data.
1
model.predict()
1
2
3
array([ 47.5701159 , 24.35988296, 42.21531274, 47.27613825,
110.86526185, 70.03097584, 95.12689978, 70.91290879,
106.52986696, 91.11263692])
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
Let’s assume that you’ve already built a linear model. We do an example below with fake data, but you can use your own actual data. For more information on the following code, see how to fit a multivariate linear model.
1
2
3
4
5
x1 <- c( 2, 7, 4, 3, 11, 18, 6, 15, 9, 12)
x2 <- c( 4, 6, 10, 1, 18, 11, 8, 20, 16, 13)
x3 <- c(11, 16, 20, 6, 14, 8, 5, 23, 13, 10)
y <- c(24, 60, 32, 29, 90, 45, 130, 76, 100, 120)
model <- lm(y ~ x1 + x2 + x3)
Let’s say we want to estimate $y$ given that $x_1 = 5$, $x_2 = 12$, and $x_3=50$.
We can use R’s predict()
function as shown below.
1
predict(model, newdata = data.frame(x1 = 5, x2 = 12, x3 = 50))
1
2
1
-91.71014
For the given values of the explanatory variables, our predicted response variable is $-91.71014$.
Note that if you want to compute the predicted values for all the data
on which the model was trained, simply call predict(model)
with no new data,
and it defaults to using the training data.
1
predict(model)
1
2
3
4
1 2 3 4 5 6 7 8
47.57012 24.35988 42.21531 47.27614 110.86526 70.03098 95.12690 70.91291
9 10
106.52987 91.11264
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
- Julia
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.