# How to predict the response variable in a linear model

## Description

If we have a linear model and a value for each explanatory variable, how do we predict the corresponding value of the response variable?

Related tasks:

## Using statsmodels, in Python

Let’s assume that you’ve already built a linear model. We do an example below with fake data, but you can use your own actual data. For more information on the following code, see how to fit a multivariate linear model.

1
2
3
4
5
6
7
8
9
10

import pandas as pd
df = pd.DataFrame( {
'x1' : [ 2, 7, 4, 3, 11, 18, 6, 15, 9, 12],
'x2' : [ 4, 6, 10, 1, 18, 11, 8, 20, 16, 13],
'x3' : [11, 16, 20, 6, 14, 8, 5, 23, 13, 10],
'y' : [24, 60, 32, 29, 90, 45, 130, 76, 100, 120]
} )
import statsmodels.api as sm
model = sm.OLS( df['y'], sm.add_constant( df[['x1','x2','x3']] ) ).fit()

Let’s say we want to estimate $y$ given that $x_1 = 5$, $x_2 = 12$, and $x_3=50$.
We can use the model’s `predict()`

function as shown below, but we must add an
entry for the constant term in the model—we can use any value, but we choose 1.

1

model.predict( [ 1, 5, 12, 50 ] )

1

array([-91.71014402])

For the given values of the explanatory variables, our predicted response variable is $-91.71014402$.

Note that if you want to compute the predicted values for all the data
on which the model was trained, simply call `model.predict()`

with no arguments,
and it defaults to using the training data.

1

model.predict()

1
2
3

array([ 47.5701159 , 24.35988296, 42.21531274, 47.27613825,
110.86526185, 70.03097584, 95.12689978, 70.91290879,
106.52986696, 91.11263692])

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

## Solution, in R

Let’s assume that you’ve already built a linear model. We do an example below with fake data, but you can use your own actual data. For more information on the following code, see how to fit a multivariate linear model.

1
2
3
4
5

x1 <- c( 2, 7, 4, 3, 11, 18, 6, 15, 9, 12)
x2 <- c( 4, 6, 10, 1, 18, 11, 8, 20, 16, 13)
x3 <- c(11, 16, 20, 6, 14, 8, 5, 23, 13, 10)
y <- c(24, 60, 32, 29, 90, 45, 130, 76, 100, 120)
model <- lm(y ~ x1 + x2 + x3)

Let’s say we want to estimate $y$ given that $x_1 = 5$, $x_2 = 12$, and $x_3=50$.
We can use R’s `predict()`

function as shown below.

1

predict(model, newdata = data.frame(x1 = 5, x2 = 12, x3 = 50))

1
2

1
-91.71014

For the given values of the explanatory variables, our predicted response variable is $-91.71014$.

Note that if you want to compute the predicted values for all the data
on which the model was trained, simply call `predict(model)`

with no new data,
and it defaults to using the training data.

1

predict(model)

1
2
3
4

1 2 3 4 5 6 7 8
47.57012 24.35988 42.21531 47.27614 110.86526 70.03098 95.12690 70.91291
9 10
106.52987 91.11264

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

## Topics that include this task

## Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

- Excel
- Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.