# How to predict the response variable in a linear model (in Python, using statsmodels)

## Task

If we have a linear model and a value for each explanatory variable, how do we predict the corresponding value of the response variable?

Related tasks:

## Solution

Let’s assume that you’ve already built a linear model. We do an example below with fake data, but you can use your own actual data. For more information on the following code, see how to fit a multivariate linear model.

1
2
3
4
5
6
7
8
9
10

import pandas as pd
df = pd.DataFrame( {
'x1' : [ 2, 7, 4, 3, 11, 18, 6, 15, 9, 12],
'x2' : [ 4, 6, 10, 1, 18, 11, 8, 20, 16, 13],
'x3' : [11, 16, 20, 6, 14, 8, 5, 23, 13, 10],
'y' : [24, 60, 32, 29, 90, 45, 130, 76, 100, 120]
} )
import statsmodels.api as sm
model = sm.OLS( df['y'], sm.add_constant( df[['x1','x2','x3']] ) ).fit()

Let’s say we want to estimate $y$ given that $x_1 = 5$, $x_2 = 12$, and $x_3=50$.
We can use the model’s `predict()`

function as shown below, but we must add an
entry for the constant term in the model—we can use any value, but we choose 1.

1

model.predict( [ 1, 5, 12, 50 ] )

1

array([-91.71014402])

For the given values of the explanatory variables, our predicted response variable is $-91.71014402$.

Note that if you want to compute the predicted values for all the data
on which the model was trained, simply call `model.predict()`

with no arguments,
and it defaults to using the training data.

1

model.predict()

1
2
3

array([ 47.5701159 , 24.35988296, 42.21531274, 47.27613825,
110.86526185, 70.03097584, 95.12689978, 70.91290879,
106.52986696, 91.11263692])

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Nathan Carter (ncarter@bentley.edu)