# How to compute the residuals of a linear model (in Python, using statsmodels)

## Task

If a model has been fit to a dataset, the *residuals* are the differences
between the actual data points and the results the model would predict.
Given a linear model and a dataset, how can we compute those residuals?

## Solution

Let’s assume that you’ve already built a linear model similar to the one below. This one uses a small amount of fake data, but it’s just an example.

1
2
3
4
5
6
7

import statsmodels.api as sm
xs = [ 393, 453, 553, 679, 729, 748, 817 ]
ys = [ 24, 25, 27, 36, 55, 68, 84 ]
xs = sm.add_constant( xs )
reg = sm.OLS( ys, xs ).fit()

We can extract the residuals of the model by calling the model’s `resid`

attribute.

1

reg.resid

1
2

array([ 9.16263041, 2.19945659, -9.07249979, -16.79516483,
-4.43114302, 6.04718527, 12.88953537])

The result is an array of the residuals for every value in the data set.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Andrew Quagliaroli (aquagliaroli@falcon.bentley.edu)