How to compute the residuals of a linear model (in Python, using statsmodels)

Task

If a model has been fit to a dataset, the residuals are the differences between the actual data points and the results the model would predict. Given a linear model and a dataset, how can we compute those residuals?

Solution

Let’s assume that you’ve already built a linear model similar to the one below. This one uses a small amount of fake data, but it’s just an example.

import statsmodels.api as sm

xs = [ 393, 453, 553, 679, 729, 748, 817 ]
ys = [  24,  25,  27,  36,  55,  68,  84 ]

xs = sm.add_constant( xs )
reg = sm.OLS( ys, xs ).fit()

We can extract the residuals of the model by calling the model’s resid attribute.

reg.resid

array([  9.16263041,   2.19945659,  -9.07249979, -16.79516483,
        -4.43114302,   6.04718527,  12.88953537])

The result is an array of the residuals for every value in the data set.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Andrew Quagliaroli (aquagliaroli@falcon.bentley.edu)