How to compute the residuals of a linear model (in Python, using statsmodels)
Task
If a model has been fit to a dataset, the residuals are the differences between the actual data points and the results the model would predict. Given a linear model and a dataset, how can we compute those residuals?
Solution
Let’s assume that you’ve already built a linear model similar to the one below. This one uses a small amount of fake data, but it’s just an example.
1
2
3
4
5
6
7
import statsmodels.api as sm
xs = [ 393, 453, 553, 679, 729, 748, 817 ]
ys = [ 24, 25, 27, 36, 55, 68, 84 ]
xs = sm.add_constant( xs )
reg = sm.OLS( ys, xs ).fit()
We can extract the residuals of the model by calling the model’s resid
attribute.
1
reg.resid
1
2
array([ 9.16263041, 2.19945659, -9.07249979, -16.79516483,
-4.43114302, 6.04718527, 12.88953537])
The result is an array of the residuals for every value in the data set.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Andrew Quagliaroli (aquagliaroli@falcon.bentley.edu)