# How to compute the residuals of a linear model

## Description

If a model has been fit to a dataset, the residuals are the differences between the actual data points and the results the model would predict. Given a linear model and a dataset, how can we compute those residuals?

## Using statsmodels, in Python

View this solution alone.

Let’s assume that you’ve already built a linear model similar to the one below. This one uses a small amount of fake data, but it’s just an example.

1
2
3
4
5
6
7
import statsmodels.api as sm

xs = [ 393, 453, 553, 679, 729, 748, 817 ]
ys = [  24,  25,  27,  36,  55,  68,  84 ]

reg = sm.OLS( ys, xs ).fit()


We can extract the residuals of the model by calling the model’s resid attribute.

1
reg.resid

1
2
array([  9.16263041,   2.19945659,  -9.07249979, -16.79516483,
-4.43114302,   6.04718527,  12.88953537])


The result is an array of the residuals for every value in the data set.

See a problem? Tell us or edit the source.

## Solution, in R

View this solution alone.

Let’s assume that you’ve already built a linear model similar to the one below. This one uses a small amount of fake data, but it’s just an example. See also how to fit a linear model to two columns of data.

1
2
3
xs <- c( 393, 453, 553, 679, 729, 748, 817 )
ys <- c(  24,  25,  27,  36,  55,  68,  84 )
model <- lm(ys ~ xs)


We can extract the residuals of the model in either of two ways.

R has a built-in residuals() function for this purpose.

1
residuals(model)

1
2
1          2          3          4          5          6          7
9.162630   2.199457  -9.072500 -16.795165  -4.431143   6.047185  12.889535


The model itself has a $residuals attribute. 1 model$residuals

1
2
1          2          3          4          5          6          7
9.162630   2.199457  -9.072500 -16.795165  -4.431143   6.047185  12.889535