How to compute the standard error of the estimate for a model (in Python, using statsmodels)

Task

One measure of the goodness of fit of a model is the standard error of its estimates. If the actual values are $y_{i}$ and the estimates are ${\hat{y}}_{i}$ , the definition of this quantity is as follows, for $n$ data points.

σ_{est} = \sqrt{\frac{\sum (y_{i} - {\hat{y}}_{i})^{2}}{n}}

If we’ve fit a linear model, how do we compute the standard error of its estimates?

Solution

Let’s assume that you already fit the linear model, as shown in the code below. This one uses a small amount of fake data, but it’s just an example. See also how to fit a linear model to two columns of data.

# Below is the fake data as an example. You can replace with your real data.
x = [  34,   9,  78,  60,  22,  45,  83,  59,  25 ]
y = [ 126, 347, 298, 309, 450, 187, 266, 385, 400 ]

# Use statsmodels to build a linear regression model
import statsmodels.api as sm
x = sm.add_constant( x )
model = sm.OLS( y, x ).fit()

The standard error is shown as part of the model summary, reported by statsmodels’s built-in summary function. See the column entitled “std err” in the output below.

model.summary()

/opt/conda/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1736: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=9
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "

OLS Regression Results
Dep. Variable:	y	R-squared:	0.063
Model:	OLS	Adj. R-squared:	-0.071
Method:	Least Squares	F-statistic:	0.4693
Date:	Mon, 24 Jul 2023	Prob (F-statistic):	0.515
Time:	20:38:01	Log-Likelihood:	-53.705
No. Observations:	9	AIC:	111.4
Df Residuals:	7	BIC:	111.8
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	354.0822	76.733	4.614	0.002	172.638	535.526
x1	-1.0090	1.473	-0.685	0.515	-4.492	2.474

Omnibus:	2.324	Durbin-Watson:	1.618
Prob(Omnibus):	0.313	Jarque-Bera (JB):	1.079
Skew:	-0.832	Prob(JB):	0.583
Kurtosis:	2.674	Cond. No.	112.

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

If we need to extract just the estimates or their standard errors, we can use code like the following.

model.params # just the model coefficients

array([354.0822479 ,  -1.00901261])

model.bse # just the standard errors of those estimates

array([76.73277161,  1.47293931])

The standard error of the estimate for the intercept is is 76.73277161 and the standard error of the estimate for the slope is 1.47293931.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by:

Ni Shi (shi_ni@bentley.edu)
Nathan Carter (ncarter@bentley.edu)