How to compute adjusted R-squared (in Python, using statsmodels)
Task
If we have fit a multivariate linear model, how can we compute the Adjusted $R^2$ for that model, to measure its goodness of fit?
Related tasks:
Solution
We assume you have already fit a multivariate linear model to some data, as in the code below. (If you’re unfamiliar with how to do so, see how to fit a multivariate linear model.) The data shown below is fake, and we assume you will replace it with your own real data if you use this code.
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
import statsmodels.api as sm
df = pd.DataFrame( {
'x1':[2, 7, 4, 3, 11, 18, 6, 15, 9, 12],
'x2':[4, 6, 10, 1, 18, 11, 8, 20, 16, 13],
'x3':[11, 16, 20, 6, 14, 8, 5, 23, 13, 10],
'y':[24, 60, 32, 29, 90, 45, 130, 76, 100, 120]
} )
xs = df[['x1', 'x2', 'x3']]
y = df['y']
xs = sm.add_constant(xs)
model = sm.OLS(y, xs).fit()
You can get a lot of information about your model from its summary.
1
model.summary()
1
2
/opt/conda/lib/python3.11/site-packages/scipy/stats/_stats_py.py:1806: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
Dep. Variable: | y | R-squared: | 0.594 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.390 |
Method: | Least Squares | F-statistic: | 2.921 |
Date: | Mon, 24 Jul 2023 | Prob (F-statistic): | 0.122 |
Time: | 17:47:21 | Log-Likelihood: | -45.689 |
No. Observations: | 10 | AIC: | 99.38 |
Df Residuals: | 6 | BIC: | 100.6 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 77.2443 | 27.366 | 2.823 | 0.030 | 10.282 | 144.206 |
x1 | -2.7009 | 2.855 | -0.946 | 0.381 | -9.686 | 4.284 |
x2 | 7.2989 | 2.875 | 2.539 | 0.044 | 0.265 | 14.333 |
x3 | -4.8607 | 2.187 | -2.223 | 0.068 | -10.211 | 0.490 |
Omnibus: | 2.691 | Durbin-Watson: | 2.123 |
---|---|---|---|
Prob(Omnibus): | 0.260 | Jarque-Bera (JB): | 1.251 |
Skew: | 0.524 | Prob(JB): | 0.535 |
Kurtosis: | 1.620 | Cond. No. | 58.2 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In particular, that printout contains the Adjusted $R^2$ value; it is the second value in the right-hand column, near the top.
You can also obtain it directly, as follows:
1
model.rsquared_adj
1
0.390392407508503
In this case, the Adjusted $R^2$ is $0.3904$.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)