# How to do a test of joint significance (in Python, using Statsmodels)

See all solutions.

If we have a multivariate linear model, how do we test the joint significance of all the variables in the model? In other words, how do we test the overall significance of the regression model?

## Solution

Let’s assume that you already made your multivariate linear model, similar to the one shown below. If you still need to create one, first see how to fit a multivariate linear model.

We use example data here, but you would use your own data instead.

1
2
3
4
5
6
7
8
import pandas as pd
import statsmodels.api as sm
data = {
'x1' : [ 2,  7,  4,  3, 11, 18,   6, 15,   9,  12],
'x2' : [ 4,  6, 10,  1, 18, 11,   8, 20,  16,  13],
'x3' : [11, 16, 20,  6, 14,  8,   5, 23,  13,  10],
'y' :  [24, 60, 32, 29, 90, 45, 130, 76, 100, 120]
}


The following code fits the model to the data.

1
2
3
4
5
df = pd.DataFrame(data)
xs = df[['x1', 'x2', 'x3']]
y = df['y']
model = sm.OLS(y, xs).fit()


Now we want to test whether the model is significant. We will use a null hypothesis that states that all of the model’s coefficients are equal to zero, that is, they are not jointly significant in predicting $y$. We can write $H_0: \beta_0 = \beta_1 = \beta2 = \beta_3 = 0$.

We also choose a value $0 \le \alpha \le 1$ as our Type 1 error rate. Herer we’ll use $\alpha=0.05$.

The summary output for the model will give us both the F-statistic and the p-value.

1
model.summary()

1
2
/opt/conda/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1736: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "

Dep. Variable: R-squared: y 0.594 OLS 0.390 Least Squares 2.921 Mon, 24 Jul 2023 0.122 20:42:37 -45.689 10 99.38 6 100.6 3 nonrobust
coef std err t P>|t| [0.025 0.975] 77.2443 27.366 2.823 0.030 10.282 144.206 -2.7009 2.855 -0.946 0.381 -9.686 4.284 7.2989 2.875 2.539 0.044 0.265 14.333 -4.8607 2.187 -2.223 0.068 -10.211 0.490
 Omnibus: Durbin-Watson: 2.691 2.123 0.26 1.251 0.524 0.535 1.62 58.2

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Near the top right of the output, we can see that the F-statistic is 2.921. The corresponding $p$-value immediately below it is 0.1222, which is greater than $\alpha$, so we do not have sufficient evidence to reject the null hypothesis.

We cannot conclude that the independent variables in our model are jointly significant in predicting the response variable.