Link Search Menu Expand Document (external link)

How to do a test of joint significance

Description

If we have a multivariate linear model, how do we test the joint significance of all the variables in the model? In other words, how do we test the overall significance of the regression model?

Using statsmodels, in Python

View this solution alone.

Let’s assume that you already made your multivariate linear model, similar to the one shown below. If you still need to create one, first see how to fit a multivariate linear model.

We use example data here, but you would use your own data instead.

1
2
3
4
5
6
7
8
import pandas as pd
import statsmodels.api as sm
data = {
    'x1' : [ 2,  7,  4,  3, 11, 18,   6, 15,   9,  12],
    'x2' : [ 4,  6, 10,  1, 18, 11,   8, 20,  16,  13],
    'x3' : [11, 16, 20,  6, 14,  8,   5, 23,  13,  10],
    'y' :  [24, 60, 32, 29, 90, 45, 130, 76, 100, 120]
}

The following code fits the model to the data.

1
2
3
4
5
df = pd.DataFrame(data)
xs = df[['x1', 'x2', 'x3']]
y = df['y']
xs = sm.add_constant(xs)
model = sm.OLS(y, xs).fit()

Now we want to test whether the model is significant. We will use a null hypothesis that states that all of the model’s coefficients are equal to zero, that is, they are not jointly significant in predicting $y$. We can write $H_0: \beta_0 = \beta_1 = \beta2 = \beta_3 = 0$.

We also choose a value $0 \le \alpha \le 1$ as our Type 1 error rate. Herer we’ll use $\alpha=0.05$.

The summary output for the model will give us both the F-statistic and the p-value.

1
model.summary()
1
2
/opt/conda/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1736: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
  warnings.warn("kurtosistest only valid for n>=20 ... continuing "
OLS Regression Results
Dep. Variable: y R-squared: 0.594
Model: OLS Adj. R-squared: 0.390
Method: Least Squares F-statistic: 2.921
Date: Mon, 24 Jul 2023 Prob (F-statistic): 0.122
Time: 20:42:37 Log-Likelihood: -45.689
No. Observations: 10 AIC: 99.38
Df Residuals: 6 BIC: 100.6
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 77.2443 27.366 2.823 0.030 10.282 144.206
x1 -2.7009 2.855 -0.946 0.381 -9.686 4.284
x2 7.2989 2.875 2.539 0.044 0.265 14.333
x3 -4.8607 2.187 -2.223 0.068 -10.211 0.490
Omnibus: 2.691 Durbin-Watson: 2.123
Prob(Omnibus): 0.260 Jarque-Bera (JB): 1.251
Skew: 0.524 Prob(JB): 0.535
Kurtosis: 1.620 Cond. No. 58.2



Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Near the top right of the output, we can see that the F-statistic is 2.921. The corresponding $p$-value immediately below it is 0.1222, which is greater than $\alpha$, so we do not have sufficient evidence to reject the null hypothesis.

We cannot conclude that the independent variables in our model are jointly significant in predicting the response variable.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Solution, in R

View this solution alone.

Let’s assume that you already made your multiple regression model, similar to the one shown below. You can visit this task, , to see how to construct a multivariate linear model.

Let’s assume that you already made your multivariate linear model, similar to the one shown below. If you still need to create one, first see how to fit a multivariate linear model.

We use example data here, but you would use your own data instead.

1
2
3
4
5
x1 <- c( 2,  7,  4,  3, 11, 18,   6, 15,   9,  12)
x2 <- c( 4,  6, 10,  1, 18, 11,   8, 20,  16,  13)
x3 <- c(11, 16, 20,  6, 14,  8,   5, 23,  13,  10)
y  <- c(24, 60, 32, 29, 90, 45, 130, 76, 100, 120)
model <- lm(y ~ x1 + x2 + x3)

Now we want to test whether the model is significant. We will use a null hypothesis that states that all of the model’s coefficients are equal to zero, that is, they are not jointly significant in predicting $y$. We can write $H_0: \beta_0 = \beta_1 = \beta2 = \beta_3 = 0$.

We also choose a value $0 \le \alpha \le 1$ as our Type 1 error rate. Herer we’ll use $\alpha=0.05$.

The summary output for the model will give us both the F-statistic and the p-value.

1
summary(model)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.031 -20.218  -8.373  22.937  35.640 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   77.244     27.366   2.823   0.0302 *
x1            -2.701      2.855  -0.946   0.3806  
x2             7.299      2.875   2.539   0.0441 *
x3            -4.861      2.187  -2.223   0.0679 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 30.13 on 6 degrees of freedom
Multiple R-squared:  0.5936,	Adjusted R-squared:  0.3904 
F-statistic: 2.921 on 3 and 6 DF,  p-value: 0.1222

In the final line of the output, we can see that the F-statistic is 2.921. The corresponding $p$-value in the same line is 0.1222, which is greater than $\alpha$, so we do not have sufficient evidence to reject the null hypothesis.

We cannot conclude that the independent variables in our model are jointly significant in predicting the response variable.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Topics that include this task

Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

  • Excel
  • Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.