# How to perform an analysis of covariance (ANCOVA)

## Description

Recall that covariates are variables that may be related to the outcome but are unaffected by treatment assignment. In a randomized experiment with one or more observed covariates, an analysis of covariance (ANCOVA) addresses this question: How would the mean outcome in each treatment group change if all groups were equal with respect to the covariate? The goal is to remove any variability in the outcome associated with the covariate from the unexplained variability used to determine statistical significance.

Related tasks:

- How to do a one-way analysis of variance (ANOVA)
- How to compare two nested linear models
- How to conduct a mixed designs ANOVA
- How to conduct a repeated measures ANOVA

## Using pingouin, in Python

The solution below uses an example dataset about car design and fuel consumption from a 1974 Motor Trend magazine. (See how to quickly load some sample data.)

1
2

from rdatasets import data
df = data('mtcars')

Let’s use ANCOVA to check the effect of the engine type (0 = V-shaped, 1 = straight, in the variable `vs`

) on the miles per gallon when considering the weight of the car as a covariate. We will use the `ancova`

function from the `pingouin`

package to conduct the test.

1
2

from pingouin import ancova
ancova(data=df, dv='mpg', covar='wt', between='vs')

Source | SS | DF | F | p-unc | np2 | |
---|---|---|---|---|---|---|

0 | vs | 54.228061 | 1 | 7.017656 | 1.292580e-02 | 0.194839 |

1 | wt | 405.425409 | 1 | 52.466123 | 5.632548e-08 | 0.644024 |

2 | Residual | 224.093877 | 29 | NaN | NaN | NaN |

The $p$-value for each variable is in the `p-unc`

column.

The $p$-value for the `wt`

variable tests the null hypothesis, “The quantities `wt`

and `mpg`

are not related.” Since it is below 0.05, we reject the null hypothesis, and conclude that `wt`

is significant in predicting `mpg`

.

The $p$-value for the `vs`

variable tests the null hypothesis, “The quantities `vs`

and `mpg`

are not related if we hold `wt`

constant.” Since it is below 0.05, we reject the null hypothesis, and conclude that `vs`

is significant in predicting `mpg`

even among cars with equal weight (`wt`

).

Note: Unfortunately, a two-factor ANCOVA is not possible in pingouin. However, a model with more than one covariate is possible, as you can provide a list as the `covar`

parameter when calling `ancova`

.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

## Solution, in R

The solution below uses an example dataset about car design and fuel consumption from a 1974 Motor Trend magazine. (See how to quickly load some sample data.)

1
2

df <- mtcars
df$vs <- as.factor(df$vs)

Let’s use ANCOVA to check the effect of the engine type (0 = V-shaped, 1 = straight, in the variable `vs`

) on the miles per gallon when considering the weight of the car as a covariate. We will use the `ancova`

function from the `pingouin`

package to conduct the test.

1
2

cov.model <- lm(mpg ~ wt + vs, data = df)
anova(cov.model)

1
2
3
4

Df Sum Sq Mean Sq F value Pr(>F)
wt 1 847.72525 847.725250 109.704168 2.284396e-11
vs 1 54.22806 54.228061 7.017656 1.292580e-02
Residuals 29 224.09388 7.727375 NA NA

The $p$-value for each variable can be found in the final column of the output, called `Pr(>F)`

.

The $p$-value for the `wt`

variable tests the null hypothesis, “The quantities `wt`

and `mpg`

are not related.” Since it is below 0.05, we reject the null hypothesis, and conclude that `wt`

is significant in predicting `mpg`

.

The $p$-value for the `vs`

variable tests the null hypothesis, “The quantities `vs`

and `mpg`

are not related if we hold `wt`

constant.” Since it is below 0.05, we reject the null hypothesis, and conclude that `vs`

is significant in predicting `mpg`

even among cars with equal weight (`wt`

).

If we wish to create a 2-factor ANCOVA model, we can test to see if the engine type (0 = V-shaped, 1 = straight) and transmission type (0 = automatic, 1 = manual) have an effect on the Miles/gallon per car when considering the weight of the car as a covariate.

1
2

cov.model.2 <- lm(mpg ~ wt + vs + am, data = df)
anova(cov.model.2)

1
2
3
4
5

Df Sum Sq Mean Sq F value Pr(>F)
wt 1 847.725250 847.725250 109.729918 3.420018e-11
vs 1 54.228061 54.228061 7.019303 1.310627e-02
am 1 7.778149 7.778149 1.006807 3.242621e-01
Residuals 28 216.315728 7.725562 NA NA

The $p$-values are again in the final column of output. They show that at the 5% significance level, we would conclude that engine type (`vs`

) significantly impacts the Miles/gallon per car while accounting for the weight of the car (`wt`

) but the transmission type (`am`

) does not.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

## Topics that include this task

## Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

- Excel
- Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.