# How to do a one-way analysis of variance (ANOVA) (in R)

See all solutions.

If we have multiple independent samples of the same quantity (such as students’ SAT scores from several different schools), we may want to test whether the means of each of the samples are the same. Analysis of Variance (ANOVA) can determine whether any two of the sample means differ significantly. How can we do an ANOVA?

## Solution

R expects you to have all the samples in one vector, and the groups they came from in a separate, categorical vector. So, for example, if we had SAT scores from four different schools (named A, B, C, and D), then our data might be arranged like this.

1
2
3
4
5
6
7
8
SAT.scores <- c(
1100, 1250, 1390, 970, 1510, 1010, 1050, 1090, 1110,
900, 1550, 1300, 1270, 1210, 900, 850, 1110, 1070, 910, 920
)
school.names <- c(
'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B',
'C', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'D', 'D'
)


ANOVA tests the null hypothesis that all group means are equal. You choose $\alpha$, the probability of Type I error (false positive, finding we should reject $H_0$ when it’s actually true). I will use $\alpha=0.05$ in this example.

1
2
3
# Run a one-way ANOVA and print a summary of all the output
result <- aov( SAT.scores ~ school.names )
summary( result )

1
2
3
4
5
Df Sum Sq Mean Sq F value Pr(>F)
school.names  3 321715  107238   3.689 0.0342 *
Residuals    16 465140   29071
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


The $p$-value reported in that output is $0.0433$. You could manually check whether $p<\alpha$. Since it is, we would reject $H_0$, and therefore conclude that at least one pair of means is statistically significantly different.

Or you could ask R to do the comparison for you, but getting the $p$-value from the ANOVA summary is fiddly:

1
2
3
alpha <- 0.05
p.value <- unname( unlist( summary( result ) ) )[9]
p.value < alpha

1
[1] TRUE


See a problem? Tell us or edit the source.

Contributed by Nathan Carter (ncarter@bentley.edu)