How to do a one-way analysis of variance (ANOVA)
Description
If we have multiple independent samples of the same quantity (such as students’ SAT scores from several different schools), we may want to test whether the means of each of the samples are the same. Analysis of Variance (ANOVA) can determine whether any two of the sample means differ significantly. How can we do an ANOVA?
Related tasks:
- How to do a two-sided hypothesis test for two sample means (which is just an ANOVA with only two samples)
- How to do a two-way ANOVA test with interaction
- How to do a two-way ANOVA test without interaction
- How to compare two nested linear models
- How to conduct a mixed designs ANOVA
- How to conduct a repeated measures ANOVA
- How to perform an analysis of covariance (ANCOVA)
- How to do a Kruskal-Wallis test
Solution, in Julia
Let’s assume we have our samples in several different Julia arrays. Here I’ll construct some made-up data about SAT scores at four different schools.
1
2
3
4
school1_SATs = [ 1100, 1250, 1390, 970, 1510 ];
school2_SATs = [ 1010, 1050, 1090, 1110 ];
school3_SATs = [ 900, 1550, 1300, 1270, 1210 ];
school4_SATs = [ 900, 850, 1110, 1070, 910, 920 ];
ANOVA tests the null hypothesis that all group means are equal. You choose $\alpha$, the probability of Type I error (false positive, finding we should reject $H_0$ when it’s actually true). I will use $\alpha=0.05$ in this example.
1
2
3
4
5
using HypothesisTests
alpha = 0.05
p_value = pvalue( OneWayANOVATest( school1_SATs, school2_SATs, school3_SATs, school4_SATs ) )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0
1
(0.05, 0.03405326535040251, true)
The result we see above is to reject $H_0$, and therefore conclude that at least one pair of means is statistically significantly different.
If you are using the most common $\alpha$ value of $0.05$, you can save a few lines of code and get a more detailed printout by just printing out the test itself:
1
OneWayANOVATest( school1_SATs, school2_SATs, school3_SATs, school4_SATs )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
One-way analysis of variance (ANOVA) test
-----------------------------------------
Population details:
parameter of interest: Means
value under h_0: "all equal"
point estimate: NaN
Test summary:
outcome with 95% confidence: reject h_0
p-value: 0.0341
Details:
number of observations: [5, 4, 5, 6]
F statistic: 3.69513
degrees of freedom: (3, 16)
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Using SciPy, in Python
Let’s assume we have our samples in several different Python lists. (Although anything like a list is also supported, including pandas Series.) Here I’ll construct some made-up data about SAT scores at four different schools.
1
2
3
4
school1_SATs = [ 1100, 1250, 1390, 970, 1510 ]
school2_SATs = [ 1010, 1050, 1090, 1110 ]
school3_SATs = [ 900, 1550, 1300, 1270, 1210 ]
school4_SATs = [ 900, 850, 1110, 1070, 910, 920 ]
ANOVA tests the null hypothesis that all group means are equal. You choose $\alpha$, the probability of Type I error (false positive, finding we should reject $H_0$ when it’s actually true). I will use $\alpha=0.05$ in this example.
1
2
3
4
5
6
7
8
9
alpha = 0.05
# Run a one-way ANOVA and print out alpha, the p value,
# and whether the comparison says to reject the null hypothesis.
from scipy import stats
F_statistic, p_value = stats.f_oneway(
school1_SATs, school2_SATs, school3_SATs, school4_SATs )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0
1
(0.05, 0.0342311478489849, True)
The result we see above is to reject $H_0$, and therefore conclude that at least one pair of means is statistically significantly different.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
R expects you to have all the samples in one vector, and the groups they came from in a separate, categorical vector. So, for example, if we had SAT scores from four different schools (named A, B, C, and D), then our data might be arranged like this.
1
2
3
4
5
6
7
8
SAT.scores <- c(
1100, 1250, 1390, 970, 1510, 1010, 1050, 1090, 1110,
900, 1550, 1300, 1270, 1210, 900, 850, 1110, 1070, 910, 920
)
school.names <- c(
'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B',
'C', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'D', 'D'
)
ANOVA tests the null hypothesis that all group means are equal. You choose $\alpha$, the probability of Type I error (false positive, finding we should reject $H_0$ when it’s actually true). I will use $\alpha=0.05$ in this example.
1
2
3
# Run a one-way ANOVA and print a summary of all the output
result <- aov( SAT.scores ~ school.names )
summary( result )
1
2
3
4
5
Df Sum Sq Mean Sq F value Pr(>F)
school.names 3 321715 107238 3.689 0.0342 *
Residuals 16 465140 29071
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The $p$-value reported in that output is $0.0433$. You could manually check whether $p<\alpha$. Since it is, we would reject $H_0$, and therefore conclude that at least one pair of means is statistically significantly different.
Or you could ask R to do the comparison for you, but getting the $p$-value from the ANOVA summary is fiddly:
1
2
3
alpha <- 0.05
p.value <- unname( unlist( summary( result ) ) )[9]
p.value < alpha
1
[1] TRUE
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.