How to use Bonferroni’s Correction method

Description

If we run a one-way ANOVA test and find that there is a significant difference between population means, we might want to know which means are actually different from each other. One way to do so is with the Bonferroni correction. This method runs a $t$-test for each pair of categories using a conservative confidence level.

Solution, in R

View this solution alone.

Let’s assume that you have already done an analysis of variance (ANOVA). (See how to do a one-way analysis of variance (ANOVA) for details.)

As an example, we will use the fake data below, which looks at the number of transactions at an ice cream shop on the weekends. Let’s assume that we chose $\alpha$ to be 0.05 in that ANOVA.

1
2
3
4
5
6
7
8
9
# Store our fake data in vectors.  (You can replace this with your real data.)
num.transactions <- c(91, 134, 98, 105, 93, 89, 145, 132, 109,
94, 105, 99, 84, 128, 120, 115, 118)
days <- c("Fri", "Sun", "Sun", "Sat", "Fri", "Fri", "Sat", "Sun", "Sun",
"Fri", "Sat", "Sat", "Fri", "Sun", "Fri", "Sat", "Sun")

# Perform an ANOVA and print a summary.
model <- aov(num.transactions ~ days)
summary(model)

1
2
3
4
5
Df Sum Sq Mean Sq F value Pr(>F)
days         2   1965   982.7   4.348  0.034 *
Residuals   14   3164   226.0
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


The top-right value in the output is the $p$-value for the test, $0.034$. Because it is below our chosen significance level of $\alpha=0.05$, there are significant differences between the mean number of transactions at the ice cream shop across at least two of these weekend days. But specifically which two, or is it more than two?

We’ll use the PostHocTest() function in the DescTools package, and specify that we want to use the Bonferroni method to make the confidence intervals for each pair of days. Let’s let $\alpha$ be equal to 0.05 again, but the Bonferroni correction implies that the overall probability of a Type I Error in any of the tests below is now at most 0.05, rather than each one being 0.05 separately.

1
2
3
4
5
# install.packages("DescTools") # If you have not already installed it
library(DescTools)

# Run the test and print the confidence intervals for each pair of days
PostHocTest(model, method = "bonferroni", conf.level = 0.95)

1
2
3
4
5
6
7
8
9
10
11
Posthoc multiple comparisons of means : Bonferroni
95% family-wise confidence level

$days diff lwr.ci upr.ci pval Sat-Fri 18.633333 -6.108523 43.37519 0.1798 Sun-Fri 24.666667 1.076232 48.25710 0.0392 * Sun-Sat 6.033333 -18.708523 30.77519 1.0000 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  In the output, R has highlighted the second row for us by placing a * after it. That is the one row where the$p$-value (in the final column) is below our chosen$\alpha=0.05$. Therefore, the only significant difference in mean number of transactions is between Sundays and Fridays. Notice also that the confidence interval in that row (from lwr.ci to upr.ci) does not include zero. (In that particular row, the confidence interval is$(1.076232,48.25710)\$.)

See a problem? Tell us or edit the source.