How to perform post-hoc analysis with Tukey’s HSD test
Description
If we run a one-way ANOVA test and find that there is a significant difference between population means, we might want to know which means are actually different from each other. One way to do so is with Tukey’s Honestly Significant Differences (HSD) method. It creates confidence intervals for each pair of samples, while controlling for Type I error rate across all pairs. Thus the resulting intervals are a little wider than those produced using Fisher’s LSD method. How do we make these confidence intervals, with an appropriate visualization?
Using statsmodels and Matplotlib, in Python
We load here the same data that appears in the solution for how to perform pairwise comparisons. That solution used ANOVA to determine which pairs of groups have significant differences in their means; follow its link for more details.
1
2
3
from rdatasets import data
df = data('InsectSprays')
df
count | spray | |
---|---|---|
0 | 10 | A |
1 | 7 | A |
2 | 20 | A |
3 | 14 | A |
4 | 14 | A |
... | ... | ... |
67 | 10 | F |
68 | 26 | F |
69 | 26 | F |
70 | 24 | F |
71 | 13 | F |
72 rows × 2 columns
We now want to perform an unplanned comparison test on the data to determine the magnitudes of the differences between pairs of groups. We do this by applying Tukey’s HSD approach to perform pairwise comparisons and generate confidence intervals that maintain a specified experiment-wide error rate. Before that, the pairwise_tukeyhsd
module needs to be imported from the statsmodels
package.
1
2
3
from statsmodels.stats.multicomp import pairwise_tukeyhsd
tukey = pairwise_tukeyhsd(endog=df['count'], groups=df['spray'], alpha=0.05)
print(tukey)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------
A B 0.8333 0.9952 -3.8661 5.5327 False
A C -12.4167 0.0 -17.1161 -7.7173 True
A D -9.5833 0.0 -14.2827 -4.8839 True
A E -11.0 0.0 -15.6994 -6.3006 True
A F 2.1667 0.7542 -2.5327 6.8661 False
B C -13.25 0.0 -17.9494 -8.5506 True
B D -10.4167 0.0 -15.1161 -5.7173 True
B E -11.8333 0.0 -16.5327 -7.1339 True
B F 1.3333 0.9603 -3.3661 6.0327 False
C D 2.8333 0.4921 -1.8661 7.5327 False
C E 1.4167 0.9489 -3.2827 6.1161 False
C F 14.5833 0.0 9.8839 19.2827 True
D E -1.4167 0.9489 -6.1161 3.2827 False
D F 11.75 0.0 7.0506 16.4494 True
E F 13.1667 0.0 8.4673 17.8661 True
-----------------------------------------------------
Because the above table contains a lot of information, it’s often helpful to visualize these intervals.
Python’s statsmodels
package does not have a built-in way to do so, but we can create our own as follows.
1
2
3
4
5
6
7
import matplotlib.pyplot as plt
rows = tukey.summary().data[1:]
plt.hlines( range(len(rows)), [row[4] for row in rows], [row[5] for row in rows] )
plt.vlines( 0, -1, len( rows )-1, linestyles='dashed' )
plt.gca().set_yticks( range( len( rows ) ) )
plt.gca().set_yticklabels( [ f'{x[0]}-{x[1]}' for x in rows ] )
plt.show()
Confidence intervals that cross the vertical, dashed line at $x=0$ are those in which the means across those groups may be equal. Other intervals have mean differences whose 95% confidence intervals do not include zero.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Using agricolae, in R
We load here the same data that appears in the solution for how to perform pairwise comparisons. That solution used ANOVA to determine which pairs of groups have significant differences in their means; follow its link for more details.
1
2
3
# Load an inbuilt data set called InsectSprays and assign it to the variable df
df <- InsectSprays
head( df, 10 )
1
2
3
4
5
6
7
8
9
10
11
count spray
1 10 A
2 7 A
3 20 A
4 14 A
5 14 A
6 12 A
7 10 A
8 23 A
9 17 A
10 20 A
We now want to perform an unplanned comparison test on the data to determine
the magnitudes of the differences between pairs of groups. Although R has a
built-in TukeyHSD
function, its output is not as complete as the HSD.test
function in the agricolae package, so here we will use that latter function.
We provide it the same ANOVA results that we computed in the solution to
how to perform pairwise comparisons.
1
2
3
4
# install.packages( "agricolae" ) # if you have not already done this
library(agricolae)
aov1 <- aov(count ~ spray, data = df)
HSD.test(aov1, "spray", group=FALSE, console=TRUE)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Study: aov1 ~ "spray"
HSD Test for count
Mean Square Error: 15.38131
spray, means
count std r se Min Max Q25 Q50 Q75
A 14.500000 4.719399 12 1.132156 7 23 11.50 14.0 17.75
B 15.333333 4.271115 12 1.132156 7 21 12.50 16.5 17.50
C 2.083333 1.975225 12 1.132156 0 7 1.00 1.5 3.00
D 4.916667 2.503028 12 1.132156 2 12 3.75 5.0 5.00
E 3.500000 1.732051 12 1.132156 1 6 2.75 3.0 5.00
F 16.666667 6.213378 12 1.132156 9 26 12.50 15.0 22.50
Alpha: 0.05 ; DF Error: 66
Critical Value of Studentized Range: 4.150851
Comparison between treatments means
difference pvalue signif. LCL UCL
A - B -0.8333333 0.9952 -5.532742 3.866075
A - C 12.4166667 0.0000 *** 7.717258 17.116075
A - D 9.5833333 0.0000 *** 4.883925 14.282742
A - E 11.0000000 0.0000 *** 6.300591 15.699409
A - F -2.1666667 0.7542 -6.866075 2.532742
B - C 13.2500000 0.0000 *** 8.550591 17.949409
B - D 10.4166667 0.0000 *** 5.717258 15.116075
B - E 11.8333333 0.0000 *** 7.133925 16.532742
B - F -1.3333333 0.9603 -6.032742 3.366075
C - D -2.8333333 0.4921 -7.532742 1.866075
C - E -1.4166667 0.9489 -6.116075 3.282742
C - F -14.5833333 0.0000 *** -19.282742 -9.883925
D - E 1.4166667 0.9489 -3.282742 6.116075
D - F -11.7500000 0.0000 *** -16.449409 -7.050591
E - F -13.1666667 0.0000 *** -17.866075 -8.467258
The table above highlights for us those confidence intervals whose means are significantly different from zero, and provides other information as well. To see if there is any statistical different between the pairs, look at the “signif” column. The more asterisks appear there, the more significant the difference.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
We load here the same data that appears in the solution for how to perform pairwise comparisons. That solution used ANOVA to determine which pairs of groups have significant differences in their means; follow its link for more details.
1
2
3
# Load an inbuilt data set called InsectSprays and assign it to the variable df
df <- InsectSprays
head( df, 10 )
1
2
3
4
5
6
7
8
9
10
11
count spray
1 10 A
2 7 A
3 20 A
4 14 A
5 14 A
6 12 A
7 10 A
8 23 A
9 17 A
10 20 A
We now want to perform an unplanned comparison test on the data to determine
the magnitudes of the differences between pairs of groups. We do this by
applying Tukey’s HSD approach to perform pairwise comparisons and generate
confidence intervals that maintain a specified experiment-wide error rate.
We use R’s built-in TukeyHSD
function, and we give it the same ANOVA results
that we computed in the solution for how to perform pairwise comparisons.
1
2
aov1 <- aov(count ~ spray, data = df)
TukeyHSD(aov1, "spray", ordered=TRUE, conf.level = 0.95)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Tukey multiple comparisons of means
95% family-wise confidence level
factor levels have been ordered
Fit: aov(formula = count ~ spray, data = df)
$spray
diff lwr upr p adj
E-C 1.4166667 -3.282742 6.116075 0.9488669
D-C 2.8333333 -1.866075 7.532742 0.4920707
A-C 12.4166667 7.717258 17.116075 0.0000000
B-C 13.2500000 8.550591 17.949409 0.0000000
F-C 14.5833333 9.883925 19.282742 0.0000000
D-E 1.4166667 -3.282742 6.116075 0.9488669
A-E 11.0000000 6.300591 15.699409 0.0000000
B-E 11.8333333 7.133925 16.532742 0.0000000
F-E 13.1666667 8.467258 17.866075 0.0000000
A-D 9.5833333 4.883925 14.282742 0.0000014
B-D 10.4166667 5.717258 15.116075 0.0000002
F-D 11.7500000 7.050591 16.449409 0.0000000
B-A 0.8333333 -3.866075 5.532742 0.9951810
F-A 2.1666667 -2.532742 6.866075 0.7542147
F-B 1.3333333 -3.366075 6.032742 0.9603075
Because the above table contains a lot of information, it’s often helpful to
visualize these intervals. R lets us do so by simply calling plot
on the
above table. We add a few plotting parameters to improve its appearance.
1
2
plot( TukeyHSD(aov1, "spray", ordered=TRUE, conf.level = 0.95),
las=1, cex.axis=0.9 )
Confidence intervals that cross the vertical, dashed line at $x=0$ are those in which the means across those groups may be equal. Other intervals have mean differences whose 95% confidence intervals do not include zero.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
- Julia
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.