Link Search Menu Expand Document (external link)

How to perform post-hoc analysis with Tukey’s HSD test

Description

If we run a one-way ANOVA test and find that there is a significant difference between population means, we might want to know which means are actually different from each other. One way to do so is with Tukey’s Honestly Significant Differences (HSD) method. It creates confidence intervals for each pair of samples, while controlling for Type I error rate across all pairs. Thus the resulting intervals are a little wider than those produced using Fisher’s LSD method. How do we make these confidence intervals, with an appropriate visualization?

Using statsmodels and Matplotlib, in Python

View this solution alone.

We load here the same data that appears in the solution for how to perform pairwise comparisons. That solution used ANOVA to determine which pairs of groups have significant differences in their means; follow its link for more details.

1
2
3
from rdatasets import data
df = data('InsectSprays')
df
count spray
0 10 A
1 7 A
2 20 A
3 14 A
4 14 A
... ... ...
67 10 F
68 26 F
69 26 F
70 24 F
71 13 F

72 rows × 2 columns

We now want to perform an unplanned comparison test on the data to determine the magnitudes of the differences between pairs of groups. We do this by applying Tukey’s HSD approach to perform pairwise comparisons and generate confidence intervals that maintain a specified experiment-wide error rate. Before that, the pairwise_tukeyhsd module needs to be imported from the statsmodels package.

1
2
3
from statsmodels.stats.multicomp import pairwise_tukeyhsd
tukey = pairwise_tukeyhsd(endog=df['count'], groups=df['spray'], alpha=0.05)
print(tukey)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
 Multiple Comparison of Means - Tukey HSD, FWER=0.05 
=====================================================
group1 group2 meandiff p-adj   lower    upper  reject
-----------------------------------------------------
     A      B   0.8333 0.9952  -3.8661  5.5327  False
     A      C -12.4167    0.0 -17.1161 -7.7173   True
     A      D  -9.5833    0.0 -14.2827 -4.8839   True
     A      E    -11.0    0.0 -15.6994 -6.3006   True
     A      F   2.1667 0.7542  -2.5327  6.8661  False
     B      C   -13.25    0.0 -17.9494 -8.5506   True
     B      D -10.4167    0.0 -15.1161 -5.7173   True
     B      E -11.8333    0.0 -16.5327 -7.1339   True
     B      F   1.3333 0.9603  -3.3661  6.0327  False
     C      D   2.8333 0.4921  -1.8661  7.5327  False
     C      E   1.4167 0.9489  -3.2827  6.1161  False
     C      F  14.5833    0.0   9.8839 19.2827   True
     D      E  -1.4167 0.9489  -6.1161  3.2827  False
     D      F    11.75    0.0   7.0506 16.4494   True
     E      F  13.1667    0.0   8.4673 17.8661   True
-----------------------------------------------------

Because the above table contains a lot of information, it’s often helpful to visualize these intervals. Python’s statsmodels package does not have a built-in way to do so, but we can create our own as follows.

1
2
3
4
5
6
7
import matplotlib.pyplot as plt
rows = tukey.summary().data[1:]
plt.hlines( range(len(rows)), [row[4] for row in rows], [row[5] for row in rows] )
plt.vlines( 0, -1, len( rows )-1, linestyles='dashed' )
plt.gca().set_yticks( range( len( rows ) ) )
plt.gca().set_yticklabels( [ f'{x[0]}-{x[1]}' for x in rows ] )
plt.show()

png

Confidence intervals that cross the vertical, dashed line at $x=0$ are those in which the means across those groups may be equal. Other intervals have mean differences whose 95% confidence intervals do not include zero.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Using agricolae, in R

View this solution alone.

We load here the same data that appears in the solution for how to perform pairwise comparisons. That solution used ANOVA to determine which pairs of groups have significant differences in their means; follow its link for more details.

1
2
3
# Load an inbuilt data set called InsectSprays and assign it to the variable df
df <- InsectSprays
head( df, 10 )
1
2
3
4
5
6
7
8
9
10
11
   count spray
1  10    A    
2   7    A    
3  20    A    
4  14    A    
5  14    A    
6  12    A    
7  10    A    
8  23    A    
9  17    A    
10 20    A    

We now want to perform an unplanned comparison test on the data to determine the magnitudes of the differences between pairs of groups. Although R has a built-in TukeyHSD function, its output is not as complete as the HSD.test function in the agricolae package, so here we will use that latter function. We provide it the same ANOVA results that we computed in the solution to how to perform pairwise comparisons.

1
2
3
4
# install.packages( "agricolae" ) # if you have not already done this
library(agricolae)
aov1 <- aov(count ~ spray, data = df)
HSD.test(aov1, "spray", group=FALSE, console=TRUE)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Study: aov1 ~ "spray"

HSD Test for count 

Mean Square Error:  15.38131 

spray,  means

      count      std  r       se Min Max   Q25  Q50   Q75
A 14.500000 4.719399 12 1.132156   7  23 11.50 14.0 17.75
B 15.333333 4.271115 12 1.132156   7  21 12.50 16.5 17.50
C  2.083333 1.975225 12 1.132156   0   7  1.00  1.5  3.00
D  4.916667 2.503028 12 1.132156   2  12  3.75  5.0  5.00
E  3.500000 1.732051 12 1.132156   1   6  2.75  3.0  5.00
F 16.666667 6.213378 12 1.132156   9  26 12.50 15.0 22.50

Alpha: 0.05 ; DF Error: 66 
Critical Value of Studentized Range: 4.150851 

Comparison between treatments means

       difference pvalue signif.        LCL       UCL
A - B  -0.8333333 0.9952          -5.532742  3.866075
A - C  12.4166667 0.0000     ***   7.717258 17.116075
A - D   9.5833333 0.0000     ***   4.883925 14.282742
A - E  11.0000000 0.0000     ***   6.300591 15.699409
A - F  -2.1666667 0.7542          -6.866075  2.532742
B - C  13.2500000 0.0000     ***   8.550591 17.949409
B - D  10.4166667 0.0000     ***   5.717258 15.116075
B - E  11.8333333 0.0000     ***   7.133925 16.532742
B - F  -1.3333333 0.9603          -6.032742  3.366075
C - D  -2.8333333 0.4921          -7.532742  1.866075
C - E  -1.4166667 0.9489          -6.116075  3.282742
C - F -14.5833333 0.0000     *** -19.282742 -9.883925
D - E   1.4166667 0.9489          -3.282742  6.116075
D - F -11.7500000 0.0000     *** -16.449409 -7.050591
E - F -13.1666667 0.0000     *** -17.866075 -8.467258

The table above highlights for us those confidence intervals whose means are significantly different from zero, and provides other information as well. To see if there is any statistical different between the pairs, look at the “signif” column. The more asterisks appear there, the more significant the difference.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Solution, in R

View this solution alone.

We load here the same data that appears in the solution for how to perform pairwise comparisons. That solution used ANOVA to determine which pairs of groups have significant differences in their means; follow its link for more details.

1
2
3
# Load an inbuilt data set called InsectSprays and assign it to the variable df
df <- InsectSprays
head( df, 10 )
1
2
3
4
5
6
7
8
9
10
11
   count spray
1  10    A    
2   7    A    
3  20    A    
4  14    A    
5  14    A    
6  12    A    
7  10    A    
8  23    A    
9  17    A    
10 20    A    

We now want to perform an unplanned comparison test on the data to determine the magnitudes of the differences between pairs of groups. We do this by applying Tukey’s HSD approach to perform pairwise comparisons and generate confidence intervals that maintain a specified experiment-wide error rate. We use R’s built-in TukeyHSD function, and we give it the same ANOVA results that we computed in the solution for how to perform pairwise comparisons.

1
2
aov1 <- aov(count ~ spray, data = df)
TukeyHSD(aov1, "spray", ordered=TRUE, conf.level = 0.95)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  Tukey multiple comparisons of means
    95% family-wise confidence level
    factor levels have been ordered

Fit: aov(formula = count ~ spray, data = df)

$spray
          diff       lwr       upr     p adj
E-C  1.4166667 -3.282742  6.116075 0.9488669
D-C  2.8333333 -1.866075  7.532742 0.4920707
A-C 12.4166667  7.717258 17.116075 0.0000000
B-C 13.2500000  8.550591 17.949409 0.0000000
F-C 14.5833333  9.883925 19.282742 0.0000000
D-E  1.4166667 -3.282742  6.116075 0.9488669
A-E 11.0000000  6.300591 15.699409 0.0000000
B-E 11.8333333  7.133925 16.532742 0.0000000
F-E 13.1666667  8.467258 17.866075 0.0000000
A-D  9.5833333  4.883925 14.282742 0.0000014
B-D 10.4166667  5.717258 15.116075 0.0000002
F-D 11.7500000  7.050591 16.449409 0.0000000
B-A  0.8333333 -3.866075  5.532742 0.9951810
F-A  2.1666667 -2.532742  6.866075 0.7542147
F-B  1.3333333 -3.366075  6.032742 0.9603075

Because the above table contains a lot of information, it’s often helpful to visualize these intervals. R lets us do so by simply calling plot on the above table. We add a few plotting parameters to improve its appearance.

1
2
plot( TukeyHSD(aov1, "spray", ordered=TRUE, conf.level = 0.95),
      las=1, cex.axis=0.9 )

Confidence intervals that cross the vertical, dashed line at $x=0$ are those in which the means across those groups may be equal. Other intervals have mean differences whose 95% confidence intervals do not include zero.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Topics that include this task

Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

  • Excel
  • Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.