How to perform pairwise comparisons
Description
When analyzing data from a completely randomized single-factor design, suppose that you have performed an ANOVA and noticed that there’s a significant difference between at least one pair of treatment levels. How can pairwise comparisons help us explore which pairs of treatment levels are different?
Related tasks:
- How to do a one-way analysis of variance (ANOVA)
- How to perform post-hoc analysis with Tukey’s HSD test
Using statsmodels, in Python
The solution below uses an example dataset that details the counts of insects in an agricultural experiment with six types of insecticides, labeled A through F. (See how to quickly load some sample data.)
1
2
3
from rdatasets import data
df = data('InsectSprays')
df
count | spray | |
---|---|---|
0 | 10 | A |
1 | 7 | A |
2 | 20 | A |
3 | 14 | A |
4 | 14 | A |
... | ... | ... |
67 | 10 | F |
68 | 26 | F |
69 | 26 | F |
70 | 24 | F |
71 | 13 | F |
72 rows × 2 columns
Before we perform any post hoc analysis, we need to see if the count of insects depends on the type of insecticide given by conducting a one way ANOVA. (See also how to do a one-way analysis of variance (ANOVA).)
1
2
3
4
from statsmodels.formula.api import ols
model = ols('count ~ spray', data = df).fit()
import statsmodels.api as sm
sm.stats.anova_lm(model, typ=1)
df | sum_sq | mean_sq | F | PR(>F) | |
---|---|---|---|---|---|
spray | 5.0 | 2668.833333 | 533.766667 | 34.702282 | 3.182584e-17 |
Residual | 66.0 | 1015.166667 | 15.381313 | NaN | NaN |
At the 5% significance level, we see that the count differs according to the type of insecticide used. We assume that the model assumptions are met, but do not verify that here.
If we would like to compare the pairs without any corrections, we can use the ‘pairwise t test’ in the scikit_posthocs
package.
1
2
import scikit_posthocs as sp
sp.posthoc_ttest(df, val_col='count', group_col='spray', p_adjust=None, pool_sd=True )
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
A | 1.000000e+00 | 6.044761e-01 | 7.266893e-11 | 9.816910e-08 | 2.753922e-09 | 1.805998e-01 |
B | 6.044761e-01 | 1.000000e+00 | 8.509776e-12 | 1.212803e-08 | 3.257986e-10 | 4.079858e-01 |
C | 7.266893e-11 | 8.509776e-12 | 1.000000e+00 | 8.141205e-02 | 3.794750e-01 | 2.794343e-13 |
D | 9.816910e-08 | 1.212803e-08 | 8.141205e-02 | 1.000000e+00 | 3.794750e-01 | 4.035610e-10 |
E | 2.753922e-09 | 3.257986e-10 | 3.794750e-01 | 3.794750e-01 | 1.000000e+00 | 1.054387e-11 |
F | 1.805998e-01 | 4.079858e-01 | 2.794343e-13 | 4.035610e-10 | 1.054387e-11 | 1.000000e+00 |
Techniques to adjust the above table for multiple comparisons include the Bonferroni correction, Fisher’s Least Significant Difference (LSD) method, Dunnett’s procedure, and Scheffe’s method. These can be used in place of ‘None’ for the p.adjust
argument; see details here.
You can also determine the magnitude of these differences; see how to perform post-hoc analysis with Tukey’s HSD test.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
The solution below uses an example dataset that details the counts of insects in an agricultural experiment with six types of insecticides, labeled A through F. (This is one of the datasets built into R for use in examples like this one.)
1
2
df <- InsectSprays
head( df, 10 )
1
2
3
4
5
6
7
8
9
10
11
count spray
1 10 A
2 7 A
3 20 A
4 14 A
5 14 A
6 12 A
7 10 A
8 23 A
9 17 A
10 20 A
Before we perform any post hoc analysis, we need to see if the count of insects depends on the type of insecticide given by conducting a one way ANOVA. (See also how to do a one-way analysis of variance (ANOVA).)
1
2
aov1 = aov(count ~ spray, data = df)
summary(aov1)
1
2
3
4
5
Df Sum Sq Mean Sq F value Pr(>F)
spray 5 2669 533.8 34.7 <2e-16 ***
Residuals 66 1015 15.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
At the 5% significance level, we see that the count differs according to the type of insecticide used. We assume that the model assumptions are met, but do not verify that here.
If we would like to compare the pairs without any corrections,
we can use the pairwise.t.test
function built into R.
1
pairwise.t.test(df$count, df$spray, p.adj="none")
1
2
3
4
5
6
7
8
9
10
11
12
Pairwise comparisons using t tests with pooled SD
data: df$count and df$spray
A B C D E
B 0.604 - - - -
C 7.3e-11 8.5e-12 - - -
D 9.8e-08 1.2e-08 0.081 - -
E 2.8e-09 3.3e-10 0.379 0.379 -
F 0.181 0.408 2.8e-13 4.0e-10 1.1e-11
P value adjustment method: none
Techniques to adjust the above table for multiple comparisons include
the Bonferroni correction, Fisher’s Least Significant Difference (LSD) method,
Dunnett’s procedure, and Scheffe’s method.
These can be used in place of “none” for the p.adj
argument;
see details here.
You can also determine the magnitude of these differences; see how to perform post-hoc analysis with Tukey’s HSD test.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
- Julia
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.