How to do a Kruskal-Wallis test (in Python, using SciPy)

Task

If we have samples from several independent populations, we might want to test whether the population medians are equal. We may not be able to assume anything about the populations’ variances, nor whether they are normally distributed, but we do assume that the populations have distributions that are approximately the same shape. The Kruskal-Wallis Test will allow us to test the medians for equality. It is similar to a One-Way ANOVA but using medians instead of means. How do we perform a Kruskal-Wallis Test?

Related tasks:

Solution

For the purposes of this example, let’s say we have a sample of GPAs from matriculated students at three Ivy League institutions: Harvard, Dartmouth, and Columbia. This is example data, and you can replace it with your actual data when you re-use this code.

SciPy requires our data to be in NumPy arrays, as shown below. Note that pandas Series (e.g., columns in a DataFrame) are also NumPy arrays.

import numpy as np
# Replace the fake data below with your real data
harvard   = np.array([3.40, 3.66, 3.90, 3.55, 3.90, 3.58])
dartmouth = np.array([3.90, 3.97, 3.92, 3.83, 4.00, 3.68])
columbia  = np.array([4.00, 3.75, 3.34])

The Kruskal-Willis Test uses a null hypothesis that the category medians are equal, $H_0: m_C = m_H = m_D \le 0$. We choose $\alpha$, or the Type I error rate, as 0.05 and run the test as shown below.

from scipy import stats
stats.kruskal(harvard, dartmouth, columbia)

KruskalResult(statistic=3.706006006006005, pvalue=0.15676569090635095)

The p-value, 0.1568, is greater than $\alpha$, so we fail to reject the null hypothesis. We do not have sufficient evidence to conclude that the median GPAs of matriculated students at these three schools are different from each other.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)