How to do a Spearman rank correlation test (in Python, using SciPy)

Task

When we want to determine whether there is a relationship between two variables, but our samples do not come from normally distributed populations, we can use the Spearman Rank Correlation Test. How do we conduct it?

Solution

We will use some fake data about height and weight measurements for people. You can replace it with your real data.

Our data should be NumPy arrays, as in the example below. (Recall that pandas DataFrame columns are also NumPy arrays.)

import numpy as np
heights = np.array([60, 76, 57, 68, 70, 62, 63])
weights = np.array([145, 178, 120, 143, 174, 130, 137])

Let’s say we want to test the correlation between height (inches) and weight (pounds). Our null hypothesis would state that the Pearson correlation coefficient is equal to zero, or that there is no relationship between height and weight, $H_0: \rho_s = 0$. We choose $\alpha$, or the Type I error rate, to be 0.05 and carry out the Spearman Rank Correlation Test to get the test-statistic and $p$-value.

from scipy import stats
from scipy.stats import spearmanr
spearmanr(heights, weights)

SignificanceResult(statistic=0.7857142857142859, pvalue=0.03623846267982713)

Our $p$-value is $0.03624$, which is less than $\alpha=0.05$, so we reject the null hypothesis. There does appear to be a relationship between height and weight.

(This $p$-value is different than the one computed in the solution using R, because different approximation methods are used by the two software packages when the sample size is small.)

Note that for right- or left-tailed tests, the following syntax can be used.

spearmanr(heights, weights, alternative="greater")  # right-tailed
spearmanr(heights, weights, alternative="less")     # left-talied

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)