# How to do a Spearman rank correlation test

## Description

When we want to determine whether there is a relationship between two variables, but our samples do not come from normally distributed populations, we can use the Spearman Rank Correlation Test. How do we conduct it?

## Using SciPy, in Python

View this solution alone.

We will use some fake data about height and weight measurements for people. You can replace it with your real data.

Our data should be NumPy arrays, as in the example below. (Recall that pandas DataFrame columns are also NumPy arrays.)

1
2
3
import numpy as np
heights = np.array([60, 76, 57, 68, 70, 62, 63])
weights = np.array([145, 178, 120, 143, 174, 130, 137])


Let’s say we want to test the correlation between height (inches) and weight (pounds). Our null hypothesis would state that the Pearson correlation coefficient is equal to zero, or that there is no relationship between height and weight, $H_0: \rho_s = 0$. We choose $\alpha$, or the Type I error rate, to be 0.05 and carry out the Spearman Rank Correlation Test to get the test-statistic and $p$-value.

1
2
3
from scipy import stats
from scipy.stats import spearmanr
spearmanr(heights, weights)

1
SignificanceResult(statistic=0.7857142857142859, pvalue=0.03623846267982713)


Our $p$-value is $0.03624$, which is less than $\alpha=0.05$, so we reject the null hypothesis. There does appear to be a relationship between height and weight.

(This $p$-value is different than the one computed in the solution using R, because different approximation methods are used by the two software packages when the sample size is small.)

Note that for right- or left-tailed tests, the following syntax can be used.

1
2
spearmanr(heights, weights, alternative="greater")  # right-tailed
spearmanr(heights, weights, alternative="less")     # left-talied


See a problem? Tell us or edit the source.

## Solution, in R

View this solution alone.

We will use some fake data about height and weight measurements for people. You can replace it with your real data.

Our data should be stored in R vectors, as shown below.

1
2
heights <- c(60, 76, 57, 68, 70, 62, 63)
weights <- c(145, 178, 120, 143, 174, 130, 137)


Let’s say we want to test the correlation between height (inches) and weight (pounds). Our null hypothesis would state that the Pearson correlation coefficient is equal to zero, or that there is no relationship between height and weight, $H_0: \rho_s = 0$. We choose $\alpha$, or the Type I error rate, to be 0.05 and carry out the Spearman Rank Correlation Test to get the test-statistic and $p$-value.

1
2
# Run the Spearman Rank Correlation Test to get the test-statistic and p-value
cor.test(heights, weights, alternative = "two.sided", method = "spearman")

1
2
3
4
5
6
7
8
Spearman's rank correlation rho

data:  heights and weights
S = 12, p-value = 0.04802
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.7857143


Our $p$-value is $0.04802$, which is less than $\alpha=0.05$, so we reject the null hypothesis. There does appear to be a relationship between height and weight.

(This $p$-value is different than the one computed in the solution using Python, because different approximation methods are used by the two software packages when the sample size is small.)

Note that for a right-tailed test, you can replace “two.sided” with “greater” and for a left-tailed test, you can replace “two.sided” with “less”.