How to do a hypothesis test for the ratio of two population variances (in Python, using SciPy)

Task

Let’s say we want to compare the variability of two populations. We take two samples of data, $x_1, x_2, x_3, \ldots, x_k$ from population 1 and $x’_1, x’_2, x’_3, \ldots, x’_k$ from population 2. What hypothesis tests can help us compare the population variances?

Related tasks:

Solution

We’ll use R’s dataset EuStockMarkets to do an example. This dataset has information on the daily closing prices of 4 European stock indices. We’re going to compare the variability of Germany’s DAX and France’s CAC closing prices.

Let’s load the dataset. (See how to quickly load some sample data.) If using your own data, place it into the sample1 and sample2 variables instead of using the code below.

from rdatasets import data
import pandas as pd

# Load in the EuStockMarkets data and place it in a pandas DataFrame
EuStockMarkets = data('EuStockMarkets')
df = pd.DataFrame(EuStockMarkets[['DAX', 'CAC']])

# Choose the two columns we want to analyze
# (You can replace the two lines below with your actual data.)
sample1 = df['DAX']
sample2 = df['CAC']

For all tests below, we will use $\alpha=0.05$ as our Type I Error Rate, but any value between 0.0 and 1.0 can be used.

Two-tailed test

We can use a two-tailed test to test whether the two population variances are equal. Specifically, the null hypothesis will be:

\[H_0: \frac{\sigma_1^2}{\sigma_2^2} = 1\]

from scipy import stats
sample1_df = len(sample1) - 1                   # degrees of freedom
sample2_df = len(sample2) - 1                   # degrees of freedom
test_statistic = sample1.var() / sample2.var()  # test statistic
stats.f.sf(test_statistic, dfn = sample1_df, dfd = sample2_df)*2  # p-value

7.729079251495416e-151

Our $p$-value is smaller than our chosen alpha, so we have sufficient evidence to reject the null hypothesis. The ratio of the variance of the closing prices on Germany’s DAX and France’s CAC is significantly different than 1, so the variances are not equal.

Right-tailed test

In a right-tailed test, the null hypothesis is that the ratio is less than or equal to 1. This is equivalent to asking if $\sigma_1^2 \le \sigma_2^2$.

\[H_0: \frac{\sigma_1^2}{\sigma_2^2} \le 1\]

We repeat below some of the code above to make each example easy to copy and paste.

from scipy import stats
sample1_df = len(sample1) - 1                   # degrees of freedom
sample2_df = len(sample2) - 1                   # degrees of freedom
test_statistic = sample1.var() / sample2.var()  # test statistic
stats.f.sf(test_statistic, dfn = sample1_df, dfd = sample2_df)  # p-value

3.864539625747708e-151

To test whether $\sigma_1^2 \ge \sigma_2^2$, simply swap the roles of the two data columns in the above code.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by:

Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)
Nathan Carter (ncarter@bentley.edu)