How to do a hypothesis test for the ratio of two population variances (in Python, using SciPy)
Task
Let’s say we want to compare the variability of two populations. We take two samples of data, $x_1, x_2, x_3, \ldots, x_k$ from population 1 and $x’_1, x’_2, x’_3, \ldots, x’_k$ from population 2. What hypothesis tests can help us compare the population variances?
Related tasks:
- How to compute a confidence interval for the difference between two proportions
- How to do a hypothesis test for a mean difference (matched pairs)
- How to do a hypothesis test for a population proportion
- How to do a hypothesis test for population variance
- How to do a hypothesis test for the difference between means when both population variances are known
- How to do a hypothesis test for the difference between two proportions
- How to do a hypothesis test for the mean with known standard deviation
- How to do a hypothesis test of a coefficient’s significance
- How to do a one-sided hypothesis test for two sample means
- How to do a two-sided hypothesis test for a sample mean
- How to do a two-sided hypothesis test for two sample means
Solution
We’ll use R’s dataset EuStockMarkets
to do an example. This dataset has
information on the daily closing prices of 4 European stock indices.
We’re going to compare the variability of Germany’s DAX and France’s CAC
closing prices.
Let’s load the dataset. (See how to quickly load some sample data.)
If using your own data, place it into the sample1
and sample2
variables
instead of using the code below.
1
2
3
4
5
6
7
8
9
10
11
from rdatasets import data
import pandas as pd
# Load in the EuStockMarkets data and place it in a pandas DataFrame
EuStockMarkets = data('EuStockMarkets')
df = pd.DataFrame(EuStockMarkets[['DAX', 'CAC']])
# Choose the two columns we want to analyze
# (You can replace the two lines below with your actual data.)
sample1 = df['DAX']
sample2 = df['CAC']
For all tests below, we will use $\alpha=0.05$ as our Type I Error Rate, but any value between 0.0 and 1.0 can be used.
Two-tailed test
We can use a two-tailed test to test whether the two population variances are equal. Specifically, the null hypothesis will be:
\[H_0: \frac{\sigma_1^2}{\sigma_2^2} = 1\]1
2
3
4
5
from scipy import stats
sample1_df = len(sample1) - 1 # degrees of freedom
sample2_df = len(sample2) - 1 # degrees of freedom
test_statistic = sample1.var() / sample2.var() # test statistic
stats.f.sf(test_statistic, dfn = sample1_df, dfd = sample2_df)*2 # p-value
1
7.729079251495416e-151
Our $p$-value is smaller than our chosen alpha, so we have sufficient evidence to reject the null hypothesis. The ratio of the variance of the closing prices on Germany’s DAX and France’s CAC is significantly different than 1, so the variances are not equal.
Right-tailed test
In a right-tailed test, the null hypothesis is that the ratio is less than or equal to 1. This is equivalent to asking if $\sigma_1^2 \le \sigma_2^2$.
\[H_0: \frac{\sigma_1^2}{\sigma_2^2} \le 1\]We repeat below some of the code above to make each example easy to copy and paste.
1
2
3
4
5
from scipy import stats
sample1_df = len(sample1) - 1 # degrees of freedom
sample2_df = len(sample2) - 1 # degrees of freedom
test_statistic = sample1.var() / sample2.var() # test statistic
stats.f.sf(test_statistic, dfn = sample1_df, dfd = sample2_df) # p-value
1
3.864539625747708e-151
Our $p$-value is smaller than our chosen alpha, so we have sufficient evidence to reject the null hypothesis. The ratio of the variance of the closing prices on Germany’s DAX and France’s CAC is significantly greater than 1, so the variance of closing prices on Germany’s DAX is greater than that of closing prices on France’s CAC.
To test whether $\sigma_1^2 \ge \sigma_2^2$, simply swap the roles of the two data columns in the above code.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by:
- Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)
- Nathan Carter (ncarter@bentley.edu)