How to do a hypothesis test for the ratio of two population variances

Description

Let’s say we want to compare the variability of two populations. We take two samples of data, $x_1, x_2, x_3, \ldots, x_k$ from population 1 and $x’_1, x’_2, x’_3, \ldots, x’_k$ from population 2. What hypothesis tests can help us compare the population variances?

Related tasks:

Using SciPy, in Python

View this solution alone.

We’ll use R’s dataset EuStockMarkets to do an example. This dataset has information on the daily closing prices of 4 European stock indices. We’re going to compare the variability of Germany’s DAX and France’s CAC closing prices.

Let’s load the dataset. (See how to quickly load some sample data.) If using your own data, place it into the sample1 and sample2 variables instead of using the code below.

from rdatasets import data
import pandas as pd

# Load in the EuStockMarkets data and place it in a pandas DataFrame
EuStockMarkets = data('EuStockMarkets')
df = pd.DataFrame(EuStockMarkets[['DAX', 'CAC']])

# Choose the two columns we want to analyze
# (You can replace the two lines below with your actual data.)
sample1 = df['DAX']
sample2 = df['CAC']

For all tests below, we will use $\alpha=0.05$ as our Type I Error Rate, but any value between 0.0 and 1.0 can be used.

Two-tailed test

We can use a two-tailed test to test whether the two population variances are equal. Specifically, the null hypothesis will be:

\[H_0: \frac{\sigma_1^2}{\sigma_2^2} = 1\]

from scipy import stats
sample1_df = len(sample1) - 1                   # degrees of freedom
sample2_df = len(sample2) - 1                   # degrees of freedom
test_statistic = sample1.var() / sample2.var()  # test statistic
stats.f.sf(test_statistic, dfn = sample1_df, dfd = sample2_df)*2  # p-value

7.729079251495416e-151

Our $p$-value is smaller than our chosen alpha, so we have sufficient evidence to reject the null hypothesis. The ratio of the variance of the closing prices on Germany’s DAX and France’s CAC is significantly different than 1, so the variances are not equal.

Right-tailed test

In a right-tailed test, the null hypothesis is that the ratio is less than or equal to 1. This is equivalent to asking if $\sigma_1^2 \le \sigma_2^2$.

\[H_0: \frac{\sigma_1^2}{\sigma_2^2} \le 1\]

We repeat below some of the code above to make each example easy to copy and paste.

from scipy import stats
sample1_df = len(sample1) - 1                   # degrees of freedom
sample2_df = len(sample2) - 1                   # degrees of freedom
test_statistic = sample1.var() / sample2.var()  # test statistic
stats.f.sf(test_statistic, dfn = sample1_df, dfd = sample2_df)  # p-value

3.864539625747708e-151

To test whether $\sigma_1^2 \ge \sigma_2^2$, simply swap the roles of the two data columns in the above code.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Solution, in R

View this solution alone.

Let’s load the dataset. (See how to quickly load some sample data.) If using your own data, place it into the sample1 and sample2 variables instead of using the code below.

# install.packages("datasets") # If you have not already done so
library(datasets)

# Load the dataset and convert it to a data frame, then extract two columns
EuStockMarkets <- data.frame(EuStockMarkets)
sample.1 <- EuStockMarkets$DAX
sample.2 <- EuStockMarkets$CAC

Two-tailed test

For all tests below, we will use $\alpha=0.05$ as our Type I Error Rate, but any value between 0.0 and 1.0 can be used.

Two-tailed test

We can use a two-tailed test to test whether the two population variances are equal. Specifically, the null hypothesis will be:

\[H_0: \frac{\sigma_1^2}{\sigma_2^2} = 1\]

sample.1.df <- length(sample.1) - 1            # degrees of freedom
sample.2.df <- length(sample.2) - 1            # degrees of freedom
test.statistic <- var(sample.1)/var(sample.2)  # test statistic
2*pf(test.statistic, df1=sample.1.df, df2=sample.2.df, lower.tail=FALSE) # p-value

[1] 7.729079e-151

Right-tailed test

In a right-tailed test, the null hypothesis is that the ratio is less than or equal to 1. This is equivalent to asking if $\sigma_1^2 \le \sigma_2^2$.

\[H_0: \frac{\sigma_1^2}{\sigma_2^2} \le 1\]

We repeat below some of the code above to make each example easy to copy and paste.

sample.1.df <- length(sample.1) - 1            # degrees of freedom
sample.2.df <- length(sample.2) - 1            # degrees of freedom
test.statistic <- var(sample.1)/var(sample.2)  # test statistic
pf(test.statistic, df1=sample.1.df, df2=sample.2.df, lower.tail=FALSE) # p-value

[1] 3.86454e-151

To test whether $\sigma_1^2 \ge \sigma_2^2$, simply swap the roles of the two data columns in the above code.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Topics that include this task

Bentley University MA214

Opportunities

This website does not yet contain a solution for this task in any of the following software packages.

Excel
Julia

If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.