How to do a hypothesis test for population variance (in R)
Task
Assume we want to estimate the variability of a quantity across a population,
starting from a sample of data,
Related tasks:
- How to compute a confidence interval for the population proportion
- How to do a hypothesis test for a mean difference (matched pairs)
- How to do a hypothesis test for a population proportion
- How to do a hypothesis test for the difference between means when both population variances are known
- How to do a hypothesis test for the difference between two proportions
- How to do a hypothesis test for the mean with known standard deviation
- How to do a hypothesis test for the ratio of two population variances
- How to do a hypothesis test of a coefficient’s significance
- How to do a one-sided hypothesis test for two sample means
- How to do a two-sided hypothesis test for a sample mean
- How to do a two-sided hypothesis test for two sample means
Solution
We’ll use R’s dataset EuStockMarkets
to do an example. This dataset has
information on the daily closing prices of 4 European stock indices.
We’re going to look at the variability of Germany’s DAX closing prices.
Let’s load the dataset. (See how to quickly load some sample data.)
If using your own data, place it into the values
variable instead of using
the code below.
1
2
3
4
# install.packages("datasets") # If you have not already done this
library(datasets)
EuStockMarkets <- data.frame(EuStockMarkets)
values <- EuStockMarkets$DAX
Two-tailed test
We may ask whether the population variance is significantly different from a hypothesized value. Let’s test against a variance of 1,000,000.
Our null hypothesis states that the population variance is equal to 1,000,000,
1
2
3
4
hyp.var <- 1000000 # hypothesized variance
df <- length(values) - 1 # degrees of freedom
test.statistic <- df*var(values)/hyp.var # test statistic
2*pchisq(test.statistic, df=df, lower.tail=FALSE) # two-tailed p-value
[1] 3.189769e-07
Our
Left-tailed test
What if we wanted to determine if the population variance were significantly
less than 1,000,000? Our null hypothesis would therefore be
The computations are very similar to the previous case, but with a different
formula for the
1
2
3
4
hyp.var <- 1000000 # hypothesized variance
df <- length(values) - 1 # degrees of freedom
test.statistic <- df*var(values)/hyp.var # test statistic
pchisq(test.statistic, df=df, lower.tail=TRUE) # left-tailed p-value
[1] 0.9999998
Our p-value, 0.9999998, is greater than
Right-tailed test
What if we wanted to determine if the population variance were significantly
less than 1,000,000? Our null hypothesis would therefore be
The computations are very similar to the previous case, but with a different
formula for the
1
2
3
4
hyp.var <- 1000000 # hypothesized variance
df <- length(values) - 1 # degrees of freedom
test.statistic <- df*var(values)/hyp.var # test statistic
pchisq(test.statistic, df=df, lower.tail=FALSE) # right-tailed p-value
[1] 1.594884e-07
Our p-value,
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)