How to perform a chi-squared test on a contingency table (in R)
Task
If we have a contingency table showing the frequencies observed in two categorical variables, how can we run a $\chi^2$ test to see if the two variables are independent?
Solution
Here we will use a $2\times4$ matrix to store a contingency table of
education vs. gender, taken from
Penn State University’s online stats review website.
You should use your own data.
(Note: R’s table
function is useful for creating contingency tables from data.)
1
2
3
4
data <- matrix( c( 60, 54, 46, 41, 40, 44, 53, 57 ), ncol = 4,
dimnames=list( c('F','M'), c('HS','BS','MS','PhD') ),
byrow =TRUE)
data
1
2
3
HS BS MS PhD
F 60 54 46 41
M 40 44 53 57
The $\chi^2$ test’s null hypothesis is that the two variables are independent. We choose a value $0\leq\alpha\leq1$ as the probability of a Type I error (false positive, finding we should reject $H_0$ when it’s actually true).
R provides a chisq.test
function that does exactly what we need.
1
2
results <- chisq.test( data )
results
1
2
3
4
Pearson's Chi-squared test
data: data
X-squared = 8.0061, df = 3, p-value = 0.04589
We can manually compare the $p$-value to an $\alpha$ we’ve chosen, or ask R to do it.
1
2
alpha <- 0.05 # or choose your own alpha here
results$p.value < alpha # reject the null hypothesis?
1
[1] TRUE
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)