# How to perform a chi-squared test on a contingency table (in R)

## Task

If we have a contingency table showing the frequencies observed in two categorical variables, how can we run a $\chi^2$ test to see if the two variables are independent?

## Solution

Here we will use a $2\times4$ matrix to store a contingency table of
education vs. gender, taken from
Penn State University’s online stats review website.
You should use your own data.
(Note: R’s `table`

function is useful for creating contingency tables from data.)

1
2
3
4

data <- matrix( c( 60, 54, 46, 41, 40, 44, 53, 57 ), ncol = 4,
dimnames=list( c('F','M'), c('HS','BS','MS','PhD') ),
byrow =TRUE)
data

1
2
3

HS BS MS PhD
F 60 54 46 41
M 40 44 53 57

The $\chi^2$ test’s null hypothesis is that the two variables are independent. We choose a value $0\leq\alpha\leq1$ as the probability of a Type I error (false positive, finding we should reject $H_0$ when it’s actually true).

R provides a `chisq.test`

function that does exactly what we need.

1
2

results <- chisq.test( data )
results

1
2
3
4

Pearson's Chi-squared test
data: data
X-squared = 8.0061, df = 3, p-value = 0.04589

We can manually compare the $p$-value to an $\alpha$ we’ve chosen, or ask R to do it.

1
2

alpha <- 0.05 # or choose your own alpha here
results$p.value < alpha # reject the null hypothesis?

1

[1] TRUE

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Nathan Carter (ncarter@bentley.edu)