# How to perform a chi-squared test on a contingency table (in Julia)

See all solutions.

If we have a contingency table showing the frequencies observed in two categorical variables, how can we run a $\chi^2$ test to see if the two variables are independent?

## Solution

Here we will use a two-dimensional Julia array to store a contingency table of education vs. gender, taken from Penn State University’s online stats review website. You should use your own data.

1
2
3
4
5
data = [
#   HS  BS  MS  Phd
60  54  46  41    # females
40  44  53  57    # males
]

1
2
3
2×4 Matrix{Int64}:
60  54  46  41
40  44  53  57


The $\chi^2$ test’s null hypothesis is that the two variables are independent. We choose a value $0\leq\alpha\leq1$ as the probability of a Type I error (false positive, finding we should reject $H_0$ when it’s actually true).

1
2
3
4
5
6
alpha = 0.05  # or choose your own alpha here

using HypothesisTests
p_value = pvalue( ChisqTest( data ) )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0

1
(0.05, 0.04588650089174742, true)


In this case, the samples give us enough evidence to reject the null hypothesis at the $\alpha=0.05$ level. The data suggest that the two categorical variables are not independent.

If you are using the most common $\alpha$ value of $0.05$, you can save a few lines of code and get more output by just writing the test itself:

1
ChisqTest( data )

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Pearson's Chi-square Test
-------------------------
Population details:
parameter of interest:   Multinomial Probabilities
value under h_0:         [0.128826, 0.124339, 0.126249, 0.121852, 0.127537, 0.123096, 0.126249, 0.121852]
point estimate:          [0.151899, 0.101266, 0.136709, 0.111392, 0.116456, 0.134177, 0.103797, 0.144304]
95% confidence interval: [(0.1089, 0.1978), (0.05823, 0.1472), (0.09367, 0.1826), (0.06835, 0.1573), (0.07342, 0.1624), (0.09114, 0.1801), (0.06076, 0.1497), (0.1013, 0.1902)]

Test summary:
outcome with 95% confidence: reject h_0
one-sided p-value:           0.0459

Details:
Sample size:        395
statistic:          8.006066246262527
degrees of freedom: 3
residuals:          [1.27763, -1.30048, 0.585074, -0.595536, -0.61671, 0.627737, -1.25583, 1.27828]
std. residuals:     [2.10956, -2.10956, 0.962783, -0.962783, -1.01656, 1.01656, -2.06656, 2.06656]