How to perform a chi-squared test on a contingency table (in Julia)
Task
If we have a contingency table showing the frequencies observed in two categorical variables, how can we run a $\chi^2$ test to see if the two variables are independent?
Solution
Here we will use a two-dimensional Julia array to store a contingency table of education vs. gender, taken from Penn State University’s online stats review website. You should use your own data.
1
2
3
4
5
data = [
# HS BS MS Phd
60 54 46 41 # females
40 44 53 57 # males
]
1
2
3
2×4 Matrix{Int64}:
60 54 46 41
40 44 53 57
The $\chi^2$ test’s null hypothesis is that the two variables are independent. We choose a value $0\leq\alpha\leq1$ as the probability of a Type I error (false positive, finding we should reject $H_0$ when it’s actually true).
1
2
3
4
5
6
alpha = 0.05 # or choose your own alpha here
using HypothesisTests
p_value = pvalue( ChisqTest( data ) )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0
1
(0.05, 0.04588650089174742, true)
In this case, the samples give us enough evidence to reject the null hypothesis at the $\alpha=0.05$ level. The data suggest that the two categorical variables are not independent.
If you are using the most common $\alpha$ value of $0.05$, you can save a few lines of code and get more output by just writing the test itself:
1
ChisqTest( data )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Pearson's Chi-square Test
-------------------------
Population details:
parameter of interest: Multinomial Probabilities
value under h_0: [0.128826, 0.124339, 0.126249, 0.121852, 0.127537, 0.123096, 0.126249, 0.121852]
point estimate: [0.151899, 0.101266, 0.136709, 0.111392, 0.116456, 0.134177, 0.103797, 0.144304]
95% confidence interval: [(0.1089, 0.1978), (0.05823, 0.1472), (0.09367, 0.1826), (0.06835, 0.1573), (0.07342, 0.1624), (0.09114, 0.1801), (0.06076, 0.1497), (0.1013, 0.1902)]
Test summary:
outcome with 95% confidence: reject h_0
one-sided p-value: 0.0459
Details:
Sample size: 395
statistic: 8.006066246262527
degrees of freedom: 3
residuals: [1.27763, -1.30048, 0.585074, -0.595536, -0.61671, 0.627737, -1.25583, 1.27828]
std. residuals: [2.10956, -2.10956, 0.962783, -0.962783, -1.01656, 1.01656, -2.06656, 2.06656]
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)