Link Search Menu Expand Document (external link)

How to perform a chi-squared test on a contingency table (in Julia)

See all solutions.

Task

If we have a contingency table showing the frequencies observed in two categorical variables, how can we run a $\chi^2$ test to see if the two variables are independent?

Solution

Here we will use a two-dimensional Julia array to store a contingency table of education vs. gender, taken from Penn State University’s online stats review website. You should use your own data.

1
2
3
4
5
data = [
#   HS  BS  MS  Phd
    60  54  46  41    # females
    40  44  53  57    # males
]
1
2
3
2×4 Matrix{Int64}:
 60  54  46  41
 40  44  53  57

The $\chi^2$ test’s null hypothesis is that the two variables are independent. We choose a value $0\leq\alpha\leq1$ as the probability of a Type I error (false positive, finding we should reject $H_0$ when it’s actually true).

1
2
3
4
5
6
alpha = 0.05  # or choose your own alpha here

using HypothesisTests
p_value = pvalue( ChisqTest( data ) )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0
1
(0.05, 0.04588650089174742, true)

In this case, the samples give us enough evidence to reject the null hypothesis at the $\alpha=0.05$ level. The data suggest that the two categorical variables are not independent.

If you are using the most common $\alpha$ value of $0.05$, you can save a few lines of code and get more output by just writing the test itself:

1
ChisqTest( data )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Pearson's Chi-square Test
-------------------------
Population details:
    parameter of interest:   Multinomial Probabilities
    value under h_0:         [0.128826, 0.124339, 0.126249, 0.121852, 0.127537, 0.123096, 0.126249, 0.121852]
    point estimate:          [0.151899, 0.101266, 0.136709, 0.111392, 0.116456, 0.134177, 0.103797, 0.144304]
    95% confidence interval: [(0.1089, 0.1978), (0.05823, 0.1472), (0.09367, 0.1826), (0.06835, 0.1573), (0.07342, 0.1624), (0.09114, 0.1801), (0.06076, 0.1497), (0.1013, 0.1902)]

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           0.0459

Details:
    Sample size:        395
    statistic:          8.006066246262527
    degrees of freedom: 3
    residuals:          [1.27763, -1.30048, 0.585074, -0.595536, -0.61671, 0.627737, -1.25583, 1.27828]
    std. residuals:     [2.10956, -2.10956, 0.962783, -0.962783, -1.01656, 1.01656, -2.06656, 2.06656]

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Nathan Carter (ncarter@bentley.edu)