How to perform a chi-squared test on a contingency table
Description
If we have a contingency table showing the frequencies observed in two
categorical variables, how can we run a
Solution, in Julia
Here we will use a two-dimensional Julia array to store a contingency table of education vs. gender, taken from Penn State University’s online stats review website. You should use your own data.
1
2
3
4
5
data = [
# HS BS MS Phd
60 54 46 41 # females
40 44 53 57 # males
]
2×4 Matrix{Int64}:
60 54 46 41
40 44 53 57
The
1
2
3
4
5
6
alpha = 0.05 # or choose your own alpha here
using HypothesisTests
p_value = pvalue( ChisqTest( data ) )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0
(0.05, 0.04588650089174742, true)
In this case, the samples give us enough evidence to reject the null hypothesis
at the
If you are using the most common
1
ChisqTest( data )
Pearson's Chi-square Test
-------------------------
Population details:
parameter of interest: Multinomial Probabilities
value under h_0: [0.128826, 0.124339, 0.126249, 0.121852, 0.127537, 0.123096, 0.126249, 0.121852]
point estimate: [0.151899, 0.101266, 0.136709, 0.111392, 0.116456, 0.134177, 0.103797, 0.144304]
95% confidence interval: [(0.1089, 0.1978), (0.05823, 0.1472), (0.09367, 0.1826), (0.06835, 0.1573), (0.07342, 0.1624), (0.09114, 0.1801), (0.06076, 0.1497), (0.1013, 0.1902)]
Test summary:
outcome with 95% confidence: reject h_0
one-sided p-value: 0.0459
Details:
Sample size: 395
statistic: 8.006066246262527
degrees of freedom: 3
residuals: [1.27763, -1.30048, 0.585074, -0.595536, -0.61671, 0.627737, -1.25583, 1.27828]
std. residuals: [2.10956, -2.10956, 0.962783, -0.962783, -1.01656, 1.01656, -2.06656, 2.06656]
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Using SciPy, in Python
Here we will use nested Python lists to store a contingency table of education vs. gender, taken from Penn State University’s online stats review website. You should use your own data, and it can be in Python lists or NumPy arrays or a pandas DataFrame.
1
2
3
4
5
data = [
# HS BS MS Phd
[ 60, 54, 46, 41 ], # females
[ 40, 44, 53, 57 ] # males
]
The
SciPy’s stats package provides a chi2_contingency
function
that does exactly what we need.
1
2
3
4
5
6
7
8
9
alpha = 0.05 # or choose your own alpha here
from scipy import stats
# Run a chi-squared and print out alpha, the p value,
# and whether the comparison says to reject the null hypothesis.
# (The dof and ex variables are values we don't need here.)
chi2_statistic, p_value, dof, ex = stats.chi2_contingency( data )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0
(0.05, 0.045886500891747214, True)
In this case, the samples give us enough evidence to reject the null hypothesis
at the
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
Here we will use a table
function is useful for creating contingency tables from data.)
1
2
3
4
data <- matrix( c( 60, 54, 46, 41, 40, 44, 53, 57 ), ncol = 4,
dimnames=list( c('F','M'), c('HS','BS','MS','PhD') ),
byrow =TRUE)
data
HS BS MS PhD
F 60 54 46 41
M 40 44 53 57
The
R provides a chisq.test
function that does exactly what we need.
1
2
results <- chisq.test( data )
results
Pearson's Chi-squared test
data: data
X-squared = 8.0061, df = 3, p-value = 0.04589
We can manually compare the
1
2
alpha <- 0.05 # or choose your own alpha here
results$p.value < alpha # reject the null hypothesis?
[1] TRUE
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.