How to perform a chi-squared test on a contingency table (in Python, using SciPy)
Task
If we have a contingency table showing the frequencies observed in two
categorical variables, how can we run a
Solution
Here we will use nested Python lists to store a contingency table of education vs. gender, taken from Penn State University’s online stats review website. You should use your own data, and it can be in Python lists or NumPy arrays or a pandas DataFrame.
1
2
3
4
5
data = [
# HS BS MS Phd
[ 60, 54, 46, 41 ], # females
[ 40, 44, 53, 57 ] # males
]
The
SciPy’s stats package provides a chi2_contingency
function
that does exactly what we need.
1
2
3
4
5
6
7
8
9
alpha = 0.05 # or choose your own alpha here
from scipy import stats
# Run a chi-squared and print out alpha, the p value,
# and whether the comparison says to reject the null hypothesis.
# (The dof and ex variables are values we don't need here.)
chi2_statistic, p_value, dof, ex = stats.chi2_contingency( data )
reject_H0 = p_value < alpha
alpha, p_value, reject_H0
(0.05, 0.045886500891747214, True)
In this case, the samples give us enough evidence to reject the null hypothesis
at the
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)