How to test data for normality with the Jarque-Bera test (in Python, using SciPy)
Task
We often want to know whether a set of data is normally distributed, so that we can deduce what inference tests are appropriate to conduct. If we have a set of data and want to figure out if it comes from a population that follows a normal distribution, one tool that can help is the Jarque-Bera test for normality. How do we perform it?
Related tasks:
- How to create a QQ-plot
- How to test data for normality with the D’Agostino-Pearson test
- How to test data for normality with Pearson’s chi-squared test
Solution
We’re going to use some fake restaurant data, but you can replace our fake data with your real data in the code below. The values in our fake data represent the amount of money that customers spent on a Sunday morning at the restaurant.
1
2
3
# Replace your data here
spending = [ 34, 12, 19, 56, 54, 34, 45, 37, 13, 22, 65, 19,
16, 45, 19, 50, 36, 23, 28, 56, 40, 61, 45, 47, 37 ]
If we assume that the skewness coefficient $S$ and the kurtosis coefficient $K$ are both equal to zero, then our null hypothesis is $H_0: S=K=0$, or that the sample data comes from a normal distribution. We choose a value $0 \le \alpha \le 1$ as our Type 1 error rate. We’ll let $\alpha$ be 0.05 here.
We can use the jarque_bera()
function in SciPy’s stats package to run the hypothesis test.
1
2
from scipy import stats
stats.jarque_bera( spending )
1
SignificanceResult(statistic=1.3347292970972002, pvalue=0.5130588882194849)
Our $p$-value of about $0.5131$ is greater than $\alpha$, so we fail to reject our null hypothesis. We would continue to operate under our original assumption that the data come from a normally distributed population.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)