How to test data for normality with the D’Agostino-Pearson test (in Python, using SciPy)

Task

We often want to know whether a set of data is normally distributed, so that we can deduce what inference tests are appropriate to conduct. If we have a set of data and want to figure out if it comes from a population that follows a normal distribution, one tool that can help is the D’Agostino-Pearson test (sometimes also called the D’Agostino-Pearson omnibus test, or the D’Agostino-Pearson $k^2$ test). How do we perform it?

Related tasks:

Solution

We’re going to use some fake restaurant data, but you can replace our fake data with your real data in the code below. The values in our fake data represent the amount of money that customers spent on a Sunday morning at the restaurant.

import numpy as np

# Replace your data here
spending = [34, 12, 19, 56, 54, 34, 45, 37, 13, 22, 65, 19,
            16, 45, 19, 50, 36, 23, 28, 56, 40, 61, 45, 47, 37]

np.mean(spending), np.std(spending, ddof=1)

(36.52, 15.772127313713899)

We will now conduct a test of the following null hypothesis: The data comes from a population that is normally distributed with mean 36.52 and standard deviation 15.77.

We will use a value $\alpha=0.05$ as our Type I error rate. The normaltest() function in SciPy’s stats package can perform the D’Agostino-Pearson test for normality, which uses the skew and kurtosis of the data.

from scipy import stats
stats.normaltest(spending)

NormaltestResult(statistic=3.0866213696851097, pvalue=0.21367252674488552)

The p-value is apprximately 0.21367, which is greater than $\alpha=0.05$, so we fail to reject our null hypothesis. We would continue to operate under our original assumption that the data come from a normally distributed population.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)