How to do a two-sided hypothesis test for two sample means (in Python, using SciPy)

Task

If we have two samples, $x_{1}, \dots, x_{n}$ and $x_{1}^{'}, \dots, x_{m}^{'}$ , and we compute the mean of each one, we might want to ask whether the two means seem approximately equal. Or more precisely, is their difference statistically significant at a given level?

Related tasks:

Solution

If we call the mean of the first sample ${\bar{x}}_{1}$ and the mean of the second sample ${\bar{x}}_{2}$ , then this is a two-sided test with the null hypothesis $H_{0} : {\bar{x}}_{1} = {\bar{x}}_{2}$ . We choose a value $0 \leq α \leq 1$ as the probability of a Type I error (false positive, finding we should reject $H_{0}$ when it’s actually true). Let’s use $α = 0.10$ as an example.

from scipy import stats

# Replace these first three lines with the values from your situation.
alpha = 0.10
sample1 = [ 6, 9, 7, 10, 10, 9 ]
sample2 = [ 12, 14, 10, 17, 9 ]

# Run a one-sample t-test and print out alpha, the p value,
# and whether the comparison says to reject the null hypothesis.
stats.ttest_ind( sample1, sample2, equal_var=False )

Ttest_indResult(statistic=-2.4616581720814326, pvalue=0.05097283741847698)

The output says that the $p$ -value is about $0.05097$ , which is less than $α = 0.10$ . In this case, the samples give us enough evidence to reject the null hypothesis at the $α = 0.10$ level. That is, the data suggest that ${\bar{x}}_{1} \neq {\bar{x}}_{2}$ .

The equal_var parameter tells SciPy not to assume that the two samples have equal variances. If in your case they do, you can omit that parameter, and it will revert to its default value of True.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Nathan Carter (ncarter@bentley.edu)