How to test for a treatment effect in a single factor design (in Python, using SciPy and statsmodels)
Suppose you are given a dataset that has more than one treatment level and you wish to see if there is a unit-level treatment effect. How would you check that?
The solution below uses an example dataset about the teeth of 10 guinea pigs at three Vitamin C dosage levels (in mg) with two delivery methods (orange juice vs. ascorbic acid). (See how to quickly load some sample data.)
1 2 from rdatasets import data df = data('ToothGrowth')
In this dataset, there are only two treatments (orange juice and ascorbic acid, in the variable
supp). We can therefore perrform a two-sample $t$ test. But first we must filter the outcome variable
len (tooth length) based on
1 2 3 4 5 subjects_receiving_oj = df[df['supp']=='OJ']['len'] subjects_receiving_vc = df[df['supp']=='VC']['len'] import scipy.stats as stats stats.ttest_ind( subjects_receiving_oj, subjects_receiving_vc, equal_var=False )
1 Ttest_indResult(statistic=1.91526826869527, pvalue=0.06063450788093387)
At the 5% significance level, we see that the length of the tooth does not differ between the two delivery methods. We assume that the model assumptions are met, but do not check that here.
If there are multiple levels (two or more), you can apply the parametric ANOVA test which in this case will provide a similar $p$ value.
1 2 3 4 5 from statsmodels.formula.api import ols model = ols('len ~ supp', data = df).fit() import statsmodels.api as sm sm.stats.anova_lm(model, typ=1)
We see the $p$ value in the final column is very similar.
However, if the assumptions of ANOVA are not met, we can utilize a nonparametric approach via the Kruskal-Wallis Test. We use the filtered variables defined above and import the
kruskal function from SciPy.
1 2 from scipy.stats import kruskal kruskal( subjects_receiving_oj, subjects_receiving_vc )
1 KruskalResult(statistic=3.4453580631407035, pvalue=0.06342967639688878)
Similar to the previous results, the length of the tooth does not differ between the delivery methods at the 5% significance level.
Content last modified on 24 July 2023.
Contributed by Krtin Juneja (KJUNEJA@falcon.bentley.edu)