Link Search Menu Expand Document (external link)

How to summarize and compare data by groups (in R)

See all solutions.

Task

When given a set of data that has different treatment conditions and an outcome variable, we need to perform some exploratory data analysis. How would you quantitatively compare the treatment conditions with regards to the outcome variable?

Related tasks:

Solution

The solution below uses an example dataset about the teeth of 10 guinea pigs at three Vitamin C dosage levels (in mg) with two delivery methods (orange juice vs. ascorbic acid). (See how to quickly load some sample data.)

1
df <- ToothGrowth

To obtain the descriptive statistics of the quantitative column (len for length of teeth) based on the treatment levels (supp), we can use either the tapply or favstats functions.

1
2
attach(df)
tapply(len, supp, summary)
1
2
3
4
5
6
7
$OJ
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8.20   15.53   22.70   20.66   25.73   30.90 

$VC
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4.20   11.20   16.50   16.96   23.10   33.90 

You can replace summary in the call to tapply with mean, median, max, min, or quantile to get just one value. An example is shown below for quantiles.

1
tapply(len, supp, quantile, prob = 0.25, data=df) # 1st quartile
1
2
    OJ     VC 
15.525 11.200 

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Krtin Juneja (KJUNEJA@falcon.bentley.edu)