# How to add an interaction term to a model (in R)

See all solutions.

Sometimes, a simple linear model isn’t sufficient for our data, and we need more complex terms or transformed variables in the model to make adequate predictions. How do we include these complex and transformed terms in a regression model?

## Solution

We’re going to use the ToothGrowth dataset in R as example data. It contains observations of tooth growth for guinea pigs who received various doses of various supplements. You would use your own data instead.

1
df <- ToothGrowth


Let’s model tooth length (len) based on the product of two predictors, the supplement given (supp) and its dosage (dose). We simply use the ordinary multiplication operator in R, written *, to express the product of these two factors when creating the model, as shown below.

Note that supp is a categorical variable with two values, so the model will include a binary variable for whether the supplement was equal to “VC.”

1
2
3
# Build the model
model <- lm(len ~ supp*dose, data = df)
summary(model)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Call:
lm(formula = len ~ supp * dose, data = df)

Residuals:
Min      1Q  Median      3Q     Max
-8.2264 -2.8462  0.0504  2.2893  7.9386

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   11.550      1.581   7.304 1.09e-09 ***
suppVC        -8.255      2.236  -3.691 0.000507 ***
dose           7.811      1.195   6.534 2.03e-08 ***
suppVC:dose    3.904      1.691   2.309 0.024631 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.083 on 56 degrees of freedom
Multiple R-squared:  0.7296,	Adjusted R-squared:  0.7151
F-statistic: 50.36 on 3 and 56 DF,  p-value: 6.521e-16


Now we have a model of the form $\hat L = 11.55 - 8.255s + 7.811d + 3.904sd$, where $L$ stands for tooth length, $s$ for whether the VC supplement was given, and $d$ for the dose given.