How to add an interaction term to a model (in R)
Task
Sometimes, a simple linear model isn’t sufficient for our data, and we need more complex terms or transformed variables in the model to make adequate predictions. How do we include these complex and transformed terms in a regression model?
Related tasks:
Solution
We’re going to use the ToothGrowth
dataset in R as example data.
It contains observations of tooth growth for guinea pigs who received various doses of
various supplements. You would use your own data instead.
1
df <- ToothGrowth
Let’s model tooth length (len
) based on the product of two predictors,
the supplement given (supp
) and its dosage (dose
).
We simply use the ordinary multiplication operator in R, written *
, to express
the product of these two factors when creating the model, as shown below.
Note that supp
is a categorical variable with two values, so the model will
include a binary variable for whether the supplement was equal to “VC.”
1
2
3
# Build the model
model <- lm(len ~ supp*dose, data = df)
summary(model)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Call:
lm(formula = len ~ supp * dose, data = df)
Residuals:
Min 1Q Median 3Q Max
-8.2264 -2.8462 0.0504 2.2893 7.9386
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.550 1.581 7.304 1.09e-09 ***
suppVC -8.255 2.236 -3.691 0.000507 ***
dose 7.811 1.195 6.534 2.03e-08 ***
suppVC:dose 3.904 1.691 2.309 0.024631 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.083 on 56 degrees of freedom
Multiple R-squared: 0.7296, Adjusted R-squared: 0.7151
F-statistic: 50.36 on 3 and 56 DF, p-value: 6.521e-16
Now we have a model of the form $\hat L = 11.55 - 8.255s + 7.811d + 3.904sd$, where $L$ stands for tooth length, $s$ for whether the VC supplement was given, and $d$ for the dose given.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)