How to add a transformed term to a model (in R)
Task
Sometimes, a simple linear model isn’t sufficient for our data, and we need more complex terms or transformed variables in the model to make adequate predictions. How do we include these complex and transformed terms in a regression model?
Related tasks:
Solution
We’re going to use the Pressure
dataset in R’s ggplot
library as example data.
It contains observations of pressure and temperature.
You would use your own data instead.
1
2
3
# install.packages( "ggplot2" ) # if you haven't done this already
library(ggplot2)
data("pressure")
Let’s model temperature as the dependent variable with the logarithm of pressure
as the independent variable. To place the “log of pressure” term in the model, we use
R’s log
function, as shown below. It uses the naturarl logarithm (base $e$).
1
2
3
# Build the model
model.log <- lm(temperature ~ log(pressure), data = pressure)
summary(model.log)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Call:
lm(formula = temperature ~ log(pressure), data = pressure)
Residuals:
Min 1Q Median 3Q Max
-28.60 -22.30 -10.13 20.00 48.61
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 153.970 6.330 24.32 1.20e-14 ***
log(pressure) 23.784 1.372 17.33 3.07e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 26.81 on 17 degrees of freedom
Multiple R-squared: 0.9464, Adjusted R-squared: 0.9433
F-statistic: 300.3 on 1 and 17 DF, p-value: 3.07e-12
The model is $\hat t = 153.97 + 23.784\log p$, where $t$ stands for temperature and $p$ for pressure.
Another example transformation is the square root transformation. As with log
,
just apply the sqrt
function to the appropriate term when defining the model.
1
2
3
# Build the model
model.sqrt <- lm(temperature ~ sqrt(pressure), data = pressure)
summary(model.sqrt)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Call:
lm(formula = temperature ~ sqrt(pressure), data = pressure)
Residuals:
Min 1Q Median 3Q Max
-98.72 -34.74 11.53 42.75 56.59
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.561 15.244 6.465 5.81e-06 ***
sqrt(pressure) 11.446 1.367 8.372 1.95e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 51.16 on 17 degrees of freedom
Multiple R-squared: 0.8048, Adjusted R-squared: 0.7933
F-statistic: 70.1 on 1 and 17 DF, p-value: 1.953e-07
The model is $\hat t = 98.561 + 11.446\sqrt{p}$, with $t$ and $p$ having the same meanings as above.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)