How to add a polynomial term to a model
Description
Sometimes, a simple linear model isn’t sufficient to describe the data. How can we include a higher-order term in a regression model, such as the square or cube of one of the predictors?
Related tasks:
Using sklearn, in Python
We begin with a fabricated dataset of 20 points. You can replace the code below with your own, real, data.
1
2
3
4
5
import numpy as np
import pandas as pd
x = np.arange(0,20) # List of integers from 0 to 19
y = [3,4,5,7,9,20,31,50,70,75,80,91,101,120,135,160,179,181,190,193] # List of 20 integers
We extend our dataset with a new column (or “feature”), containing $x^2$.
1
2
3
4
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures( degree=2, include_bias=False )
x_matrix = x.reshape( -1, 1 ) # make x a matrix so that we can add columns
poly_features = poly.fit_transform( x_matrix ) # add a second column, so we now have x and x^2
Next, fit a regression model to the new features, which are $x$ and $x^2$.
1
2
3
4
from sklearn.linear_model import LinearRegression
poly_reg_model = LinearRegression() # Our model will be linear in the features x and x^2
poly_reg_model.fit( poly_features, y ) # Use regression to create the model
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
Finally, get the coefficients and intercept of the model.
1
poly_reg_model.intercept_, poly_reg_model.coef_
1
(-8.384415584415635, array([6.28628389, 0.27420825]))
Thus the equation for our model of degree two is $\widehat{y} = -8.38 + 6.28x + 0.27x^2$
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Solution, in R
We’re going to use the Pressure
dataset in R’s ggplot
library as example data.
It contains observations of pressure and temperature.
You would use your own data instead.
1
2
3
# install.packages( "ggplot2" ) # if you haven't done this already
library(ggplot2)
data("pressure")
Let’s model temperature as the dependent variable with pressure squared as the
independent variable. To place the “pressure squared” term in the model, we use
R’s poly
function, as shown below. It automatically includes a pressure term
as well (not squared).
1
2
3
# Build the model
model <- lm(temperature ~ poly(pressure, 2), data = pressure)
summary(model)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Call:
lm(formula = temperature ~ poly(pressure, 2), data = pressure)
Residuals:
Min 1Q Median 3Q Max
-113.095 -44.543 6.157 50.459 75.791
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 180.00 14.31 12.581 1.03e-09 ***
poly(pressure, 2)1 361.84 62.36 5.802 2.70e-05 ***
poly(pressure, 2)2 -186.66 62.36 -2.993 0.0086 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 62.36 on 16 degrees of freedom
Multiple R-squared: 0.7271, Adjusted R-squared: 0.693
F-statistic: 21.31 on 2 and 16 DF, p-value: 3.079e-05
Now we have a model of the form $\hat t = 180 + 361.84p - 186.66p^2$, where $t$ stands for temperature and $p$ for pressure.
You can change the number in the poly
function.
For example, if we wanted to create a third-degree polynomial term
then we would have specified poly(pressure, 3)
, and it would have included pressure,
pressure squared, and pressure cubed.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Topics that include this task
Opportunities
This website does not yet contain a solution for this task in any of the following software packages.
- Excel
- Julia
If you can contribute a solution using any of these pieces of software, see our Contributing page for how to help extend this website.