# How to add a polynomial term to a model

## Description

Sometimes, a simple linear model isn’t sufficient to describe the data. How can we include a higher-order term in a regression model, such as the square or cube of one of the predictors?

## Using sklearn, in Python

View this solution alone.

We begin with a fabricated dataset of 20 points. You can replace the code below with your own, real, data.

1
2
3
4
5
import numpy as np
import pandas as pd

x = np.arange(0,20)                                                  # List of integers from 0 to 19
y = [3,4,5,7,9,20,31,50,70,75,80,91,101,120,135,160,179,181,190,193] # List of 20 integers


We extend our dataset with a new column (or “feature”), containing $x^2$.

1
2
3
4
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures( degree=2, include_bias=False )
x_matrix = x.reshape( -1, 1 )                   # make x a matrix so that we can add columns
poly_features = poly.fit_transform( x_matrix )  # add a second column, so we now have x and x^2


Next, fit a regression model to the new features, which are $x$ and $x^2$.

1
2
3
4
from sklearn.linear_model import LinearRegression
poly_reg_model = LinearRegression()     # Our model will be linear in the features x and x^2
poly_reg_model.fit( poly_features, y )  # Use regression to create the model

LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.

Finally, get the coefficients and intercept of the model.

1
poly_reg_model.intercept_, poly_reg_model.coef_

1
(-8.384415584415635, array([6.28628389, 0.27420825]))


Thus the equation for our model of degree two is $\widehat{y} = -8.38 + 6.28x + 0.27x^2$

See a problem? Tell us or edit the source.

## Solution, in R

View this solution alone.

We’re going to use the Pressure dataset in R’s ggplot library as example data. It contains observations of pressure and temperature. You would use your own data instead.

1
2
3
# install.packages( "ggplot2" ) # if you haven't done this already
library(ggplot2)
data("pressure")


Let’s model temperature as the dependent variable with pressure squared as the independent variable. To place the “pressure squared” term in the model, we use R’s poly function, as shown below. It automatically includes a pressure term as well (not squared).

1
2
3
# Build the model
model <- lm(temperature ~ poly(pressure, 2), data = pressure)
summary(model)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Call:
lm(formula = temperature ~ poly(pressure, 2), data = pressure)

Residuals:
Min       1Q   Median       3Q      Max
-113.095  -44.543    6.157   50.459   75.791

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)          180.00      14.31  12.581 1.03e-09 ***
poly(pressure, 2)1   361.84      62.36   5.802 2.70e-05 ***
poly(pressure, 2)2  -186.66      62.36  -2.993   0.0086 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 62.36 on 16 degrees of freedom
Multiple R-squared:  0.7271,	Adjusted R-squared:  0.693
F-statistic: 21.31 on 2 and 16 DF,  p-value: 3.079e-05


Now we have a model of the form $\hat t = 180 + 361.84p - 186.66p^2$, where $t$ stands for temperature and $p$ for pressure.

You can change the number in the poly function. For example, if we wanted to create a third-degree polynomial term then we would have specified poly(pressure, 3), and it would have included pressure, pressure squared, and pressure cubed.