# How to add a polynomial term to a model (in Python, using sklearn)

See all solutions.

Sometimes, a simple linear model isn’t sufficient to describe the data. How can we include a higher-order term in a regression model, such as the square or cube of one of the predictors?

## Solution

We begin with a fabricated dataset of 20 points. You can replace the code below with your own, real, data.

1
2
3
4
5
import numpy as np
import pandas as pd

x = np.arange(0,20)                                                  # List of integers from 0 to 19
y = [3,4,5,7,9,20,31,50,70,75,80,91,101,120,135,160,179,181,190,193] # List of 20 integers


We extend our dataset with a new column (or “feature”), containing $x^2$.

1
2
3
4
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures( degree=2, include_bias=False )
x_matrix = x.reshape( -1, 1 )                   # make x a matrix so that we can add columns
poly_features = poly.fit_transform( x_matrix )  # add a second column, so we now have x and x^2


Next, fit a regression model to the new features, which are $x$ and $x^2$.

1
2
3
4
from sklearn.linear_model import LinearRegression
poly_reg_model = LinearRegression()     # Our model will be linear in the features x and x^2
poly_reg_model.fit( poly_features, y )  # Use regression to create the model

LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.

Finally, get the coefficients and intercept of the model.

1
poly_reg_model.intercept_, poly_reg_model.coef_

1
(-8.384415584415635, array([6.28628389, 0.27420825]))


Thus the equation for our model of degree two is $\widehat{y} = -8.38 + 6.28x + 0.27x^2$