How to add a polynomial term to a model (in Python, using sklearn)

Task

Sometimes, a simple linear model isn’t sufficient to describe the data. How can we include a higher-order term in a regression model, such as the square or cube of one of the predictors?

Related tasks:

Solution

We begin with a fabricated dataset of 20 points. You can replace the code below with your own, real, data.

import numpy as np
import pandas as pd

x = np.arange(0,20)                                                  # List of integers from 0 to 19
y = [3,4,5,7,9,20,31,50,70,75,80,91,101,120,135,160,179,181,190,193] # List of 20 integers

We extend our dataset with a new column (or “feature”), containing $x^{2}$ .

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures( degree=2, include_bias=False )
x_matrix = x.reshape( -1, 1 )                   # make x a matrix so that we can add columns
poly_features = poly.fit_transform( x_matrix )  # add a second column, so we now have x and x^2

Next, fit a regression model to the new features, which are $x$ and $x^{2}$ .

from sklearn.linear_model import LinearRegression
poly_reg_model = LinearRegression()     # Our model will be linear in the features x and x^2
poly_reg_model.fit( poly_features, y )  # Use regression to create the model

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Finally, get the coefficients and intercept of the model.

poly_reg_model.intercept_, poly_reg_model.coef_

(-8.384415584415635, array([6.28628389, 0.27420825]))

Thus the equation for our model of degree two is $\hat{y} = - 8.38 + 6.28 x + 0.27 x^{2}$

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Debayan Sen (DSEN@bentley.edu)