How to fit a linear model to two columns of data (in Python, using SciPy)
Task
Let’s say we have two columns of data, one for a single independent variable
In other words, what are the model coefficients
Related tasks:
- How to compute R-squared for a simple linear model
- How to fit a multivariate linear model
- How to predict the response variable in a linear model
Solution
This solution uses a pandas DataFrame of fake example data. When using this code, replace our fake data with your real data.
Although the solution below uses plain Python lists of data, it also works if the data are stored in NumPy arrays or pandas Series.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Here is the fake data you should replace with your real data.
xs = [ 393, 453, 553, 679, 729, 748, 817 ]
ys = [ 24, 25, 27, 36, 55, 68, 84 ]
# We will use SciPy to build the model
import scipy.stats as stats
# If you need the model coefficients stored in variables for later use, do:
model = stats.linregress( xs, ys )
beta0 = model.intercept
beta1 = model.slope
# If you just need to see the coefficients (and some other related data),
# do this alone:
stats.linregress( xs, ys )
LinregressResult(slope=0.1327195637885226, intercept=-37.32141898334582, rvalue=0.8949574425541466, pvalue=0.006486043236692156, stderr=0.029588975845594334, intercept_stderr=18.995444317768097)
The linear model in this example is approximately
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)