How to fit a linear model to two columns of data (in R)
Task
Let’s say we have two columns of data, one for a single independent variable $x$ and the other for a single dependent variable $y$. How can I find the best fit linear model that predicts $y$ based on $x$?
In other words, what are the model coefficients $\beta_0$ and $\beta_1$ that give me the best linear model $\hat y=\beta_0+\beta_1x$ based on my data?
Related tasks:
- How to compute R-squared for a simple linear model
- How to fit a multivariate linear model
- How to predict the response variable in a linear model
Solution
This solution uses fake example data. When using this code, replace our fake data with your real data.
1
2
3
4
5
6
7
8
9
10
11
# Here is the fake data you should replace with your real data.
xs <- c( 393, 453, 553, 679, 729, 748, 817 )
ys <- c( 24, 25, 27, 36, 55, 68, 84 )
# If you need the model coefficients stored in variables for later use, do:
model <- lm( ys ~ xs )
beta0 = model$coefficients[1]
beta1 = model$coefficients[2]
# If you just need to see the coefficients, do this alone:
lm( ys ~ xs )
1
2
3
4
5
6
Call:
lm(formula = ys ~ xs)
Coefficients:
(Intercept) xs
-37.3214 0.1327
The linear model in this example is approximately $y=0.133x-37.32$.
Content last modified on 24 July 2023.
See a problem? Tell us or edit the source.
Contributed by Nathan Carter (ncarter@bentley.edu)