Link Search Menu Expand Document (external link)

How to fit a multivariate linear model (in R)

See all solutions.

Task

Let’s say we have several independent variables, $x_1, x_2, \ldots, x_k$, and a dependent variable $y$. How can I fit a linear model that uses these independent variables to best predict the dependent variable?

In other words, what are the model coefficients $\beta_0, \beta_1, \beta_2, \ldots, \beta_k$ that give me the best linear model $\hat{y}=\beta_0 + \beta_1x + \beta_2x + \cdots + \beta_kx$ based on my data?

Related tasks:

Solution

We’re going to use fake data here for illustrative purposes. You can replace our fake data with your real data in the code below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Replace this fake data with your real data
x1 <- c(2, 7, 4, 3, 11, 18, 6, 15, 9, 12)
x2 <- c(4, 6, 10, 1, 18, 11, 8, 20, 16, 13)
x3 <- c(11, 16, 20, 6, 14, 8, 5, 23, 13, 10)
y <- c(24, 60, 32, 29, 90, 45, 130, 76, 100, 120)

# If you'll need the model coefficients later, store them as variables like this:
model <- lm(y ~ x1 + x2 + x3)
beta0 <- model$coefficients[1]
beta1 <- model$coefficients[2]
beta2 <- model$coefficients[3]
beta3 <- model$coefficients[4]

# To see the model summary, which includes the coefficients and much more, do this:
summary(model)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-25.031 -20.218  -8.373  22.937  35.640 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   77.244     27.366   2.823   0.0302 *
x1            -2.701      2.855  -0.946   0.3806  
x2             7.299      2.875   2.539   0.0441 *
x3            -4.861      2.187  -2.223   0.0679 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 30.13 on 6 degrees of freedom
Multiple R-squared:  0.5936,	Adjusted R-squared:  0.3904 
F-statistic: 2.921 on 3 and 6 DF,  p-value: 0.1222

The coefficients and intercept appear on the left hand side of the output, about half way down, under the heading “Estimate.”

Thus the multivariate linear model from the example data is $\hat y = 77.244 - 2.701x_1 + 7.299x_2 - 4.861x_3$.

Content last modified on 24 July 2023.

See a problem? Tell us or edit the source.

Contributed by Elizabeth Czarniak (CZARNIA_ELIZ@bentley.edu)