# How to fit a linear model to two columns of data (in Julia)

See all solutions.

Let’s say we have two columns of data, one for a single independent variable $x$ and the other for a single dependent variable $y$. How can I find the best fit linear model that predicts $y$ based on $x$?

In other words, what are the model coefficients $\beta_0$ and $\beta_1$ that give me the best linear model $\hat y=\beta_0+\beta_1x$ based on my data?

## Solution

This solution uses fake example data. When using this code, replace our fake data with your real data.

1
2
3
4
5
6
7
8
9
10
11
# Here is the fake data you should replace with your real data.
xs = [ 393, 453, 553, 679, 729, 748, 817 ]
ys = [  24,  25,  27,  36,  55,  68,  84 ]

# Place the data into a DataFrame, because that's what Julia's modeling tools expect:
using DataFrames
data = DataFrame( xs=xs, ys=ys )  # Or you can name the columns whatever you like

# Create the linear model:
using GLM
lm( @formula( ys ~ xs ), data )

1
2
3
4
5
6
7
8
9
10
11
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

ys ~ 1 + xs

Coefficients:
───────────────────────────────────────────────────────────────────────────
Coef.  Std. Error      t  Pr(>|t|)    Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────────────────
(Intercept)  -37.3214    18.9954    -1.96    0.1066  -86.1508      11.5079
xs             0.13272    0.029589   4.49    0.0065    0.0566587    0.20878
───────────────────────────────────────────────────────────────────────────


The linear model in this example is approximately $y=0.13272x-37.3214$.