Julia Essentials - Gradient Descent

Julia First Steps : Linear regressions

This short notebook aims to demonstrate how easy it is to do linear algebra with Julia Let’s start by generating a vector of random normally distributed values $X$ :

using LinearAlgebra
using Plots
X = randn(50)
50-element Array{Float64,1}:
  1.2315276650849976
 -1.4524774035807928
 -1.331487964705029
  1.7910530069325952
  1.4344147755194752
  0.8381566948997999
  0.37732934639480875
 -0.08465783247472314
 -1.2007366388946283
  0.2477648455025647
  0.36153611350540193
  1.0671530445403783
  0.48110313559012585
  ⋮
 -0.9292021823632189
  0.6925673689749254
 -0.4111717889244624
 -1.066442631729392
 -1.0254831803316289
  1.030013173212353
 -0.36342805460780975
  0.6463700426665658
 -1.4416779178988512
 -0.9530989447609137
 -0.5719567028504781
 -1.494616935078167

Then we use $X$ to generate the target variable $y$. To do so : $y_i = 5X_i + 2 +\epsilon$ :

ϵ = randn(50)/10;
y = (5 * X) + ϵ .+ 2
50-element Array{Float64,1}:
  8.1407692513959
 -5.3081880651629465
 -4.580108604523786
 10.979354279905353
  9.217772217481926
  6.200257553598876
  3.974797320462655
  1.4380463508634955
 -4.010151843847709
  3.2260249975738176
  3.5870402952610814
  7.3820007915715875
  4.604098918967308
  ⋮
 -2.743409826160721
  5.564858955179007
 -0.19705563404837934
 -3.3428052322697033
 -3.089865165111327
  7.234024641459847
  0.36568334066534147
  5.240195075581495
 -5.197421511055493
 -2.743474127301406
 -0.9295139178487957
 -5.497662035395874

This is how simple it is to make a linear regression with Julia :

X\y
4.831442452366241

As $y$ has been generated with an intercept this approach yields inaccurate results. Let’s add a column of ones to $X$. $X$ is now a $50 \times 2$ Matrix.
Xconst is our Design Matrix :

Xconst = hcat(ones(50), X)
50×2 Array{Float64,2}:
 1.0   1.23153
 1.0  -1.45248
 1.0  -1.33149
 1.0   1.79105
 1.0   1.43441
 1.0   0.838157
 1.0   0.377329
 1.0  -0.0846578
 1.0  -1.20074
 1.0   0.247765
 1.0   0.361536
 1.0   1.06715
 1.0   0.481103
 ⋮    
 1.0  -0.929202
 1.0   0.692567
 1.0  -0.411172
 1.0  -1.06644
 1.0  -1.02548
 1.0   1.03001
 1.0  -0.363428
 1.0   0.64637
 1.0  -1.44168
 1.0  -0.953099
 1.0  -0.571957
 1.0  -1.49462

Using the back slash operator, we are now able to get back the parameters used to generate $y$ :

para = Xconst\y
2-element Array{Float64,1}:
 1.9969334464481343
 5.012372712566652

This evaluates parameters more accurately. Note that, in Julia, x = A\y solves $A.x = y$ :

If we use the normal equation, we know parameters can be estimated by $(X^tX)^{-1} X^t.y$ :

(transpose(Xconst) * Xconst)^(-1) * transpose(Xconst) * y
2-element Array{Float64,1}:
 1.9969334464481343
 5.012372712566649
scatter(Xconst[:,2], y)
Plots.abline!(para[2], para[1], line=:dash, legend=false)

svg

Fortunately, Julia has some packages to help make quick diagnostics. The GLM (General Linear Models) package can be used here to get p-values and confidence intervals :

using GLM
lm(Xconst, y)
LinearModel{GLM.LmResp{Array{Float64,1}},GLM.DensePredChol{Float64,Cholesky{Float64,Array{Float64,2}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
    Estimate  Std. Error  t value  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────
x1   1.99693   0.0138377  144.312    <1e-64    1.96911    2.02476
x2   5.01237   0.0139446  359.448    <1e-83    4.98434    5.04041
─────────────────────────────────────────────────────────────────

To evaluate how easy it is to auto diff a function with Julia let’s define the predict and the cost function. predict takes two parameters and returns predicted values $\hat{y}$. cost evaluates how far $\hat{y}$ from $y$ :

predict(intercept, x) = Xconst[:,1] .* intercept + Xconst[:,2] .* x
predict (generic function with 1 method)
cost(intercept, x) = sum((predict(intercept, x) - y).^2)
cost (generic function with 1 method)
println(cost(1.5, 4))
println(cost(2, 5))
58.77621086610803
0.46418446941640396

In order to minimize cost, gradients will be automatically calculated by Julia. $n$ iterations are run to find the two values that minimize cost :

using ForwardDiff
ForwardDiff.gradient(z -> cost(z[1], z[2]), [1.0, 2.0])
2-element Array{Float64,1}:
  -72.81718390149325
 -287.7385372871958
ForwardDiff.gradient(z -> cost(z[1], z[2]), [2.0, 3.0])
2-element Array{Float64,1}:
   18.26089198493291
 -198.1888809734103
grad(intercept, x) = ForwardDiff.gradient(z -> cost(z[1], z[2]), [intercept, x])
grad (generic function with 1 method)
grad(1.0, 2.0)
2-element Array{Float64,1}:
  -72.81718390149325
 -287.7385372871958
function eval(interceptStart=0.0, xStart=0.0, niter=1000, lr=0.01)
  evalIntercept = interceptStart
  evalX = xStart
  for i in 1:niter
    gradVector = grad(evalIntercept, evalX)
    evalIntercept = evalIntercept - lr * gradVector[1]
    evalX = evalX - lr * gradVector[2]
  end
  return evalIntercept, evalX
end
eval (generic function with 5 methods)

The gradient descent method returns an accurate evaluation of the parameters used to generate $y$ :

eval()
(1.9969334464481345, 5.012372712566651)