Gradient Descent

Matrix A

Matrix B

Alpha(For unoptimised) and Max Iterations

Please enter range of plotting of contour(Range of both x and y)

Surface Plot

To solve the equation of the form Ax = B we consider its quadratic form and try to find its minimum.
A quadratic form is simply a scalar, quadratic function of a vector with the form,

f(x) = \frac{1}{2}x^{T}Ax -b^{T}x + c

Taking x = [x1 , x2] for better visualisations, we can have input matrix A,b,c .

For example,

A = \left[\matrix{3& 2\\ 2& 6}\right] b = \left[\matrix{2\\ -8}\right] c = 0

In the gradient descent algorithm, taking initial any x1,x2 we get a point on the contour defined by the above quadratic function.Therefore starting from there, we have to decide which direction to go in, if we want the minimum , that direction is -f’(x) , to calculate this we use differentiation to find:-

f'(x) = \frac{1}{2}A^Tx + \frac{1}{2}Ax - b

As you can observe above for a symmetric matrix, this converts to Ax - b.

Choosing alpha Iterations

In an optimised form of gradient descent, we can use a constant alpha value such that we go α*(-f’(x)) in the direction from that point and keep iterating till we reach a minimum or start diverging(Note every alpha can’t give us the solution, it may jump over the solution if value of alpha is large).

Optimized Iterations

The question is, how big a step should we take?
A line search is a procedure that chooses α to minimize f along a line. We are restricted to choosing a point on the intersection of the vertical plane and the paraboloid. We get a parabola defined by the intersection of these surfaces. What is the value of α at the base of the parabola?
From basic calculus, α minimizes f when the directional derivative, f'(x(1)) is equal to zero.By the chain rule,

\frac{d}{dα }f(x_{(1)}) = f'(x_{(1)})^{T} \frac{d}{dα }x_{(1)} = f'(x_{(1)})^{T}r_{(0)}

Where r_i = b - Ax_i, Setting the above expression to zero, we find that α should be chose such that r₀ and f'(x₁) are orhtogonal. There is an intuitive reason why we should expect these vectors to be orthogonal at the minimum. There exists differwnt gradient vectors at various points along the search line. The slope of the parabola at any point is equal to the magnitude of the projection of the gradient onto the line .These projections represent the rate of increase of f as one traverses the search line.f is minimized where the projection is zero — where the gradient is orthogonal to the search line.