Graduate Descent

Gradient-vector product

calculus

We've all written the following test for our gradient code (known as the finite-difference approximation).

xif(x)12ε(f(x+εei)f(xεei))

where ε>0 and ei is a vector of zeros except at i where it is 1. This approximation is exact in the limit, and accurate to o(ε) additive error.

This is a specific instance of a more general approximation! The dot product of the gradient and any (conformable) vector d can be approximated with the following formula,

f(x)d12ε(f(x+εd)f(xεd))

We get the special case above when d=ei. This also exact in the limit and just as accurate.

Runtime? Finite-difference approximation is probably too slow for approximating a high-dimensional gradient because the number of function evaluations required is 2n where n is the dimensionality of x. However, if the end goal is to approximate a gradient-vector product, a mere 2 function evaluations is probably faster than specialized code for computing the gradient.

How to set ε? The second approach is more sensitive to ε because d is arbitrary, unlike ei, which is a simple unit-norm vector. Luckily some guidance is available. Andrei (2009) reccommends

ε=ϵmach(1+x)/d

where ϵmach is machine epsilon. (Numpy users: numpy.finfo(x.dtype).eps).

Why do I care?

  1. Well, I tend to work on sparse, but high-dimensional problems where finite-difference would be too slow. Thus, my usual solution is to only test several randomly selected dimensionsbiasing samples toward dimensions which should be nonzero. With the new trick, I can effectively test more dimensions at once by taking random vectors d. I recommend sampling d from a spherical Gaussian so that we're uniform on the angle of the vector.

  2. Sometimes the gradient-vector dot product is the end goal. This is the case with Hessian-vector products, which arises in many optimization algorithms, such as stochastic meta descent. Hessian-vector products are an instance of the gradient-vector dot product because the Hessian is just the gradient of the gradient.

Hessian-vector product

Hessian-vector products are an instance of the gradient-vector dot product because since the Hessian is just the gradient of the gradient! Now you only need to remember one formula!

H(x)d12ε(f(x+εd)f(xεd))

With this trick you never have to actually compute the gnarly Hessian! More on Justin Domke's blog

Comments