Expected value of a quadratic: Suppose we'd like to compute the expectation of a quadratic function, i.e., \(\mathbb{E}\left[ x^{\top}\negthinspace\negthinspace A x \right]\) , where \(x\) is a random vector and \(A\) is deterministic symmetric matrix. Let \(\mu\) and \(\Sigma\) be the mean and variance of \(x\). It turns out the expected value of a quadratic has the following simple form:
Delta Method: Suppose we'd like to compute expected value of a nonlinear function \(f\) applied our random variable \(x\), \(\mathbb{E}\left[ f(x) \right]\). The Delta method approximates this expection by replacing \(f\) by its second-order Taylor approximation \(\hat{f_{a}}\) taken at some point \(a\)
The expectation of this Taylor approximation is a quadratic function! Let's try to apply our new equation for the expected value of quadratic. We can use the trick from above with \(A=H(a)\) and \(x = (x-a)\). Note, the covariance matrix is shift-invariant, and the Hessian is a symmetric matrix!
Taking the Taylor expansion around \(\mu\) simplifies the equation as follows
That looks much more tractable! Error bounds are possible to derive, but outside to scope of this post. For a nice use of the delta method in machine learning see (Wager+,'13) and (Smith & Eisner,'06)