# Expected value of a quadratic and the Delta method

Expected value of a quadratic: Suppose we'd like to compute the expectation of a quadratic function, i.e., $$\mathbb{E}\left[ x^{\top}\negthinspace\negthinspace A x \right]$$ , where $$x$$ is a random vector and $$A$$ is deterministic symmetric matrix. Let $$\mu$$ and $$\Sigma$$ be the mean and variance of $$x$$. It turns out the expected value of a quadratic has the following simple form:

$$\mathbb{E}\left[ x^{\top}\negthinspace\negthinspace A x \right] = \text{trace}\left( A \Sigma \right) + \mu^{\top}\negthinspace A \mu$$

Delta Method: Suppose we'd like to compute expected value of a nonlinear function $$f$$ applied our random variable $$x$$, $$\mathbb{E}\left[ f(x) \right]$$. The Delta method approximates this expection by replacing $$f$$ by its second-order Taylor approximation $$\hat{f_{a}}$$ taken at some point $$a$$

$$\hat{f_{a}}(x) = f(a) + \nabla f(a)^{\top} (x - a) + \frac{1}{2} (x - a)^\top H(a) (x - a)$$

The expectation of this Taylor approximation is a quadratic function! Let's try to apply our new equation for the expected value of quadratic. We can use the trick from above with $$A=H(a)$$ and $$x = (x-a)$$. Note, the covariance matrix is shift-invariant, and the Hessian is a symmetric matrix!

\begin{aligned} \mathbb{E}\left[ \hat{f_{a}}(x) \right] & = \mathbb{E} \left[ f(a) + \nabla\negthinspace f(a)^{\top} (x - a) + \frac{1}{2} (x - a)^{\top} H(a)\, (x - a) \right] \\\ & = f(a) + \nabla\negthinspace f(a)^{\top} ( \mu - a ) + \frac{1}{2} \mathbb{E} \left[ (x - a)^{\top} H(a)\, (x - a) \right] \\\ & = f(a) + \nabla\negthinspace f(a)^{\top} ( \mu - a ) + \frac{1}{2}\left( \text{trace}\left( H(a) \, \Sigma \right) + (\mu - a)^{\top} H(a)\, (\mu - a) \right) \end{aligned}

Taking the Taylor expansion around $$\mu$$ simplifies the equation as follows

\begin{aligned} \mathbb{E}\left[ \hat{f_{\mu}} (x) \right] &= \mathbb{E}\left[ f(\mu) + \nabla\negthinspace f(\mu) (x - \mu) + \frac{1}{2} (x - \mu)^{\top} H(\mu)\, (x - \mu) \right] \\\ &= f(\mu) + \frac{1}{2} \, \text{trace}\Big( H(\mu) \, \Sigma \Big) \end{aligned}

That looks much more tractable! Error bounds are possible to derive, but outside to scope of this post. For a nice use of the delta method in machine learning see (Wager+,'13) and (Smith & Eisner,'06)