Graduate Descent

Numerically stable p-norms


Consider the p-norm

$$ || \boldsymbol{x} ||_p = \left( \sum_i |x_i|^p \right)^{\frac{1}{p}} $$

In python this translates to:

from numpy import array

def norm1(x, p):
    "First-pass implementation of p-norm."
    return (np.abs(x)**p).sum() ** (1./p)

Now, suppose \(|x_i|^p\) causes overflow (for some \(i\)). This will occur for sufficiently large \(p\) or sufficiently large \(x_i\)—even if \(x_i\) is representable (i.e., not NaN or \(\infty\)).

For example:

>>> big = 1e300
>>> x = array([big])
>>> norm1(x, p=2)
[ inf]   # expected: 1e+300

This fails because we can't square big

>>> np.array([big])**2
[ inf]

A little math

There is a way to avoid overflowing because of a few large \(x_i\).

Here's a little fact about p-norms: for any \(p\) and \(\boldsymbol{x}\)

$$ || \alpha \cdot \boldsymbol{x} ||_p = |\alpha| \cdot || \boldsymbol{x} ||_p $$

We'll use the following version (harder to remember)

$$ || \boldsymbol{x} ||_p = |\alpha| \cdot || \boldsymbol{x} / \alpha ||_p $$

Don't believe it? Here's some algebra:

$$ \begin{eqnarray*} || \boldsymbol{x} ||_p &=& \left( \sum_i |x_i|^p \right)^{\frac{1}{p}} \\ &=& \left( \sum_i \frac{|\alpha|^p}{|\alpha|^p} \cdot |x_i|^p \right)^{\frac{1}{p}} \\ &=& |\alpha| \cdot \left( \sum_i \left( \frac{|x_i| }{|\alpha|} \right)^p \right)^{\frac{1}{p}} \\ &=& |\alpha| \cdot \left( \sum_i \left| \frac{x_i }{\alpha} \right|^p \right)^{\frac{1}{p}} \\ &=& |\alpha| \cdot || \boldsymbol{x} / \alpha ||_p \end{eqnarray*} $$

Back to numerical stability

Suppose we pick \(\alpha = \max_i |x_i|\). Now, the largest number we have to take the power of is one—making it very difficult to overflow on the account of \(\boldsymbol{x}\). This should remind you of the infamous log-sum-exp trick.

def robust_norm(x, p):
    a = np.abs(x).max()
    return a * norm1(x / a, p)

Now, our example from before works :-)

>>> robust_norm(x, p=2)


  • It appears as if scipy.linalg.norm is robust to overflow, while numpy.linalg.norm is not. Note that scipy.linalg.norm appears to be a bit slower.

  • The logsumexp trick is nearly identical, but operates in the log-domain, i.e., \(\text{logsumexp}(\log(|x|) \cdot p) / p = \log || x ||_p\). You can implement both tricks with the same code, if you use different number classes for log-domain and real-domain—a trick you might have seen before.

from arsenal.math import logsumexp
from numpy import log, exp, abs
>>> p = 2
>>> x = array([1,2,4,5])
>>> logsumexp(log(abs(x)) * p) / p
>>> log(robust_norm(x, p))