Consider the p-norm
In python this translates to:
from numpy import array
def norm1(x, p):
"First-pass implementation of p-norm."
return (np.abs(x)**p).sum() ** (1./p)
Now, suppose \(|x_i|^p\) causes overflow (for some \(i\)). This will occur for sufficiently large \(p\) or sufficiently large \(x_i\)—even if \(x_i\) is representable (i.e., not NaN or \(\infty\)).
For example:
>>> big = 1e300
>>> x = array([big])
>>> norm1(x, p=2)
[ inf] # expected: 1e+300
This fails because we can't square big
>>> np.array([big])**2
[ inf]
A little math
There is a way to avoid overflowing because of a few large \(x_i\).
Here's a little fact about p-norms: for any \(p\) and \(\boldsymbol{x}\)
We'll use the following version (harder to remember)
Don't believe it? Here's some algebra:
Back to numerical stability
Suppose we pick \(\alpha = \max_i |x_i|\). Now, the largest number we have to take the power of is one—making it very difficult to overflow on the account of \(\boldsymbol{x}\). This should remind you of the infamous log-sum-exp trick.
def robust_norm(x, p):
a = np.abs(x).max()
return a * norm1(x / a, p)
Now, our example from before works :-)
>>> robust_norm(x, p=2)
1e+300
Remarks
-
It appears as if
scipy.linalg.norm
is robust to overflow, whilenumpy.linalg.norm
is not. Note thatscipy.linalg.norm
appears to be a bit slower. -
The
logsumexp
trick is nearly identical, but operates in the log-domain, i.e., \(\text{logsumexp}(\log(|x|) \cdot p) / p = \log || x ||_p\). You can implement both tricks with the same code, if you use different number classes for log-domain and real-domain—a trick you might have seen before.
from arsenal.math import logsumexp
from numpy import log, exp, abs
>>> p = 2
>>> x = array([1,2,4,5])
>>> logsumexp(log(abs(x)) * p) / p
1.91432069824
>>> log(robust_norm(x, p))
1.91432069824