Graduate Descent

Multidimensional array index

This is a simple note on how to compute a bijective mapping between the indices of an \(n\)-dimensional array and a flat, one-dimensional array. We'll look at both directions of the mapping: (tuple->int) and (int -> tuple).

We'll assume each dimension \(a, b, c, \ldots\) is a positive integer and bounded \(a \le A, b \le B, c \le C, \ldots\)

Start small

Let's start by looking at \(n = 3\) and generalize from there.

def index_3(a, A):
    _,J,K = A
    i,j,k = a
    return ((i*J + j)*K + k)

def inverse_3(ix, A):
    _,J,K = A
    total = J*K
    i = ix // total
    ix = ix % total
    total = K
    j = ix // total
    k = ix % total
    return (i,j,k)

Here's our test case:

A,B,C = 3,4,5
key = 0
for a in range(A):
    for b in range(B):
        for c in range(C):
            print (a,b,c), '->', key
            assert inverse_3(key, (A,B,C)) == (a,b,c)
            assert index_3((a,b,c), (A,B,C)) == key
            key += 1

Note: This is not the only bijective mapping from tuple to int that we could have come up with. The one we chose corresponds to the particular layout, which is apparent in the test case.

For \(n=4\) the pattern is \(((a \cdot B + b) \cdot C + d) \cdot D + d\).

Sidenote: We don't actually need the bound \(a \le A\) in either index or inverse. This gives us a little extra flexibility because our first dimension can be infinite/unknown.

General case

def index(a, A):
    "Map tuple ``a`` to index with known bounds ``A``."
    # the pattern:
    # ((i*J + j)*K + k)*L + l
    key = a[0]
    for i in xrange(1, len(A)):
        key *= A[i]
        key += a[i]
    return key

def inverse(ix, A):
    "Find key given index ``ix`` and bounds ``A``."
    total = 1
    for x in A:
        total *= x
    key = []
    for i in xrange(len(A)):
        total /= A[i]
        r = ix // total
        ix = ix % total
    return key


Testing the general case

import numpy as np, itertools

def test_layout(D):
    "Test that `index` produces the layout we expect."
    z = [index(d, D) for d in itertools.product(*(range(a) for a in D))]
    assert z == range(np.product(D))

def test_inverse(key, D):
    got = inverse(index(key, D), D)
    assert tuple(key) == tuple(got)

if __name__ == '__main__':

    test_inverse(key = (1,), D = (10,))
    test_inverse(key = (1,3), D = (2,4))
    test_inverse(key = (3,2,5), D = (10,4,8))
    test_inverse(key = (3,2,5,1), D = (10,4,8,2))
    test_inverse(key = (3,2,5,1,11), D = (10,4,8,2,20))

Rant against grid search

Grid search is a simple and intuitive algorithm for optimizing and/or exploring the effects of parameters to a function. However, given its rigid definition grid search is susceptible to degenerate behavior. One type of unfortunate behavior occurs in the presence of unimportant parameters, which results in many (potentially expensive) function evaluations being wasted.

This is a very simple point, but nonetheless I'll illustrate with a simple example.

Consider the following simple example, let's find the argmax of \(f(x,y) = -x^2\).

Suppose we search over a \(10\)-by-\(10\) grid, resulting in a total of \(100\) function evaluations. For this function, we expect precision which proportional of the number of samples in the \(x\)-dimension, which is only \(10\) samples! On the other hand, randomly sampling points over the same space results in \(100\) samples in every dimension.

In other words, randomly sample instead of using a rigid grid. If you have points, which are not uniformly spaced, I'm willing to bet that an appropriate probability distribution exists.

This type of problem is common on hyperparameter optimizations. For futher reading see Bergstra & Bengio (2012).

Other thoughts:

  1. Local search is often much more effective. For example, gradient-based optimization, Nelder-Mead, stochastic local search, coordinate ascent.

  2. Grid search tends to produce nicer-looking plots.

  3. What about variance in the results? Two things: (a) This is a concern for replicability, but is easily remedied by making sampled parameters available. (b) There is always some probability that the sampling gives you a terrible set of points. This shouldn't be a problem if you use enough samples.