# Multidimensional algorithms and iteration

01 Feb 2016  |  Tim Holy

Starting with release 0.4, Julia makes it easy to write elegant and efficient multidimensional algorithms. The new capabilities rest on two foundations: a new type of iterator, called `CartesianRange`, and sophisticated array indexing mechanisms. Before I explain, let me emphasize that developing these capabilities was a collaborative effort, with the bulk of the work done by Matt Bauman (@mbauman), Jutho Haegeman (@Jutho), and myself (@timholy).

These new iterators are deceptively simple, so much so that Ive never been entirely convinced that this blog post is necessary: once you learn a few principles, theres almost nothing to it. However, like many simple concepts, the implications can take a while to sink in. There also seems to be some widespread confusion about the relationship between these iterators and `Base.Cartesian`, which is a completely different (and much more painful) approach to solving the same problem. There are still a few occasions where `Base.Cartesian` is necessary, but for many problems these new capabilities represent a vastly simplified approach.

Lets introduce these iterators with an extension of an example taken from the manual.

# eachindex, CartesianIndex, and CartesianRange

You may already know that, in julia 0.4, there are two recommended ways to iterate over the elements in an `AbstractArray`: if you dont need an index associated with each element, then you can use

``` for a in A # A is an AbstractArray # Code that does something with the element a end ```

If instead you also need the index, then use

```for i in eachindex(A) # Code that does something with i and/or A[i] end ```

In some cases, the first line of this loop expands to ```for i = 1:length(A)```, and `i` is just an integer. However, in other cases, this will expand to the equivalent of

``` for i in CartesianRange(size(A)) # i is now a CartesianIndex # Code that does something with i and/or A[i] end ```

Lets see what these objects are:

``` A = rand(3,2) julia> for i in CartesianRange(size(A)) @show i end i = CartesianIndex{2}((1,1)) i = CartesianIndex{2}((2,1)) i = CartesianIndex{2}((3,1)) i = CartesianIndex{2}((1,2)) i = CartesianIndex{2}((2,2)) i = CartesianIndex{2}((3,2)) ```

A `CartesianIndex{N}` represents an `N`-dimensional index. `CartesianIndex`es are based on tuples, and indeed you can access the underlying tuple with `i.I`. However, they also support certain arithmetic operations, treating their contents like a fixed-size `Vector{Int}`. Since the length is fixed, julia/LLVM can generate very efficient code (without introducing loops) for operations with N-dimensional `CartesianIndex`es.

A `CartesianRange` is just a pair of `CartesianIndex`es, encoding the start and stop values along each dimension, respectively:

``` julia> CartesianRange(size(A)) CartesianRange{CartesianIndex{2}}(CartesianIndex{2}((1,1)),CartesianIndex{2}((3,2))) ```

You can construct these manually: for example,

```julia> CartesianRange(CartesianIndex((-7,0)), CartesianIndex((7,15))) CartesianRange{CartesianIndex{2}}(CartesianIndex{2}((-7,0)),CartesianIndex{2}((7,15))) ```

constructs a range that will loop over `-7:7` along the first dimension and `0:15` along the second.

One reason that `eachindex` is recommended over `for i = 1:length(A)` is that some `AbstractArray`s cannot be indexed efficiently with a linear index; in contrast, a much wider class of objects can be efficiently indexed with a multidimensional iterator. (SubArrays are, generally speaking, a prime example.) `eachindex` is designed to pick the most efficient iterator for the given array type. You can even use

``` for i in eachindex(A, B) ... ```

to increase the likelihood that `i` will be efficient for accessing both `A` and `B`.

As well see below, these iterators have another purpose: independent of whether the underlying arrays have efficient linear indexing, multidimensional iteration can be a powerful ally when writing algorithms. The rest of this blog post will focus on this latter application.

# Writing multidimensional algorithms with CartesianIndex iterators

## A multidimensional boxcar filter

Lets suppose we have a multidimensional array `A`, and we want to compute the moving average over a 3-by-3-by- block around each element. From any given index position, well want to sum over a region offset by `-1:1` along each dimension. Edge positions have to be treated specially, of course, to avoid going beyond the bounds of the array.

In many languages, writing a general (N-dimensional) implementation of this conceptually-simple algorithm is somewhat painful, but in Julia its a piece of cake:

``` function boxcar3(A::AbstractArray) out = similar(A) R = CartesianRange(size(A)) I1, Iend = first(R), last(R) for I in R n, s = 0, zero(eltype(out)) for J in CartesianRange(max(I1, I-I1), min(Iend, I+I1)) s += A[J] n += 1 end out[I] = s/n end out end ```

Lets walk through this line by line:

• `out = similar(A)` allocates the output. In a real implementation, youd want to be a little more careful about the element type of the output (what if the input array element type is `Int`?), but were cutting a few corners here for simplicity.

• `R = CartesianRange(size(A))` creates the iterator for the array, ranging from `CartesianIndex((1, 1, 1, ...))` to `CartesianIndex((size(A,1), size(A,2), size(A,3), ...))`. We dont use `eachindex`, because we cant be sure whether that will return a `CartesianRange` iterator, and here we explicitly need one.

• `I1 = first(R)` and `Iend = last(R)` return the lower (`CartesianIndex((1, 1, 1, ...))`) and upper (`CartesianIndex((size(A,1), size(A,2), size(A,3), ...))`) bounds of the iteration range, respectively. Well use these to ensure that we never access out-of-bounds elements of `A`.

Conveniently, `I1` can also be used to compute the offset range.

• `for I in R`: here we loop over each entry of `A`.

• `n = 0` and `s = zero(eltype(out))` initialize the accumulators. `s` will hold the sum of neighboring values. `n` will hold the number of neighbors used; in most cases, after the loop well have `n == 3^N`, but for edge points the number of valid neighbors will be smaller.

• `for J in CartesianRange(max(I1, I-I1), min(Iend, I+I1))` is probably the most clever line in the algorithm. `I-I1` is a `CartesianIndex` that is lower by 1 along each dimension, and `I+I1` is higher by 1. Therefore, this constructs a range that, for interior points, extends along each coordinate by an offset of 1 in either direction along each dimension.

However, when `I` represents an edge point, either `I-I1` or `I+I1` (or both) might be out-of-bounds. `max(I-I1, I1)` ensures that each coordinate of `J` is 1 or larger, while `min(I+I1, Iend)` ensures that `J[d] <= size(A,d)`.

• The inner loop accumulates the sum in `s` and the number of visited neighbors in `n`.

• Finally, we store the average value in `out[I]`.

Not only is this implementation simple, but it is surprisingly robust: for edge points it computes the average of whatever nearest-neighbors it has available. It even works if `size(A, d) < 3` for some dimension `d`; we dont need any error checking on the size of `A`.

## Computing a reduction

For a second example, consider the implementation of multidimensional reductions. A reduction takes an input array, and returns an array (or scalar) of smaller size. A classic example would be summing along particular dimensions of an array: given a three-dimensional array, you might want to compute the sum along dimension 2, leaving dimensions 1 and 3 intact.

### The core algorithm

An efficient way to write this algorithm requires that the output array, `B`, is pre-allocated by the caller (later well see how one might go about allocating `B` programmatically). For example, if the input `A` is of size `(l,m,n)`, then when summing along just dimension 2 the output `B` would have size `(l,1,n)`.

Given this setup, the implementation is shockingly simple:

```function sumalongdims!(B, A) # It's assumed that B has size 1 along any dimension that we're summing fill!(B, 0) Bmax = CartesianIndex(size(B)) for I in CartesianRange(size(A)) B[min(Bmax,I)] += A[I] end B end ```

The key idea behind this algorithm is encapsulated in the single statement `B[min(Bmax,I)]`. For our three-dimensional example where `A` is of size `(l,m,n)` and `B` is of size `(l,1,n)`, the inner loop is essentially equivalent to

```B[i,1,k] += A[i,j,k] ```

because `min(1,j) = 1`.

### The wrapper, and handling type-instability using function barriers

As a user, you might prefer an interface more like ```sumalongdims(A, dims)``` where `dims` specifies the dimensions you want to sum along. `dims` might be a single integer, like `2` in our example above, or (should you want to sum along multiple dimensions at once) a tuple or `Vector{Int}`. This is indeed the interface used in `sum(A, dims)`; here we want to write our own (somewhat simpler) implementation.

A bare-bones implementation of the wrapper is straightforward:

```function sumalongdims(A, dims) sz = [size(A)...] sz[[dims...]] = 1 B = Array(eltype(A), sz...) sumalongdims!(B, A) end ```

Obviously, this simple implementation skips all relevant error checking. However, here the main point I wish to explore is that the allocation of `B` turns out to be type-unstable: `sz` is a `Vector{Int}`, the length (number of elements) of a specific `Vector{Int}` is not encoded by the type itself, and therefore the dimensionality of `B` cannot be inferred.

Now, we could fix that in several ways, for example by annotating the result:

```B = Array(eltype(A), sz...)::typeof(A) ```

However, this isnt really necessary: in the remainder of this function, `B` is not used for any performance-critical operations. `B` simply gets passed to `sumalongdims!`, and its the job of the compiler to ensure that, given the type of `B`, an efficient version of `sumalongdims!` gets generated. In other words, the type instability of `B`s allocation is prevented from spreading by the fact that `B` is henceforth used only as an argument in a function call. This trick, using a function-call to separate a performance-critical step from a potentially type-unstable precursor, is sometimes referred to as introducing a function barrier.

As a general rule, when writing multidimensional code you should ensure that the main iteration is in a separate function from type-unstable precursors. Even when you take appropriate precautions, theres a potential gotcha: if your inner loop is small, julias ability to inline code might eliminate the intended function barrier, and you get dreadful performance. For this reason, its recommended that you annotate function-barrier callees with `@noinline`:

```@noinline function sumalongdims!(B, A) ... end ```

Of course, in this example theres a second motivation for making this a standalone function: if this calculation is one youre going to repeat many times, re-using the same output array can reduce the amount of memory allocation in your code.

## Filtering along a specified dimension (exploiting multiple indexes)

One final example illustrates an important new point: when you index an array, you can freely mix `CartesianIndex`es and integers. To illustrate this, well write an exponential smoothing filter. An efficient way to implement such filters is to have the smoothed output value `s[i]` depend on a combination of the current input `x[i]` and the previous filtered value `s[i-1]`; in one dimension, you can write this as

```function expfilt1!(s, x, ) 0 < <= 1 || error(" must be between 0 and 1") s[1] = x[1] for i = 2:length(a) s[i] = *x[i] + (1-)*s[i-1] end s end ```

This would result in an approximately-exponential decay with timescale `1/`.

Here, we want to implement this algorithm so that it can be used to exponentially filter an array along any chosen dimension. Once again, the implementation is surprisingly simple:

```function expfiltdim(x, dim::Integer, ) s = similar(x) Rpre = CartesianRange(size(x)[1:dim-1]) Rpost = CartesianRange(size(x)[dim+1:end]) _expfilt!(s, x, , Rpre, size(x, dim), Rpost) end @noinline function _expfilt!(s, x, , Rpre, n, Rpost) for Ipost in Rpost # Initialize the first value along the filtered dimension for Ipre in Rpre s[Ipre, 1, Ipost] = x[Ipre, 1, Ipost] end # Handle all other entries for i = 2:n for Ipre in Rpre s[Ipre, i, Ipost] = *x[Ipre, i, Ipost] + (1-)*s[Ipre, i-1, Ipost] end end end s end ```

Note once again the use of the function barrier technique. In the core algorithm (`_expfilt!`), our strategy is to use two `CartesianIndex` iterators, `Ipre` and `Ipost`, where the first covers dimensions `1:dim-1` and the second `dim+1:ndims(x)`; the filtering dimension `dim` is handled separately by an integer-index `i`. Because the filtering dimension is specified by an integer input, there is no way to infer how many entries will be within each index-tuple `Ipre` and `Ipost`. Hence, we compute the `CartesianRange`s in the type-unstable portion of the algorithm, and then pass them as arguments to the core routine `_expfilt!`.

What makes this implementation possible is the fact that we can index `x` as `x[Ipre, i, Ipost]`. Note that the total number of indexes supplied is `(dim-1) + 1 + (ndims(x)-dim)`, which is just `ndims(x)`. In general, you can supply any combination of integer and `CartesianIndex` indexes when indexing an `AbstractArray` in Julia.

The AxisAlgorithms package makes heavy use of tricks such as these, and in turn provides core support for high-performance packages like Interpolations that require multidimensional computation.

```for Ipost ... for i ... for Ipre ... x[Ipre, i, Ipost] ... ```
so that `x` would be traversed in memory-order.