A research problem I'm working on involves Isomap, a method of dimensionality reduction that requires computing all shortest paths over relatively large graphs. In interpreted languages like Matlab this is often done through the use of the Floyd-Warshall algorithm, a simple $latex O(n^3)$ dynamic programming approach to computing all shortest paths. In the reference Isomap code this algorithm takes only three lines:

1 2 3 | for k=1:N D = min(D,repmat(D(:,k),[1 N])+repmat(D(k,:),[N 1])); end |

The benefits of Floyd-Warshall are two-fold for languages like Matlab. The first is that the implementation is short and simple. The second is that the implementation can employ a number of linear algebra primitives that are highly optimized in languages like Matlab. In the code snippet above the repmat, addition, and min operations are all implemented as low-level highly optimized library calls. This means that Floyd-Warshall, despite the non-optimal runtime, is often the fastest Matlab approach to computing all shortest paths even on reasonably large problem sizes.

In my work, I've reimplemented Isomap in Python (which will soon appear in my algorithms gallery). My implementation of Floyd-Warshall uses NumPy for access to similarly optimized low-level linear algebra routines. Here's my original Python code snippet:

1 2 3 | for k in range(n): adj = np.minimum( adj, np.add.outer(adj[:,k],adj[k,:]) ) return adj |

Note that I used outer products in my code, instead of the repmat approach in the original Isomap implementation. I decided to see if using tile, the NumPy equivalent or repmat, would result in an even faster Floyd-Warshall implementation:

1 2 3 | for k in range(n): adj = np.minimum(adj, np.tile(adj[:,k].reshape(-1,1),(1,n)) + np.tile(adj[k,:],(n,1))) return adj |

To test these two approaches, I generated a few random graphs on 1000 nodes and computed all shortest paths several times (to account for variations in CPU usage). To my delight, it turns out that my original implementation ran an average of 30 seconds faster (42.8 seconds) than the approach using tile (72.7 seconds). Though more experimentation might be required to make this comparison definitive, I am trying to uncover possible reasons for the performance difference, and in particular why the Matlab code uses repmat instead of outer products (I have yet to compare performance of different Matlab versions of this algorithm).