Computing All Shortest Paths in Python
by JS
A research problem I’m working on involves Isomap, a method of dimensionality reduction that requires computing all shortest paths over relatively large graphs. In interpreted languages like Matlab this is often done through the use of the Floyd-Warshall algorithm, a simple dynamic programming approach to computing all shortest paths. In the reference Isomap code this algorithm takes only three lines:
for k=1:N D = min(D,repmat(D(:,k),[1 N])+repmat(D(k,:),[N 1])); end
The benefits of Floyd-Warshall are two-fold for languages like Matlab. The first is that the implementation is short and simple. The second is that the implementation can employ a number of linear algebra primitives that are highly optimized in languages like Matlab. In the code snippet above the repmat, addition, and min operations are all implemented as low-level highly optimized library calls. This means that Floyd-Warshall, despite the non-optimal runtime, is often the fastest Matlab approach to computing all shortest paths even on reasonably large problem sizes.
In my work, I’ve reimplemented Isomap in Python (which will soon appear in my algorithms gallery). My implementation of Floyd-Warshall uses NumPy for access to similarly optimized low-level linear algebra routines. Here’s my original Python code snippet:
for k in range(n): adj = np.minimum( adj, np.add.outer(adj[:,k],adj[k,:]) ) return adj
Note that I used outer products in my code, instead of the repmat approach in the original Isomap implementation. I decided to see if using tile, the NumPy equivalent or repmat, would result in an even faster Floyd-Warshall implementation:
for k in range(n): adj = np.minimum(adj, np.tile(adj[:,k].reshape(-1,1),(1,n)) + np.tile(adj[k,:],(n,1))) return adj
To test these two approaches, I generated a few random graphs on 1000 nodes and computed all shortest paths several times (to account for variations in CPU usage). To my delight, it turns out that my original implementation ran an average of 30 seconds faster (42.8 seconds) than the approach using tile (72.7 seconds). Though more experimentation might be required to make this comparison definitive, I am trying to uncover possible reasons for the performance difference, and in particular why the Matlab code uses repmat instead of outer products (I have yet to compare performance of different Matlab versions of this algorithm).
Comments
I wonder, does this have real world applications? For instance, shortest path would be very helpful in a warehouse/distribution center setting. It would be highly beneficial to be able to determine the shortest route through a warehouse when filling a multiple item order.
This has all sorts of real world applications! What you describe, however, seems like a shortest path problem of a slightly different sort. You want shortest paths that are constrained to include certain way points (the locations of the components of the order). The algorithm above computes the shortest paths between any two points, not any five, ten, or more points. What you describe seems closer to another famous (and much harder!) problem known as the traveling salesman problem: http://en.wikipedia.org/wiki/Travelling_salesman_problem.
[...] compared two Python implementations for computing shortest paths in a previous post. This was all inspired by some Matlab code for computing Isomap that contained the following [...]
We switched to repmat because it made the code run an order of magnitude faster on the MATLAB at the time. The problem with outer products is that it replicates the given vector by repeatedly performing floating-point multiplication by 1, as opposed to simply making copies of the entries.
cheers
Vin
do you have the full set of code? like an example beginning to end?