Algorithm: RANSAC
by JS
RANSAC came up recently in a paper for a reading group I organize. I thought a Python implementation would make a nice addition to the algorithm bazaar that is gradually growing here at depth first search.
RANSAC stands for random sample consensus. The idea is that when you want to fit a model to some data (say, like a line characterized by a slope and intercept to some points), you could try to fit the model to some random subset of the data, and then test to see if the model generalizes well over the entire dataset (modulo any outliers).
Suppose our data was just a bunch of 2-D points (with some noise):
We might try randomly selecting some of the points in order to try to fit a model that explains the points (in this case a line). If we do this enough times, chances are our best model will look pretty good, it’ll explain most of the points and correctly label the anamolous points as outliers:
I think it’s pretty surprising that RANSAC works at all. It is, perhaps, parameter sensistive, but just the idea that a random selection of points can lead to a workable global model of the data is unexpected, and unexpectedly useful.

