Yesterday I spent some time trying to optimize a rather large PCA-type transformation for some images. The task was such that if I wasn’t careful, I’d run out of physical memory and end up using swap space (where the physical disk is used for memory).
I found out that numpy dot products can, for some reason I don’t yet understand, blow up the memory usage. So can pickle and/or bz2 file compression. Eventually I found a reasonably memory/time efficient approach using Cython + memmap and a few other tricks.
Ultimately, this entire problem was sort of a side show. If my data didn’t fit in memory (with little to spare) to start with I would have considered some sort of sub-sampling approach. But the fact that the data did fit in memory, and a careful consideration of the linear algebra I needed to do led me to believe I could do what I needed to do in the remaining memory, made for a compelling optimization problem.
It’s hard to turn away from an interesting problem, even if there are simpler solutions that take less time and avoid the issue entirely.
You can find my updated PCA code here: https://github.com/stober/pca.