depth first search

“We can only see a short distance ahead, but we can see plenty there that needs to be done."

This Week in Microblogging

  • Avengers was awesome. I think Whedon has finally unlocked the "buy a vineyard" achievement. #
  • That's still a thing that famous directors do, right? #
  • That was the taxi line from hell. #
  • Today I get the chance to vote against all the city council incumbents who allowed the plastic bag ban to proceed a few months ago. #

This Week in Microblogging

  • Is it me, or is the more surprising story the fact that Facebook just gives out usage data like this? http://t.co/p1r1MYNT #
  • Seems like the question, "How many people are using Draw Something?" should be very difficult to answer unless you work at Zynga. #
  • But no, the answer is right there on the internet. Weird. #
  • "The price of entertainment is obedience." http://t.co/kFIsbrnT #

Speed and Memory

In a previous post I talked about some optimization I needed to do to get PCA to run on a corpus of images. In that post, I tended to blur the lines between fast and memory efficient. The relationship between speed of execution and memory usage is more complicated, but for the specific problem I was running into, swap space usage caused a decrease in computation speed. With heavy use of swap space, the memory pipeline grows in length. Reducing memory usage shortens the memory pipeline and has a clear impact on computation speed. In the jargon, this involved turning an IO bound process back into a CPU bound process.

There are other possible relationships between speed and memory efficiency. For example, you can trade memory for speed, or speed for memory. Lookup tables of precomputed values, for instance, trade away memory for speed (a lookup is faster than a value recomputation). The reverse, where values are recomputed as necessary, trades CPU cycles for memory.

Crypto and Criminal

Every time I read stories like this (http://news.ycombinator.com/item?id=3929507) I can’t help but wonder. I mean, it seems like the technology already exists for simple encrypted friend-2-friend email/chat/voice. If not, I’m sure a smart programmer with basic crypto knowledge and access to existing toolsets could put one together in something like one weekend and poof goes the entire law enforcement wiretap agenda.

But I suppose the issue is not that smart criminals can avoid wiretaps, it’s that dumb criminals can’t. There’s a more disturbing option, which is that criminals aren’t really the target at all…

Friday Home Office Downtempo Jam


Fast PCA in Cython

Yesterday I spent some time trying to optimize a rather large PCA-type transformation for some images. The task was such that if I wasn’t careful, I’d run out of physical memory and end up using swap space (where the physical disk is used for memory).

I found out that numpy dot products can, for some reason I don’t yet understand, blow up the memory usage. So can pickle and/or bz2 file compression. Eventually I found a reasonably memory/time efficient approach using Cython + memmap and a few other tricks.

Ultimately, this entire problem was sort of a side show. If my data didn’t fit in memory (with little to spare) to start with I would have considered some sort of sub-sampling approach. But the fact that the data did fit in memory, and a careful consideration of the linear algebra I needed to do led me to believe I could do what I needed to do in the remaining memory, made for a compelling optimization problem.

It’s hard to turn away from an interesting problem, even if there are simpler solutions that take less time and avoid the issue entirely.

You can find my updated PCA code here: https://github.com/stober/pca.

This Week in Microblogging

  • Not a bad Tuesday night. http://t.co/ARrvOKSH #
  • Scratch that. Blues is the only cure for bitterness. #
  • I really need a good grammar checker for email to catch when I accidentally a word. #
  • So this is neat: http://t.co/GHwoqGzS (ht @kottke) #
  • When domain name transfers go bad, they go really really bad. #
  • whois information from GoDaddy for my .bz domain is not propagating to any other whois servers. #
  • After reading more than I want to about whois infrastructure, I'm not sure whois propagation is required. #
  • But other .bz domain registrars have domains with whois information that propagates just fine. #
  • I suspect "brogrammer" stories proliferate because people really like using the word brogrammer. #
  • Does every gender related issue need a futzy new name? #
  • Let's trivialize this problem while driving up page views by creating a cheeky new portmanteau! #

This Week in Microblogging

  • Today I learned about the TEXINPUT environment variable. Living the dream. #
  • I'm not sure what to think of this controversy. There hasn't been a Metafilter thread yet. #
  • Fatalism is the only cure for bitterness. #

This Week in Microblogging

  • I always think of great things to say about 15 minutes after a conversation ends. #
  • Amazing how much a decent chair can improve focus and productivity. #
  • Also, you can get by with a cheap new chair as long as your previous chair was old and really cheap. Relative comfort matters! #
  • When you really get into it, you realize that Python package and module management is a complete fucking mess. #
  • At least it's not Scala. #
  • I'm not sure what's worse, that there exists a whitespace-mode in Emacs or the fact that I was in desperate need of it. #
  • Okay, I'll admit that Cython is pretty neat. #
  • Longhorn run! #

Quote of the Day

Texting is no excuse for not being prepared to see a bear right in front of you. There is no excuse. I am bear-ready 100% of the time. On a rollercoaster at Disneyland, you still need to be thinking, “What will I do, if there is suddenly a bear?”