Quote of the Day
by JS
I mentioned Developing Intelligence the other day, so I thought I’d quote from there and hopefully spread a little knowledge around:
Anyway, the Bennett point is simple: when you run a large number of statistical tests simultaneously, even on a random dataset, you’re bound to find some percentage of tests that turn up “significant” just as a result of chance, and with some probability those significant results will randomly cluster together in 3D space. If one fails to correct the significance threshold for the large number of statistical tests performed, then you get unreliable results, even if you only consider those significant results that cluster in 3D space.
This quote is in reference to a humorous poster by Bennett et al. about zombie fish. If you want to read more about this and other common statistical mistakes take a look at the notes here. Part III includes a discussion of multiple inference (sorry, no zombie fish in the notes).

Comments
I was just talking with a statistician about this yesterday. To me, it seems arbitrary to correct for the significance threshold within a single paper, or single body of work, or any boundary. If I run 40 tests, yeah, some are going to be significant just by chance. But for any individual test, it shouldn’t matter that I ran 40 others. If I just run one test, there are 39 groups at other places running 39 other tests, what’s the difference?
I guess it comes down to the question you are asking. If you are asking “is there a significant result somewhere in this data?” Then, it makes sense to correct for the number of tests you are running.
Well, the problem is that without proper corrections to p-values, the significant results in your data may be of the “zombie fish” variety.