Archive for the “math” Category

I’ve been reading up on graph cuts methods for low-level vision. These are methods that, when properly applied, can de-noise an image as was done in the original paper:

D. M. Greig, B. T. Porteous, and A. H. Seheult, “Exact maximum a posteriori estimation for binary images,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 271-279, 1989.

I was initially interested because I wanted to know whether there was a general connection between graph cut problems and MAP estimation (which is the fancy thing you try to do when you want to uncover the true image from a noisy one). I’m not the only one who has asked this question, as the following paper is focused on precisely defining the applicability of graph cut methods in estimation problems:

V. Kolmogorov and R. Zabin, “What energy functions can be minimized via graph cuts?” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 147-159, 2004.

I find myself somewhat unsatisfied by results that use a clever reduction from one problem (like MAP estimation in image de-noising) to another problem (like finding minimal graph cuts) where the ‘trick’ sort of happens by happy accident, and not something deeper. When this happens in pure mathematics, an awful lot of work seems to go into elucidating the nature of possible connections, searching for some deeper theory, etc. I can’t be really precise here, but I imagine something like what happened with Moonshine theory.

In general, I think it is too much to expect all the tricks of the trade of applied fields to fit into particularly elegant theories. The upside, I suppose, is that with a nice applied result, you can actually do something you could not do before.

Comments No Comments »

Entropy has always struck me as a somewhat puzzling concept. The definition is simple enough

 H(X) = - \sum_i p_i \log p_i

but raises all kinds of puzzling questions. What is with the minus sign? Why use the logarithm? Imagine for a moment that you do not know anything about logarithms, and you were trying to understand the definition above

 H(X) = - \sum_i p_i f(p_i)

where  f \equiv \log . Of course, we still have this mysterious minus sign, so lets fold that into our mystery function  g \equiv - f . Now

 H(X) = \sum_i p_i g(p_i)

looks like a weighted average of some mystery function  g .

Let’s go one step further and rewrite this definition in terms of outcomes where P(X = w_i) = p_i and h(w_i) \equiv g(P(X=w_i)) is a mystery function of the outcomes w_i of X. Then we have

 H(X) = \sum_i P(X = w_i) h(w_i).

The weights, quite naturally, are the probabilities of various outcomes. Compare this, for instance, with the expected value

E(X) = \sum_i P(X = w_i) w_i .

Now we need to introduce some intuition for entropy. We want to capture a certain common sense notion of “disorder.” In the language of probability theory, a random variable is maximally disordered if all outcomes are equally likely.

But if what if all outcomes are not equally likely? Some outcomes may be surprising (because they have low probability). Some outcomes may be unsurprising (because they have high probability). This notion of “surprise” is precisely what the mystery function is trying to measure. And entropy is the weighted average of surprise.

So, here are some properties we’d like our surprise function to have:

  • h(w_i) should be large when p_i (the probability of w_i) is small.
  • h(w_i) should be small when p_i (the probability of w_i) is large.

We can probably think of a number of functions  h that have these two properties. It turns out that we need to satisfy another property as well.

  • Suppose w_i is an event that is composed of two mutual independent events a_i and b_i. Then we’d like h(w_i) = h(a_i) + h(b_i).

Intuitively, an event should be as surprising as the sum of the surprise of any sub events. Consider the following function of an event w_i

h(w_i) = \frac{1}{P(X = w_i)}.

For large  P(X = w_i) , h(w_i) is small. Similarly, for small  P(X = w_i) , h(w_i) is large. Unfortunately, this function does not satisfy property 3 since for mutually independent sub-events, we have

h(w_i) =  \frac{1}{P(X = a_i)P(X=b_i)} \ne   \frac{1}{P(X = a_i)} + \frac{1}{P(X=b_i)}.

By using the logarithm we can define a function h that satisfies all the properties we want, including the summation property:

h(w_i) =  \log(\frac{1}{P(X = a_i)P(X=b_i)}) =   \log(\frac{1}{P(X = a_i)}) + \log(\frac{1}{P(X=b_i)}).

Now observe that

h(w_i) = \log(\frac{1}{P(X = w_i)}) = - \log(P(X = w_i)) = -\log(p_i)

leading to the definition of entropy that we started with!

Comments 1 Comment »

Now, I wish that was all I needed to say. But, the point rather cogently made by the comic above is that that isn’t good enough, that an unexamined “it just happened that we were all men” is never good enough. All of us, men and women both need to examine whether they’re being that guy in the comic. A lot of people seem to have picked up the idea that if you didn’t mean to be sexist (racist, etc.), then you weren’t. That is, that sexism is a deontological problem, not a teleological problem. As long as they don’t catch themselves being sexist, they are safe.

The quote references this comic.

Comments No Comments »

I just watched an online presentation of SAGE, a nifty toolkit for doing mathematics. One of the really neat things about SAGE is that it glues together a ton of existing open source packages. We’re beginning to transition into the stage of software engineering where software development starts (in the minds of many futurists) to resemble archeology — digging up ancient secrets and putting them to work in new ways.

Comments No Comments »

I am in fact a scorpio.

Comments No Comments »

I’m clearing out my draft posts, without actually trying to flesh them out. Anyway, here’s some questions I’m thinking about. As you may be able to infer, I’m trying to teach myself statistics.


Natural conjugate priors – prior has the same functional form as the likelihood. Is there a category theoretical explanation of “natural” in this context? Things I’m trying to understand: exponential families, sufficient statistic, natural conjugate prior.Some websites:

OCW 1
OCW 2
OCW 3
OCW 4
OCW 5
OCW 6

Comments No Comments »

I’m happy about this post because I understood it, as opposed to say, most of the stuff at n-Category Cafe. I’m especially happy because at one time or another, I knew the statement of, and the corresponding proof of:

1. The Hairy Ball Theorem
2. Cauchy Integral Theorem
3. Nash Equilibrium Theorem

I just hope it’s not a problem that I don’t exactly remember the details anymore.

Random Thought: I’m skeptical of category theory mainly because I’m not entirely sure what advantage we derive from theory encoded in categorical language over theory as classically presented. I’m worried that the difference is only aesthetic. What’s worse, category theorists don’t seem to be able to explain the advantages in anything other than categorical language. Though this comes close.

Comments No Comments »

I was a math major in college. Though I’m sure the experience was beneficial, upon reflection I can’t identify how any content I learned in the math classes I took applies to computer science. I’m using a lot of Bayesian inference stuff right now, and PCA, but these two topics never came up in the any of the math courses I took, though linear algebra did cover Eigenvalues and Eigenvectors.

I recently read a book on category theory and computer science. The categorical view of computation is somewhat interesting, but category theory in general is much more complicated than the small set of categories relevant to computer science. In some sense, I’m still searching for uses of non-elementary mathematics in computer science.

An essential cornerstone of undergraduate mathematics is the study of abstract algebra. No undergraduate mathematics curriculum is complete without a good introduction to groups, rings and fields, up to at least Galois theory. Groups in particular, are supposed to be very general algebraic objects.

So my question to you, are there any topics in computer science that benefit from group theory?

I have a semi-answer, Polya’s Theorem. This is a theorem in combinatorics that depends on group actions, both in its statement and proof. It seems highly relevant to computer science, being such a pivotal counting result. And yet I can’t actually think of a specific use for it in computer science.

I guess I’m just wondering whether I’ve been wasting my time trying to learn mathematics.

UPDATE: Turing’s proof that the halting problem is not computable implies Godel’s incompleteness theorem. And Curry-Howard correspondence links type systems with constructive logic, which shows that certain areas of logic are best thought of computationally. I guess I’m interested in applications of group theory in part because it’s a step or two removed from logic.

Comments No Comments »

How can you have inverses but still have zero divisors, you may wonder? It’s because nonassociativity is bad juju.

From a comment on this.

Then there’s the following from the same comment thread:

There are illegal primes, but being prime numbers, they are inevitably real, positive, rational, integral, natural and wholesome.

A goldmine!

And here’s a spurious link to a comment on that thread that I might want to write about later.

Comments No Comments »