Entropy
by JS
Entropy has always struck me as a somewhat puzzling concept. The definition is simple enough
but raises all kinds of puzzling questions. What is with the minus sign? Why use the logarithm? Imagine for a moment that you do not know anything about logarithms, and you were trying to understand the definition above
where . Of course, we still have this mysterious minus sign, so lets fold that into our mystery function
. Now
looks like a weighted average of some mystery function .
Let’s go one step further and rewrite this definition in terms of outcomes where and
is a mystery function of the outcomes
of
. Then we have
.
The weights, quite naturally, are the probabilities of various outcomes. Compare this, for instance, with the expected value
.
Now we need to introduce some intuition for entropy. We want to capture a certain common sense notion of “disorder.” In the language of probability theory, a random variable is maximally disordered if all outcomes are equally likely.
But if what if all outcomes are not equally likely? Some outcomes may be surprising (because they have low probability). Some outcomes may be unsurprising (because they have high probability). This notion of “surprise” is precisely what the mystery function is trying to measure. And entropy is the weighted average of surprise.
So, here are some properties we’d like our surprise function to have:
should be large when
(the probability of
) is small.
should be small when
(the probability of
) is large.
We can probably think of a number of functions that have these two properties. It turns out that we need to satisfy another property as well.
- Suppose
is an event that is composed of two mutual independent events
and
. Then we’d like
.
Intuitively, an event should be as surprising as the sum of the surprise of any sub events. Consider the following function of an event
.
For large ,
is small. Similarly, for small
,
is large. Unfortunately, this function does not satisfy property 3 since for mutually independent sub-events, we have
.
By using the logarithm we can define a function that satisfies all the properties we want, including the summation property:
.
Now observe that
leading to the definition of entropy that we started with!

Comments
Using M’s computer. I think its novel (and surprising) to see entropy defined as the amount of (-surprise). Am I right that high entropy is not surprising and low entropy is?
Anyway the novelty (surprise) is great especially when the result leads to the low entropy result of the existing equation!
- D