On the Ugly Duckling Theorem

by JS

While reading Pattern Classification by Duda, Hart, and Stork, I came across what is known as the Ugly Duckling Theorem. The theorem is simple to state.

Insofar as we use a finite set of predicates that are capable of distinguishing any two objects considered, the number of predicates shared by any two such objects is constant, independent of the choice of the two objects.

You can think of the term predicate as any kind of yes or no question we may ask about the object in order to determine that object’s place in a hierarchy. For this discussion you may find it useful to keep in mind an example hierarchy, such as the biological hierarchy. If we naively try to determine how related two animals are, we might ask certain kinds of yes or no questions of each and compare the results. For example, is the animal warm blooded? Does the animal live on land?

[Aside: Biological classification is much more complex than the cartoonish example I'm putting forward as an illustrative example of the use of predicates in determining similarity. The Ugly Duckling Theorem, however, applies to any grouping scheme, including the complex evolutionary analysis that informs our modern understanding of the classification of living organisms. ]

Classifying objects into categories is a central problem in machine learning. Many of the algorithms I’ve explored on this blog were designed to make various facets of that process tractable on computers. I think it is worthwhile to explore some of the limits of what we should expect machine learning algorithms to do. For example, can we consider a simple approach to determining similarity based on predicate knowledge?

The somewhat surprising answer is no. For any finite set of predicates, any two objects will agree on a constant number of such predicates. The tricky observation that Watanabe (this theory’s originator) makes is that we also need to consider all logical relations of predicates. That is, if we use predicates A and B to determine the similarity between two objects, we ought to consider predicates constructed out of logical relations among A and B. Taking all of these together, we find that any two objects will agree on the same number of these predicates. In other words, every object is equidistant from every other object.

Clearly, we are able to measure similarity between objects, and we have machine learning algorithms that are also able to infer distinctions based on similarity measures. The way around the Ugly Duckling Theorem is to weight propositions differently. Not all propositions for comparing objects are created equal. Intuitively, the difference between plants and animals is “larger” than the difference between warm and cold-blooded animals.

One way to see how the number of shared predicates might be object independent is to consider some predicates, Red and Black, and two objects, one red and one black. Now we might intuit that these objects do not share any predicates, but in fact they do. They share the predicate “Red or Black.” It may happen that “Red or Black” is not a predicate that we would ever consider important, but its lack of importance is a property we place on the predicate, and prevents the sort of predicate counting approach that the Ugly Duckling Theorem rules out.

I still have some questions I’m trying to work through with regards to this theorem. For example, it seems like using all prepositions to determine similarity is like trying to learn a concept class consisting of all Boolean formula in N variables. I suspect we can draw some sort of equivalence between the Ugly Duckling Theorem and the idea that you have to restrict the space of possible hypotheses when trying to learn a classification function, even though the theorem applies to predicates, whereas hypothesis classes are essentially spaces of functions.

The other issue is how a weighting of propositions comes about. For us, does evolution or development provide our intrinsic propositional biases? It seems like effective similarities arise because they are useful, not because they are general. Can we capture this in a kind of improved version of the Ugly Duckling Theorem?

  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Tumblr
  • Twitter