Unsupervised Classifiers, Mutual Information and 'Phantom Targets

Part of Advances in Neural Information Processing Systems 4 (NIPS 1991)

Bibtex Metadata Paper


John Bridle, Anthony Heading, David MacKay


David J.e. MacKay

California Institute of Technology 139-74

Pasadena CA 91125 U.S.A

We derive criteria for training adaptive classifier networks to perform unsu(cid:173) pervised data analysis. The first criterion turns a simple Gaussian classifier into a simple Gaussian mixture analyser. The second criterion, which is much more generally applicable, is based on mutual information. It simpli(cid:173) fies to an intuitively reasonable difference between two entropy functions, one encouraging 'decisiveness,' the other 'fairness' to the alternat.ive in(cid:173) terpretations of the input. This 'firm but fair' criterion can be applied to any network that produces probability-type outputs, but it does not necessarily lead to useful behavior.

1 Unsupervised Classification

One of the main distinctions made in discussing neural network architectures, and pattern analysis algorithms generally, is between supervised and unsupervised data analysis. We should therefore be interested in any method of building bridges between techniques in these two categories. For instance, it is possible to use an unsupervised system such as a Boltzmann machine to learn the joint distribution of inputs and a teacher's classificat.ion labels. The particular type of bridge we seek is a method of taking a supervised pattern classifier and turning it into an unsupervised data analyser. That is, we are interested in methods of "bootstrapping" classifiers.

Consider a classifier system. Its input is a vector x, and the output is a probability vector y(x). (That is, the elements ofy are positive and sum to 1.) The elements of y, (Yi (x), i = 1 ... N c ) are to be taken as the probabilities that x should be assigned to each of Nc classes. (Note that our definition of classifier does not include a decision process.)