Unsupervised learning of distributions on binary vectors using two layer networks

Part of Advances in Neural Information Processing Systems 4 (NIPS 1991)

Bibtex Metadata Paper


Yoav Freund, David Haussler


We study a particular type of Boltzmann machine with a bipartite graph structure called a harmo(cid:173) nium. Our interest is in using such a machine to model a probability distribution on binary input vectors. We analyze the class of probability distributions that can be modeled by such machines. showing that for each n ~ 1 this class includes arbitrarily good appwximations to any distribution on the set of all n-vectors of binary inputs. We then present two learning algorithms for these machines .. The first learning algorithm is the standard gradient ascent heuristic for computing maximum likelihood estimates for the parameters (i.e. weights and thresholds) of the modeL Here we give a closed form for this gradient that is significantly easier to compute than the corresponding gradient for the general Boltzmann machine . The second learning algorithm is a greedy method that creates the hidden units and computes their weights one at a time. This method is a variant of the standard method for projection pursuit density estimation . We give experimental results for these learning methods on synthetic data and natural data from the domain of handwritten digits.