Large Scale Bayes Point Machines

Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)

Bibtex Metadata Paper


Ralf Herbrich, Thore Graepel


also known as the Bayes point -

The concept of averaging over classifiers is fundamental to the Bayesian analysis of learning. Based on this viewpoint, it has re(cid:173) cently been demonstrated for linear classifiers that the centre of mass of version space (the set of all classifiers consistent with the training set) - exhibits excel(cid:173) lent generalisation abilities. However, the billiard algorithm as pre(cid:173) sented in [4] is restricted to small sample size because it requires o (m 2 ) of memory and 0 (N . m2 ) computational steps where m is the number of training patterns and N is the number of random draws from the posterior distribution. In this paper we present a method based on the simple perceptron learning algorithm which allows to overcome this algorithmic drawback. The method is al(cid:173) gorithmically simple and is easily extended to the multi-class case. We present experimental results on the MNIST data set of hand(cid:173) written digits which show that Bayes point machines (BPMs) are competitive with the current world champion, the support vector machine. In addition, the computational complexity of BPMs can be tuned by varying the number of samples from the posterior. Finally, rejecting test points on the basis of their (approximative) posterior probability leads to a rapid decrease in generalisation er(cid:173) ror, e.g. 0.1% generalisation error for a given rejection rate of 10%.