Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Anita Faul, Michael Tipping
The recent introduction of the 'relevance vector machine' has effec(cid:173) tively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective func(cid:173) tion, and demonstrates that conditioned on an individual hyper(cid:173) parameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model.