Bayesian Averaging is Well-Temperated

Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)

Bibtex Metadata Paper


Lars Hansen


Bayesian predictions are stochastic just like predictions of any other inference scheme that generalize from a finite sample. While a sim(cid:173) ple variational argument shows that Bayes averaging is generaliza(cid:173) tion optimal given that the prior matches the teacher parameter distribution the situation is less clear if the teacher distribution is unknown. I define a class of averaging procedures, the temperated likelihoods, including both Bayes averaging with a uniform prior and maximum likelihood estimation as special cases. I show that Bayes is generalization optimal in this family for any teacher dis(cid:173) tribution for two learning problems that are analytically tractable: learning the mean of a Gaussian and asymptotics of smooth learn(cid:173) ers.