From PAC-Bayes Bounds to KL Regularization

Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)

Bibtex Metadata Paper


Pascal Germain, Alexandre Lacasse, Mario Marchand, Sara Shanian, Fran├žois Laviolette


We show that convex KL-regularized objective functions are obtained from a PAC-Bayes risk bound when using convex loss functions for the stochastic Gibbs classifier that upper-bound the standard zero-one loss used for the weighted majority vote. By restricting ourselves to a class of posteriors, that we call quasi uniform, we propose a simple coordinate descent learning algorithm to minimize the proposed KL-regularized cost function. We show that standard ellp-regularized objective functions currently used, such as ridge regression and ellp-regularized boosting, are obtained from a relaxation of the KL divergence between the quasi uniform posterior and the uniform prior. We present numerical experiments where the proposed learning algorithm generally outperforms ridge regression and AdaBoost.