Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process[PDF] [BibTeX]
Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, Bayesian inference methods often require high-dimensional averages and can be slow to compute, especially with the potentially unbounded representations associated with nonparametric models. We address the challenge of scaling nonparametric Bayesian inference to the increasingly large datasets found in real-world applications, focusing on the case of parallelising inference in the Indian Buffet Process (IBP). Our approach divides a large data set between multiple processors. The processors use message passing to compute likelihoods in an asynchronous, distributed fashion and to propagate statistics about the global Bayesian posterior. This novel MCMC sampler is the first parallel inference scheme for IBP-based models, scaling to datasets orders of magnitude larger than had previously been possible.