Bayesian Agglomerative Clustering with Coalescents

Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)

Yee Teh, Hal Daume III, Daniel M. Roy


We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman’s coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over the state-of-the-art, and demonstrate our approach in document clustering and phylolinguistics.