Latent Dirichlet Allocation

Blei, David; Ng, Andrew; Jordan, Michael

Latent Dirichlet Allocation

David M. Blei, Andrew Y. Ng, Michael I. Jordan

Advances in Neural Information Processing Systems 14 (NIPS 2001)

Abstract

We propose a generative model for text and other collections of dis(cid:173) crete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof(cid:173) mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present em(cid:173) pirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

Abstract

Name Change Policy