Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Thomas Griffiths, Mark Steyvers, David Blei, Joshua Tenenbaum
Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or seman- tics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short- and long-range dependencies respectively.