Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Mark Paskin
Unsupervised learning algorithms have been derived for several sta(cid:173) tistical models of English grammar, but their computational com(cid:173) plexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an effi(cid:173) cient EM training algorithm. The model is based upon grammat(cid:173) ical bigrams, i.e. , syntactic relationships between pairs of words. We present the results of experiments that quantify the represen(cid:173) tational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text.