A Probabilistic Model for Learning Concatenative Morphology

Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)

Bibtex Metadata Paper


Matthew Snover, Michael Brent


This paper describes a system for the unsupervised learning of morpho- logical suffixes and stems from word lists. The system is composed of a generative probability model and hill-climbing and directed search algo- rithms. By extracting and examining morphologically rich subsets of an input lexicon, the directed search identifies highly productive paradigms. The hill-climbing algorithm then further maximizes the probability of the hypothesis. Quantitative results are shown by measuring the accuracy of the morphological relations identified. Experiments in English and Pol- ish, as well as comparisons with another recent unsupervised morphol- ogy learning algorithm demonstrate the effectiveness of this technique.