Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
Padhraic Smyth
This paper discusses a probabilistic model-based approach to clus(cid:173) tering sequences, using hidden Markov models (HMMs) . The prob(cid:173) lem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experi(cid:173) mental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.