Making Latin Manuscripts Searchable using gHMM's

Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

Bibtex Metadata Paper

Authors

Jaety Edwards, Yee Teh, Roger Bock, Michael Maire, Grace Vesom, David Forsyth

Abstract

We describe a method that can make a scanned, handwritten mediaeval latin manuscript accessible to full text search. A generalized HMM is fitted, using transcribed latin to obtain a transition model and one exam- ple each of 22 letters to obtain an emission model. We show results for unigram, bigram and trigram models. Our method transcribes 25 pages of a manuscript of Terence with fair accuracy (75% of letters correctly transcribed). Search results are very strong; we use examples of vari- ant spellings to demonstrate that the search respects the ink of the doc- ument. Furthermore, our model produces fair searches on a document from which we obtained no training data.