{"title": "Planar Hidden Markov Modeling: From Speech to Optical Character Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 731, "page_last": 738, "abstract": null, "full_text": "Planar Hidden Markov Modeling: \n\nfrom Speech to Optical Character Recognition \n\nEsther Levin and Roberto Pieraccini \n\nA IT Bell Laboratories \n\n600 Mountain Ave. \n\nMurray Hill, NJ 07974 \n\nAbstract \n\nWe propose in this paper a statistical model (planar hidden Markov model -\nPHMM) describing statistical properties of images. The model generalizes \nthe single-dimensional HMM, used for speech processing, to the planar case. \nFor this model to be useful an efficient segmentation algorithm, similar to the \nViterbi algorithm for HMM, must exist We present conditions in terms of \nthe PHMM parameters that are sufficient to guarantee that the planar \nsegmentation problem can be solved in polynomial time, and describe an \nalgorithm for that. This algorithm aligns optimally the image with the model, \nand therefore is insensitive to elastic distortions of images. Using this \nalgorithm a joint optima1 segmentation and recognition of the image can be \nperformed, thus overcoming the weakness of traditional OCR systems where \nsegmentation is performed independently before the recognition leading to \nunrecoverable recognition errors. \n\nTbe PHMM approach was evaluated using a set of isolated band-written \ndigits. An overall digit recognition accuracy of 95% was acbieved. An \nanalysis of the results showed that even in the simple case of recognition of \nisolated characters, the elimination of elastic distortions enhances the \nperformance Significantly. We expect that the advantage of this approach will \nbe even more \nsuch as connected writing \nrecognition/spotting, for whicb there is no known high accuracy method of \nrecognition. \n\nsignificant \n\nfor \n\ntasks \n\n1 Introduction \n\nThe performance of traditional OCR systems deteriorate very quickly when documents \nare degraded by noise, blur, and other forms of distortion. Tbe main reason for sucb \ndeterioration is that in addition to the intra-class cbaracter variability caused by distortion, \nthe segmentation of the text into words and characters becomes a nontrivial task. In most \nof the traditional systems, such segmentation is done before recognition, leading to many \nrecognition errors, since recognition algorithms cannot usually recover from errors \nintroduced in the segmentation pbase. Moreover, in many cases the segmentation is ill(cid:173)\ndefined, since many plausible segmentations migbt exist, and only grammatical and \nlinguistic analysis can find the \"rigbt \" one. To address these problems, an algorithm is \nneeded that can : \n\n\u2022 be tolerant to distortions leading to intra-class variability \n\n731 \n\n\f732 \n\nLevin and Pieraccini \n\n\u2022 perform segmentation together with recogruuon, \n\nthus jointly optimizing both \n\nprocesses, while incorporating grammatica1llinguistic constraints. \n\nIn this paper we describe a planar segmentation algorithm that has the above properties. \nIt results from a direct extension of the Viterbi (Forney, 1973) algorithm, widely used in \nautomatic speech recognition, to two-dimensional signals. \nIn the next section we desaibe the basic hidden Markov model and define the \nsegmentation problem. In section 3 we introduce the planar HMM that extends the HMM \nconcept to model images. The planar segmentation problem for PHMM is defined in \nsection 4. It was recently shown (Kearns and Levin, 1992) that the planar segmentation \nproblem is NP-hard, and therefore, in order to obtain an effective planar segmentation \nalgorithm, we propose to constrain the parameters of the PHMM. We show sufficient \nconditions in terms of PHMM parameters for such algorithm to exist and describe the \nalgorithm. This approach differs from the one taken in references (Chellappa and \nChatterjee, 1985) and (Derin and Elliot, 1987), where instead of restricting the problem, \na suboptimal solution to the general problem was fmUld. Since in (Kearns and Levin, \n1992) it was also shown that planar segmentation problem is hard to approximate, such \nsuboptimal solution doesn't have any guaranteed bounds. The segmentation algorithm \ncan now be used effectively not only for aligning isolated images, but also for joint \nrecognition/segmentation, eliminating the need of independent segmentation that usually \nleads to unrecoverable errors in recognition. The same algorithm is used for estimation of \nthe parameters of the model given a set of example images. In section 5, results of \nisolated hand-written digit recognition experiments are presented. The results indicate \nthat even in the simple case of isolated characters, the elimination of planar distottions \nenhances the performance significantly. Section 6 contains the summary of this work. \n\n2 Hidden Markov Model \n\nthat \n\nis used \n\nto describe \n\nis a statistical model \n\nThe HMM \ntemporal signals \nG= (g(t): 1 ~t~ T, g E G c Rill in speech processing applications (Rabiner, 1989; Lee \net ai., 1990; Wilpon et ai., 1990; Pieraccini and Levin, 1991). The HMM is a composite \nstatistical source comprising a set s::: { 1, ... ,TR} of TR sources called states. The i-th \nstate, i E S, is characterized by its probability distribution Pj(g) over G. At each time t \nonly one of the states is active, emitting the observable g(t). We denote by s(t), s(t) E s \nthe random variable corresponding to the active state at time t. The joint probability \ndistribution (for real-valued g) or discrete probability mass (for g being a discrete \nvariable) P (s(t),g(t\u00bb for t > 1 is characterized by the following property: \n\nP(s(t),g(t) I s(1:t-l),g(1:t-l\u00bb=P(s(t) I s(t-l\u00bb P(g(t) I s(t\u00bb== \n\n(1) \n\n=P(s(t) I s(t-l\u00bb ps(r)(g(t\u00bb , \n\nfor \n\nstands \n\ns(l:t-l) \n\nand \nwhere \nsequence \ng(l:t-l)= (g(1), ... ,g(t-l)}. We denote by ajj \ntransition probability \nP(s(t)=j I s(t-l)=i), and by ~, the probability of state i being active at t=l, \n1tj =P(s(1)=i). The probability of the entire sequence of states S=s(1:n and \nobservations G=g(1:T) can be expressed as \n\n(s(1), ... s(t-l) l, \nthe \n\nthe \n\nP(G,S)=1ts(1)Ps(1)(g(1\u00bb n as(r-l)s(r) Ps(r)(g(t\u00bb. \n\nT \n\nr=2 \n\n(2) \n\nThe interpretation of equations (1) and (2) is that the observable sequence G is generated \nin two stages: first, a sequence S of T states is chosen according to the Markovian \ndisfribution parametrized by {a jj } and {1t;}; then each one of the states s (t), 1~~T, in S \ngenerates an observable g(t) according to its own memoryless distribution PS(I)' forming \nthe observable sequence G. This model is called a hidLlen Markov model, because the \nstate sequence S is not given, and only the observation sequence G is known. A \nparticular case of this model, called a left-ta-right HMM, where ajj =0 for j