{"title": "Planar Hidden Markov Modeling: From Speech to Optical Character Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 731, "page_last": 738, "abstract": null, "full_text": "Planar Hidden Markov Modeling: \n\nfrom Speech to Optical Character Recognition \n\nEsther Levin and Roberto Pieraccini \n\nA IT Bell Laboratories \n\n600 Mountain Ave. \n\nMurray Hill, NJ 07974 \n\nAbstract \n\nWe propose in  this paper a  statistical  model  (planar hidden  Markov model  -\nPHMM)  describing  statistical  properties  of images.  The  model generalizes \nthe single-dimensional HMM,  used for speech processing, to  the  planar case. \nFor this model to be useful an efficient segmentation algorithm, similar to the \nViterbi  algorithm  for  HMM,  must exist  We  present conditions  in  terms  of \nthe  PHMM  parameters  that  are  sufficient  to  guarantee  that  the  planar \nsegmentation  problem  can  be  solved  in  polynomial  time,  and  describe  an \nalgorithm for that.  This algorithm aligns optimally the image with the model, \nand  therefore  is  insensitive  to  elastic  distortions  of  images.  Using  this \nalgorithm a joint optima1  segmentation and recognition of the  image  can  be \nperformed, thus overcoming the  weakness of traditional OCR systems where \nsegmentation  is  performed  independently  before  the  recognition  leading  to \nunrecoverable recognition errors. \n\nTbe  PHMM  approach  was  evaluated  using  a  set  of  isolated  band-written \ndigits.  An  overall  digit  recognition  accuracy  of  95%  was  acbieved.  An \nanalysis of the results showed  that even in  the  simple case of recognition  of \nisolated  characters,  the  elimination  of  elastic  distortions  enhances  the \nperformance Significantly. We expect that the advantage of this approach  will \nbe  even  more \nsuch  as  connected  writing \nrecognition/spotting,  for  whicb  there  is  no  known  high  accuracy  method  of \nrecognition. \n\nsignificant \n\nfor \n\ntasks \n\n1  Introduction \n\nThe  performance  of traditional  OCR systems deteriorate  very  quickly  when  documents \nare  degraded  by  noise,  blur,  and  other  forms  of distortion.  Tbe  main  reason  for  sucb \ndeterioration is that in addition to the intra-class cbaracter variability caused by distortion, \nthe  segmentation of the text into words and characters becomes a nontrivial task.  In most \nof the  traditional systems, such segmentation is done before recognition, leading to  many \nrecognition  errors,  since  recognition  algorithms  cannot  usually  recover  from  errors \nintroduced in  the  segmentation pbase. Moreover,  in  many  cases  the  segmentation  is  ill(cid:173)\ndefined,  since  many  plausible  segmentations  migbt  exist,  and  only  grammatical  and \nlinguistic analysis can  find  the  \"rigbt  \" one.  To address  these problems,  an  algorithm  is \nneeded that can : \n\n\u2022  be tolerant to distortions leading to intra-class variability \n\n731 \n\n\f732 \n\nLevin and Pieraccini \n\n\u2022  perform  segmentation  together  with  recogruuon, \n\nthus  jointly  optimizing  both \n\nprocesses, while incorporating grammatica1llinguistic constraints. \n\nIn this paper we  describe a  planar segmentation algorithm  that has  the above properties. \nIt results from a direct extension of the Viterbi (Forney,  1973) algorithm,  widely used in \nautomatic speech recognition, to two-dimensional signals. \nIn  the  next  section  we  desaibe  the  basic  hidden  Markov  model  and  define  the \nsegmentation problem.  In section 3 we introduce the planar HMM that extends the HMM \nconcept  to  model  images.  The  planar  segmentation  problem  for  PHMM  is  defined  in \nsection 4.  It was recently  shown  (Kearns and Levin,  1992)  that the  planar segmentation \nproblem  is  NP-hard,  and  therefore,  in  order  to obtain  an  effective  planar  segmentation \nalgorithm,  we  propose  to  constrain  the  parameters  of the  PHMM.  We  show  sufficient \nconditions  in  terms of PHMM  parameters  for  such  algorithm  to exist and  describe  the \nalgorithm.  This  approach  differs  from  the  one  taken  in  references  (Chellappa  and \nChatterjee,  1985) and (Derin and  Elliot, 1987), where instead of restricting the  problem, \na  suboptimal  solution  to  the  general  problem  was  fmUld.  Since  in  (Kearns  and  Levin, \n1992)  it was also shown  that planar segmentation problem is  hard  to  approximate,  such \nsuboptimal  solution  doesn't have  any  guaranteed  bounds.  The  segmentation  algorithm \ncan  now  be  used  effectively  not  only  for  aligning  isolated  images,  but  also  for  joint \nrecognition/segmentation,  eliminating  the  need of independent segmentation that usually \nleads to unrecoverable errors in recognition. The same algorithm is used for estimation of \nthe  parameters  of the  model  given  a  set  of example  images.  In  section  5,  results  of \nisolated  hand-written  digit  recognition  experiments  are  presented.  The results  indicate \nthat  even  in  the  simple  case of isolated characters,  the elimination  of planar distottions \nenhances the performance significantly. Section 6 contains the summary of this work. \n\n2  Hidden Markov Model \n\nthat \n\nis  used \n\nto  describe \n\nis  a  statistical  model \n\nThe  HMM \ntemporal  signals \nG= (g(t): 1 ~t~ T, g E  G  c  Rill  in  speech processing applications (Rabiner,  1989; Lee \net ai.,  1990; Wilpon et ai.,  1990; Pieraccini and Levin,  1991).  The HMM is a composite \nstatistical source comprising a set s:::  { 1,  ... ,TR}  of TR  sources called states.  The i-th \nstate, i E  S,  is characterized by  its  probability  distribution Pj(g) over G.  At each  time t \nonly one of the  states is active, emitting the observable g(t). We denote by  s(t), s(t) E  s \nthe  random  variable  corresponding  to  the  active  state  at  time  t.  The  joint probability \ndistribution  (for  real-valued  g)  or  discrete  probability  mass  (for  g  being  a  discrete \nvariable) P (s(t),g(t\u00bb for t > 1 is characterized by the following property: \n\nP(s(t),g(t) I s(1:t-l),g(1:t-l\u00bb=P(s(t) I s(t-l\u00bb  P(g(t) I s(t\u00bb== \n\n(1) \n\n=P(s(t) I s(t-l\u00bb ps(r)(g(t\u00bb , \n\nfor \n\nstands \n\ns(l:t-l) \n\nand \nwhere \nsequence \ng(l:t-l)= (g(1), ... ,g(t-l)}.  We  denote  by  ajj \ntransition  probability \nP(s(t)=j  I s(t-l)=i),  and  by  ~,  the  probability  of  state  i  being  active  at  t=l, \n1tj =P(s(1)=i).  The  probability  of  the  entire  sequence  of  states  S=s(1:n  and \nobservations G=g(1:T) can be expressed as \n\n(s(1),  ... s(t-l) l, \nthe \n\nthe \n\nP(G,S)=1ts(1)Ps(1)(g(1\u00bb  n as(r-l)s(r)  Ps(r)(g(t\u00bb. \n\nT \n\nr=2 \n\n(2) \n\nThe interpretation of equations (1)  and (2) is that the observable sequence G is generated \nin  two  stages:  first,  a  sequence  S  of  T  states  is  chosen  according  to  the  Markovian \ndisfribution parametrized by  {a jj }  and {1t;};  then each one of the states s (t),  1~~T, in  S \ngenerates an observable g(t) according to  its own memoryless distribution PS(I)'  forming \nthe  observable  sequence  G.  This model  is  called a  hidLlen  Markov  model,  because  the \nstate  sequence  S  is  not  given,  and  only  the  observation  sequence  G  is  known.  A \nparticular  case  of this  model,  called  a  left-ta-right  HMM,  where  ajj =0  for  j<i,  and \n\n\fPlanar Hidden Markov  Modeling:  from  Speech  to  Optical Character Recognition \n\n733 \n\n1t, = I,  is  especially  useful  for  speech  recognition.  In  this case  each  state  of the  model \nrepresents an unspecified acoustic unit, and due to the  \"left-to-rigbt\"  structure,  the  whole \nword is modeled as a concatenation of such acoustic \\D1its.  The time spent in each of the \nstates is  not fixed,  and therefore the model can  take into account the duration variability \nbetween different utterances of the same word. \n\nThe  segmAentation  problem  of  HMM  is  that  of  estimating  the  most  probable  state \nsequence S, given the observation G, \n\nS=ar~P(S I G)=ar~P(G,S). \n\ns \n\ns \n\n(3) \n\nA \n\nThe problem  of finding  S through  exhaustive  search  is of exponential complexity,  since \nthere exist  TTl  possible state sequences,  but it can be solved in  polynomial time  using  a \ndynamic  programming  approach  (i.e.  Viterbi  algorithm).  The  segmentation  plays  a \ncentral role in all HMM-based speech recognizers, since for connected speech it gives the \nsegmentation into words or sub-word units, and performs a recognition simultaneously, in \nan optimal way.  This is in contrast to  sequential systems, in which the connected speech \nis first  segmented into  wordslsubwords  according  to  some rules, and  than  the  individual \nsegments  are  recognized  by  computing  the  appropriate  likelihoods,  and  where  many \nrecognition  errors  are  caused  by  tmrecoverable  segmentation  errors.  Higher-level \nsyntactical  knowledge  can  be  integrated  into  decoding  process  through  transition \nprobabilities  between  the  models.  The  segmentation  is  also  used  for  estimating  the \nHMMs parameters using a corpus of a training data. \n\n3  The Two-Dimensional Case: Planar HMM \n\na \n\nfor \n\nplanar \n\nthis \n\nsection  we \n\ndescribe \n\nstatistical  model \n\nIn \nimage \nG={g(x,y):(x,y)e Lx.Y ,  g e  G}.  We  call  this  model  \"Planar  HMM\"  (pHMM)  and \ndesign it to extend the advantages of conventional HMM to the two-dimensional case. \nThe  PHMM  is  a  composite  source,  comprising  a  set  s = {(i,y), I~~R' l~y~Y R}  of \nN=XRYR  states.  Each  state  in  s  is  a  stochastic  source  characterized  by  its  probability \ndensity  Pi.y(g)  over  the  space  of observations  g e  G.  It  is  convenient  to  think  of the \nstates of the model as being located on a rectangular lattice where each state corresponds \nto  a  pixel  of the  corresponding  reference  image.  Similarly  to  the  conventional  HMM, \nonly  one  state is active in the generation of the (x,y)-th  image pixel g (x,y).  We denote \nby  s(x,y) e  s the  active  state of the  model  that generates g (x,y).  The joint distribution \ngoverning  the  choice  of active  states  and  image  values  has  the  following  Markovian \nproperty: \n\nP(g(x,y), s(x,y)  I g(1:X, I:y-l), g(1:x-I,y), s(l:X,I:y-l),s(l:x-l,y\u00bb= \n\n(4) \n\n=P(g(x,y) I s(x,y\u00bb  P(s(x,y)  I s(x-l,y),s(x,y-l)= \n\n=PS(z.y)(g(x,y\u00bb  P(s(x,y)  I s(x-l,y),s(x,y-l\u00bb= \n\nwhere g(l:X,y-l)= {g (x,y): (x,y) e  Rx.y-d, g (1:x-l,y)= {g (l,y), ... ,g (x-l,y)}, and \ns(l:X,l:y-l),  s(1:x-l,y)  are  the  active  states  involved  in  generating  g(1:X,y-l), \ng(1:x-l,y), respectively, and RX,y-l  is an  axis  parallel rectangle between  the  origin and \nthe point (X,y-l).  Similarly to  the  one-dimensional case, it is  useful  to define a  left-to(cid:173)\nright  bottom-up  PHMM  where  P(s(x,y)=(m,n) I s(x-l,y)::=(i,j),s(x,y-l)=(k,l)):;t:{)  only \nwhen  i9n  and  l~, that  does  not  allow  for  \"fold  overs\"  in  the  state  image.  The \nMarkovian  property  (4)  allows  the  lefl-t<rright  bottom-up  PHMM  to  model  elastic \ndistortions  among  different  realizations  of  the  same  image,  similarly  to  the  way  the \nMarkovian property in  left-to-right HMM handles temporal alignment  We have chosen \nthis  definition  (4)  of Markovian  property rather than others (see  for example Oerin and \nKelly,  1989) since it leads to the formulation of a segmentation problem which is similar \nto the planar alignment defined in (Levin and Pieraccini, 1992). \n\n\f734 \n\nLevin and  Pieraccini \n\nUsing property  (4),  the joint likelihood of the  image G = g(l:X, l:Y) and the  state  image \nS=s(l:X, l:Y) can be written as \n\nx  y \n\nP(G,S)= nnps(r,y)(g(x.y\u00bb \n\n(5) \n\nX  H \n\n1I:s (I,I) n as (.x-I, I),s(r,1)  n as (I,y-I),s(l,y)  n n As (r-I ,y),s(r,y-I),s (r,y)' \n\nY  X \n\nr=1 y .. 1 \nV \n\nY \n\nr~ \n\ny~ \n\ny~z~ \n\nwhere: \n\nand \n\nA (i,j),(t,i),(m,II) =P (s (x.y) = (m,n)  I s (x-l,y) = (i,j), s (x.y-l) = (k,l) ), \n\nH \n\na(i,j),(m,II)  =P(s(x.l)=(m,n) I s(x-l,l)=(i,j\u00bb, \nv \na(t,I),(m,II) = P (s (l,y) = (m, n) I s (l,y) = (k,1) ), \n\n1I:ij =P(s(l, l)=(i,j\u00bb \n\ndenote  the  generalized  transition  probabilities  of  PHMM,  Similarly  to  HMM,  (5) \nsuggests that an image G is generated by the PHMM in two successive stages:  in the first \nstage the  state matrix S  iven~d according  to  the Markovian probability distribution \nparametrized by  {A},  {a  },  {a  },  and {1t},  In  the  second stage,  the  image  value in the \n(x,y)-th pixel is produced independently from other pixels according to the distribution of \nthe s(x,y)-th state  ps(z,y)(g).  As in  HMM,  the  state matrix S in most of the applications \nis not known, only G is observed, \n\nA \n\n4  Planar Segmentation Problem \nThe  segmentation  problem  of  PHMM  is  that  of  finding  the  state  matrix  S  that  best \nexplains  the  observable  image  G and defines  an  optimal  alignment  of the image  to  the \nmodel. Solving this problem eliminates the sensitivity to inlra-class elastic distortions and \nallows  for  simulqrneous  segmentation/recognition  of  images  similarly  to  the  one(cid:173)\ndimensional case. S can be estimated as in (3) by S = Qrgmax P (G,S).  If we approach this \nmaximization  by  exhaustive  search,  the  computational  complexity  is  exponential,  since \nthere are (XR  yR)XY  different state matrices,  Since the segmentation problem is NP-hard \n(Kearns  and  Levin,  1992),  we  suggest  to  simplify  the  problem  by  constraining  the \nparameters of the PHMM, so  that efficient segmentation algorithm can  be  found.  In this \nsection  we  present  conditions  in  terms  of the  generalized  transitiop  probabilities  of \nPHMM that are sufficient to guarantee that the most likely state image S can be computed \nin polynomial time, and describe an algorithm for doing that. \n\ns \n\nA \n\nFor  the  problem  of  finding  S  to  be  solved  in  polynomial  time.  there  should  exist  a \ngrouPin~G of the  set s of states of the model into  NG  mutually exclusivel  subsets of states \n\"(P'  s = U  \"(po  The  generalized  transition  probabilities  should  satisfy  the  two  following \nconstraints with respect to such grouping: \n\np=1 \n\nA (i,j),(t,I),(m,II) ;to  ; a(i,j), (m,II)  ;to \n\nH \n\nonlyifthereexistsp,  l$p$NG,  sucbthat(i,j),(m,n)  E  \"(p. \n\nA (i,j),(t,I),(m,II) =A(i,i),(kl,I\\),(m,II)  ;  a(t,I),(m,II) =a(tl,ld,(m.II) \n\nv \n\nv \n\n(6) \n\n(7) \n\nI  It  is  lXlssible  to  drop  the  mutually  exclusiveness  constraints  by  duplicating  states,  but  then  we  have  to \nensure that the number of subsets,  NG, should be lXl1ynomial in the dimensions of the model XR\u2022 YR , \n\n\fPlanar Hidden Markov Modeling:  from  Speech to Optical Character Recognition \n\n735 \n\nif there exists p , 1 Sp SNG.  such that (k,l) , (khl l )  E  Yp. \n\nCondition (6) means that the the left neighbor (i,j) of the state (m,n) in the state matrix S \nmust be a member of the same subset Yp  as (m, n).  Condition (7) means that the value of \ntransition probability A  (i.j).(k.I).(Ift.,,)  does not depend explicitly on the identity  (k, l) of the \nbottom neighboring state, but only on the subset Yp  to which (k,l) belongs. \n\nUnder  (6)  and  (7)  the  most  likely  state  matrix  S  can  be  found  using  an  algorithm \ndescribed  in  (Levin  and  Pieraccini,  1992).  This  algorithm  makes  use  of  the  Viterbi \nprocedure at two different levels.  In the first  stage optimal segmentation is computed for \neach  subset yp  with each  image raw using  Viterbi.  Then  global  segmentation  is  fmmd, \nthrough  Viterbi,  by  combining  optimally  the  segmentations  obtained  in  the  previous \nstage. \n\n\" \n\nAlthough conditions  (6),(7) are hard to check in practice  since any  possible grouping  of \nthe  states has  to be considered,  they  can  be  effectively  used  in  constructive  mode,  i.e., \nchosing  one  particular  grouping,  and  then  imposing  the  constraints  (6)  and  (7)  on  the \ngeneralized  transition  probabilities  with  respect  to  this  grouping.  For  example,  if we \nchoose Yp= {(i,y)  I IS.iSXR, y =p },  1 Sp S YR, then the constraints (6),(7) become: \n\nA (i.j).(l.I).(Ift.,,);;':O,  a(i.i).(m.,,)  ;;,:0  only for j = n  , \n\nH \n\nand, \n\nA  (i.i).(l.I).(Ift.,,)=A(i.i).(kl.l).(Ift.,,),  a(l.l).(m.,,)  =a(ll.I).(Ift.,,)  for  ISk lt k  SXR \u2022 \n\nv \n\nv \n\n(8) \n\n(9) \n\nNote  that  constraints  (6),  (7)  break  the  symmetry  between  the  roles  of  the  two \ncoordinates.  Other  sets  of conditions  can  be  obtained  from  (6)  and  (7)  by  coordinate \ntransformation.  For  example,  the  roles  of the  vertical  and  the  horizontal  axes  can  be \nexchanged.  A  grouping  and  constraints  set  chosen  for  a  particular  application  should \nreflect the geometric properties of the images. \n\n5  Experimental Results \nThe  PHMM  approach  was  tested  on  a  writer-independent  isolated  handwritten  digit \nrecognition  application.  The  data  we  used  in  our  experiments  was  collected  from  12 \nsubjects  (6 for  training and  6  for test).  Each  subject was  asked to  write  10 samples  of \neach digiL  Samples were written in fixed-size boxes, therefore naturally size-normalized \nand centered.  Each sample in the database was represented by a 16x16 binary image. \n\nEach character class  (digit)  was represented by  a  single  PHMM,  satisfying  (6)  and  (7). \nEach PHMM had a strictly left-to-right bottom-up structure,  where the state matrix S was \nrestricted to contain every state of the model, i.e., states could not be skipped.  All models \nhad the  same number of states.  Each state was represented by its own binary probability \ndistribution,  i.e.,  the  probability of a  pixel  being  1  (black)  or 0  (white).  We  estimated \nthese probabilities from  the training data with the following  generalization of the Viterbi \ntraining  algorithm  (Jelinek,  1976).  For  the  initialization  we  uniformly  divided  each \ntraining image into regions corresponding to  the states of its model.  The initial value  of \nPj (g=I) for  the  i-th  state  was  obtained as  a  frequency  count of the  black  pixels in  the \ncorresponding  region  over  all  the  samples  of  the  same  digiL  Each  iteration  of  the \nalgorithm consisted of two stages:  first,,,the  samples were aligned with the corresponding \nmodel, by finding the best state matrix S.  Then, a new frequency count for each state was \nused  to  update P j (1),  according  to  the obtained alignment.  We noticed  that  the  training \nprocedure converged usually after 2-4 iterations, and in all the experiments the algorithm \nwas  stopped at the  10th iteration.  The recognition  was  performed  by  assigning  the  test \nsample to the class k  for  which the alignment likelihood was maximal. \n\n\f736 \n\nLevin and  Pieraccini \n\nTable  1 shows the number of errors in the recognition of the  training  set and the  test set \nfor different sizes of the models. \n\nNumber of states \n\nXR=YR \n\n6 \n8 \n9 \n10 \n11 \n12 \n16 \n\nRecognition Errors \nTest \nTraining \n82 \n50 \n48 \n32 \n38 \n42 \n64 \n\n78 \n36 \n35 \n26 \n21 \n18 \n36 \n\nTable  1:  Number  of errors  in  the  recognition  of the  training  set  and  the  test  set  for \n\ndifferent size of the models (out of 600 trials in both cases) \n\nIt  is  worth  noting  the  following  two  points.  First,  the  test error  shows a  minimum  for \nXR = YR = 10 of 5%.  By increasing or decreasing the  number of states this error increases. \nThis phenomenon is due to the following: \n\n1.  The typical under/over parametrization behavior. \n\n2. \n\nIncreasing the number of states closer to the size of the modeled images reduces the \nflexibility  of  the  alignment  procedure,  making  this  a  trivial  uniform  alignment \nwhen XR = YR = 16. \n\nAlso,  the  training  error decreases  monotonically  with  increasing  number of states  up  to \nXR = Y R = 16.  This  is  again  typical  behavior  for  such  systems,  since  by  increasing  the \nnumber of states, the number of model parameters grows, improving the fit  to the training \ndata.  But  when  the  number  of  states  equals  the  dimensions  of  the  sample  images, \nXR = YR = 16,  there is a sudden Significant increase in  the  training error.  This behavior is \nconsistent with point (2) above. \n\nFig.  1  shows  three  sets  of models  with  different  numbers  of states.  The  states  of the \nmodels  in  this  figure  are  represented  by  squares,  where  the  grey  level  of  the  square \nthe  probability  P(g=I).  The  (6x6)  state  models  have  a  very  coarse \nencodes \nrepresentation of the digits,  because the  number of states is so  small.  The (lOxl0) state \nmodels appear much sharper than  the  (16x16)  state  models,  due  to their ability  to align \nthe training samples. \n\nThis  preliminary  experiment shows  that eliminating  elastic distortions by the  alignment \nprocedure  discussed  above  plays  an  important  role  in  the  task  of isolated  character \nrecognition, improving the recognition accuracy significantly.  Note that the simplicity of \nthis  task does  not stress  the full  power of the  PHMM  representation,  since the  data was \nisolated,  size-normalized,  and  centered.  On  this  task,  the  achieved  performance  is \ncomparable to that of many other OCR systems. We expect that in harder tasks, involving \nconnected  text,  the  advantage  of  the  proposed  method  will  enhance  the  performance. \nRecently,  this  approach  is  being  successfully applied  to  the  task of recognition  of noisy \ndegraded printed messages (Agazzi et aL,  1993). \n\n6  Summary and Discussion \nIn  this  paper  we  describe  a  planar  hidden  Markov  model  and  develop  a  planar \nsegmentation  algorithm  that  generalizes  the  Viterbi  procedure  widely  used  in  speech \nrecognition. \noptimal \nrecognition/segmentation  of  images  incorporating  some  grammatical  constraints  and \ntolerating  intra-class elastic distortions.  The PHMM approach was tested on an isolated, \nhand-written digit recognition application.  An  analysis of the results indicate that even in \na  simple  case  of  isolated  characters,  the  elimination  of  elastic  distortions  enhances \n\nalgorithm \n\nused \n\nperform \n\njoint \n\nThis \n\ncan \n\nbe \n\nto \n\n\fPlanar Hidden Markov  Modeling:  from  Speech to  Optical  Character Recognition \n\n737 \n\nrecognition performance significantly. We expect that the advantage of this approach  will \nbe even more  valuable  in  harder tasks,  such  as  cursive  writing  recognition/spotting,  for \nwhich an effective solution using the current available techniques has not yet been found . \n\n\u2022 \n\na\u00b7t ,~;. \n.'.;; \n$: \n\n&~:J \n\n.:~.::.'. \n\n:til:::.: \n.:.:. \n\u00b7fi .::: \n\n, \n\n::::: \n\nFigure 1:  Three sets of models with 6x6,  lOxlO, and 16x16 states. \n\nReferences \n\nO.  E.  Agazzi,  S.  S.  Kuo,  E.  Levin,  R.  Pieraccini,  \"  Connected  and  Degraded  Text \nRecognition Using Planar Hidden Markov Models,\" \nProc.  Of Int.  COnference on Acoustics Speech and Signal Processing, April 1993. \n\nR.  Chellappa,  S.  Chatterjee,  \"Classification of textures Using  Gaussian Markov  Random \nFields,\" IEEE Transactions on ASSP , Vol. 33, No.4, pp. 959-963, August 1985. \n\n\f738 \n\nLevin  and Pieraccini \n\nH.  Derin,  H.  Elliot,  \"Modeling and Segmentation of Noisy and Textured Images Using \nGibbs Random Fields,\"  IEEE Transactions on PAMI, Vol. 9,  No.1 pp.  39-55, January \n1987. \n\nH.  Derin,  P.  A.  Kelly,  'Discrete-Index  Markov-Type  Random  Processes,'  in  IEEE \nProceedings, vol 77, #10, pp.1485-1510, 1989 \n\nG.D. Forney, \"The Viterbi algorithm,\" Proc. IEEE. Mar. 1973. \n\nF.  Jelinek,  \"Continuous  Speech  Recognition  by  Statistical  Methods,\"  Proceedings  of \nIEEE,  vol. 64, pp. 532-556, April 1976. \n\nM. Keams, E. Levin, Unpublished,  1992. \n\nC.-H.  Lee,  L.  R.  Rabiner,  R.  Pieraccini,  J.  G.  Wilpon,  \"Acoustic  Modeling  for  Large \nVocabulary  Speech  Recognition,\"  Computer  Speech  and  Language,  1990,  No.4,  pp. \n127-165. \n\nE. Levin, R.  Pieraccini, \"Dynamic Planar Warping and Planar Hidden Markov Modeling: \nfrom  Speech  to  Optical  Character  Recognition,\"  submitted  to  IEEE  Trans.  on  PAMl. \n1992. \n\nR.  Pieraccini,  E.  Levin,  \"Stochastic  Representation  of  Semantic  Structure  for  Speech \nUnderstanding,\"  Proceedings  of  EUROSPEECH  91,  Vo1.2,  pp.  383-386,  Genova, \nSeptember 1991. \n\nL.R.  Rabiner,  \"A  Tutorial  on  Hidden  Markov  Models  and  Selected  Applications  in \nSpeech Recognition,\" Proc.  IEEE, Feb. 1989. \n\n1.  G.  Wilpon,  L.  R.  Rabiner,  C.-H.  Lee,  E.  R.  Goldman,  \"Automatic  Recognition  of \nKeywords  in  Unconstrained  Speech  Using  Hidden  Markov  Models,\"  IEEE  Trans.  on \nASSP, Vol. 38, No. 11, pp 1870-1878, November 1990. \n\n\f", "award": [], "sourceid": 633, "authors": [{"given_name": "Esther", "family_name": "Levin", "institution": null}, {"given_name": "Roberto", "family_name": "Pieraccini", "institution": null}]}