{"title": "Recognition-based Segmentation of On-Line Cursive Handwriting", "book": "Advances in Neural Information Processing Systems", "page_first": 777, "page_last": 784, "abstract": null, "full_text": "Recognition-based Segmentation of \n\nOn-line Cursive Handwriting \n\nNicholas S. Flann \n\nDepartment of Computer Science \n\nUtah State University \nLogan, UT 84322-4205 \n\nflannGnick.cs.usu.edu \n\nAbstract \n\nThis paper introduces a new recognition-based segmentation ap(cid:173)\nproach to recognizing on-line cursive handwriting from a database \nof 10,000 English words. The original input stream of z, y pen coor(cid:173)\ndinates is encoded as a sequence of uniform stroke descriptions that \nare processed by six feed-forward neural-networks, each designed \nto recognize letters of different sizes. Words are then recognized by \nperforming best-first search over the space of all possible segmen(cid:173)\ntations. Results demonstrate that the method is effective at both \nwriter dependent recognition (1.7% to 15.5% error rate) and writer \nindependent recognition (5.2% to 31.1% error rate). \n\n1 \n\nIntroduction \n\nWith the advent of pen-based computers, the problem of automatically recognizing \nhandwriting from the motions of a pen has gained much significance. Progress has \nbeen made in reading disjoint block letters [Weissman et. ai, 93]. However, cursive \nwriting is much quicker and natural for humans, but poses a significant challenge to \npattern recognition systems because of its variability, ambiguity and need to both \nsegment and recognize the individual letters. Recent techniques employing self(cid:173)\norganizing networks are described in [Morasso et. ai, 93] and [Schomaker, 1993]. \nThis paper presents an alternative approach based on feed-forward networks. \n\nOn-line handwriting consists of writing with a pen on a touch-terminal or digitizing \n\n777 \n\n\f778 \n\nFlann \n\n(a) \n\n(b) \n\n(c) \n\n(d) \n\n(e) \n\nFigure 1: The five principal stages of preprocessing: (a) The original data, z, Y \nvalues sampled every 10mS. (b) The slant is normalized through a shear transfor(cid:173)\nmation; (c) Stroke boundaries are determined at points where y velocity equals 0 or \npen-up or pen-down events occur; (d) Delayed strokes are reordered and associated \nwith corresponding strokes of the same letters; (e) Each stroke is resampled in space \nto correspond to exactly 8 points. Note pen-down strokes are shown as thick lines, \npen-up strokes as thin lines. \n\n\fRecognition-Based Segmentation of On-Line Cursive Handwriting \n\n779 \n\ntablet. The device produces a regular stream of z, y coordinates, describing the \npositions of the pen while writing. Hence the problem of recognizing on-line cur(cid:173)\nsively written words is one of mapping a variable length sequence of z, y coordinates \nto a variable length sequence of letters. Developing a system that can accurately \nperform this mapping faces two principal problems: First, because handwriting is \ndone with little regularity in speed, there is unavoidable variability in input size. \nSecond, because no pen-up events or spatial gaps signal the end of one letter and the \nbeginning of the next, the system must perform both segmentation and recognition. \n\nThis second problem necessitates the development of a recognition-based segmenta(cid:173)\ntion approach. In [Schenkel et al., 93] one such approach is described for connected \nblock letter recognition where the system learns to recognize segmentation points. \nIn this paper an alternative method is presented that first performs exhaustive \nrecognition then searches the space of possible segmentations. The remainder of \nthe paper describes the method in more detail and presents results that demon(cid:173)\nstrate its effectiveness at recognizing a variety of cursive handwriting styles. \n\n2 Methodology \n\nThe recognition system consists of three subsystems: (a) the preprocessor that maps \nthe initial stream of z, y coordinates to a stream of stroke descriptions; (b) the letter \nclassifier that learns to recognize individual letters of different size; and ( c) the word \nfinder that performs recognition-based segmentation over the output of the letter \nclassifier to identify the most likely word written. \n\n2.1 Preprocessing \n\nThe preprocessing stage follows steps outlined in [Guerfali & Plamondon, 93] and \nis illustrated in Figure 1. First the original data is smoothed by passing it through \na low-pass filter, then reslanted to make the major stroke directions vertical. This \nis achieved by computing the mean angle of all the individual lines then applying \na shear transformation to remove it. Second, the strokes boundaries are identified \nas points when if = 0 or when the pen is picked up or put down. Zero y velocity \nwas chosen rather than minimum absolute velocity [Morasso et. ai, 93] since it was \nfound to be more robust. Third, delayed strokes such as those that dot an i or cross \na t are reordered to be associated with their corresponding letter. Here the delayed \nstroke is placed to immediately follow the closest down stroke and linked into the \nstroke sequence by straight line pen-up strokes. Fourth, each stroke is resampled in \nthe space domain (using linear interpolation) so as to represent it as exactly eight \nz, y coordinates. Finally the new stream of z, y coordinates is converted to a stream \nof 14 feature values. \n\nEight of these features are similar to those used in [Weissman et. ai, 93], and repre(cid:173)\nsent the angular acceleration (as the sin and cos of the angle), the angular velocity \nof the line (as the sin and cos of the angle), the z, y coordinates (z has a linear \nramp removed), and first differential ox,Oy. One feature denotes whether the pen \nwas down or up when the line was drawn. The remaining features encode more \nabstract information about the stroke. \n\n\f780 \n\nFlann \n\n\u2022 \n\n32 \n\nFigure 2: The pyramid-style architecture of the network used to recognize 2 stroke \nletters. The input size is 32 x 14; 32 is from the 4 input strokes (each represented by \n8 resampled points), two central strokes from the letter and the 2 context strokes, \none each side; 14 is from the number of features employed to represent each point. \nNot all the receptive fields are shown. The first hidden layer consists of 7 fields, \n4 over each stroke and 3 more spanning the stroke boundaries. The next hidden \nlayer consists of 5 fields, each spanning 3 x 20 inputs. The output is a 32 bit \nerror-correcting code. \n\nJ.) ~\"I v~c.'fJcr/ \"~lI\"\")c' (/\" .p/ ~'l q\\) /.h.l/ ..... ')\"/\\1\\.1 Jt\u00b7z./I' l'(cid:173)\n-V..c..A.U,I A {jAAVv ....... t A ~...J)'l~.n\",l1v..-t..>...,--ZUv ..... U.,.,,,( .lI\\ \n.-Y.w r .M/l.JYV.JJ. ~ At.. ~ fA. \"'\"'I.t.N. ~ .I .. L.r.,.. U. f\" I' ry \\{\\J?'\\J)1 LA ~ \n..,. ..n...d t...rt '( f,l.-v tV 'i> r' 1/\"J1. tt I'-' V (,fJ 1\\./11 \\....-\"\\ ~ r.r S)y' U Iv' hV (..; \n\\..0.m \"Yi.IW11. ... ~ W ~.-;,(...vy..p/v~\\.6\\~ J..v rn ~ ~d~ AlA t \nbY)U> _~.bA ~ u...Yv:.)~ )AA. \\.Oe!;\\IVY' M1~~ /\\.$\\ t.W f1-~~, \n\nFigure 3: Examples of the class \"other\" for stroke sizes 1 though 6. Each letter is \na random fragment of a word, such that it is not an alphabetic letter. \n\n\fRecognition-Based Segmentation of On-Line Cursive Handwriting \n\n781 \n\n2.2 Letter Recognition \n\nThe letter classifier consists of six separate pyramid-style neural-networks, each \nwith an architecture suitable for recognizing a letter of one through six strokes. \nA neural network designed to recognize letters of size j strokes encodes a map(cid:173)\nping from a sequence of j + 2 stroke descriptions to a 32 bit error-correcting code \n[Dietterich & Bakiri, 91]. Experiments have shown this use of a context window \nimproves performance, since the allograph of the current letter is dependent on the \nallographs of the previous and following letters. The network architecture for stroke \nsize two is illustrated in Figure 2. The architecture is similar to a time-delayed \nneural-network [Lang & Waibel, 90] in that the hierarchical structure enables dif(cid:173)\nferent levels of abstract features to be learned. However, the individual receptive \nfields are not shared as in a TDNN, since translational variance is not problem and \nthe sequence of data is important. \n\nThe networks are trained using 80% of the raw data collected. This set is further \ndivided into a training and a verification set. All training and verification data is \npreprocessed and hand segmented, via a graphical interface, into letter samples. \nThese are then sorted according to size and assembled into distinct training and \nverification sets. It is often the case that the same letter will appear in multiple \nsize files due to variability in writing and different contexts (such as when an 0 is \nfollowed by a 9 it is at least a 3 stroke allograph, while an 0 followed by an 1 is \nusually only a two stroke allograph). Included in these letter samples are samples \nof a new letter class \"other,\" illustrated in Figure 3. Experiments demonstrated \nthat use of an \"other\" class tightens decision boundaries and thus prevents spurious \nfragments-of which there are many during performance-from being recognized as \nreal letters. Each network is trained using back-propagation until correctness on \nthe verification set is maximized, usually requiring less than 100 epochs. \n\n2.3 Word Interpreter \n\nTo identify the correct word, the word interpreter explores the space of all possible \nsegmentations of the input stroke sequence. First, the input sequence is partitioned \ninto all possible fragments of size one through six, then the appropriately sized \nnetwork is used to classify each fragment. An example of this process is illustrated \nas a matrix in Figure 4(a). \n\nThe word interpreter then performs a search of this matrix to identify candidate \nwords. Figure 4(b) and Figure 4(c) illustrates two sets of candidate words found \nfor the example in Figure 4(a). Candidates in this search process are generated \naccording to the following constraints: \n\n\u2022 A legal segmentation point of the input stream is one where no two adja(cid:173)\ncent fragments overlap or leave a gap. To impose this constraint the i'th \nfragment of size j may be extended by all of the i + j fragments, if they \nexist. \n\n\u2022 A legal candidate letter sequence must be a subsequence of a word in the \n\nprovided lexicon of expected English words. \n\n\f782 \n\nFlann \n\nUiL-tiollary Siz .. - (J \n\nDktioJliU)' Siu-107.a!:l \n\n1\u00bbAAE \n\n2)ARE \n\n&)QAf \n\n1)ARE \n\n2)ARf \n\n3)ARf \n\nS)ORf \n\nFigure 4: (a) The matrix of fragments and their classifications that is generated by \napplying the letter recognizers to a sample of the word are. The original handwriting \nsample, following preprocessing, is given at the top of the matrix. The bottom row \nof the matrix corresponds to all fragments of size one (with zero overlap), the second \nrow to all fragments of size two (with an overlap of one stroke) etc. The column \nof letters in each fragment box represents the letter classifications generated by \nthe neural network of appropriate size. The higher the letter in the column, the \nmore confident the classification. Those fragments with no high scoring letter were \nrecognized as examples of the class \"other.\" (b) The first five candidates found by \nthe word recognizer employing no lexicon. The first column is the word recognized, \nthe second column is the score for that word, the third is the sequence of fragments \nand their classifications. (c) The first five candidates found by the word recognizer \nemploying a lexicon of 10748 words. \n\n\fRecognition-Based Segmentation of On-Line Cursive Handwriting \n\n783 \n\nIn a forward search, a candidate of size n consists of: (a) a legal sequence of frag(cid:173)\nments It, 12, .. . , In that form a prefix of the input stroke sequence, (b) a sequence \nof letters It, 12 , \u2022 \u2022\u2022 , In that form a prefix of an English word from the given dictio(cid:173)\nnary and (c) a score s for this candidate, defined as the average letter recognition \nerror: \n\nE?-l 6(1., Ii) \n8 = ==---:.,;...;.,,;.~ \n\nn \n\nwhere 6(/i, Ii) is the hamming distance between letter Is's code and the actual code \nproduced by the neural network when given Ii as input. This scoring function is \nthe same as employed in [Edelman et. ai, 90]. \n\nThe best word candidate is one that conforms with the constraints and has the \nlowest score. Although this is a reasonable scoring function, it is easy to show \nthat it is not admissible when used as an evaluation function in forward search. \nWith a forward search, problems arise when the prefix of the correct word is poorly \nrecognized. To help combat this problem without greatly increasing the size of the \nsearch space, both forward and backward search is performed. \nSearch is initiated by first generating all one letter and one fragment prefix and suffix \ncandidates. Then at each step in the search, the candidate with the lowest score is \nexpanded by considering the cross product of all legal letter extensions (according to \nthe lexicon) with all legal fragment extensions (according to the fragment-sequence \nconstraints). The list of candidates is maintained as a heap for efficiency. The search \nprocess terminates when the best candidate satisfies: (1) the letter sequence is a \ncomplete word in the lexicon and (2) the fragment sequence uses all the available \ninput strokes. \n\nThe result of this bi-directional search process is illustrated in Figure 4(a)(b), where \nthe five best candidates found are given for no lexicon and a large lexicon. The use \nof a 10,748 word lexicon eliminates meaningless fragment sequences, such as cvre, \nwhich is a reasonable segmentation, but not in the lexicon. The first two candidates \nare the same fragment sequence, found by the two search directions. The third \ncandidate with a 10,748 word dictionary illustrates an alternative segmentation of \nthe correct word. This candidate was identified by a backward search, but not a \nforward search, due to the poor recognition of the first fragment. \n\n3 Evaluation \n\nTo evaluate the system, 10 writers have provided samples of approximately 100 \nwords picked by a random process, biased to better represent uncommon letters. \nTwo kinds of experiments were performed. First, to test the ability of the system to \nlearn a variety of writing styles, the system was tested and trained on distinct sets \nof samples from the same writer. This experiment was repeated 10 times, once for \neach writer. The error rate varied between 1.7% and 15.5%, with a mean of 6.2%, \nwhen employing a database of 10,748 English words. The second experiments tested \nthe ability of the system to recognize handwriting of a writer not represented in the \ntraining set. Here the set of 10 samples were split into two sets, the training set \nof 9 writers with the remaining 1 writer being the test set. The error rate was \nunderstandably higher, varying between 5.2% and 31.1%, with a mean of 10.8%, \nwhen employing a database of 10,748 English words. \n\n\f784 \n\nFlann \n\n4 Summary \n\nThis paper has presented a recognition-based segmentation approach for on-line \ncursive handwriting. The method is very flexible because segmentation is performed \nfollowing exhaustive recognition. Hence, we expect the method to be successful \nwith more natural unconstrained writing, which can include mixed block, cursive \nand disjoint letters, diverse orderings of delayed strokes, overwrites and erasures. \n\nAcknowledgements \n\nThis work was supported by a Utah State University Faculty Grant. Thanks to \nBalaji Allamapatti, Rebecca Rude and Prashanth G Bilagi for code development. \n\nReferences \n\n[Dietterich & Bakiri, 91] Dietterich, T., G. & Bakiri, G. (1991). Error correcting \noutput codes: A general method for improving multiclass inductive learning \nprograms, in Proceedings of the Ninth National Conference on Artificial \nIntelligence, Vol 2, pp 572-577. \n\n[Edelman et. al,90] Edelman S., Tamar F., and Ullman S. (1990). Reading cursive \nhandwriting by alignment of letter prototypes. International Journal of \nComputer Vision, 5:3, 303-331. \n\n[Guerfali & Plamondon, 93] Guerfali W. & Plamondon R. (1993). Normalizing and \nrestoring on-line handwriting. Pattern Recognition, Vol. 26, No.3, pp. 419-\n431. \n\n[Guyon et. ai, 90] Guyon I., Albrecht P., Le Cun Y., Denker J. & Hubbard W. \n(1991). Design of a neural network character recognizer for a touch terminal. \nPattern Recognition, Vol. 24, No.2. pp. 105-119. \n\n[Lang & Waibel, 90] Lang K., J. & Waibel A., H. (1990). A time-delayed neural \nnetwork architecture for isolated word recognition, Neural Networks, Vol 3, \npp 33-43. \n\n[Morasso et. ai, 93] Morasso P., Barberis, S. Pagliano S. & Vergano, D. (1993). \nRecognition experiments of cursive dynamic handwriting with self(cid:173)\norganizing networks. Pattern Recognition, Vol. 26, No.3, pp. 451-460. \n\n[Schenkel et al., 93] Schenkel M., Weissman H., Guyon I., Nohl C., & Henderson D. \n(1993). Recognition-based segmentation of on-line hand-printed words. In \nS. J. Hanson, J. D. Cowan & C. L. Giles (Eds), Advances in Neural Informa(cid:173)\ntion Processing Systems, 5,723-730. San Mateo, CA: Morgan Kaufmann. \n[Schomaker, 1993] Schomaker L. (1993). Using stroke or character based self(cid:173)\norganizing maps in the recognition of on-line connected cursive script. Pat(cid:173)\ntern Recognition, Vol. 26. No.3., pp. 442-450. \n\n[Srihari & Bozinovic, 87] Srihari S. N. & Bozinovic R. M. (1987). A multi-level \nperception approach to reading cursive script. Artificial Intelligence, 33 \n217-255. \n\n[Weissman et. ai, 93] Weissman H., Schenkel M., Guyon I., Nohl C. & Henderson \nD. (1993). Recognition-based segmentation of on-line run-on hand printed \nwords: input vs. output segmentation. Pattern Recognition. \n\n\f", "award": [], "sourceid": 794, "authors": [{"given_name": "Nicholas", "family_name": "Flann", "institution": null}]}