{"title": "Neural Networks that Learn to Discriminate Similar Kanji Characters", "book": "Advances in Neural Information Processing Systems", "page_first": 332, "page_last": 339, "abstract": null, "full_text": "332 \n\nNEURAL NETWORKS THAT LEARN TO \n\nDISCRIMINATE SIMILAR KANJI CHARACTERS \n\nYoshihiro Morl \n\nKazuhiko Yokosawa \n\nATR Auditory and Visual Perception Research Laboratories \n\n2-1-61 Shiromi Higashiku Osaka 540 Japan \n\nABSTRACT \n\nis \n\nto \n\nnetwork \n\nis applied \n\nfeed-forward \n\ntwo new methods are utilized \n\ncharacters. Using \na \nlearning algorithm. a \n\nthe problem of \nb a c k \nthree(cid:173)\n\nA neural network \nrecognizing Kanji \npropagation network \nlayered. \ntrained \nrecognize similar handwritten Kanji characters. \naddition. \ntraining effective. The \nhigher \nanalysis of connection weights showed \nthe hierarchical \nnetworks can discern \nKanji characters. This strategy of \nmakes high \nresults \neffective for Kanji character recognition. \n\nto \nIn \nto make \nrecognition accuracy was \nthat of conventional methods. An \nthat \ntrained \nstructure of \ntrained networks \nrecognition accuracy possible. Our \nvery \n\nthat neural networks \n\nsuggest \n\nthan \n\nare \n\n1 INTRODUCTION \n\nthese networks has been better than \n\nNeural networks are applied to recognition tasks in many fields. \nthe field of letter recognition. net work s \nwith good results. In \nhave been made which recognize hand-written digits \n[Burr 1986] \n1988]. The \nand complex printed Chinese \nthat of \nperformance of \nconventional methods. However. \nstill \nlarge number of \nrudimentary when we consider not only \nin hand-written \nKanji characters. but \ncharacters. We are aiming \nthat \nrecognizes \nin Japan. \nSince it is difficult for a single network to discriminate 3000 \ncharacters. our plan \nby \n\nthe 3000 Kanji characters commonly used \n\nlarge-scale network \n\nthe \ninvolved \n\nthe distortion \n\nto create a \n\nto make a \n\nlarge-scale \n\ncharacters \n\nnetwork \n\nresults \n\nthese \n\n[Ho \n\nare \n\nis \n\n\fNeural Networks that Learn Kanji Characters \n\n333 \n\nassembling many smaller ones that would each be responsible for \nrecognizing only a \n\nsmall number of characters. \n\nThere are two issues concerning implementation of such a \nscale network : the ability of individual networks. and organizing \nthe networks. As a first step. the ability of a small network to \ndiscriminate similar Kanji characters was \ninvestigated. We found \nlearning speed and performance of networks are highly \nthat \ninfluenced by environment \nthe order. number. \nand repetition of training samples). New methods of teaching the \nenvironment are utilized to make \n\nlearning effective. \n\ninstance. \n\nlarge(cid:173)\n\nthe \n\n(for \n\n2 NEW TYPES OF TEACHERS \n\n2.1 PROBLEMS OF BACKPROPAGATION \n\nlearning \n\nalgorithm only \n\nteaches \nThe Backpropagation(BP) \nthe \ncorrect answers [Rumelhart 1986]. BP does not care about \nrecognition rate of each category. If we use ordinary BP \nin a \nif there are both easy and \nsituation of limited resources. and \ndifficult categories to \nlearn in the training set. what happens is \nthat the easier category uses up most of the resources in the early \nstages of training \nthe difficult \ncategory to learn should get more resources. This weakness of BP \nmakes \n\n(Figure 1). Yet. for efficiency. \n\nlearning time longer. \n\nthe \n\nlearning procedures \n\nthe real \nTwo new methods are used \nisolation. \nworld. \nThere is also a learning environment. It is therefore natural. and \neven necessary. \nincorporate \nenvironmental \n\nto avoid \n(human) do not exist \n\nthis problem. In \nin \n\nteaching methods \n\nto devise \n\nfactors. \n\nthat \n\n\f334 \n\nMorl and Yokosawa \n\nSeparation by BP \n\nFigure 1. Easily Learned Category \n\nIdeal Separation \n\nTakes more Resources \n\nEnvironment \n\n(The feature space of samples) \n\nLearning \nProcedur \n\nCategory \n\nBack Propagation \n\nFigure 2. \n\nTwo New methods \n\n\fNeural Networks that Learn Kanji Characters \n\n335 \n\n2.2 FIRST METHOD (REVIEW METHOD) \n\ntracks \n\nis focused on categories \n\nthe performance for each category. At first, \nThis method \nthat are not being recognized \ntraining \nwell. After this, on a more fine-grained level, the error for each \nsample is first checked, and the greater this error, the more often \nthat sample is presented (Figure 2). This leads to a more balanced \nrecognition over the categories. \n\n2.3 SECOND METHOD (PREPARATION METHOD) \n\nto \nThe second method, designed \nincrease \nthe network's \ntotal error rate is observed to fall below a certain value (Figure \n2). \n\nto prevent over-training, \n\ntraining samples when \n\nthe number of \n\nis \n\n3 RECOGNITION EXPERIMENT \n\n3.1 INPUT PATTERN AND NETWORK STRUCTURE \n\ntraining samples \n\nfor \n\nthis network were chosen \n\nthere are 200 handwritten \n\nKanji characters are composed of sub-characters called radicals \n(Figure 3). The four Kanji characters used in our experiment are \nshown in Figure 4. These characters are all combinations of two \nkinds of left radicals and \ntwo kinds of right radicals. Visually, \nthese characters are similar and hence difficult \nto discriminate. \nThe \nfrom a \ndatabase of about 3000 Kanji characters [Saito 1985]. For each \ncharacter, \nfrom different \nwriters. 100 are used as training samples, and the remaining 100 \nare used to test recognition accuracy of the trained network. All \nsamples in the database consist of 64 by 63 dots. If we were to use \nthis pattern as the input to our neural net, the number of units \nrequired \nthe \nfor \ncomputational abilities of current computers. Therefore, \ntwo \nkinds of feature vectors extracted from handwritten patterns are \nthe \"MESH \nused as \nfeature \", there are 64 dimensions computing the density of the 8 \ninto which handwritten samples are divided. \nby 8 small squares \n[Hagita 1983], there are 256 \nIn \nalong \ndimensions computing \nsegment \nfour \nin the same \ndirections -\n\nlength \nhorizontal, vertical, and two diagonals -\n\ninput. In one of the feature vectors, \n\nthe \"LDCD feature\" \n\nlayer would be \n\nthe other, \n\ntoo \n\nlarge \n\nin \n\nthe \n\ninput \n\nsamples \n\nthe \n\na \n\nline \n\n\f336 \n\nMorl and Yokosawa \n\nthis experiment, we use a feed-forward neural \nsmall squares. In \nnetwork with three \nan input layer. a hidden layer and \nan output layer - . Each unit of the input layer is connected to all \nunits of the hidden layer, and each unit of the hidden layer is \nconnected to all units of the output layer. \n\nlayers -\n\nKanji for \"Theory\" \n\n'-\" \n-- -o fIB \nI,,' \n,- - - ---,: \n:fIB-: \n, ..... ' \n, ........ , \n: 0: \n\n.. - - - _ .. \n, \n, \n, \n, \n, \n, \n\n.. ____ 4 \n\n........ \n\n, \nI \n\nI \n\n' \n\nI \n\nI \n\nF \n\nI \n, \n\n'- - ... --' \n\nLeft radical Right radical \nFig. 3 Concept of Radical \n\nFig. 4 Example of Kanji Characters \n\n... \n\n5 11 \n\n0 > 8 \n\n0 \n\n~ \n\n/' t \" 1 \n:61 \n\nhorizontal \ncomponent \n\nFigure 5. LOCO Feature \n\n3.2 RECOGNITION RESULTS (MESH VS. LDCD) \n\nAverage recognition rates when the MESH feature was used were \n98.5% \ntesting samples. \nAverage recognition rates when the LOCO feature was used were \n\nsamples and 82.5% \n\ntraining \n\nfor \n\nfor \n\n\fNeural Networks that Learn Kanji Characters \n\n337 \n\n99.5 % for training samples and 92.0% for testing samples. These \nrecognItion \nfor \nconventional methods we used. \n\nfor neural networks were higher \n\nrates \n\nthan \n\n3.3 Recognition Rate & \n\nthe Number of Samples \n\ntraining \n\nincreased \n\nthe number of \n\nWe gradually \nto \ninvestigate the influence of this number on the recognition rate \nof testing samples. Figure 6 shows the recognition rate of testing \nsamples for ten different amounts of training samples. When the \nnumber of training samples are 2 and 3, the recognition rates are \nlower than for 1 training sample. This result is probably due to \nthe fact that the second samples in each set are not well-written. \nThis result means that an average pattern should be used in the \nearly \n\nsamples \n\ntraining period. \n100 \n\n__ -cr--- CI \n\n-u \nQ) .. .. 0 \n0 -c \nQ) u .. Q) \n\nD. \n\n90 \n\n80 \n\n70 \n\n60 \n\n50 \n\n40 1-----~~~--------~------------------., \n100 \n\n10 \n\n1 \n\nSamples I Kanji Category \n\nFigure 6. Recognition Rate and the Number of Training Samples \n\n3.4 ANALYSIS OF INNER REPRESENTATION \n\n3.4.1 Weights vs. Difference Between Averaged Samples \n\nTo investigate how this neural network learns to solve the given \ntask, the weights vector from the input layer to each hidden unit \nis compared \nthe difference between averaged samples with a \ncommon radical. Since the four Kanji characters in this task are \nall combinations of two kinds of left radicals and two kinds of \ntake charge of left and \nright radicals, \nright \nenough \naccomplish \n\ntwo hidden units which \n\nrespectively, \n\nradicals, \n\nare \n\nto \n\nto \n\n\f338 \n\nMorl and Yokosawa \n\nthen \n\nthis produced \n\ntwo averaged patterns. These \npattern \n\nrecogmtIon. At first, 200 samples with the same left radical were \naveraged. Since there are just two left radicals in the four Kanji \ntwo \ncharacters, \nsubtracted, yielding \npatterns were \nthat \na \ncorresponds to the difference between the two \nleft radicals. The \nsame method was used to obtain a pattern that corresponds to the \nfor each of \ndifference between \nthese patterns, the correlation coefficient with \nthe weights from \nthe input-layer to each hidden unit is calculated. The pattern for \nleft \nradicals was very highly correlated with hidden unit 1 \n(R=0.71,p<0.01), and not correlated with hidden unit 2. On the \nother hand, \nradicals was very highly \ncorrelated with hidden unit 2 (R=0.79,p<0.01), and not correlated \nwith hidden unit 1. \nis \ndiscriminating among radicals of one particular side of the Kanji \ncharacters. \n\nIn other words, each hidden unit \n\nradicals. Then, \n\nthe pattern \n\nright \n\nthe \n\ntwo \n\nright \n\nfor \n\n3.4.2 Weights vs. Bayse Discrimination \n\nThe bayse method \ndistribution of the categories \ndistribution of categories in this task \nthe \nequal, \ndiscrimination function becomes first order as given below. \n\nis used as a discrimination function when the \nthe \nis normal distribution and \ncategory \nthe \n\ncovariance matrix of \n\nis known. Supposing \n\neach \n\nthat \n\nis \n\nf(X) \nL \nJ,ll \nJ,l r \nX \nc \n\n= \n\n(J,ll - J,lr)t L X + c \n\n(1) \n\nCovariance Matrix with the same radical \nAverage Vector with the same left radical \nAverage Vector with the same right radical \nInput Feature Vector \nConstant \n\nThe input vector to the input layer is translated to a hidden unit \nas follows. \n\ny \ny \nX \nW \na \n\n= \n\nwx+ a \n\n(2) \n\nInput Sum \nInput Feature Vector \nWeights Matrix from Input Layer to a Hidden Unit \nThreshold \n\nEquation (2) \nstrategy similar \n\nis similar to equation (1). If the network uses a \nthere should be some \n\nto bayse discrimination, \n\n\fNeural Networks that Learn Kanji Characters \n\n339 \n\nbayse weights \n\n(J.l.1 - J.l.r)t L in equation (1) \ncorrelation between \nin equation (2). When the correlation coefficient between \nand W \nto each \nbayse weights and \nhidden unit was calculated, \nthere was no significant correlation \nbetween them (R=0.02,p>0.05). In other words, the network does \nnot use a strategy like bayse discrimination. \n\nthe weights from \n\ninput \n\nthe \n\nlayer \n\n4 CONCLUSION \n\nthis \n\nto make \n\nthe surrounding environment. With \n\nis \nFor this experiment, we observed that the learning procedure \nin \ninfluenced by \nfact \ntraining within a \nmind, new methods were proposed \nlearning process more effective. These methods \nlead to balanced \nrecognition rates over categories. The most important result from \nthis experiment is that a network trained with BP can perceive \nthat Kanji characters are composed of radicals. Based on \nthis \nit is possible to estimate the number of units required for \nability, \nthe hidden-layer of a network. Such a network could then fonn \nthe building block of a \ncapable of \nrecognizing as many as the 3000 Kanji characters commonly used \nin Japan. \n\nlarge-scale network \n\nAcknowledgments \n\nare grateful \n\nto Dr. Michio Umeda for his support and \nWe \nencouragement. Special \nideas he \nprovided in our many discussions and for his help in developing \nsimulation programs. \n\nto Kazuki Joe for \n\nthanks \n\nthe \n\nReference \n\nD.J.Burr,\"A Neural Network Digit Recognizer\", \n\nA.Ho and W.Furmanski,\"Pattern Recognition \n\n[Burr 1986] \nIEEE-SMC,1621-1625,1986. \n[Ho 1988] \nby Neural Network Model on Hypercubes\" ,HCCA3-528 \n[Rumelhart 1986] D.E.Rumelhart \nProcessing\",vol.l,The MIT Press,1986. \n[Saito 1985] \nBase ETL9 of Handprinted Characters \nand Its Analysis\" ,J68-D.4,757-764,1985 \n[Hagita 1983] \nof \nHandprinted Chinese Characters by Global and Local Direction \nContributivity Density-Feature\", J66-D,6,722-729,1983 \n\nthe Data \nin JIS Chinese Characters \n\nN .Hagi ta,S. N aito,I. Masuda, \"Recogni tion \n\nT.Saito,H.Yamada,K.Yamamoto,\"On \n\naI, \"Parallel Distributed \n\net \n\n\f", "award": [], "sourceid": 158, "authors": [{"given_name": "Yoshihiro", "family_name": "Mori", "institution": null}, {"given_name": "Kazuhiko", "family_name": "Yokosawa", "institution": null}]}