{"title": "Adaptive Development of Connectionist Decoders for Complex Error-Correcting Codes", "book": "Advances in Neural Information Processing Systems", "page_first": 691, "page_last": 697, "abstract": null, "full_text": "Adaptive Development of Connectionist Decoders \n\nfor Complex Error-Correcting Codes \n\nSheri L. Gish Mario Blalull \n\nIBM Rf'search Division \n\nAlmaden Research Center \n\n650 Harry Road \n\nSan Jose, C A 95120 \n\nAbstract \n\n\\Ve present. an approach for df'velopment of a decoder for any complex \nbinary error-correct.ing code- (ECC) via training from examples of decoded \nreceived words. Our decoder is a connectionist architecture. We describe \ntwo sepa.rate solutions: A system-level solution (the Cascaded Networks \nDecoder); and the ECC-Enhanced Decoder, a solution which simplifies \nthe mapping problem which must be solved for decoding. Although both \nsolutions meet our basic approach constraint for simplicity and compact(cid:173)\nness. only the ECC-Enhanced Decoder meet.s our second basic constraint \nof being a generic solution. \n\n1 \n\nINTRODUCTION \n\n1.1 THE DECODING PROBLEM \n\nAn error-correcting code (ECC) is used to identify and correct errors in a received \nbinary vector which is possibly corrupted clue to transmission across a noisy channel. \nIn order to use a selected error-correcting code. the information bits, or the bits \ncontaining t.he message. are tllCOdid int.o a valid ECC codeword by the addition of \na set of f'xtra hits, the redulldallcy, detf'fmined by tlw properties of the selected \nECC. To decode a received word. there is a pre-processing step first in which a \nsyndrome is calculated from the word. The syndrome is a vector whose length is \nequal t.o the redundancy. If the syndrome is the all-zero vector, then the received \n691 \n\n\f692 \n\nGish and Blaum \n\nword is a valid codeword (no errors). The non-zero syndromes have a one-to-one \nrelationship wit.h t.he error vectors provided the number of errors does not exceed \nthe error-COlTect ing capability of the ('Ode. \n(An error wctor is a binary vector \nequal in length to an ECC codeword with the error positions having a value of 1 \nwhile the rest of t.1lf' positions have the value 0). The decoding process is defined as \nthe mapping of a syndrome to it.s associat.ed error vector. Once an error vector is \nfound, the correct,ed codeword can be calculated by XORillg the error vector with \nthe received word. For more background in error-correct.ing codes , the reader is \nreferred to any book in the field, such as [2, 9] . \n\nECC's differ in the number of errors which they can correct and also in the distance \n(measured as a Hamming distance in codespace) which can be recognized between \ntllP received word and a t.rue code\\vord . Codes which can correct. more errors and \ncover greater distances are considered more powerful. However, in practice the \ndifficulty of developing an efficient. decoder 'which can correct many errors prevents \nthe use of most ECC's in the solut.ion of real world problems. Although decoding \ncan be done for any ECC via lookup tahle, this method quickly becomes intractable \nas the length of codewords and the numher of errors possibly corrected increase. \nDevdopment of an efficient. decoder for a part.icular ECC is not straightforward. \nMoreover, it was shown that decoding of a random code is an NP-hard problem [1, 4]. \nThe purpose of our work is to develop an ECC decoder using the trainable machine \nparadigm; i.e. we develop a decoder via training using examples of decoded received \nwords. To prove our collcept, we have selected a binary hlock code, the (23,12,7) \nGolay Code, v.'hich has \"real world\" complexity. The Golay Code corrects up to 3 \nerrors and has minimum distance 7. A Golay codeword is 23 bits long (12 infor(cid:173)\nmat.ion hits, 11 bit redundancy); the syndrome is 11 bits long. There exist many \nefficient. decoding methods for the Golay code [2, 3, 9], but t.he code complexity \nrepresents quite a challenge for our proposed approach. \n\n1.2 A CONNECTIONIST ECC DECODER \n\n\\Ve use a connect.ionist archit.ecture as our ECC decoder; the input is a syndrome \n(we assume that the straight.forward step of syndrome calculation is pre-processing) \nand the output is the port.ion of t.he error vector conesponding to the information \nbits in the received word (we ignore the redundancy). The primary reason for our \nchoice of a connect.ionist. architecturE' is its inherent simplicity and compactness; \na connectionist. archit.ecture solut.ion is readily implemented in either hardware or \nsoftware solutions to complex real world problems. The particular architecture we \nuse is t.he multi-layer feedforward network with one hidclf'n layer. There are full \nconnections only between adja.cent layers. The number of nodes in the input layer \nis the number of bit.s in the syndrome, and t.he number of nodes in the output layer \nis the number ofinformat.ion bit.:; in t.he ECC' codeword. Tlw number of nodes in the \nhidden layer is a free parameter, but typically this number is no more than 1 or 2 \nnodes great.f'l' t.han the number of nodes in t.he input. layer. Our activation function \nis t.he logistic funct.ion and our t.raining algorit.hm is backpropaga.tion (see [10] for a \ndesniption of both) . This architectural approach has been demonst.rated to be both \ncost-effective and a superior performer compared to classical stat.istical alternative \nmethods in t.he solut.ion of complex mapping prohlems when it is used as a trainable \npattern classifier [6, 7]. \n\n\fAdaptive Development of Connectionist Decoders for Complex Error-Correcting Codes \n\n693 \n\nThere are two basic constraints which we have placed on our trainable connectionist \ndecoder. First, the final connectionist archit.ect ure must be simple and contain as \nfew nodes as possible. Second, the method we u::;e to develop our decoder must be \nable to be generalized to any binary ECC. To meet the second constraint, we insured \nt.hat t.he training uat.aset. cont.ained only examples of decoded words (i.e. no a priori \nknowledge of code patterning or exist.ing decoding algorithms was included), and \nalso that the training dataset was a.<; small a subset of t.he possible error vectors as \nwas required to obtain generalization by trained net.works . \n\n2 RESULTS \n\n2.1 THE CASCADED NETWORKS DECODER \n\nUsing our basic approach, we have developed two separate solutions. One, the \nCascaded Networks Decoder (see Figure 1) a systf'm-If'vf'l solution which parses \nt.he decoding problem into a set of more t.ractable problems each addressed by a \nseparate network. These smaller networks each solve f'ither simple classification \nproblems (binary decisions) or are specialized decoders. Performance of the Cas(cid:173)\nca.ded Net.works Df'coder is 95% correct. for t.he Gola.y code (test.ed on all 211 possible \nerror \\\"ect.ors). and the whole system is small and compact. How~ver, this solution \ndoes not meet our const.raint. that t.he solution method bf' gf'lleric since the parsing \nof thf' original prohlem does rf'quire t:'ome a priori knowledge about. the ECC, and \nt.he training of each network is dOHt' 011 a separate, self-contained schedule. \n\n2.2 THE ECC-ENHANCED DECODER \n\nThe approach taken by the Cascaded Networks Decoder simplifies the solution \nstrategy of the decoding problem, while the E('('-Enhancpd Decoder simplifies the \nmapping problem to he solved by tlw decoder. \nIn the ECC-Enhanced Decoder, \nboth the input syndrome and the out.put f'rJ\"or vector art' encoded as codewords \nof an EC(,. Such f'ncoding should serye to sf'parat.e tIlt' inputs in input space and \nthe outputs in out.put. space , creating a \"region-to-rpgion\" mapping which is much \neasier t.han t.he \"point-to-point\" ma.pping required without. encoding [8]. In addition, \nthe decoding of t.he network output. compensates for some level of uncertainty in \nthe network's performance; an output vector within a small dista.nce of the target \nvector will be corrected to the actual target by the ECC. This enhances training \nprocedures [.5, 8]. \n\n\\Ve have founu that t.he ECC-Enhanced Decoder method meets all of our constraints \nfor a connect.ionist architecture. However, we also have found that choosing the best \nECC for encoding the input. and for encoding the output. represent.s two critical and \nquite separate problems which must he soh\u00b7ed in order for the method to succeed. \n\n2.2.1 Choosing the Input ECC Encoding \n\nThe goal for the chosen ECC int.o which t.he input is encoded is to achieve maximum \nsepal'ation of input patterns in code spacE'. The major constraint is the size of the \ncodeword (number of bits which thf' lengt.h of the redundancy must be), because \nlonger codewords increase the complexit.y of training and the size (in number of \n\n\f694 \n\nGish and Blaum \n\nERROR VECTOR \n\n12 BITS \n\n' . \n\n: \n\n,': : \n\nSYNDROME ~~ \n11 BITS \n\nFigure 1: Cascaded Networks Decoder. A system-level solution incorporating 5 \ncasca.ded lleural networks. \n\nnodes) of the connectionist architecturf'. To det.ermine the effect of different types \nof ECC's on the separation of input patterns in code space, we constructed a 325 \npattern training dataset (mapping 11 bit. syndrome to 12 bit error vector) and \nencoded only the inputs using 4 different ECC's. The candidate ECC's (with the \nsize of redundancy required to encode t.he 11 bit syndrome) were \n\n\u2022 Hamming (bit level, 4 bit. redundancy) \n\u2022 Extended Ha.mming (bit. level, !) bit rpclundancy) \n\u2022 Reed Solomon (4 bit byt.f' level. 2 byt~ ff\"!dundancy) \n\u2022 Fire (bit level, 11 bit redundancy) \n\n\\Ve t.rained 5 networks (1 with no encoding of input. 1 each with a different ECC \nencoding) using this training elataset. Empirically, we had determined that this \ntraining dataset. is slightly t.oo small to achieve generalization for this task; we \ntrained each net\\\"wrk until its performance level on a 435 pattern test dataset (dif(cid:173)\nferellt patterns from the training dataset but. encoded identically) degraded 20%. \n\\Ve then analyzed the effect of the input encoding on the patterning of error posi(cid:173)\ntions we observed for the output. vectors. \n\n\fAdaptive Development of Connectionist Decoders for Complex Error-Correcting Codes \n\n695 \n\nThe ff'suHs of our analysis iUp illustrat.t'd in Figures 2 and 3. These bar graphs \nlook only at. out.put vect.ors found t.o haH' 2 or more errors, a.nd show the proximity \nof error positions within an output vector. Each bar corre:sponds to the maximum \ndistancp of error positions within a vector (adjacent posit ions have a distance of \n1). The bar height. represent.s t.he total frf'quency of vect.ors with a given maximum \ndistance; each bar is color-coded to break down t.he frequt' llcy by total number of \nerrors per vect.or. This type of measurt'ment. shows the degree of burst (clustering of \nerror posit.ions) in t he errors; knowing \\'\\\u00b7het.her or not one has burst errors influences \nt.he likf'lihood of correct.ion of those errors by an ECC (for instance, Fire codes are \nburst correcting codes). \n\n~~--------------------------~ \n\nJ .. \n\n.t \n\n.. \n\n\u2022 \no..tacc. \n\n2 Enors 11113 En ... B4 orr'n \n\n\u2022 \n\n.. \n\n\" \n\no..laDC \u2022 \n\n.2En'\" .lEn ... 1:m4.rron Os ...... \n\n10 \n\n.. \n\nFigUl'e 2: Bar Gl'aphs of Out.put Errors Made hy tllf' Decoder. There was no \nencoding of t.he illPut in this instance. Training datasd results are on left, test \ndataset. rf'Sult.s are on right. \n\nOur aualy:sis shows t.hat. t.he Reed Solomon ECC is t.he only input encoding which \nseparat.ed t.he input pat.terns in a way which mack liSe of an output pa.ttern ECC \nencoding effect.ive (result.ed ill more burst-type errors, decreased the total number of \nerror positions in output wctors which had errors). The J 1 bit redundancy required \nby the Fire code for input encoding increased complexity so that this solution was \nworse t.han t.llf' others in terms of performance. Thus, \\V(' have chosen the Reed \nSolomon ECC for input. encoding in our ECC-Enhanced Decoder. \n\n2.2.2 Choosing the Output ECC Encoding \n\nTllf' goal for t.ht' chosell ECC into which t.he out.put is encoded is correction of \nthe maximum I1llml)f'r of errors made by the decoder. Like t.he constraint imposed \non the chosen ECC for input encoding, the ECC select.ell for encoding the output \n\n\f696 \n\nGish and Blaum \n\n~r---------------------------~ \n\n.. \n\nJ ,. \n\nIl: \n\n. \n\nDUtaoce \n\n.2En ... IlbEnon ~4 ... or. \n\n10 \n\n11 \n\n\u2022 \n\n10 \n\nII \n\n.ZErron 111113Err ... m4 ...... Os ...... \n\n~e \n\n~~--------------------------~ \n\n& \n\nI)iollDCe \n\n, \n\n10 \n\nII \n\n.2Enon II1II3 Err ... t::\u00a3I4 ...... \n\nFigur{~ 3: Bar C.;raphs of Effects of Different ECC Input Encodings on Output Errors \nMade by the Decoder. Training dataset results are 011 left, test dataset results are on \nright. Top row is Hamming cod(=' encoding. bottom row is Reed Solomon encoding. \n\nshould add as small a redundancy as possible. However, thne is another even more \nimport.ant constraint on t.he choice of ECC for output. encoding: decoding simplicity. \nThe major advant.age gained from encoding t.he out.put is the correction of slight \nuncert.ainty in the performance of the decoder, and t.his advantage is gained after \nthe out.put is decoded. Thus, any ECC selected for output encoding should be one \nwhich can be decoded efficiently. \n\nThe f'rror separat.ion results we gained from our analysis of the effects of input \nencoding were used t.o guide our choices for an ECC into which the output would \nbe encoded . \\Ve chose our ECC from the 4 candidat.es we considered for the input \n(these ECC's all can he decoded efficiently). The ff~dundancy cost for encoding a \n12 bit. error vector was t.he same as in t.he 11 bit. input case for t.he Reed Solomon \nand Fire codes, but. was increased by 1 bit. for the Hamming codes. Based on the \nresult. t.hat. a Reed Solomon encoding of t.he input both increased the amount of \n\n\fAdaptive Development of Connectionist Decoders for Complex Error-Correcting Codes \n\n697 \n\nburst errors and decreased the total number of errors per output vector, we chose \nthe Hamming cod~ and t.he Fire code for our output encoding ECC. Both encodings \nyielded excellent performance on the Golay code decoding problem; the Fire code \noutput encoding result.ed in better generalization by the network and thus better \nperformallce (87% correct) t.han the Hamming code output encoding (84% correct). \n\nReferences \n\n[1] E. R. Berlekarnp, R. J. McEliece and H. C. A. van Tilborg, \"On the Inherent \nIntractability of Certain Coding Problems,\" IEEE Trans. on In/. Theory, Vol. \nIT-8. pp. 384-:386. May 1978. \n\n[2] R. E. Blahut, Thwr.1J and Practice of Error COlltrol Codes, Addison-Wesley, \n\n1983. \n\n[3] M. Blaum and J. Bruck, \"Decoding the Golay Code with Venn Diagrams,\" \n\nIEEE TrailS . 011 Illf. Theor.lJ, Vol. IT-:3G, pp. 906-910, July 1990. \n\n[4] .J. Bruck and M. Naor, \"The Hardness of Decoding Linear Codes with Prepro(cid:173)\ncessing,\" IEEE Tr'a 11 S. 011 In/. Thwr./j , Vol. IT-36, pp. 381-385, March 1990. \n\n[5) T. G. Dietterich anel G. Bakiri, \"Error-Correcting Out.put Codes: A General \nMet.hod for Improving Mult.idass Inductive Learning Programs,\" Oregon State \nUniversity Computer Science TR 91-30-2, 1991. \n\n[6] S. L. Gish and \"V . E. Blanz, \"Comparing a Connect.ionist Trainable Classifier \nwith Classical Statistical Decision Analysis Methods ,\" IBM Research Report \nRJ 6891 (65717), June 1989. \n\n[7] S. L. Gish and 'V. E. Blanz, \"Comparing the Performance of a Connection(cid:173)\nist. and St.at.istical Classifiers on an Image Segmentation Problem,\" in D. S. \nTouret.zky (eel) NfuralIlIformation ProCfssing ,,)'yste1Jls 2, pp. 614-621, Mor(cid:173)\ngan Kaufmann Publishers, 1990. \n\n[8] H. Li, T. Kronaneler and I. Ingemarsson, \"A Pattern Classifier Integrating \nMultilayer Percept.ron and Error-Correcting Code,\" in Proceedings of the \nIAPR \\Vorkshop on Machine Vision Applications, pp. 113-116. Tokyo, Novem(cid:173)\nber 1990. \n\n[9] F. J. Mac\\Villiams and N. J. A. Sloane, The Theory of Error-Correcting Codes, \n\nAmst.erdam. The Netherlallds: North-Holland, 1977. \n\n[10] D. E. Rumelhart, G. E. Hinton, and R . .J. \\\\,illiams, \"Learning Internal Rep(cid:173)\nresent.ations hy Error Propagation,\" in D. E. Rumelhart, J . L. McClelland et. \nal. (eds) Parallel Distributed Procc.';sing Vol. 1, Chaptf'f 8, MIT Press, 1986. \n\n\f", "award": [], "sourceid": 519, "authors": [{"given_name": "Sheri", "family_name": "Gish", "institution": null}, {"given_name": "Mario", "family_name": "Blaum", "institution": null}]}~~