{"title": "Regular and Irregular Gallager-zype Error-Correcting Codes", "book": "Advances in Neural Information Processing Systems", "page_first": 272, "page_last": 278, "abstract": null, "full_text": "Regular and Irregular Gallager-type \n\nError-Correcting Codes \n\nY. Kabashirna and T. Murayarna \n\nDept. of Compt. IntI. & Syst. Sci. \n\nTokyo Institute of Technology \n\nYokohama 2268502, Japan \n\nD. Saad and R. Vicente \n\nNeural Computing Research Group \n\nAston University \n\nBirmingham B4 7ET, UK \n\nAbstract \n\nThe performance of regular and irregular Gallager-type error(cid:173)\ncorrecting code is investigated via methods of statistical physics. \nThe transmitted codeword comprises products of the original mes(cid:173)\nsage bits selected by two randomly-constructed sparse matrices; \nthe number of non-zero row/column elements in these matrices \nconstitutes a family of codes. We show that Shannon's channel \ncapacity may be saturated in equilibrium for many of the regular \ncodes while slightly lower performance is obtained for others which \nmay be of higher practical relevance. Decoding aspects are con(cid:173)\nsidered by employing the TAP approach which is identical to the \ncommonly used belief-propagation-based decoding. We show that \nirregular codes may saturate Shannon's capacity but with improved \ndynamical properties. \n\n1 \n\nIntroduction \n\nThe ever increasing information transmission in the modern world is based on re(cid:173)\nliably communicating messages through noisy transmission channels; these can be \ntelephone lines, deep space, magnetic storing media etc. Error-correcting codes play \na significant role in correcting errors incurred during transmission; this is carried out \nby encoding the message prior to transmission and decoding the corrupted received \ncode-word for retrieving the original message. \nIn his ground breaking papers, Shannon[l] analyzed the capacity of communication \nchannels, setting an upper bound to the achievable noise-correction capability of \ncodes, given their code (or symbol) rate, constituted by the ratio between the num(cid:173)\nber of bits in the original message and the transmitted code-word. Shannon's bound \nis non-constructive and does not provide a recipe for devising optimal codes. The \nquest for more efficient codes, in the hope of saturating the bound set by Shannon, \nhas been going on ever since, providing many useful but sub-optimal codes. \n\nOne family of codes, presented originally by Gallager[2]' attracted significant inter(cid:173)\nest recently as it has been shown to outperform most currently used techniques[3]. \nGallager-type codes are characterized by several parameters, the choice of which \ndefines a particular member of this family of codes. Current theoretical results[3] \n\n\fRegular and Irregular Gallager-type Error-Correcting Codes \n\n273 \n\noffer only bounds on the error probability of various architectures, proving the ex(cid:173)\nistence of very good codes under some restrictions; decoding issues are examined \nvia numerical simulations. \nIn this paper we analyze the typical performance of Gallager-type codes for several \nparameter choices via methods of statistical mechanics. We then validate the an(cid:173)\nalytical solution by comparing the results to those obtained by the TAP approach \nand via numerical methods. \n\n2 The general framework \n\nIn a general scenario, a message represented by an N dimensional Boolean vector \ne is encoded to the M dimensional vector JO which is transmitted through a noisy \nchannel with some flipping probability p per bit (other noise types may also be \nstudied). The received message J is then decoded to retrieve the original message. \nIn this paper we analyze a slightly different version of Gallager-type codes termed \nthe MN code[3] that is based on choosing two randomly-selected sparse matrices A \nand B of dimensionality M x N and M x M respectively; these are characterized \nby K and L non-zero unit elements per row and C and L per column respectively. \nThe finite numbers K, C and L define a particular code; both matrices are known \nto both sender and receiver. Encoding is carried out by constructing the modulo \n2 inverse of B and the matrix B- 1 A (mod 2); the vector JO = B- 1 A e (mod 2, e \nBoolean vector) constitutes the codeword. Decoding is carried out by taking the \nproduct of the matrix B and the received message J = JO +( (mod 2), corrupted \nby the Boolean noise vector (, resulting in Ae + B (. The equation \n\nAe + B( = AS + B'T (mod 2) \n\n(1) \n\nis solved via the iterative methods of Belief Propagation (BP)[3] to obtain the most \nprobable Boolean vectors Sand 'T; BP methods in the context of error-correcting \ncodes have recently been shown to be identical to a TAP[4] based solution of a \nsimilar physical system[5]. \nThe similarity between error-correcting codes of this type and Ising spin systems \nwas first pointed out by Sourlas[6], who formulated the mapping of a simpler code, \nsomewhat similar to the one presented here, onto an Ising spin system Hamiltonian. \nWe recently extended the work of Sourlas, that focused on extensively connected \nsystems, to the finite connectivity case[5] as well as to the case of MN codes [7]. \n\nTo facilitate the current investigation we first map the problem to that of an Ising \nmodel with finite connectivity. We employ the binary representation (\u00b11) of the \ndynamical variables Sand 'T and of the vectors J and JO rather than the Boolean \n(0,1) one; the vector JO is generated by taking products of the relevant binary \nmessage bits J2 = TIiE/.' ~i' where the indices J.L = (h, ... iK) correspond to the \nnon-zero elements of B-1 A, producing a binary version of JO. As we use statistical \nmechanics techniques, we consider the message and codeword dimensionality (N \nand M respectively) to be infinite, keeping the ratio between them R = N 1M, \nwhich constitutes the code rate, finite. Using the thermodynamic limit is quite \nnatural as Gallager-type codes are usually used for transmitting long (104 - 105) \nmessages, where finite size corrections are likely to be negligible. To explore the \nsystem's capabilities we examine the Hamiltonian \n\n\f274 \n\nY. Kabashima, T. Murayama, D. Saad and R. Vicente \n\nThe tensor product DlJ.uJ,.J.{Tl where JlJ.u = TIiEIJ. ~i TIjEu (j and u = (jl,'\" iL), is \nthe binary equivalent of Ae + B(, treating both signal (8 and index i) and noise \n(7\" and index j) simultaneously. Elements of the sparse connectivity tensor D IJ.U \ntake the value 1 if the corresponding indices of both signal and noise are chosen \n(Le., if all corresponding indices of the matrices A and Bare 1) and 0 otherwise; \nit has C unit elements per i-index and L per j-index representing the system's \ndegree of connectivity. The f> function provides 1 if the selected sites' product \nTIiEIJ. Si TIjEu Tj is in disagreement with the corresponding element JIJ.U, recording \nan error, and 0 otherwise. Notice that this term is not frustrated, as there are \nM +N degrees of freedom and only M constraints from Eq.(l), and can therefore \nvanish at sufficiently low temperatures. The last two terms on the right represent \nour prior knowledge in the case of sparse or biased messages Fs and of the noise \nlevel Fr and require assigning certain values to these additive fields. The choice of \nf3 -+ 00 imposes the restriction of Eq.(l), limiting the solutions to those for which \nthe first term of Eq.(2) vanishes, while the last two terms, scaled with f3, survive. \nNote that the noise dynamical variables 7\" are irrelevant to measuring the retrieval \nsuccess m = Jr (~~1 ~i sign (Si)!3 ) ~ . The latter monitors the normalized mean \noverlap between the Bayes-optimal retrieved message, shown to correspond to the \nalignment of (Si)!3 to the nearest binary value[6], and the original message; the \nsubscript f3 denotes thermal averaging. \nSince the first part of Eq.(2) is invariant under the map Si -+ Si~i, Tj -+ Tj(j and \nJIJ.U -+ JIJ.U TIiEIJ. ~i TIjEu (j = 1, it is useful to decouple the correlation between the \nvectors 8, 7\" and e, (. Rewriting Eq.(2) one obtains a similar expression apart from \nthe last terms on the right which become Fs / f3 L:k Sk ~k and Fr / f3 ~k Tk (k. \nThe random selection of elements in D introduces disorder to the system which \nis treated via methods of statistical physics. More specifically, we calculate the \npartition function Z(D, J) = Tr{8,7\"} exp[-f31i] averaged over the disorder and the \nstatistical properties of the message and noise, using the replica method[5, 8, 9]. \nTaking f3 -+ 00 gives rise to a set of order parameters \nq\"\",(3 \u2022..\u2022 \"Y = (~ tZi Sf Sf, .. ,S7) \n\nT\"\".(3, .. ,\"Y = (~ ty; rj rf, .. ,r?) \n\n.=1 \n\n(3400 \n\n.=1 \n\n(3400 \n\n(2) \nwhere a, f3, .. represent replica indices, and the variables Zi and 1j come from \nenforcing the restriction of C and L connections per index respectively[5]: \n\nf> ( \"D . . \n\n. \n\n \n\nL \n\n( . \n'2 ,\u00b7\u00b7 ,'tK \n\n. ) \n\n- c) = i 21T dZ ZL: h .... i K f-(C+l) \n\n2 \n7r \n\n0 \n\n' \n\n(3) \n\nand similarly for the restriction on the j indices. \nTo proceed with the calculation one has to make an assumption about the order \nparameters symmetry. The assumption made here, and validated later on, is that \nof replica symmetry in the following representation of the order parameters and the \nrelated conjugate variables \n\nQa,!3 .. -y \n\naq / dx 7r(X) xl , Qa,!3 .. -y = aq- / dx 1?(x) Xl \n\n(4) \n\nra,!3 .. -y \n\nar / dy p(y) yl , r a,!3 .. -y = a; / dy p(Y) yl , \n\nwhere l is the number of replica indices, a. are normalization coefficients, and \n7r(x) , 1?(x) , p(y) and p(Y) represent probability distributions. Unspecified integrals \n\n\fRegular and Irregular Gallager-type Error-Correcting Codes \n\n275 \n\nare over the range [-1, + 1]. One then obtains an expression for the free energy \nper spin expressed in terms of these probability distributions liN (In Z)~,(,'D The \nfree energy can then be calculated via the saddle point method. Solving the \nequations obtained by varying the free energy w.r.t the probability distributions \n1T(X), 1?(x), p(y) and p(y), is difficult as they generally comprise both delta peaks \nand regular[9] solutions for the ferromagnetic and paramagnetic phases (there is no \nspin-glass solution here as the system is not frustrated). The solutions obtained \nin the case of unbiased messages (the most interesting case as most messages are \ncompressed prior to transmission) are for the ferromagnetic phase: \n\n1T(X) = 8(x - 1) , 1?(x) = 8(x - 1) , p(y) = 8(y - 1) , p(Y) = 8(Y - 1), \n\n(5) \n\nand for the paramagnetic phase: \n\n1T(X) \n\np(y) \n\n8(x) , 1?(x) = 8(x) , p(Y) = 8(Y) \n1 + tanh Fr r( _ \n\nh F ) \n\nu y \n\ntan \n\nr + \n\n2 \n\n1 - tanh Fr r( \n\nu Y + tan r \u00b7 \n\nh F ) \n\n2 \n\n(6) \n\nThese solutions obey the saddle point equations. However, it is unclear whether \nthe contribution of other delta peaks or of an additional continuous solution will be \nsignificant and whether the solutions (5) and (6) are stable or not. In addition, it \nis also necessary to validate the replica symmetric ansatz itself. To address these \nquestions we obtained solutions to the system described by the Hamiltonian (2) via \nTAP methods of finitely connected systems[5]; we solved the saddle point equations \nderived from the free energy numerically, representing all probability distributions \nby up to 104 bin models and by carrying out the integrations via Monte-Carlo \nmethods; finally, to show the consistency between theory and practice we carried \nout large scale simulations for several cases, which will be presented elsewhere. \n\n3 Structure of the solutions \n\nThe various methods indicate that the solutions may be divided to two different \ncategories: K = L = 2 and either K ~ 3 or L ~ 3. We therefore treat them separately. \nFor unbiased messages and either K ~ 3 or L ~ 3 we obtain the solutions (5) and \n(6) both by applying the TAP approach and by solving the saddle point equations \nnumerically. The former was carried out at the value of Fr which corresponds to \nthe true noise and input bias levels (for unbiased messages Fa = 0) and thus to \nNishimori's condition[lO], where no replica symmetry breaking effects are expected. \nThis is equivalent to having the correct prior within the Bayesian framework[6] and \nenables one to obtain analytic expressions for some observables as long as some \ngauge requirements are obeyed [10] . Numerical solutions show the emergence of \nstable dominant delta peaks, consistent with those of (5) and (6). The question \nof longitudinal mode stability (corresponding to the replica symmetric solution) \nwas addressed by setting initial conditions for the numerical solutions close to the \nsolutions (5) and (6), showing that they converge back to these solutions which are \ntherefore stable. \n\nThe most interesting quantity to examine is the maximal code rate, for a given \ncorruption process, for which messages can be perfectly retrieved. This is defined \nin the case of K,L~3 by the value of R=KIC=NjM for which the free energy of \nthe ferromagnetic solution becomes smaller than that of the paramagnetic solution, \nconstituting a first order phase transition. A schematic description of the solutions \nobtained is shown in the inset of Fig.1a. The paramagnetic solution (m = 0) has \na lower free energy than the ferromagnetic one (low Ihigh free energies are denoted \n\n\f276 \n\nY. Kabashima, T. Murayama, D. Saad and R. Vicente \n\nby the thick and thin lines respectively, there are no axis lines at m = 0,1) for \nnoise levels P > Pc and vice versa for P ~ Pc; both solutions are stable. The critical \ncode rate is derived by equating the ferromagnetic and paramagnetic free energies \nto obtain Rc = 1-H2(p) = 1+(plog2P+(1- p)log2(1- p)) . This coincides with \nShannon's capacity. To validate these results we obtained TAP solutions for the \nunbiased message case (K = L = 3, C = 6) as shown in Fig.1a (as +) in comparison \nto Shannon's capacity (solid line). \nAnalytical solutions for the saddle point equations cannot be obtained for biased \npatterns and we therefore resort to numerical methods ana the TAP approach. The \nmaximal information rate (Le., code-rate xH2 (fs = (1 + tanh Fs)/2) - the source \nredundancy) obtained by the TAP method (0) and numerical solutions of the saddle \npoint equations (0), for each noise level, are shown in Fig.1a. Numerical results \nhave been obtained using 103 _104 bin models for each probability distribution and \nhad been run for 105 steps per noise level point. The various results are highly \nconsistent and practically saturate Shannon's bound for the same noise level. \nThe MN code for K , L ~ 3 seems to offer optimal performance. However, the main \ndrawback is rooted in the co-existence of the stable m = 1 and m = 0 solutions, \nshown in Fig.1a (inset), which implies that from some initial conditions the system \nwill converge to the undesired paramagnetic solution. Moreover, studying the fer(cid:173)\nromagnetic solution numerically shows a highly limited basin of attraction, which \nbecomes smaller as K and L increase, while the paramagnetic solution at m = 0 \nalways enjoys a wide basin of attraction. Computer simulations (see also [3]) show \nthat as initial conditions for the decoding process are typically of close-to-zero mag(cid:173)\nnetization (almost no prior information about the original message is assumed) it \nis likely that the decoding process will converge to the paramagnetic solution. \n\nWhile all codes with K, L ~ 3 saturate Shannon's bound in their equilibrium prop(cid:173)\nerties and are characterized by a first order, paramagnetic to ferromagnetic, phase \ntransition, codes with K = L = 2 show lower performance and different physical char(cid:173)\nacteristics. The analytical solutions (5) and (6) are unstable at some flip rate levels \nand one resorts to solving the saddle point equations numerically and to TAP based \nsolutions. The picture that emerges is sketched in the inset of Fig.1b: The para(cid:173)\nmagnetic solution dominates the high flip rate regime up to the point PI (denoted \nas 1 in the inset) in which a stable, ferromagnetic solution, of higher free energy, \nappears (thin lines at m = \u00b11). At a lower flip rate value P2 the paramagnetic \nsolution becomes unstable (dashed line) and is replaced by two stable sub-optimal \nferromagnetic (broken symmetry) solutions which appear as a couple of peaks in \nthe various probability distributions; typically, these have a lower free energy than \nthe ferromagnetic solution until P3, after which the ferromagnetic solution becomes \ndominant. Still, only once the sub-optimal ferromagnetic solutions disappear, at the \nspinodal point Ps, a unique ferromagnetic solution emerges as a single delta peak in \nthe numerical results (plus a mirror solution). The point in which the sub-optimal \nferromagnetic solutions disappear constitutes the maximal practical flip rate for the \ncurrent code-rate and was defined numerically (0) and via TAP solutions (+) as \nshown in Fig.1b. \n\nNotice that initial conditions for TAP and the numerical solutions were chosen al(cid:173)\nmost randomly, with a slight bias of 0(10-12), in the initial magnetization. The \nTAP dynamical equations are identical to those used for practical BP decoding[5], \nand therefore provide equivalent results to computer simulations with the same pa(cid:173)\nrameterization, supporting the analytical results. The excellent convergence results \nobtained point out the existence of a unique pair of global solutions to which the \nsystem converges (below Ps) from practically all initial conditions. This observation \nand the practical implications of using K = L = 2 code have not been obtained by \n\n\fRegular and Irregular Gallager-type Error-Correcting Codes \n\n277 \n\ninformation theory methods (e.g.[3]}j these prove the existence of very good codes \nfor C = L ~ 3, and examine decoding properties only via numerical simulations. \n\n4 \n\nIrregular Constructions \n\nIrregular codes with non-uniform number of non-zero elements per column and \nuniform number of elements per row were recently introduced [11, 12] and were \nfound to outperform regular codes. It is relatively straightforward to adapt our \nmethods to study these particular constructions. The restriction of the number \nof connections per index can be replaced by a set of N restrictions of the form \n(1), enforcing Cj non-zero elements in the j-th column of the matrix A, and other \nM restrictions enforcing Ll non-zero elements in the l-th column of the matrix B. \nBy construction these restrictions must obey the relations E.7=l Cj = M K and \nE~l Ll = M L. One can assume that a particular set of restrictions is generated \nindependently by the probability distributions P(C) and P(L). With that we can \ncompute average properties of irregularly constructed codes generated by arbitrary \ndistributions. \n\nProceeding along the same lines to those of the regular case one can find a very \nsimilar expression for the free energy which can be interpreted as a mixture of regular \ncodes with column weights sampled with probabilities P(C) and P(L). As long as \nwe choose probability distributions which vanish for C, L = 0 (avoiding trivial \nnon-invertible matrices) and C, L = 1 (avoiding single checked bits), the solutions \nto the saddle point equations are the same as those obtained in the regular case \n(Eqs.5, 6) leading to exactly the same predictions for the maximum performance. \nThe differences between regular and irregular codes show up in their dynamical \nbehavior. In the irregular case with K > 2 and for biased messages the basin of \nattraction is larger for higher noise levels [13]. \n\n5 Conclusion \n\nIn this paper we examined the typical performance of Gallager-type codes. We dis(cid:173)\ncovered that for a certain choice of parameters, either K ~ 3 or L ~ 3, one potentially \nobtains optimal performance, saturating Shannon's bound. This comes at the ex(cid:173)\npense of a decreasing basin of attraction making the decoding process increasingly \nimpractical. Another code, K = L = 2, shows \"close to optimal performance with \na very large basin of attraction, making it highly attractive for practical purposes. \nThe decoding performance of both code types was examined by employing the TAP \napproach, an iterative method identical to the commonly used BP. Both numerical \nand TAP solutions agree with the theoretical results. The equilibrium properties of \nregular and irregular constructions is shown to be the same. The improved perfor(cid:173)\nmance of irregular codes reported in the literature can be explained as consequence \nof dynamical properties. This study examines the typical performance of these in(cid:173)\ncreasingly important error-correcting codes, from which optimal parameter choices \ncan be derived, complementing the bounds and empirical results provided in the \ninformation theory literature. Important aspects that are yet to be investigated \ninclude other noise types, finite size effects and the decoding dynamics itself. \n\nAcknowledgement Support by the JSPS RFTF program (YK), The Royal Society and \nEPSRC grant GR/N00562 (DS) is acknowledged. \n\n\f278 \n\n1 \n\n0.8 \n\n~ 0.6 \nI \na: 0.4 \n\n0.2 \n\nY. Kabashima. T. Murayama. D. Saad and R. Vicente \n\n1 \n\n0.8 \n\n0.6 \n\n0.4 \n\n0.2 \n\n0 \n\n0 \n\na: \n\np \n\n0.1 \n\n0.2 \n\nP \n\n0.3 \n\n0.4 \n\n0.5 \n\nFerro \n\n0.1 \n\n0.2 \n\nP \n\n0.3 \n\n0.4 \n\n0.5 \n\nFigure 1: Critical code rate as a function of the flip rate p, obtained from numerical \nsolutions and the TAP approach (N = 104 ), and averaged over 10 different initial \nconditions with error bars much smaller than the symbols size. \n(a) Numerical \nsolutions for K = L = 3, C = 6 and varying input bias fs (0) and TAP solutions for \nboth unbiased (+) and biased (0) messages; initial conditions were chosen close to \nthe analytical ones. The critical rate is multiplied by the source information content \nto obtain the maximal information transmission rate, which clearly does not go \nbeyond R = 3/6 in the case of biased messages; for unbiased patterns H 2 (fs) = 1. \nInset: The ferromagnetic and paramagnetic solutions as functions of p; thick and \nthin lines denote stable solutions of lower and higher free energies respectively. (b) \nFor the unbiased case of K = L = 2; initial conditions for the TAP (+) and the \nnumerical solutions (0) are of almost zero magnetization. Inset: The ferromagnetic \n(optimal/sub-optimal) and paramagnetic solutions as functions of p; thick and thin \nlines are as in (a), dashed lines correspond to unstable solutions. \n\nReferences \n\n[1] C.E. Shannon, Bell Sys. Tech.J., 27, 379 (1948); 27, 623 (1948). \n[2] R.G. Gallager, IRE Trans.Info. Theory, IT-8, 21 (1962). \n[3] D.J.C. MacKay, IEEE Trans.IT, 45, 399 (1999) . \n[4] D. Thouless, P.W. Anderson and R.G. Palmer, Phil. Mag., 35, 593 (1977). \n[5] Y. Kabashima and D. Saad, Europhys.Lett., 44 668 (1998) and 45 97 (1999). \n[6] N. Sourlas, Nature, 339, 693 (1989) and Euro.Phys.Lett., 25 , 159 (1994). \n[7] Y. Kabashima, T. Murayama and D. Saad, Phys.Rev.Lett., (1999) in press. \n[8] K.Y.M. Wong and D. Sherrington, J.Phys.A, 20, L793 (1987). \n[9] C. De Dominicis and P.Mottishaw, J.Phys.A, 20, L1267 (1987). \n[10] H. Nishimori, Prog. Theo.Phys., 66, 1169 (1981). \n[11] M. Luby et. ai, IEEE proceedings of ISIT98 (1998) and Analysis of Low Density \n\nCodes and Improved Designs Using Irregular Graphs, preprint. \n\n[12] D.J.C. MacKay et. al, IEEE Trans.Comm., 47, 1449 (1999). \n[13] R. Vicente et. ai, http://xxx.lanl.gov/abs/cond-mat/9908358 (1999). \n\n\f", "award": [], "sourceid": 1700, "authors": [{"given_name": "Yoshiyuki", "family_name": "Kabashima", "institution": null}, {"given_name": "Tatsuto", "family_name": "Murayama", "institution": null}, {"given_name": "David", "family_name": "Saad", "institution": null}, {"given_name": "Renato", "family_name": "Vicente", "institution": null}]}