{"title": "Single-Iteration Threshold Hamming Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 564, "page_last": 571, "abstract": null, "full_text": "Single-iteration Threshold Hamming \n\nNetworks \n\nIsaac Meilijson \n\nEytan Ruppin \n\nMoshe Sipper \n\nSchool of Mathematical Sciences \n\nRaymond and Beverly Sackler Faculty of Exact Sciences \n\nTel Aviv University, 69978 Tel Aviv, Israel \n\nAbstract \n\nWe analyze in detail the performance of a Hamming network clas(cid:173)\nsifying inputs that are distorted versions of one of its m stored \nmemory patterns. The activation function of the memory neurons \nin the original Hamming network is replaced by a simple threshold \nfunction. The resulting Threshold Hamming Network (THN) cor(cid:173)\nrectly classifies the input pattern, with probability approaching 1, \nusing only O(mln m) connections, in a single iteration. The THN \ndrastically reduces the time and space complexity of Hamming Net(cid:173)\nwork classifiers. \n\n1 \n\nIntroduction \n\nOriginally presented in (Steinbuch 1961, Taylor 1964) the Hamming network (HN) \nhas received renewed attention in recent years (Lippmann et. al. 1987, Baum et. \nal. 1988). The HN calculates the Hamming distance between the input pattern \nand each memory pattern, and selects the memory with the smallest distance. It \nis composed of two subnets: The similarity subnet, consisting of an n-neuron input \nlayer connected with an m-neuron memory layer, calculates the number of equal bits \nbetween the input and each memory pattern. The winner-take-all (WTA) subnet, \nconsisting of a fully connected m-neuron topology, selects the memory neuron that \nbest matches the input pattern. \n\n564 \n\n\fSingle-iteration Threshold Hamming Networks \n\n565 \n\nThe similarity subnet uses mn connections and performs a single iteration. The \nWTA sub net has m 2 connections. With randomly generated input and memory \npatterns, it converges in 8(m In(mn)) iterations (Floreen 1991). Since m is ex(cid:173)\nponential in n, the space and time complexity of the network is primarily due to \nthe WTA subnet (Domany & Orland 1987). We analyze the performance of the \nHN in the practical scenario where the input pattern is a distorted version of some \nstored memory vector. We show that it is possible to replace the original activa(cid:173)\ntion function of thf' neurons in the memory layer by a simple threshold function, \nand completely discard the WTA subnet. If the threshold is properly tuned, only \nthe neuron standing for the 'correct' memory is likely to be activated. The result(cid:173)\ning Threshold Hamming Network (THN) will perform correctly (with probability \napproaching 1) in a single iteration, using only O(m In m) connections instead of \nthe O( m 2 ) connections in the original HN. We identify the optimal threshold, and \nmeasure its performance relative to the original HN. \n\n2 The Threshold Hamming Network \nWe examine a HN storing m + 1 memory patterns e\", 1 ~ jJ ~ m + 1, each \nbeing an n-dimensional vector of \u00b11. The input pattern x is generated by selecting \nsome memory pattern ~I-' (w.l.g., ~m+l), and letting each bit Xi be either ~f or \n-~f with probabilities a and (I - a) respectively, where a > 0.5. To analyze this \nHN, we use some tight. approximations to the binomial distribution. Due to space \nconsiderations, their proofs are omitted. \n\nLemlna 1. \nLet X,...., Bin(n,p) . If Xn are integers such that limn-+oo~ = /3 E (p, 1), then \n\nP(X > xn) ~ \n\n-\n\n(1 - ~ )v211'n/3(1 - /3) \n\n1 - p \n\nexp{ -n[/3ln /3 + (1 _ /3) In 1 -\n\n/3]} \n\np \n\n1 - p \n\nin the sense that the ratio between LHS and RHS converges to 1 as n ~ 00. \nthe special case p = ~, let G{/3) = In 2 + /31n/3 + (1- /3) In{1 - /3), then \n\nP(X> x ),...., -:--_e7'xp,-,{=--;:-=n=G::::(/3:::::)==} =. \n,...., (2 - ~ )V211'n/3(1 - 13) \n\nn \n\n-\n\n(1) \n\nFor \n\n(2) \n\nLenllna 2. \nLet Xi ,...., Bin(n,~) be independent, 'Y E (0,1), and let Xn be as in Lemma 1. If \n\nm = (2 - ~)\\h7rn/3(I- 13) (ln~) enG ({3), \n\nthen \n\n(3) \n\n(4) \n\nLenllna 3. \nLet y,...., Bin(n,Q') with a >~, let (Xi) and 'Y be as in Lemma 2, and let T} E (0,1). \nLet Xn be the integer closest to nf3, where \nV n \n\n/3=a_/a(l-a)z _~ \n\n(5) \n\n2n \n\n1) \n\n\f566 \n\nMeilijson, Ruppin, and Sipper \n\nand zTj is the T] - quantile of the standard normal distribution, i.e., \n\n1 \nT] = - -\nvf2; \n\njZ'I \n\n-00 \n\n2 \n\ne- x /2dx \n\n(6) \n\nThen, if Y and (Xd are independent \n\nP (max(X 1 , X 2 , \" ' , Xm) < Y) 2:: P(max(X1 , X 2 , \" ' , Xm) < Xn ~ Y) => ,T] \n\n(7) \n\nas n -+- 00, for m as in (3). \nBased on the above binomial probability approximations, we can now propose and \nanalyze a n-neuron Threshold Hamming Network (THN) that classifies the input \npatterns with probability of error not exceeding f, when the input vector is generated \nwith an initial bit-similarity a: Let Xi be the similarity between the input vector \nand the j'th memory pattern (1 ~ j < m), and let Y be the similarity with \nthe 'correct' memory pattern ~m+l. Choose, and T] so that ,T] 2:: 1 -\nf, e.g., \n, = T] = VI=\"f; determine f3 by (5) and m by (3). Discard the WTA subnet, and \nsimply replace the neurons of the memory layer by m neurons having a threshold \nXn , the integer closest to nf3. If any memory neuron with similarity at least Xn \nis declared 'the winner', then, by Lemma 3, the probability of error is at most f, \nwhere 'error' may be due to the existence of no winner, wrong winner, or multiple \nwmners. \n\n3 The Hamming Network and an Optimal Threshold \n\nHamming Network \n\nWe now calculate the choice of the threshold Xn that maximizes the storage ca(cid:173)\npacity m = men, f, a). Let $(z) Jna(l- a)} = (1-7])exp{r(z) Jna(1- a)} \n\n(10) \n\nwhere ~ is the standard normal density function, \u00ab1> is the standard normal cumu(cid:173)\n\u00ab1> and r = -J. is the corresponding failure \nlative distribution function, \u00ab1>$ = 1 -\nrate function. The probability of correct recognition using a threshold x can now \nbe expressed as \n\nP(M < x)P(Y ~ x) = ,(6)\"'0-\"'(1- (1-7])exp{r(z) \n\nx - Xo \n\n}) \n\nJna(l- a) \n\n(11) \n\nWe differentiate expression (11) with respect to Xo - x, and equate the derivative \nat Xo = x to zero, to obtain the relation between , and 7] that yields the optimal \nthreshold, i.e., that which maximizes the probability of correct recognition. This \nyields \n\n(12) \n\n(13) \n\n(14) \n\n,= exp{-\n\n1'(Z) \n\n1 -\n- - } \nJna(l-a)ln~ 7] \n\n7] \n\nWe now approximate \n\n1 -\n\n, ~ - In , ~ \n\nr(z) \n\nJna(l- a)ln 4 \n\n( \n1 -\n\n) \n7] \n\nand thus the optimal proportion between the two error probabilities is \n\n1 - ; \n-- ~ \n1 - 7] \n\nr(z) \n\n= {yo \njna(1 - a) In ~ \n\no \n\nBased on Lemma 4, if the desired probability of error is (, we choose \n\n{Jf. \n\n_ 1-\n\nt \n\n7] -\n\n,=I-I+{Y' \n\n(15) \nWe start with, = 7] = ..;r=f, obtain {3 from (5) and {y from (8), and recompute 7] \nand, from (15). The limiting values of j3 and, in this iterative process give the \nmaximal capacity m and threshold x n . \nWe now compute the error probability t( m, n, a) of the original HN (with the WTA \nsubnet) for arbitrary tn, n and a, and compare it with (. \n\n(1 + {y) \n\nLemma 5. \nFor arbitrary n, a and t, let m, {3\", 7] and {y be as calculated above. Then, the \nprobability of error ((m, n, a) of the HN satisfies \n1- e- 61n 6 \n-.L \n{yIn 1-{3 \n\n((m,n,a)~r(I-{Y) \n\n({y)lH(1+6 \n\n(16) \n\n{y6 \n\n1 + \n\n\f568 \n\nMeiiijson, Ruppin, and Sipper \n\nwhere \n\nis the Gamma function. \n\nProof: \n\nP(Y ~ M) = LP(Y ~ x)P(M = x) = \n\nx \n\nLP(Y ~ x)[P(M < x+ 1) - P(M < x)] ~ \nx \nL P(Y ~ xo)e- 6(xo-x)ln 6 \n\nx \n\n(17) \n\n(18) \n\nWe now approximate this sum by the integral of the summand: let b = ~ and \nc = 6ln ~. We have seen that the probability of incorrect performance of the \nWTA subnet is equal to \n\nP(Y :S M) ~ \nL P(Y ~ xo)e-c(xo-x)[(P(M < xo))b(ro-r-l) - (P(M < xo))b(ro-r)] ~ \n\nx \n\nNow we transform variables t = bY In ~ to get the integral in the form \n\nThis is the convergent difference between two divergent Gamma function integrals. \n~e perform inte~rat~on by parts to obtain a representation as an integr~l wi~h rK2 \nmstead of t-(1+ 2) m the mtegrand. For 0 ~ K2 < 1, the correspondmg mtegral \nconverges. The final result is then \n\n(1 - 7]) \n\n1 - e- C \n\nC \n\nHence, we have \n\nc \nr(l - -)(1n -)1iib \n\nc \nIn b \n\n1 \n'Y \n\nP(Y ~ M) ~ (1-7]) \n\n-L r(l- 6)(ln _)6 ~ \n\n1 \n\n'Y \n\n1 \n\n- e \n\n-61n -1L \nl-{J \n\n6ln l-f3 \n\nr(l - 6) \n\n1 - e- 6ln 6 \n-L \n6In 1 _/3 \n\n(1 \n\n(f6)6 \n+ \n\n6)1+6 f \n\n(21 ) \n\n(22) \n\n\f% error -+ \n\npredicted \n\nthreshold \n\n, m THN \n\nSingle-iteration Threshold Hamming Networks \n\n569 \n\npredicted \nHN \n\nexperimental \nTHN \n\nexperimental\" \nHN \n\n! \n133 , 145 \n\n134 \u2022 346 \n\n135 , 825 \n\n136 , 1970 \n\n2.46 \n'Y = 1.03 \n(1 -\nT/ = 1.46) \n1 -\n3.4 \n'Y = 1.37 \n(1 -\n1-T/=2.1l) \n4.714 \n(1 -\n1 -\n6.346 \n(1 -\n1 -\n\n'Y = 1.776 \n11 = 2.991) \n\n'Y = 2.274 \nT/ = 4.167) \n\n0.144 \n\n0.272 \n\n0.494 \n\n0.857 \n\n2.552 \n(1 -\n1 -\n3.468 \n(1 -\n1 -\n4.152 \n(1 -\n1 -\n6.447 \n(1 -\n1 -\n\n'Y = 1.0 \nT/ = 1.552) \n\n'Y = 1.373 \nT/ = 2.168) \n\n'Y = 1.606 \n11 = 2.576) \n\n'Y = 2.335 \nT/ = 4.162) \n\n0.103 \n\n0.253 \n\n0.485 \n\n0.863 \n\nTable 1: The performance of a HN and optimal THN: A comparison between cal(cid:173)\nculated and experimental results (a = 0.7,n = 210). \n\nas claimed. Expression (22) is presented as K(f, 8, (3)f, where K(f, 8, (3) is the factor \n(:::; 1) by which the probability of error f of the THN should be multiplied in order \nto get the probability of error of the original HN with the WTA subnet. For small \n8, K is close to 1, however, as will be seen in the next section, K is typically larger. \n\n4 Numerical results \n\nThe experimental results presented in table 1 testify to the accuracy of the HN and \nTHN calculations. Figure 1 presents the calculated error probabilities for various \nvalues of input similarity a and memory capacity m, as a function of the input size \nn. As is evident, the performance of the THN is worse than that of the HN, but due \nto the exponential growth of m, it requires only a minor increment in n to obtain \na THN that performs as well as the original HN. \n\nTo examine the sensitivity of the THN network to threshold variation, we have fixed \na = 0.7, n = 210, m = 825, and let the threshold vary between 132 and 138. As we \ncan see in figure 2, the threshold 135 is indeed optimal, but the performance with \nthreshold values of 134 and 136 is practically identical. The magnitude of the two \nerror types varies considerably with the threshold value, but this variation has no \neffect on the overall performance near the optimum. These two error probabilities \nmight as well be taken equal to each other. \n\nConclusion \nIn this paper we analyzed in detail the performance of a Hamming \nNetwork and a Threshold Hamming Network. Given a desired storage capacity and \nperformance, we described how to compute the corresponding minimal network size \nrequired. The THN drastically reduces the time and connectivity requirements of \nHamming Network classifiers. \n\n\f570 \n\nMeilijson, Ruppin, and Sipper \n\nepsilon \n(error \n\nprobability) \n\n0.0001 \n0.0003 \n\n0.000 \n0.002 \n\n0.007 \n\n0.14 \n0.37 \n\nalpha=0.6,m=103 \n\nTHN~ \nHN -+-\n\n800 \n\n1000 \n\n1200 \n\n1400 \n1600 \nn (network size) \n\n1800 \n\n2000 \n\n2200 \n\nalpha=0 .7,m=106 \n\nTHN~ \nHN -+-\n\nepsilon \n( error \n\nprobability) \n\n0.0001 \n0.0003 \n0.000 \n\n0.14 \n\n0.37 \n\n300 320 340 360 380 400 420 440 460 480 500 520 540 560 580 600 \n\nn (network size) \n\n0.0001....,.-----------r--------..,...---, \n\na1pha=0.8,m=109 \n\nepsilon \n(error \n\nprobability) \n\n0.0003 \n0.000 \n\n0.37 \n\n160 \n\n180 \n\n200 \n\nTHN~ \nHN -+-\n\n280 \n\n300 \n\n320 \n\n220 \n\nn (network size) \n\n240 \n\n260 \n\nFigure 1: Probability of error as a function of network size: three networks are \ndepicted , displaying the performance at various values of (}' and m . For graphical \nconvenience, we have plotted log ~ versus n. \n\n\f% error \n\n10 \n9 \n8 \n7 \n6 \n5 \n4 \n3 \n2 \n1 \n0 \n132 \n\nSingle-iteration Threshold Hamming Networks \n\n571 \n\nTHN performance \n\nepsilon ~ \n\n1 - gamma +-\n1 - eta -e-\n\n133 \n\n134 \n\n135 \n\nthreshold \n\n136 \n\n137 \n\n138 \n\nFigure 2: Threshold sensitivity of the THN (a = 0.7, n = 210, m = 825). \n\nReferences \n\n[1] K. Steinbuch. Dei lernmatrix. /(ybernetic, 1:36-45, 1961. \n[2] \\iV.K. Taylor. Cortico-thalamic organization and memory. Proc. of the Royal \n\nSociety of London B, 159:466-478, 1964. \n\n[3] R.P. Lippmann, B. Gold, and M.L. Malpass. A comparison of Hamming and \nHopfield neural nets for pattern classification. Technical Report TR-769, MIT \nLincoln Laboratory, 1987. \n\n[4] E.E. Baum, J. Moody, and F. Wilczek. Internal representations for associative \n\nmemory. Biological Cybernetics, 59:217-228, 1987. \n\n[5] P. Floreen. The convergence of hamming memory networks. IEEE Trans. on \n\nNeural Networks, 2(4):449-457, 1991. \n\n[6] E. Domany and H. Orland. A maximum overlap neural network for pattern \n\nrecognition. Physics Letters A, 125:32-34,1987. \n\n[7] M.R. Leadbetter, G. Lindgren, and H. Rootzen. Extremes and related prop(cid:173)\nerties of random sequences and processes. Springer-Verlag, Berlin-Heidelberg(cid:173)\nNew York , 1983. \n\n\f", "award": [], "sourceid": 668, "authors": [{"given_name": "Isaac", "family_name": "Meilijson", "institution": null}, {"given_name": "Eytan", "family_name": "Ruppin", "institution": null}, {"given_name": "Moshe", "family_name": "Sipper", "institution": null}]}