{"title": "A Second-Order Translation, Rotation and Scale Invariant Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 313, "page_last": 319, "abstract": null, "full_text": "A Second-Order Translation, Rotation and \n\nScale Invariant Neural Network \n\nShelly D.D. Goggin \n\nKristina M. Johnson \n\nKarl E. Gustafson\u00b7 \n\nOptoelectronic Computing Systems Center and \n\nDepartment of Electrical and Computer Engineering \n\nUniversity of Colorado at Boulder \n\nBoulder, CO 80309 \n\nshellg@boulder.colorado.edu \n\nABSTRACT \n\nA second-order architecture is presented here for translation, rotation and \nscale invariant processing of 2-D images mapped to n input units. This \nnew architecture has a complexity of O( n) weights as opposed to the O( n 3 ) \nweights usually required for a third-order, rotation invariant architecture. \nThe reduction in complexity is due to the use of discrete frequency infor(cid:173)\nmation. Simulations show favorable comparisons to other neural network \narchitectures. \n\n1 \n\nINTRODUCTION \n\nMultiplicative interactions in neural networks have been proposed (Pitts and Mc(cid:173)\nCulloch, 1947; Giles and Maxwell, 1987; McClelland et aI, 1988) both to explain bi(cid:173)\nological neural functions and to provide invariances in pattern recognition. Higher(cid:173)\norder neural networks are useful for invariant pattern recognition problems, but \ntheir complexity prohibits their use in mal1Y large image processing applications. \nThe complexity of the third-order rotation invariant neural network of Reid et aI, \n1990 is 0(n 3 ), which will clearly not scale. For example, when 11 is on the order \nof 106 , as in high definition television (HDTV), 0(10 18) weights would be required \nin a third-order neural network. Clearly, image processing applications are best \napproached with neural networks of lower complexity. \\Ve present a translation, \n\n\u00b7Department of Mathematics \n\n313 \n\n\f314 \n\nGoggin, Johnson, and Gustafson \n\nrotation and scale invariant architecture, which has weight complexity of O( n), and \nrequires only multiplicative and additive operations in the activation function. \n\n2 HIGHER-ORDER NEURAL NETWORKS \n\nHigher-order neural networks (HONN) have multiplicative terms in their activation \nfunction, such that the output of a unit, Ok, has the form \n\n(n-l)(n-l) \n\nOk = f[ E E ... E Wij .. .lkXiXj ... Xr] \n\n(n-l) \n\n(1) \n\n(i=O) (j=0) \n\n1=0 \n\nwhere f is a thresholding function, Wij. .. lk is the weight for each term, and Xi is one \nof n input values. Some of the Xi could be bias units to give lower order terms. The \norder of the multiplications is O(nm) for an m-order network, but the order of the \nnumber of weights can be lower. Since the multiplications of data can be done in \na preprocessing stage, the major factor in the computational burden is the number \nof weights. The emphasis on the complexity of the weights is especially relevant for \noptical implementations of higher-order networks (Psaltis et aI, 1988, Zhang et aI, \n1990), since the multiplications can usually be performed in parallel. \n\nInvariances can be achieved with higher-order neural networks by using the spa(cid:173)\ntial frequencies of the input as a priori information. Wechsler and Zimmerman, \n1988, compute the Fourier transform of the data in polar coordinates and use these \ndata as inputs to a neural network to achieve rotation, scale and translation invari(cid:173)\nanee. The disadvantage with this approach is that the Fourier transform and the \ncomputation of polar coordinates require more complex operations than addition \nand multiplication of inputs. It has been shown that second-order networks can \nbe constructed to provide either translation and scale invariance or rotation and \nscale invariance (Giles et aI, 1988). However, their approach does not consider the \ndifficulties in defining scale and rotation for images made up of pixels. Our archi(cid:173)\ntecture directly addresses the problem of rotation, translation and scale invariance \nin pattern recognition for 2-D arrays ofbinal'Y pixels. Restrictions permit structure \nto be built into the weights, which reduces their complexity. \n\n3 WEDGE-RING HONN \n\nvVe present a new architecture for a second-order neural network based on the \nconcept of the wedge-ring detector (Casasent, 1985). When a wedge-ring detector \nis used in the Fourier plane of an optical processor, a set of features are obtained \nthat are invariant to scale, rotation and translation. As shown in figure 1, the lens \nperforms a spatial Fourier transform on an image, which yields an intensity pattern \nthat is invariant to translations in the image plane. The ring detectors sum the \namplitudes of the spatial frequencies with the same radial distance from the zero \nfrequency, to give features that are invariant to rotation and shift changes. The \nwedge detectors sum the amplitudes of frequencies within a range of angles with \nrespect to the zero frequency to produce features that are invariant to scale and \nshift changes, assuming the images retain the same zero frequency power as they \nare scaled. \n\n\fA Second-Order Thanslation, Rotation and Scale Invariant Neural Network \n\n315 \n\nLaser \n\nImage \n\nFourier Wedge-Ring \n\nTransform Detector \n\nLens \n\nComputer \n\nFigure 1: A Wedge-Ring Detector Optical Processor \n\nIn a multi-pixel, binary image, a second-order neural network can perform the same \nfunction as the wedge-ring detector without the need for a Fourier transform. For \nan image of dimensions fo x yin, let us define the pixel spatial frequency fi,j as \n\n(v'n-l-Ikl) (v'n- l -Ill) \n\nh,l = L \n\nL \n\n;ri,j;ri+lkl,j+I'I' -(vn -1) ~ k, I < vn - 1 \n\n(2) \n\n(i=O) \n\n(j=O) \n\nwhere ;ri,j is a binary valued pixel at location (i, j). Note that the pixel frequencies \nhave symmetry; /i,j = f -i,-j. The frequency terms can be arranged in a grid \nin a manner analogous to the Fourier transform image in the optical wedge-ring \ndetector. (See figure 2.) \n\nPixel Wedge Terms \n\n\u2022 \u2022 B1~lIIll?Zlra. \nV lao. V lS3. V 13S' V 117' V 90\" V 63. V 4S. V 27\u00b7 \n\nf4,1 \n\n(I.' \nf \u2022.\u2022 \n\n(~1 f~. f4,l \nf' r1 f. r \u2022 f \u2022 .-\nfa,1 fl;-' f ... f .~ fl,t \nf .,2 \nf.,4 f ..... f \u2022 .- f .~ \n\nf.,2 \n\nfl,4 \n\nf2,o1 \n\nf1,t fl,l \n\nf2,3 \n\nPixel Spatial Frequencies \n\nXO.!l Xo XO\u20222 \n\nx l \u2022O Xl,l Xl, \n\nX2,O X2,1 ~2 \n\nImage \n\n(Input Units) \n\nPixel Ring Terms \n\nDIml \u2022 \u2022 \u2022 \nro \n\nr3 \n\nr 1 \n\nr 2 \n\nr4 \n\nFigure 2: A Simple Input Image and its Associated Pixel Spatial \n\nFrequencies, Pixel Ring Terms and Pixel Wedge Terms \n\nFor all integers p, 0 ~ p ~ 2( fo - 1), the ring pixel terms rp are given by \nrp = 2 L h,l, 0 ~ k ~ vn - 1, 0 ~ I ~ yin - 1, if k = O. \n\n(3) \n\nIkl+lll=p \n\n-(yin - 1) ~ I ~ yin - 1, if k > O. \n\nas shown in figure 2. This definition of the ring pixel terms works well for \nimages with a small number of pixels. Larger pixel arrays can use the following \n\n\f316 \n\nGoggin, Johnson, and Gustafson \n\ndefinition. For 0 ~ p ~ 2( Vii - I?, \nr p = 2 L h: ,f, 0 ~ k ~ Fn - 1, 0 < I < y'n - 1, if k = o. \n\nI:l+ll=p \n\n-(y'n - 1) ~ 1 ~ y'n - 1, if k > o. \n\n( 4) \n\nNote that p will not take on all values less than 2n. The number of ring pixel terms \ngenerated by equation 4 is less than or equal to r n/21 + L y'n/2 J. The number of \nring pixel terms can be reduced by making the rings a fixed width, ~r. Then, for \nall integers p, 0 ~ p < rV2(Vii - 1)/~rl \n\nrp = 2 \n\nL \n\n(p-l)~r<~~p~r \n\nfl:,l, \n\no o. \n\n(5) \n\nAs the image size increases, the ring pixel terms will approximate continuous rings. \nFor 0 < () ~ 1800 , the wedge pixel terms V9 are \nV9 = 2 \n\nfl:,l, -(Fn -1) < k < 0, -(y'n - 1) ~ I ~ 1, if k = 0, \n\ntan- l (I: 1 1)=9 \n\n-( Vii - 1) < I ~ y'n - 1, if k < 0, \n(6) \nas shown in figure 2. The number of wedge pixel terms is less than or equal to \n2n - 2y'n + 1. The number of wedge pixel terms can be reduced by using a fixed \nwedge width, ~v. Then for all integers q, 1 ~ q ~ P80\u00b0 / ~v 1, \n\n(q-l )~tJ< tan- l (I: 11)~q~tJ \n\n-(Vii - 1) ~ k < 0, \n-( Vii - 1) < I ~ 1, if k = 0, \n-(Vii - 1) ~ I ~ Vii - 1, if k < 0, \n\n(7) \n\nFor small pixel arrays, the pixel frequencies are not evenly distributed between the \nwedges. \n\nAll of the operations from the second-order terms to the pixel frequencies and from \nthe pixel frequencies to the ring and wedge pixel terms are linear. Therefore, the \nvalues of the wedge-ring features can be obtained by directly summing the second(cid:173)\norder terms, without explicitly determining the individual spatial frequencies. \n\n(y\"il-l-II:I) (y\"il-I-lll) \n\nL L \n\n(j=O) \n\n(i=O) \n\no o. \n\n(8) \n\nV9 = 2 \n\n(tan-l(1: 11)=9) \n\n(y\"il-l-Ikl) (fo-l-lll) \n\nL L \n\n(j=O) \n\n(i=O) \n\n-(y'n-l)