{"title": "Learning to See Rotation and Dilation with a Hebb Rule", "book": "Advances in Neural Information Processing Systems", "page_first": 320, "page_last": 326, "abstract": null, "full_text": "Learning to See Rotation and \n\nDilation with a Hebb Rule \n\nMartin I. Sereno and Margaret E. Sereno \n\nCognitive Science D-015 \n\nUniversity of California, San Diego \n\nLa Jolla, CA 92093-0115 \n\nAbstract \n\nPrevious work (M.I. Sereno, 1989; cf. M.E. Sereno, 1987) showed that a \nfeedforward network with area VI-like input-layer units and a Hebb rule \ncan develop area MT-like second layer units that solve the aperture \nproblem for pattern motion. The present study extends this earlier work \nto more complex motions. Saito et al. (1986) showed that neurons with \nlarge receptive fields in macaque visual area MST are sensitive to \ndifferent senses of rotation and dilation, irrespective of the receptive field \nlocation of the movement singularity. A network with an MT-like \nsecond layer was trained and tested on combinations of rotating, dilating, \nand translating patterns. Third-layer units learn to detect specific senses \nof rotation or dilation in a position-independent fashion, despite having \nposition-dependent direction selectivity within their receptive fields. \n\nINTRODUCTION \n\n1 \nThe visual systems of mammals and especially primates are capable of prodigious feats of \nmovement. object. and scene recognition under noisy conditions--feats we would like to \ncopy with artificial networks. We are just beginning to understand how biological \nnetworks are wired up during development and during learning in the adult. Even at this \nstage. however. it is clear that explicit error signals and the apparatus for propagating them \nbackwards across layers are probably not involved. On the other hand. there is a growing \nbody of evidence for connections whose strength can be modified (via NMDA channels) \nas functions of the correlation between pre- and post-synaptic activity. The present project \nwas to try to learn to detect pattern rotation and dilation by example. using a simple Hebb \n\n320 \n\n\fLearning to See Rotation and Dilation with a Hebb Rule \n\n321 \n\nrule. By building up complex filters in stages using a simple, realistic learning rule, we \nreduce the complexity of what must be learned with more explicit supervision at higher \nlevels. \n\n1.1 ORIENT A TION SELECTIVITY \nSome of the connections responsible for the selectivity of cortical neurons to local stimulus \nfeatures develop in the absence of patterned visual experience. For example, primary \nvisual cortex (VI or area 17) contains orientation-selective neurons at birth in several \nanimals. Linsker (1986a,b) has shown that feedforward networks with gaussian \ntopographic interlayer connections, linear summation, and simple hebb rules, develop \norientation selective units in higher layers when trained on noise. In his linear system, \nweight updates for a layer can be written as a function of the two-point correlation \ncharacterizing the previous layer. Noise applied to the input layer causes the emergence of \nconnections that generate gaussian correlations at the second layer. This in tum drives the \ndevelopment of more complex correlation functions in the third layer (e.g., difference-of(cid:173)\ngaussians). Rotational symmetry is broken in higher layers with the emergence of Gabor(cid:173)\nfunction-like connection patterns reminiscent of simple cells in the cortex. \n\n1.2 PATTERN MOTION SELECTIVITY \nThe ability to see coherent motion fields develops late in primates. Human babies, for \nexample, fail to see the transition from unstructured to structured motion--e.g., the \ntransition between randomly moving dots and circular 2-D motion--for several months. \nThe transition from horizontally moving dots with random y-axis velocities to dots with \nsinusoidal y-axis velocities (which gives the percept of a rotating 3-D cylinder) is seen even \nlater (Spitz, Stiles-Davis, & Siegel, 1988). This suggests that the cortex requires many \nexperiences of moving displays in order to learn how to recognize the various types of \ncoherent texture motions. \nHowever, orientation gradients, shape from shading, and pattern translation, dilation, and \nrotation cannot be detected with the kinds of filters that can be generated solely by noise. \nThe correlations present in visual scenes are required in order for these higher level filters \nto arise. \n\n1.3 NEUROPHYSIOLOGICAL MOTIVATION \nMoving stimuli are processed in successive stages in primate visual cortical areas. The first \ncortical stage is layer 4Ca of VI, which receives its main ascending input from the \nmagnocellular layers of the lateral geniculate nucleus. Layer 4Ca projects to layer 4B, \nwhich contains many tightly-tuned direction-selective neurons. These neurons, however, \nrespond to moving contours as if these contours were moving perpendicular to their local \norientation (Movshon et at, 1985). \nLayer 4B neurons project directly and indirectly to area MT, where a subset of neurons \nshow a relatively narrow peak in the direction tuning curve for a plaid that is lined up with \nthe peak for a single grating. These neurons therefore solve the aperture problem for \npattern translation presented to them by the local motion detectors in layer 4 B of VI. MT \nneurons, however, appear to be largely blind to the sense of pattern rotation or dilation \n(Saito et al., 1986). Thus, there is a higher order 'aperture problem' that is solved by the \nneurons in the parts of areas MST and 7a that distinguish senses of pattern rotation and \n\n\f322 \n\nSereno and Sereno \n\ndilation. The present model provides a rationale for how these stages might naturally arise \nin development. \n\n2 RESULTS \nIn previous work (M.1. Sereno, 1989; cf. M.E. Sereno, 1987) a simple 2-layer feedforward \narchitecture sufficed for an MT-like solution to the aperture problem for local translational \nmotion. Units in the fIrst layer were granted tuning curves like those in VI, layer 4B. Each \nfirst-layer unit responded to a particular range of directions and speeds of the component \nof movement perpendicular to a local contour. Second layer units developed MT-like \nreceptive fields that solved the aperture problem for local pattern translation when trained \non locally jiggled gratings rigidly moving in randomly chosen pattern directions. \n\n2.1 NETWORK ARCHITECTURE \nA similar architecture was used for second-to-third layer connections (see Fig. l--a sample \nnetwork with 5 directions and 3 speeds). As with Linsker, a new input layer was \nconstructed from a canonical unit, suitably transformed. Thus, second-layer units were \ngranted tuning curves resembling those found in MT (as well as those generated by first(cid:173)\nto-second layer leaming)--that is, they responded to the local pattern translation but were \nblind to particular senses of local rotation, dilation, and shear. There were 12 different local \n\n~jJ \n\nThird \nLayer \n(=MST) \n\n...... \n\n~.f.. probability I r: \n\nof \n\n.J . '-\nllxAy \n\n'-~:-----\"\":\":..:..::.J \n\n........ \n\nconnection \n\nSecond Layer \n\n(=MT) \n\nFirst Layer \n\n(=Vl, Layer 4B) \n\nFigure 1: Network Architecture \n\npattern directions and 4 different local pattern speeds at each x -y location (48 different units \nat each of 100 x-y points). Second-layer excitatory tuning curves were piecewise linear \nwith half-height overlap for both direction and speed. Direction tuning was set to be 2-3 \ntimes as important as speed tuning in determining the activation of input units. Input units \n\n\fLearning to See Rotation and Dilation with a Hebb Rule \n\n323 \n\ngenerated untuned feedforward inhibition for off-directions and off-speeds. Total \ninhibition was adjusted to balance total excitation. The probability that a unit in the first \nlayer connected to a unit in the second layer fell off as a gaussian centered on the \nretinotopically equivalent point in the second layer. Since receptive fields in areas MST \nand 7a are large. the interlayer divergence was increased relative to the divergence in the \nfirst-to-second layer connections. Third layer units received several thousand connections. \nThe network is similar to that of Linsker except that there is no activity-independent decay \n(kj ) for synaptic weights and no offset (k2) for the correlation term. The activation. outj\u2022 \nfor each unit is a linear weighted sum of its inputs. ini scaled by a, and clipped to maximum \nand minimum values: \n\nout. = \n\n} \n\n{ a.~)niWeightij \n\ni \nout \n\n. \nmax, mill \n\nWeights are also clipped to maximum and minimum values. The change in each weight, \ntJ.weightij' is a simple fraction, 8, of the product of the pre- and post-synaptic values: \n\n/}weight .. = f,in .out. \n\nI} \n\nI } \n\nThe learning rate, 8, was set so that about 1.000 patterns could be presented before most \nweights saturated. The stable second-layer weight patterns seen by Linsker (1986a) are \nreproduced by this model when it is trained on noise input. However, since it lacks k2' it \ncannot generate center-surround weight structures given only gaussian correlations as \ninput. \n\n2.2 TRAINING PATTERNS \nSecond-to-third layer connections were trained with full or partial field rotations. dilations. \nand translations. Each stimulus consisted of a set of local pattern motions at each x-y point \nthat were: 1) rotating clockwise or counterclockwise around, 2) dilating or contracting \ntoward, or 3) translating through a randomly chosen location. The singularity was always \nwithin the input array. Both full and partial field rotations and dilations were effective \ntraining stimuli for generating rotation and dilation selectivity. \n\n2.3 POSITION-INDEPENDENT TUNING CURVES \nPost-training rotation and dilation tuning curves for different receptive-field locations were \ngenerated for many third-layer units using paradigms similar to those used on real neurons. \nThe location of the motion singularity of the test stimulus was varied across layer two. \nThird-layer units often responded selectively to a particular sense of rotation or dilation at \neach visual field test location. A sizeable fraction of units (10-60%) responded \nin a \nposition-independent way after unsupervised learning on rotating and dilating fields. \nSimilar responses were found using both partial- and full-field test stimuli. \nThese units thus resemble the neurons in primate visual area MSTd (10-40% of the total \nthere) recorded by Saito et a1. (1986), Duffy and Wurtz (1990). and Andersen et a1. (1990) \nthat showed position-independent responses to rotations and dilations. Other third-layer \nunits had position-dependent tuning--that is, they changed their selectivity for stimuli \ncentered at different visual field locations, as, in fact, do a majority of actual MSTd \nneurons. \n\n\f324 \n\nSereno and Sereno \n\n2.4 POSITION-DEPENDENT WEIGHT STRUCTURES \nGiven the position- independence of the selective response to rotations and/or dilations in \nsome of the third-layer units, it was surprising to find that most such units had weight \nstructures indicating that local direction sensitivity varied systematically across a unit's \nreceptive field. Regions of maximum weights in direction-speed subspace tended to vary \nsmoothly across x-y space such that opposite ends of the receptive field were sensitive to \nopposite directions. This picture obtained with full and medium-sized partial field training \nexamples, breaking down only when the rotating and dilating training patterns were \nsubstantially smaller than the receptive fields of third-layer units. In the last case, smooth \nchanges in direction selectivity across space were interrupted at intervals by discontinuities. \nAn essentially position-independent tuning curve is achieved because any off-center \nclockwise rotation that has its center within the receptive field of a unit selective for \nclockwise rotation will activate a much larger number of input units connected with large \npositive weights than will any off-center counterclockwise rotation (see Fig 2). \n\nrecep'tive \n\nfield \n\nreceptive field \n\nstimulus \n\npattern \n\nlocal direction selectivity \nof trained unit sensitive \nto clockwise rotation \n\ntest for position in variance \nstimulus in receptive field \n\nby rotating off-center \n\nmost local directions \nclockWIse stimulus \n\nmatch with \n\nmost local directions \n\nclash with \n\nopposite rotation \n\nFigure 2: Position-dependent weights and \n\nposition-independent responses \n\nSaito et al. (1986), Duffy & Wurtz (1990), and Andersen et al. (1990) have all suggested \nthat true translationally-invariant detection of rotation and dilation sense must involve \n\n\fLearning to See Rotation and Dilation with a Hebb Rule \n\n325 \n\nseveral hierarchical processing stages and a complex connection pattern. The present \nresults show that position-independent responses are exhibited by units with position(cid:173)\ndependent local direction selectivity. as originally exhibited with small stimuli in area 7a \nby Motter and Mountcastle (1981). \n\n2.5 WHY WEIGHTS ARE PERIODIC IN DIRECTION-SPEED SUBSPACE \nFor all training sets. the receptive fields of all units contained regions of all-max weights \nand all-min weights within the direction-speed subspace at each x-y point. For comparison. \nif the model is trained on uncorrelated direction noise (a different random local direction at \neach x-y point). third layer input weight structures still exhibit regions of all-max and all(cid:173)\nmin weights in the direction-speed subspace at each x-y point in the second layer. In \ncontrast to weight structures generated by rigid motion. however. the location of these \nregions for a unit are not correlated across x-y space. These regions emerge at each x-y \nlocation because the overlap in the input unit tuning curves generates local two-point \ncorrelations in direction-speed subspace that are amplified by a hebb rule (Linsker. 1986a). \nThis mechanism prevents more complex weight structures (like those envisaged by the \nneurophysiologists and those generated by backpropagation) from emerging. The two(cid:173)\npoint correlations across x-y space generated by jiggled gratings. or by the rotation and \ndilation training sets serve to align the all-max or all-min regions in the case of translation \nsensitivity. or generate smooth gradients in the case of sensitivity to rotation and dilation. \n\n2.6 WHY MT DOES NOT LEARN TO DETECT ROTATION AND DILATION \nSaito et al. (1986) demonstrated that MT neurons are not selective for particular senses of \npattern rotation and dilation. but only for particular pattern translations (MT neurons will \nof course respond to a part of a large rotation or dilation that locally approximates the unit' s \ntranslational directional tuning). MT neurons in the present model do not develop this \nselectivity even when trained on rotating and dilating stimuli because of the smaller \ndivergence in the first layer (V 1) to second layer (MT) connection. The local views of \nrotations and dilations seen by MT are apparently noise-like enough that any second order \nselectivity is averaged out. A larger (unrealistic) divergence allows a few units to solve the \naperture problem and detect rotation and dilation in one step. \nTraining sets that contain many pure-translation stimuli along with the rotating and dilating \nstimuli fail to bring about the emergence of selectivity to senses of rotation and dilation \n(most units reliably detect only particular translations in this case). Satisfactory \nperformance is achieved only if the translating stimuli are on average smaller than the \nrotating and dilating stimuli. This may point to a regularity in the poorly characterized \nstimulus set that the real visual system experiences, and perhaps in this case. has come to \ndepend on for normal development. \n\nDISCUSSION \n\nThis exercise found a particularly simple solution to our problem that in retrospect should \nhave been obvious from fIrst principles. The present results suggest that this simple \nsolution is also easily learned with simple Hebb rule. Two points warrant discussion. \nFirst. this model achieves a reasonable degree of translational invariance in the detection of \nseveral simple kinds of pattern motion despite having weight structures that approximate a \nsimple centered template. Such a solution to approximately translationally invariant \n\n\f326 \n\nSereno and Sereno \n\npattern detection may be applicable, and more importantly, practically learnable, for other \nmore complex patterns, as long as the local features of interest vary reasonably smoothly \nand the pattern is not presented too far off-center. These constraints may characterize many \nfoveated objects. \n\nSecond, given that the tuning curves for particular stimulus features often change in a \ncontinuous fashion as one moves across the cortex (e.g., orientation tuning, direction \ntuning), there is likely to be a pervasive tendency in the cortex for receptive fields in higher \nareas to be constructed from subunits that receive strong connections from nearby cells in \nthe lower area. \n\nAcknowledgements \nWe thank Udo Wehmeier, Nigel Goddard, and David Zipser for help and discussions. \nNetworks and displays were constructed on the Rochester Connectionist Simulator. \n\nReferences \nAndersen, R, M. Graziano, and R Snowden (1990) Translational invariance and \n\nattentional modulation ofMST cells. Soc. Neurosci., Abstr. 16:7. \n\nDuffy, C.J. and RH. Wurtz (1990) Organization of optic flow sensitive receptive fields in \n\ncortical area MST. Soc. Neurosci., Abstr. 16:6. \n\nLinsker, R. (1986a) From basic network principles to neural architecture: emergence of \n\nspatial-opponent cells. Proc. Nat. Acad. Sci. 83, 7508-7512. \n\nLinsker, R (1986b) From basic network principles to neural architecture: emergence of \n\norientation-selective cells. Proc. Nat. Acad. Sci. 83, 8390-8394. \n\nMotter, B.C. and V.B. Mountcastle (1981) The functional properties of the light-sensitive \nneurons of the posterior parietal cortex studied in waking monkeys: foveal sparing and \nopponent vector organization. Jour. Neurosci. 1:3-26. \n\nMovshon, J.A., E.H. Adelson, M.S. Gizzi, and W.T. Newsome (1985) Analysis of moving \n\nvisual patterns. In C. Chagas, R Gattass, and C. Gross (eds.), Pattern Recognition \nMechanisms. Springer-Verlag, pp. 117-151. \n\nSaito, H., M. Yukie, K. Tanaka, K. Hikosaka, Y. Fukada and E. Iwai (1986) Integration of \n\ndirection signals of image motion in the superior temporal sulcus of the macaque \nmonkey. Jour. Neurosci. 6:145-157. \n\nSereno, M.E. (1987) Modeling stages of motion processing in neural networks. \nProceedings of the 9th Annual Cognitive Science Conference, pp. 405-416. \n\nSereno, M.l. (1988) The visual system. In l.W.v. Seelen, U.M. Leinhos, & G. Shaw (eds.), \n\nOrganization of Neural Networks. VCH, pp.176-184. \n\nSereno, M.l. (1989) Learning the solution to the aperture problem for pattern motion with \n\na hebb rule. In D.S. Touretzky (ed.), Advances in Neural Information Processing \nSystems I. Morgan Kaufmann Publishers, pp. 468-476. \n\nR. V. Spitz, J. Stiles-Davis & RM. Siegel. Infant perception of rotation from rigid \n\nstructure-from-motion displays. Soc. Neurosci., Abstr. 14, 1244 (1988). \n\n\f", "award": [], "sourceid": 366, "authors": [{"given_name": "Martin", "family_name": "Sereno", "institution": null}, {"given_name": "Margaret", "family_name": "Sereno", "institution": null}]}