{"title": "Emergence of Global Structure from Local Associations", "book": "Advances in Neural Information Processing Systems", "page_first": 1101, "page_last": 1108, "abstract": null, "full_text": "Emergence  of Global  Structure from \n\nLocal  Associations \n\nThea  B.  Ghiselli-Crippa \n\nDepartment of Infonnation Science \n\nUniversity of Pittsburgh \nPittsburgh  PA  15260 \n\nPaul  W.  Munro \n\nDepartment of Infonnation Science \n\nUniversity of Pittsburgh \nPittsburgh  PA  15260 \n\nABSTRACT \n\nA variant of the encoder architecture, where units at the input and out(cid:173)\nput layers represent nodes on a graph. is applied to the task of mapping \nlocations to  sets of neighboring locations. The degree to  which  the re(cid:173)\nsuIting  internal (i.e.  hidden  unit)  representations reflect global proper(cid:173)\nties of the environment depends upon several parameters of the learning \nprocedure. Architectural bottlenecks. noise. and incremental learning of \nlandmarks are shown to  be important factors in  maintaining topograph(cid:173)\nic relationships at a global scale. \n\n1  INTRODUCTION \n\nThe acquisition of spatial knowledge by exploration of an environment has been the sub(cid:173)\nject of several recent experimental studies. investigating such phenomena as the relation(cid:173)\nship between distance estimation and priming (e.g. McNamara et al .\u2022  1989) and the influ(cid:173)\nence of route infonnation (McNamara et al., 1984). Clayton and Habibi (1991) have gath(cid:173)\nered data suggesting that temporal contiguity during exploration is an  important factor in \ndetennining associations between spatially distinct sites.  This data supports the  notion \nthat spatial associations are built by a temporal process that is active during exploration \nand by extension supports Hebb's (1949) neurophysiological postulate that temporal as(cid:173)\nsociations underlie mechanisms of synaptic learning. Local spatial infonnation acquired \nduring the exploration process is continuously integrated into a global representation of \nthe environment (cognitive map). which is typically arrived at by also considering global \nconstraints. such as  low dimensionality. not explicitly represented  in  the local relation(cid:173)\nships. \n\n1101 \n\n\f1102 \n\nGhiselli-Crippa and Munro \n\n2  NETWORK  ARCHITECTURE  AND  TRAINING \n\nThe goal of this  network design is to reveal structure among the internal representations \nthat emerges solely from  integration of local spatial associations; in other words. to show \nhow a network trained to learn only local spatial associations characteristic of an environ(cid:173)\nment can develop internal representations which capture global spatial properties. A vari(cid:173)\nant of the encoder architecture (Ackley et al .\u2022  1985) is used to associate each node on a 2-\nD graph with the set of its neighboring nodes. as defined by the arcs in the graph. This 2-\nD neighborhood mapping task is  similar to the  I-D task explored by Wiles (1993) using \nan N-2-N architecture. which can be characterized in terms of a graph environment as  a \ncircular chain with broad neighborhoods. \n\nIn the neighborhood mapping experiments described in the following, the graph nodes are \nvisited at random: at each iteration, a training pair (node-neighborhood) is selected at ran(cid:173)\ndom from  the training set.  As in  the standard encoder task, the input patterns are all or-' \nthogonal. so that there is no structure in  the input domain that the network could exploit \nin  constructing  the  internal representations;  the only information about the  structure of \nthe  environment  comes  from  the  local  associations  that the  network  is  shown  during \ntraining. \n\n2.1  N\u00b7H\u00b7N  NETWORKS \n\nThe neighborhood mapping task was first studied using a strictly layered feed-forward N(cid:173)\nH-N architecture, where N is  the  number of input and output units. corresponding to the \nnumber of nodes in  the environment, and H is the number of units in  the single hidden \nlayer. Experiments were done using square grid environments with wrap-around (toroidal) \nand without wrap-around (bounded) at the edges. The resulting hidden unit representations \nreflect global properties of the environment to the extent that distances between them cor(cid:173)\nrelate with distances between corresponding points on the grid.  These two distance mea(cid:173)\nsures are plotted against one another in Figure  1 for toroidal and bounded environments. \n\n5x5 Grid \n\n4 Hidden Units \n\nU5\u00b7, . - - - - - - - - - - - : \" '1  \n\n1.4 \n\n1.2 \n1.0 \n\ne:: \no _ \n\u00b70 \n.!! \ne::  \u2022  06 \n:Ie:: \n~ lIS  0.6 \ng.~ a:c  0.4 \n0.2 \n\nO.ol---~----\"T\"\"\"---i \n\n3 \n\no \n\n1 \n\n2 \nGrid  Distance \n\nWith  wrap-around \n\n5x5  Grid \n\n4 Hidden Units \n\n2.0-,.----------......., \n\nR\"2  =  0.499 \n\n1.5 \n\ne:: \no \niii \ni\u00b71.0 \n.,0 \n\n!i \ni\"~ a:c 0.5 \n\n: \n\nO.O ...... _,....... __ ----r--__ -...-..-4 \n\n234  \n\n5 \n\no \n\nGrid  Distance \nNo  wrap-around \n\nFigure 1:  Scatterplots of Distances between Hidden Unit Representations vs. Distances \nbetween Corresponding Locations in the Grid Environment. \n\n\fEmergence of Global Structure from  Local Associations \n\n1103 \n\n2.2  N\u00b72\u00b7H\u00b7N  Networks \n\nA hidden layer with just two units forces representations into a 2-D space. which matches \nthe dimensionality of the environment.  Under this constraint. the image of the environ(cid:173)\nment in the 2-D space may reflect the topological structure of the environment. This con(cid:173)\njecture leads to a further conjecture that the 2-D representations will also reveal global re(cid:173)\nlationships of the environment.  Since the neighborhoods in a 2-D representation are not \nlinearly separable regions. another layer (H-Iayer) is introduced between the two-unit layer \nand  the  output (see  Figure  2).  Thus.  the  network  has  a  strictly  layered  feed-forward \nN-2-H-N architecture. where the N  units at the input and output layers correspond to the \nN  nodes in the environment. two  units make up  the  topographic layer. and H is  the num(cid:173)\nber of units chosen for  the new layer (H is estimated according to the complexity of the \ngraph). Responses for the hidden  units (in both  the T- and H-layers) are computed using \nthe hyperbolic tangent (which ranges from  -1  to +1). while the standard sigmoid (0 to +1) \nis  used  for  the output units.  to  promote orthogonality between representations  (Munro. \n1989).  Instead of the squared error. the cross entropy function (Hinton.  1987)  is  used to \navoid problems with low derivatives observed in early versions of the network. \n\n~ooe@o~oo \n\n.~ \n\n<: .\u2022 \u00b7\u00b7.\u00b7 \u2022\u2022\u2022 \u00b7.:.:.\u00b7 \u2022. 1.: :;  .::'.,..:  .............. . \nf \n:' <. :>/<::>. : \u2022. \u00b7 \u2022. 1 .\u2022. \u00b7\u2022\u00b7 .\u2022\u2022 \u00b7 .\u2022. \u00b7 \u2022. :\u00b7.\u00b7 \u2022\u2022. \u00b7 \u2022. :.\u00b7 \u2022. \u00b7 \u2022. : .\u2022\u2022. \u00b7.\u00b7: \u2022.\u2022. \u00b7! .. \u00b7 \u2022. i: \u2022. i.\\.::.j) \n\n' .. :.'/> .'><' \n\n00 \n\no \n\n2 \n\n3 \n\n5 \n\n6 \n\n7 \n\n8 \n\noooeooooo \n\nFigure 2:  A 3x3  Environment and the Corresponding Network. When input unit 3 is \nactivated, the network responds by activating the same unit and all its neighbors. \n\n3  RESULTS \n\n3.1  T\u00b7UNIT  RESPONSES \n\nNeighborhood mapping experiments were done using bounded square grid environments \nand N-2-H-N networks.  After training,  the topographic  unit activities corresponding  to \neach of the N possible inputs are plotted, with connecting lines representing the arcs from \n\n\f1104 \n\nGhiselli-Crippa and Munro \n\nthe  environment.  Each  axis  in  Figure  3  represents  the  activity  of one  of the  T-units. \nThese maps can be readily examined to study the relationship between their global struc(cid:173)\nture and the structure of the environment. The receptive fields of the T-units give an alter(cid:173)\nnative representation of the same data:  the response of each T-unit to  all N inputs is repre(cid:173)\nsented by N circles arranged in the same configuration as the nodes in  the grid environ(cid:173)\nment. Circle size is  proportional  to  the absolute value of the  unit activity;  filled  circles \nindicate negative values, open circles indicate positive values. The receptive field  repre(cid:173)\nsents the T-unit's sensitivity with respect to  the environment. \n\n\u2022 \n\n\u2022  0 \n\n\u2022\u2022\u2022 \n000 \n1:8 \n\n\u2022  \u00b70 \n\n\u2022\u2022\u2022 \n... \n~c8 \n\n26~ \n\n. \n\noCXX) \n0  00 \n\u2022\u2022\u2022\u2022 \n\n\u2022\u2022\u2022\u2022 \nleo o \n. \u00b08 \n\n.\u00b70 \n\n\u2022  00 \n\nFigure 3:  Representations at the Topographic Layer.  Activity plots and receptive fields \nfor two 3x3 grids (left and middle) and a 4x4 grid(right). \n\nThe two 3x3 cases shown in Figure 3 illustrate alternative solutions that are each locally \nconsistent, but have different global structure. In  the first case, it is evident how the first \nunit  is  sensitive  to changes in  the  vertical location of the  grid  nodes, while  the second \nunit is sensitive to their horizontal location. The axes are essentially rotated 45 degrees in \nthe second case. Except for this rotation of the reference axes, both representations cap(cid:173)\ntured the global structure of the 3x3 environment. \n\n3.2  NOISE  IN  THE  HIDDEN  UNITS \n\nWhile networks tended to fonn maps in the T -layer that reflect the global structure of the \nenvironment, in  some cases  the  maps showed correspondences that  were less obvious: \ni.e., the grid lines crossed, even though  the network converged.  A few  techniques  have \nproven valuable for promoting global correspondence between the topographic representa(cid:173)\ntions and the environment, including Judd and Munro's (1993)  introduction of noise as \npressure to separate representations. The noise is  implemented as a small probability for \n\n\fEmergence of Global Structure from Local Associations \n\n1105 \n\nreversing  the  sign  of  individual  H-unit  outputs.  As  reported  in  a  previous  study \n(Ghiselli-Crippa and Munro,  1994), the presence of noise causes the network to develop \ntopographic representations  which are more separated, and therefore more robust, so that \nthe correct output units can  be activated even if one or more of the H-units provides an \nincorrect output.  From  another point of view,  the  noise can  be  seen as  causing the  net(cid:173)\nwork to behave as if it had an effective number of hidden units which is smaller than the \ngiven  number H.  The introduction of noise  as  a  means  to  promote robust topographic \nrepresentations can be appreciated by examining Figure 4, which illustrates the represen(cid:173)\ntations of a 5x5 grid developed by a 25-2-20-25 network trained without noise (left) and \nwith  noise  (middle)  (the  network  was  initialized  with  the  same  set of small  random \nweights in all cases).  Note that the representations developed by the  network subject to \nnoise are more separated and exhibit the same global structure as the environment.  To \navoid convergence problems observed with the use of noise throughout the whole training \nprocess,  the  noise  can  be introduced at the beginning of training and then gradually re(cid:173)\nduced over time. \n\nA similar technique involves the use of low-level noise injected in the T-Iayer to directly \npromote the formation  of well-separated representations.  Either Gaussian or uniform \nnoise directly added to the T-unit outputs gives comparable results.  The use of noise in \neither hidden layer has a beneficial influence on the formation of globally consistent rep(cid:173)\nresentations.  However. since the noise in the H-units exerts only an indirect influence on \nthe T -unit representations, the choice of its actual  value seems to  be less crucial than  in \nthe case where the noise is directly applied at the T-Iayer. \n\nThe drawback for the use of noise is an  increase in  the number of iterations required by \nthe network to converge, that scales up with the magnitude and duration of the noise. \n\nFigure 4:  Representations at the  Topographic Layer.  Training  with  no  noise  (left)  and \nwith  noise in the hidden units (middle); training using landmarks (right). \n\n3.3  LANDMARK  LEARNING \n\nAnother effective method involves  the organization of training  in  2 separate phases,  to \nmodel  the  acquisition  of landmark information  followed  by  the development of route \nand/or survey knowledge (Hart and Moore,  1973; Siegel and White,  1975). This method \nis implemented by manipulating the training set during learning, using coarse spatial res(cid:173)\nolution at the outset and introducing interstitial features as learning progresses to the sec(cid:173)\nond phase. The first phase involves training the network only on a subset of the possible \n\n\f1106 \n\nGhiselli-Crippa and Munro \n\nN patterns (landmarks). Once the landmarks have been learned. the remaining patterns are \nadded to the training set. In the second phase. training proceeds as  usual with the full set \nof training  patterns;  the only  restriction  is applied  to  the landmark points.  whose  topo(cid:173)\ngraphical representations are not allowed to change (the corresponding weights between \ninput units and T-units are frozen). thus modeling the use of landmarks as stable reference \npoints when learning the details of a new environment. The right pane of Figure 4 illus(cid:173)\ntrates the representations developed for a 5x5 grid using landmark training; the same 25-2-\n20-25  network mentioned above  was  trained in  2 phases.  first on a subset of 9 patterns \n(landmarks) and then on the full set of 25  patterns (the landmarks are indicated as white \ncircles in the activity plot). \n\n3.4  NOISE  IN  LANDMARK  LEARNING \n\nThe techniques described above (noise and landmark learning) can be combined together to \nbetter promote the emergence of well-structured representation spaces. In particular, noise \ncan be used during the first phase of landmark learning to encourage a robust representa(cid:173)\ntion  of the  landmarks:  Figure  5  illustrates  the  representations  obtained  for  a  5x5  grid \nusing  landmark training  with  two different levels of noise in the  H-units during the first \nphase. The effect of noise is evident when comparing the 4 comer landmarks in the right \npane of Figure 4 (landmark learning with no noise) with those in Figure 5. With increas(cid:173)\ning  levels of noise.  the T-unit activities corresponding to the 4 comer landmarks approach \nthe asymptotic  values  of + 1 and  -1;  the  activity plots illustrate this  effect by  showing \nhow the comer landmark representations move toward the comers of T-space, reaching a \nconfiguration which provides more resistance to  noise. During the second phase of train(cid:173)\ning, the landmarks function as reference points for the additional features of the environ(cid:173)\nment and their positioning in  the representational space therefore becomes very  impor(cid:173)\ntant.  A well-fonned, robust representation of the landmarks at the end of the first phase \nis crucial for the fonnation of a map in  T-space that reflects global structure, and the use \nof noise can help promote this. \n\nFigure 5:  Representations at the Topographic Layer.  Landmark training  using  noise in \nphase  1:  low noise level (left). high noise level (right). \n\n4  DISCUSSION \n\nLarge scale constraints intrinsic to natural environments. such as low dimensionality, are \nnot necessarily reflected in local neighborhood relations, but they constitute infonnation \nwhich is essential to the successful development of useful representations of the environ-\n\n\fEmergence of Global Structure from Local Associations \n\n1107 \n\nment.  In our model, some of the constraints imposed on  the  network architecture effec(cid:173)\ntively reduce the dimensionality of the representational space. Constraints have been in(cid:173)\ntroduced several ways:  bottlenecks, noise, and landmark learning; in all cases, these con(cid:173)\nstraints have had constructive influences on the emergence of globally consistent repre(cid:173)\nsentation  spaces. The  approach  described  presents an  alternative to  Kohonen's  (1982) \nscheme for capturing topography;  here,  topographic relations emerge in  the representa(cid:173)\ntional space, rather than in the weights between directly connected units. \n\nThe experiments  described  thus  far  have  focused  on  how  global  spatial  structure  can \nemerge from  the integration of local associations and how  it is affected by the introduc(cid:173)\ntion  of global constraints.  As mentioned in  the introduction, one additional  factor influ(cid:173)\nencing the process of acquisition of spatial knowledge needs to  be considered:  temporal \ncontiguity during exploration. that is. how temporal associations of spatially adjacent lo(cid:173)\ncations can influence the representation of the environment.  For example, a random type \nof exploration (\"wandering\") can be considered. where the next node to be visited is select(cid:173)\ned at random  from  the neighbors of the current node.  Preliminary studies indicate that \nsuch temporal contiguity during training reSUlts  in the fonnation of hidden unit represen(cid:173)\ntations with  global properties qualitatively similar to those reported here.  Alternatively, \nmore directed exploration methods can be studied. with a systematic pattern guiding the \nchoice of the next node to be visited. The main purpose of these studies  will be to show \nhow different exploration strategies can affect the formation and the characteristics of cog(cid:173)\nnitive maps of the environment. \n\nHigher order effects of temporal and spatial contiguity can also be considered. However, \nin order to capture regularities in the training process that span several exploration steps. \nsimple feed-forward networks may no  longer be sufficient; partially recurrent networks \n(Elman, 1990) are a likely candidate for the study of such processes. \n\nAcknowledgements \n\nWe  wish to thank Stephen Hirtle, whose expertise in  the area of spatial cognition greatly \nbenefited our research.  We are also grateful for the insightful comments of Janet Wiles. \n\nReferences \n\nD.  H.  Ackley.  G.  E.  Hinton,  and  T.  J.  Sejnowski  (1985)  \"A  learning  algorithm  for \nBoltzmann machines,\" Cognitive Science,  vol.  9.  pp.  147-169. \n\nK.  Clayton and A.  Habibi (1991)  \"The contribution of temporal contiguity to  the spatial \npriming  effect,\"  Journal  of Experimental  Psychology:  Learning,  Memory,  and \nCognition.  vol.  17,  pp.  263-27l. \n\nJ. L. Elman  (1990)  \"Finding  structure  in  time,\"  Cognitive  Science,  vol.  14,  pp.  179-\n211. \n\nT.  B. Ghiselli-Crippa and P. W.  Munro (1994) \"Learning global spatial structures from \nlocal associations,\" in M. C Mozer, P.  Smolensky, D.  S.  Touretzky, J. L. Elman, and A. \nS.  Weigend  (Eds.),  Proceedings  of the  1993  Connectionist  Models  Summer  School, \nHillsdale, NJ: Erlbaum. \n\n\f1108 \n\nGhiselli-Crippa and Munro \n\nR.  A.  Hart and G. T.  Moore (1973) \"The development of spatial cognition: A review,\"  in \nR.  M.  Downs and Stea (Eds.), Image and Environment, Chicago, IL:  Aldine. \n\nD. O.  Hebb (1949) The Organization of Behavior, New York, NY:  Wiley. \n\nG.  E.  Hinton  (1987)  \"Connectionist learning  procedures,\"  Technical Report CMU-CS-\n87-115,  version  2,  Pittsburgh,  PA:  Carnegie-Mellon  University,  Computer  Science \nDepartment. \n\nS. Judd and P.  W. Munro (1993) \"Nets with unreliable hidden nodes learn error-correcting \ncodes,\"  in  C.  L.  Giles,  S.  J.  Hanson, and J.  D.  Cowan, Advances in Neural Information \nProcessing Systems 5, San Mateo, CA:  Morgan Kaufmann. \n\nT.  Kohonen  (1982)  \"Self-organized  fonnation  of topological  correct  feature  maps,\" \nBiological Cybernetics,  vol.  43, pp.  59-69. \n\nT.  P. McNamara, J. K.  Hardy, and S.  C.  Hirtle  (1989)  \"Subjective  hierarchies in  spatial \nmemory,\"  Journal of Experimental Psychology: Learning, Memory,  and Cognition,  vol. \n15,  pp.  211-227. \n\nT.  P.  McNamara,  R.  Ratcliff,  and  G.  McKoon  (1984)  \"The  mental  representation  of \nknowledge  acquired  from  maps,\"  Journal  of Experimental  Psychology:  Learning, \nMemory, and Cognition,  vol.  10,  pp.  723-732. \n\nP.  W.  Munro  (1989)  \"Conjectures  on  representations  in  backpropagation  networks,\" \nTechnical Report TR-89-035, Berkeley, CA: International Computer Science Institute. \n\nA.  W.  Siegel  and  S.  H.  White  (1975)  \"The development of spatial representations  of \nlarge-scale environments,\"  in  H.  W.  Reese  (Ed.),  Advances in  Child Development and \nBehavior, New  York, NY:  Academic Press. \n\nJ.  Wiles  (1993)  \"Representation  of variables  and  their  values  in  neural  networks,\"  in \nProceedings  of the  Fifteenth  Annual  Conference  of the  Cognitive  Science  Society, \nHillsdale, NJ:  Erlbaum. \n\n\f", "award": [], "sourceid": 852, "authors": [{"given_name": "Thea", "family_name": "Ghiselli-Crippa", "institution": null}, {"given_name": "Paul", "family_name": "Munro", "institution": null}]}