{"title": "A unified theory for the origin of grid cells through the lens of pattern formation", "book": "Advances in Neural Information Processing Systems", "page_first": 10003, "page_last": 10013, "abstract": "Grid cells in the brain fire in strikingly regular hexagonal patterns across space. There are currently two seemingly unrelated frameworks for understanding these patterns. Mechanistic models account for hexagonal firing fields as the result of pattern-forming dynamics in a recurrent neural network with hand-tuned center-surround connectivity. Normative models specify a neural architecture, a learning rule, and a navigational task, and observe that grid-like firing fields emerge due to the constraints of solving this task. Here we provide an analytic theory that unifies the two perspectives by casting the learning dynamics of neural networks trained on navigational tasks as a pattern forming dynamical system. This theory provides insight into the optimal solutions of diverse formulations of the normative task, and shows that symmetries in the representation of space correctly predict the structure of learned firing fields in trained neural networks. Further, our theory proves that a nonnegativity constraint on firing rates induces a symmetry-breaking mechanism which favors hexagonal firing fields. We extend this theory to the case of learning multiple grid maps and demonstrate that optimal solutions consist of a hierarchy of maps with increasing length scales. These results unify previous accounts of grid cell firing and provide a novel framework for predicting the learned representations of recurrent neural networks.", "full_text": "A uni\ufb01ed theory for the origin of grid cells through\n\nthe lens of pattern formation\n\nBen Sorscher*1, Gabriel C. Mel*2, Surya Ganguli1, Samuel A. Ocko1\n\n1Department of Applied Physics, Stanford University\n2Neurosciences PhD Program, Stanford University\n\nAbstract\n\nGrid cells in the brain \ufb01re in strikingly regular hexagonal patterns across space.\nThere are currently two seemingly unrelated frameworks for understanding these\npatterns. Mechanistic models account for hexagonal \ufb01ring \ufb01elds as the result of\npattern-forming dynamics in a recurrent neural network with hand-tuned center-\nsurround connectivity. Normative models specify a neural architecture, a learning\nrule, and a navigational task, and observe that grid-like \ufb01ring \ufb01elds emerge due to\nthe constraints of solving this task. Here we provide an analytic theory that uni\ufb01es\nthe two perspectives by casting the learning dynamics of neural networks trained\non navigational tasks as a pattern forming dynamical system. This theory pro-\nvides insight into the optimal solutions of diverse formulations of the normative\ntask, and shows that symmetries in the representation of space correctly predict\nthe structure of learned \ufb01ring \ufb01elds in trained neural networks. Further, our theory\nproves that a nonnegativity constraint on \ufb01ring rates induces a symmetry-breaking\nmechanism which favors hexagonal \ufb01ring \ufb01elds. We extend this theory to the case\nof learning multiple grid maps and demonstrate that optimal solutions consist of a\nhierarchy of maps with increasing length scales. These results unify previous ac-\ncounts of grid cell \ufb01ring and provide a novel framework for predicting the learned\nrepresentations of recurrent neural networks.\n\n1\n\nIntroduction\n\nHow does the brain construct an internal map of space? One such map is generated by grid cells in\nthe medial entorhinal cortex (MEC), which exhibit regular hexagonal spatial \ufb01ring \ufb01elds, forming a\nperiodic, low-dimensional representation of space [1]. Grid cells are clustered into discrete modules\nsharing a periodicity and an orientation, but varying randomly in phase [1, 2]. A complementary\nmap is generated by place cells in the adjacent hippocampus, which exhibit localized spatial \ufb01ring\n\ufb01elds, forming a sparser representation of space [3].\nEarly mechanistic models of grid cells corresponded to recurrent neural networks (RNNs) with\nhand-tuned connectivity designed speci\ufb01cally to reproduce hexagonal grid cell \ufb01ring patterns [4, 5,\n6]. Such continuous attractor models can robustly integrate and store 2D positional information via\npath integration [7]. Recent enhancements to such attractor networks that incorporate plastic inputs\nfrom landmark cells can explain why grid cells deform in irregular environments [8], and when\nthey either phase shift or remap in altered virtual reality environments [9]. However, none of these\nrecurrent network models show that grid-like \ufb01ring patterns are required to solve navigational tasks.\nThus they cannot demonstrate that hexagonal \ufb01ring patterns naturally arise as the optimal solution\nto any computational problem, precisely because the hexagonal patterns are simply assumed in the\n\ufb01rst place by hand-tuning the recurrent connectivity.\nMore recent normative models have shown that neural networks trained on tasks that involve encod-\ning a representation of spatial position learn grid-like responses in their hidden units. For example,\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f[10] found that the weights of a one layer neural network, trained via Oja\u2019s rule on simulated place\ncell inputs, learned grid cells with square grid \ufb01ring patterns. When a non-negativity constraint\nwas imposed, these grids became hexagonal. In [11], the eigenvectors of the graph Laplacian for\na navigational task were shown to be square-like grids. [12] showed that learning basis functions\nfor local transition operators also yields square grids. [13, 14] trained recurrent neural networks on\na path integration task, and observed that the hidden units developed grid-like patterns. [13] found\nsquare grids in square environments and hexagonal grids in triangular environments. [14] claimed\nto \ufb01nd hexagonal grids, though their resulting patterns had substantial heterogeneity. While these\nnormative models hint at the intriguing hypothesis that grid like representations in the brain may\narise as an inevitable consequence of solving a spatial navigation task , these models alone do not\noffer any theoretical clarity on when and why grid cell patterns emerge from such navigational so-\nlutions, and if they do, why they are sometimes square, sometimes hexagonal, or sometimes highly\nheterogeneous 1\nHere we provide such theoretical clarity by forging an analytic link between the learning dynamics\nof neural networks trained on navigational tasks, to a central, unifying pattern forming dynamical\nsystem. Our theory correctly predicts the structure and hierarchy of grid patterns learned in diverse\nneural architectures, and proves that nonnegativity is just one of a family of interpretable representa-\ntional constraints that promotes hexagonal grids. Furthermore, this theory uni\ufb01es both mechanistic\nand normative models, by proving that the learning dynamics induced by optimizing a normative\nposition encoding objective is equivalent to the mechanistic pattern forming dynamics implemented\nby a population of recurrently connected neurons.\n\n2 Optimally encoding position yields diverse grid-like patterns\n\nTo study the space of solutions achieved by the normative models outlined above, we train a variety\nof neural network architectures on navigational tasks, reproducing the work of [10, 13, 14] (Fig.\n1A). We simulate an animal moving in a square box, 2.2m on a side, and record the activity of\nnP simulated place cells tiled randomly and uniformly throughout the environment (Fig. 1B). We\ncollect the place cell activations at nx locations as the animal explores its environment in a matrix\nP 2 Rnx\u21e5nP . We then train the following network architectures on the following tasks:\n\n1. 1-layer NN. Following [10], we train a single layer neural network to perform unsuper-\nvised Hebbian learning on place cell activations P as inputs. Hidden unit representations\nare made orthogonal by a generalized Hebbian algorithm similar to Gram-Schmidt orthog-\nonalization (see [10] for details). This learning procedure is equivalent to performing PCA\non place cell inputs.\n\n2. RNN. We train an RNN to encode position by path integrating velocity inputs. At each time\nstep, the network receives the animal\u2019s 2-dimensional velocity ~v(t) as input. The velocity\nsignal is integrated by the network\u2019s nG recurrently connected units, and the network\u2019s\ncurrent position representation is linearly read out into a layer of estimated place cells.\nThis approach is identical to that used in [13], except that our RNNs are trained to encode\nposition by encoding in its outputs a place cell representation rather than a 2D vector of\nCartesian (x, y) position.\n\n3. LSTM. We train a signi\ufb01cantly more complex LSTM architecture on the same path inte-\ngration task as in 2, reproducing the work of [14]. The \u201dgrid cells\u201d in this architecture are\nnot recurrently connected, but reside in a dense layer immediately following the LSTM.\nThe grid cells are also subject to dropout at a rate of 0.5. We train both with and without\nthe additional objective of integrating head direction inputs, and obtain qualitatively similar\nresults.\n\nRemarkably, in each case the networks learn qualitatively similar grid-like representations (Fig. 1C-\nE). We observe that the structure of the grid patterns depends sensitively on the shape of the place\ncell tuning curves. We \ufb01rst train with Gaussian place cell tuning curves of size 400cm2, and \ufb01nd\nthat each network develops regular square grid patterns (Fig. 1C), like those in [13]. We next train\nwith a center-surround place cell tuning curve, like that used in [10], and \ufb01nd that each network\n\n1Though several works have shown that if grid cells do obey a lattice-like structure, hexagonal lattices are\n\nbetter than other lattice types for decoding position under noise [15] and are more economical [16].\n\n2\n\n\fGPWP/RNN/LSTMCDEBARNNLSTMUnconstrainedNonnegative1-layer NNPlace cell centersSimulated trajectoryUnconstrainedFigure1:Neuralnetworkstrainedonnormativetasksdevelopgrid-like\ufb01ring\ufb01elds.(A)Aschematicofthepositionencodingobjective.Dependingonthetask,Gmayreceiveexternalinputsdirectlyfromplacecells(asin[10]),orfromanRNNoranLSTMthatinturnonlyreceivesvelocityinputs(asin[13,14]).Inthelattercase,therecurrentnetmustgeneratedhiddenrepresentationsGthatconvertvelocityinputstoaplacecodePthroughasetoftrainedread-outweightsW.(B)Left:simulatedanimaltrajectory.Right:theplacecellcenters(dots)ofthedesiredplacecodePin(A)uniformlyandisotropicallytiletheenvironment.Bluetoredindicatelowtohigh\ufb01ringrateswhentheanimalisatthelocationontheleft.(C-E)Fromlefttoright,wetrainasinglelayerneuralnetwork,anRNN,andanLSTMonplacecelloutputs,reproducingtheresultsof[10,13,14].C)Whentheplacecellreceptive\ufb01eld(left)isbroad,allnetworkslearnsquaregrids.D)Whentheplacecellreceptive\ufb01eldisadifferenceofgaussians,allnetworkslearnamorphous,quasi-periodicpatterns.E)Whenanonnegativityconstraintisimposedonhiddenunitactivations,allnetworksnowlearnregularhexagonalgrids.developsamorphous,quasi-periodicpatterns(Fig.1D).Thesepatternshavegrid-likefeatures,andoccasionallyappearhexagonal,muchlikethepatternsfoundin[14].Wealsonotethatweobtainedsimilaramorphouspatternswhenwetrainedwiththetuningcurvesusedin[14],whichareextremelysharplypeaked(1cm2),sothattheplacecellcodeisapproximatelyone-hot(Fig.6).Anon-negativityconstraintinduceshexagonalgrids.[10]observedthatimposinganon-negativityconstraintontheoutputoftheir1-layerneuralnetworkchangedthelearnedgridpatternsfromsquaretohexagonal.However,sincesuchfeedforwardnetworksonlyconvertplacecellinputstooutputslearnedinanunsupervisedmanner,itisaprioriunclearhownon-negativitymightimpacttheinternalrepresentationsofrecurrentneuralnetworkstrainedtoconvertvelocityinputstoplacecelloutputs.Toinvestigatewhetherthesameconstraintwouldalterthestructureofthepatternsob-servedintheotherarchitecturesandinmorecomplexnavigationaltasks,weretrainallarchitectureswhileimposingnon-negativityontheactivationsofhiddenunitsthatultimatelygiverisetogridcells.We\ufb01ndthatthisconstraintconsistentlyyieldsregularhexagonalgridsineacharchitecture(Fig.1E).3\fABFDEHIGCFigure2:Patternformationtheorypredictsstructureoflearnedrepresentations.A,B)Visualizationof\u02dc\u2303and\u02dc\u2303\u21e4.D-E)Thetopsubspaceof\u02dc\u2303\u21e4isadegeneratering.Absentanyotherconstraint,patternformingdynamicswillyieldarbitrarycombinationsofFouriermodesonthisring.B)Whentheringisclosetozero,only90combinationsofmodesareavailableduetodiscretizationeffects,yieldingsquarelattices.C)Whentheringislarger,manydegeneratemodesareavailable.E)ThesoftenednonnegativityconstraintofEq.6inducesathree-bodyinteractionbetweentripletsofspatialfrequencieswhichaddtozero.Withinthetopsubspaceof\u02dc\u2303\u21e4,theseformanequilateraltriangle,yieldingahexagonallattice.F)Thesumofthreeplanewavesat60interferestoformahexagonallatticeifallwavesareinphase.G-I)Numericalsimulationofpatternformingdynamics(Eq.5).G)When\u02dc\u2303ispeakednearzero,patternformingdynamicsyieldsquaregrids.H)When\u02dc\u2303ispeakedfarfromzero,patternformingdynamicsyieldquasi-periodiclatticescomprisedofmanymodes.I)Asoftenednonnegativityconstraintinducesaregularhexagonallattice.InsetsinG)-I)2DFouriertransformsofthelearnedmaps:4peaks,widelydistributedactivity,and6hexagonallydistributedpeaks,respectively.Thiscollectionofresultsraisesfundamentalscienti\ufb01cquestions.Whydothesediversearchitectures,acrossdiversetasks(bothnavigationandautoencoding),allconvergetoagrid-likesolution,andwhatgovernsthelatticestructureofthissolution?Weaddressthisquestionbynotingthatalloptimizationproblemsinnetworks1-3containwithinthemacommonsub-problem,whichwecallthepositionencodingobjective:selectinghiddenresponsesGandlinearlycombiningthemwithreadoutweightsWinordertopredictplacecellresponsesP(Fig.1A).Wefurthershowthatduetothetranslation-invarianceofplacecellresponses,thelearningdynamicsofthispositionencodingobjectivecanbeformulatedasapatternformingdynamicalsystem,allowingustounderstandthenatureandstructureoftheresultantgrid-likesolutionsandtheirdependenceonvariousparameters.3PatternformationtheorypredictsstructureoflearnedrepresentationsThecommonpositionencodingsub-problemidenti\ufb01edintheprevioussectioncanbemathemati-callyformulatedasminimizingthefollowingobjectivefunctionE(G,W)=kP\u02c6Pk2F,where\u02c6P=GW.(1)HereP2Rnx\u21e5nPrepresentstrueplacecellactivations,wherePx,iistheactivationofplacecelliatspatialindexx.G2Rnx\u21e5nGrepresentshiddenlayeractivations(whichwilllearngrid-likerepresentations),whereGx,jistheactivationofhiddenunitjatspatialindexx.W2RnG\u21e5nPrepresentslinearreadoutweights,whereWjiisthecontributionofgridcelljtoplacecelli.\u02c6P2Rnx\u21e5nPrepresentsthepredictionsoftheplacecellencodingsystem.Forsimplicity,weconsideranL2penaltyonencodingerrors.BecauseweareultimatelyinterestedinthehiddenunitactivationsG,wereplaceWwithitsoptimumvaluefor\ufb01xedG(seeApp.B.1fordetails):argminWE(G,W)=(GTG)1GTP.(2)4\fThe objective E is unchanged by any transformation of the form G ! GZ, W ! Z1W . In partic-\nular, we can simplify our objective by choosing Z so that G\u2019s columns are orthonormal. Enforcing\nthis constraint via Lagrange multipliers, we obtain the following Lagrangian\n\nL = Tr\u21e5GT \u2303G (GT G I)\u21e4 ,\n\n(3)\nwhere \u2303= P P T is the nx\u21e5nx correlation matrix of place cell outputs. Note that assuming the place\ncell receptive \ufb01elds uniformly and isoptropically cover space, \u2303 will, in the limit of large numbers\nof place cells, be approximately translation invariant (i.e. Toeplitz) and circularly symmetric. This\nLagrangian is optimized when G\u2019s columns span the top nG eigenvectors of \u2303 (Eckart-Young-\nMirsky, see App. B.2), and is invariant to a unitary transformation G ! GU. Moreover, since \u2303\nis a Toeplitz matrix, the eigenvectors of \u2303 are approximately plane waves. Thus the optimization\nin (3) yields arbitrary linear combinations of different plane wave eigenmodes of \u2303 corresponding\nto the nG largest eigenvalues. However, this multiplicity of solutions is a special feature due to the\nlack of any further constraints, like non-negativity. As we\u2019ll see below, once a nonlinear constraint,\nlike non-negativity, is added, this multiplicity of solutions disappears, and the optimization favors a\nsingle type of map corresponding to hexagonal grid cells.\n\n3.1 Single-cell dynamics\n\nTo build intuition, we begin by studying the case of a single encoding cell g 2 Rnx and difference\nof Gaussian place cell tuning. The Lagrangian for this cell is given by\n\nGradient ascent on this objective function at \ufb01xed yields the dynamics\n\nL = gT \u2303g + (1 gT g).\n\nd\ndt\n\ng = g +\u2303 g.\n\n(4)\n\n(5)\n\nThis is a pattern forming dynamics in which the \ufb01ring \ufb01elds at two positions gx and gx0 mutually\nexcite (inhibit) each other if the spatial autocorrelation \u2303xx0 of the desired place cell code at the\ntwo positions x and x0 is positive (negative). Under this dynamics, patterns corresponding to the\neigenmodes of largest eigenvalue of \u2303 grow the fastest, with an exponential growth-rate given by\nthe corresponding eigenvalue. In actuality, to solve the constrained optimization problem in (4) we\nrun a projected gradient ascent algorithm in which we iteratively project g to the constraint surface\ngT g = 1. Such a dynamics converges to a linear combination of degenerate eigenmodes of \u2303 all of\nwhom share the same maximal eigenvalue. Since \u2303 is translation invariant and circularly symmetric,\nthese corresponding eigenmodes are linear combinations of plane waves whose wave-vectors lie on\na ring in Fourier space whose radius k\u21e4 is determined by the top eigenvalue-eigenvector pair of \u2303.\nIn Fig. 2 we plot the eigenvalue associated to each plane wave as a function of its wave-vector\nfor a Gaussian (A) and difference of Gaussian (B) place cell tuning curve. For Gaussian tuning,\noptimal wave-vectors, corresponding to the largest eigenvalues of \u2303, lie close to the origin, while\nfor difference of Gaussians tuning with covariance \u2303\u21e4, the optimal wave-vectors, corresponding to\nthe largest eigenvalues of \u2303\u21e4, are concentrated on a ring of radius k\u21e4 in Fourier-space, far from the\norigin.\nConsistent with this analysis and the results of Fig. 1, numerical simulations of pattern forming\ndynamics corresponding to optimizing (4) yield quasi-periodic patterns like those in (Fig. 2G,H).\nIn simulations, the \ufb01nite box size discretizes Fourier space onto a lattice. Thus, numerical solutions\nwill consist of discrete combinations of plane waves with wave-vectors of radius k\u21e4. The lowest\nnonzero Fourier modes occur at 0 and 90 on the Fourier lattice. Therefore, when \u2303\u2019s spectrum\nis peaked near the origin as in the case of Gaussian place cell tuning (Fig. 2C), solutions will\nbe dominated by square grids like those found in [10, 13]. Rings further from the origin may\noccasionally intersect six lattice points, \u201caccidentally\u201d yielding hexagonal grids like those observed\nin [14]. However, as the difference of Gaussian case shows, in general, optimal patterns can contain\nany mixture of wavevectors from a ring (Fig 2D), giving rise to amorphous patterns (Fig 2H; inset\nshows Fourier power is distributed over whole ring). Indeed, the encoding objective considered\nabove does not prefer one type of lattice to another. As we will see, adding a nonnegativity constraint\nto our objective breaks this degeneracy, and reliably picks out hexagonal solutions.\n\n5\n\n\f3.2 A nonnegativity constraint favors hexagonal grids\n\nWe have seen empirically that a nonnegativity constraint tends to produce hexagonal grids (Fig.\n1E). To understand this effect, we add a softened nonnegativity constraint to our objective function\nas follows\n\nL = gT \u2303g + (1 gT g) + (g),\n\n(6)\n\nwhere (g) penalizes negative activites in the map g. It will be convenient to write gx as g(~x),\ntreating g as a scalar \ufb01eld de\ufb01ned for all points in space. Our objective then takes the form\n\nL[g(~x)] =ZZ~x, ~x0\n\ng(~x)\u2303(~x ~x0)g(~x0) + \u27131 Z~x\n\ng2(~x)\u25c6 +Z~x\n\n(g(~x)).\n\n(7)\n\nWe can approximate the negativity penalty by Taylor expanding about 0: (g) \u21e1 0 + 1g + 2g2 +\n3g3. Our Lagrangian then has a straightforward form in Fourier space\n\n\u02dcL[\u02dcg(~k)] \u21e1Z~k |\u02dcg(~k)|2 \u02dc\u2303(~k) + \u02dc\u27131 Z~k |\u02dcg(~k)|2\u25c6\n\n+ [0 + 1\u02dcg(~0) + 2Z~k |\u02dcg(~k)|2 + 3ZZZ~k,~k0,~k00\n\n\u02dcg(~k)\u02dcg(~k0)\u02dcg(~k00)(~k + ~k0 + ~k00)].\n\n(8)\n\n0, 1, and 2 will not qualitatively change the structure of the solutions. 0 simply shifts the\noptimal value of L, but not its argmax; 1 controls the amount of the constant mode in the maps,\nand does not affect their qualitative shape; and 2 can be absorbed into \u02dc [17]. Critically, however,\nthe cubic term 3 introduces an interaction between wavevector triplets ~k, ~k0, ~k00 whenever the three\nsum to zero (Fig. 2E).\nIn the limit of weak 3, the maps will be affected in two separate ways. First, weak 3 will pull the\nmaps slightly outside of the linear span of the optimal plane-waves, or eigenmodes of \u2303 of largest\neigenvalue. As 3 ! 0, this effect shrinks and effectively disappears, so that we can assume the\noptimal maps are still constrained to be linear combinations of plane waves, with wave-vectors on\nthe same ring in Fourier space. The second, stronger effect is due to the fact that no matter how small\n3 is made, it will break L\u2019s symmetry, effectively forcing it to choose one solution from the set of\npreviously degenerate optima. Therefore, in the limit of small 3, we can determine the optimal\nmaps by considering which wavevector mixture on the ring of radius k\u21e4 maximizes the nonlinear\nterm\n\nLint =ZZZ~k,~k0,~k00\n\n\u02dcg(~k)\u02dcg(~k0)\u02dcg(~k00)(~k + ~k0 + ~k00).\n\n(9)\n\nSubject to the normalization constraint R |\u02dcg(~k)|2 = 1, this term is maximized when \u02dcg(~k) =\n1p6P3\ni=1 (~k ~ki) + c.c.2, where ~k1 + ~k2 + ~k3 = 0. The only combination of ~k1, ~k2, ~k3 on\nthe ring of radius |k\u21e4| that sums to zero is an equilateral triangle (Fig. 2E). Therefore, rather than\narbitrary linear combinations of plane waves as in Eq. 1, the optimal solutions consist of three plane\nwaves with equal amplitude and wavevectors that lie on an equilateral triangle.\n\ng(~x) =\n\n1\np6\n\n(ei~k1\u00b7~x+1 + ei~k2\u00b7~x+2 + ei~k3\u00b7~x+3 + c.c.).\n\n(10)\n\nThe interaction Lint is maxizimed when 1 + 2 + 3 = 0, in which case the three plane waves\ninterfere to form a regular hexagonal lattice (Fig. 2F).\n\n2Note that c.c. is shorthand for complex conjugate. For any real solution g(~x) to Eq. 6, \u02dcg(~k) = \u02dcg\u2020(~k).\n\nTherefore, for each wavevector ~k we must also include its negative, ~k.\n\n6\n\n\fWe can optimize the above Lagrangian using the same pattern forming dynamics as in Eq. 5, under\nthe nonnegativity constraint de\ufb01ned above3. When we perform numerical simulations of this\ndynamics, we \ufb01nd regular hexagonal grids (Fig. 2I). Taking the 2D Fourier transform of the resulting\npattern reveals that the nonnegativity constraint has picked out three wavevectors oriented at 60\nrelative to one another (and their negatives) from the optimal ring of solutions (Fig. 2I, inset).\n\n3.3 Hexagonal grids and g ! g symmetry breaking\nWe see from the above argument that the recti\ufb01cation nonlinearity is but one of a large class of\nnonlinearities which will favor hexagonal grids. A generic nonlinearity with a non-trivial cubic term\nin its Taylor expansion will break the g ! g symmetry, and introduce a three-body interaction\nwhich picks out hexagonal lattices. While nonnegativity is a speci\ufb01c nonlinearity motivated by\nbiological considerations, a broad class of nonlinearities will achieve the same effect (numerical\nsimulations in Appendix A1).\n\n4 Multiple cells & nonnegativity yields hierarchies of hexagonal grids\n\nWe now return to the full Lagrangian with multiple grid cells (Eq. 3),\n\nL = Tr GT \u2303G +Xij\n\nij(I GT G)ij + (G).\n\n(11)\n\nRecall that the solution space of the unperturbed objective with = 0 corresponds to maps G whose\nwavevectors fall on a series of concentric rings in Fourier space, \u02dc\u2303top (Fig. 3B). By symmetry of L,\nany unitary mixture of such a set of maps, G ! GU, will perform equally well. The nonlinearity \nthen breaks U-symmetry and selects for speci\ufb01c mixtures of wavevectors. As before, the cubic term\n3 induces a three-body interaction which promotes triplets of wavevectors that sum to zero.\nIf the number of maps to be learned nG is small enough that the top subspace rings \u02dc\u2303top form a rela-\ntively thin annulus in Fourier space, then all wavevector triplets that sum to 0 will be approximately\nequilateral, giving rise to regular hexagonal grid maps. Once the number of maps to be learned\nis large, the top subspace rings will form a thick annulus, inside of which many different triplet\nwavevector arrangements - not just equilateral triangles - will sum to 0. Despite this, a signi\ufb01cant\nfraction of maps learned in simulations are still hexagonal. As a \ufb01rst step toward understanding\nthese results, we analyze a few possible non-equilateral triplets and show that the optimum has a\ndominant equilateral component.\nOne simple non-equilateral arrangement is any triplet of the form (~k, ~k,2~k), corresponding to a\nstripe pattern with its \ufb01rst overtone. In Appendix B.3, we prove that such an arrangement contributes\nat most Lcoupled\n=\n2\u21e52\u21e53!/63/2 \u21e1 1.63. Another possibility is a hybrid map consisting of a mixture of two equilateral\ntriangles, one twice as big as the other. We prove that the optimal mixture puts most weight on either\nthe big or the small triangle, making the optimal solutions relatively pure-tone hexagonal maps.\nEmpirically, we can optimize the unperturbed multiple grid cell Lagrangian in Eq. 3 using the\npattern forming dynamics\n\n= 3/2 to the Lagrangian, whereas an equilateral triangle contributes Ldecoupled\n\nint\n\nint\n\nd\ndt\n\nG = G +\u2303 G\n\n(12)\n\nand enforcing orthonormality of G. Introducing the nonnegativity constraint, we obtain dominantly\nhexagonal maps across multiple spatial scales (Fig. 3D).\nHistorically, the roughly constant ratio of grid scale in neighboring MEC modules has led to interest\nin geometric hierarchies - both which kinds of encoding objectives favor them, and which pattern\nforming dynamics produce them. During simulations, we \ufb01nd that lattice discretization effects can\nsometimes create the illusion of a geometric hierarchy (3E). Roughly speaking, if \u02dc\u2303 is peaked at the\norigin, as the number of encoding maps is increased, wavevector rings are \ufb01lled up one by one. Due\n3In App. B.4 we prove that the dynamics satis\ufb01es KKT optimality conditions for the Lagrangian in Eq. 6.\n\n7\n\n\fFigure 3: A) Visualization of \u02dc\u2303. B) Top subspace of \u2303 when multiple grid cells are available. Absent\nany other constraint, there is a full rotational degree of freedom within this space. C) A cubic term\nin the nonlinearity induces a three-body interaction between triplets of spatial frequencies which\nadd to zero. D) Results of multi-grid pattern forming dynamics with nonnegativity constraint show\nregular hexagonal grids across multiple spatial scales. E) Left: distribution of grid scales for pattern\nforming dynamics with Gaussian place cell tuning curve. This distribution can create the illusion of\ngeometric hierarchy, but is due to discretization restricting the lowest frequency modes to the lattice.\nRight: (top) First \ufb01ve available spatial frequencies and (bottom): their corresponding wavevectors.\n\nThree of the \ufb01rst four lattice spacings are separated by a ratio of p2 \u21e1 1.4.\n\nto the geometry of the lattice points in wave-vector space, the \ufb01ve wave-vectors of smallest length\n(i.e. smallest spatial frequency) will have relative length ratios given by (1,p2,p2\n).\nThese correspond exactly to the peaks we observe in 3E. Because these particular scale relationships\nare strongly dependent on discretization effects and boundary conditions - both of which arise from\nan ad hoc modelling decision to use a square, periodic environment - it is not clear that this effect in\nthe model is a potentially likely explanation for the apparent geometric hierarchy of maps in MEC.\n\n,p5,p2\n\n2\n\n3\n\n5 Unifying normative and mechanistic models of grid cells\n\nThis theory establishes a remarkable equivalence between optimizing the position encoding objec-\ntive, and the recurrent neural network dynamics of continuous attractor models. To see this, consider\na closely related single-cell Lagrangian \u00afL obtained by adding a constraint on the sum of the cell ac-\ntivations:\n\n\u00afL = gT \u2303g gT g + \u00b51T g\n\n(13)\n\nEmpirically, we \ufb01nd that that this sum constraint minimally affects the structure of the optimal\npatterns. We can optimize this objective for nonnegative g by stepping along \u00afL\u2019s gradient and\nrectifying the result. In the limit of small step size, this becomes\n\nd\ndt\n\ng =\u21e2g +\u2303 g + \u00b5\ng + \u21e5\u2303g + \u00b5\u21e4\n\n8\n\ng > 0\ng = 0\n\n(14)\n\nABDECGrid scale (cm)Probability density1.41.41.46090120150210180240\fwhere is the rectifying nonlinearity. Interestingly, this is almost exactly the dynamics proposed in\ncontinuous attractor models of grid cells [4, 5, 7, 6], with appropriate choice of time constant \u2327 and\nscaling of the recurrent weights J and feedforward drive b:\n\n\u2327\n\nd\ndt\n\ng = g + [Jg + b].\n\n(15)\n\nWe prove in App. B.4 and B.5 that the dynamics of Eq. 14 and Eq. 15 have identical \ufb01xed points\nwhich satisfy KKT optimality conditions for the constrained position encoding objective. In this uni-\n\ufb01cation of normative and mechanistic models, the spatial autocorrelation structure \u2303 of the desired\nplace cell code in a normative position encoding model corresponds to the recurrent connectivity J\nof the mechanistic RNN model. While the normative model learns grid cell \ufb01ring \ufb01elds as a func-\ntion of space by minimizing an encoding objective, the mechanistic RNN model generates stable\nperiodic bump patterns on a sheet of neurons by choosing translation invariant connectivity between\nneurons on the sheet. Our theory predicts that if the neural nonlinearity ReLU in a mechanistic\nmodel breaks \ufb01ring rate inversion symmetry g ! g, then periodic patterns on the neural sheet\nshould be hexagonal. Historically, a rectifying nonlinearity has indeed been used in mechanistic\nmodels, and hexagonal grids have emerged. Consistent with our theory, other nonlinearities that\npreserve the \ufb01ring rate inversion symmetry yield square grids (see Appendix A1).\nThis unifying connection between normative and mechanistic models yields an intuitive insight:\ncontinuous attractor dynamics not only reproduce the patterns of activity observed in the MEC; they\nare equivalent to optimizing the position encoding objective. That is, the patterns formed in the\ncontinuous attractor model are also optimal for linearly generating place cell-like activations.\n\n6 Discussion\n\nOur unifying theory and numerical results suggest that much of the diversity and structure of so-\nlutions obtained across many different normative models of MEC can be explained by the learning\ndynamics of a simple linear, place cell encoding objective. This is intriguing given that the archi-\ntectures and navigational tasks employed in [13, 14] are considerably more sophisticated. Further\nstudies could explore how changing this common subproblem changes the solutions found by RNNs\ntrained to navigate. Moreover, our theory predicts why hexagonal grids should emerge in RNNs\ntrained to path integrate, but it does not explain how RNNs trained via backprop learn to stabilize\nand update these patterns in order to path integrate. Future studies could reverse engineer these\ntrained networks to determine whether their inner workings coincide with the simple mechanistic\nmodels we describe in Section 5 and have been proposed for the past 20 years.\nFurthermore, our theory made simplifying assumptions about uniform, isotropic coverage of place\ncell representations, yielding highly regular, stable grid cell patterns by solving the position encoding\nobjective. Our theoretical framework enables us to explore quantitatively how grid cell solutions\nchange when the environment is deformed, rewards or obstacles are incorporated, or place cells are\nlesioned. Recent experiments have characterized the MECs response to each of these scenarios [18,\n19, 20, 21], but no uni\ufb01ed theory has been put forth to explain the results. These questions could\npotentially be addressed by drawing from a rich literature of how patterns respond to defects in\neither spatial statistics or neural connectivity. Such defects could play a role in accounting for the\nheterogeneity of grid cells [22]. Another interesting approach is to incorporate landmark inputs in\ntrained networks, in addition to velocity inputs. Such landmark inputs are known to correct drift in\nthe grid cell system [23] and can successfully account for various deformations in grid cell \ufb01ring\npatterns due to environmental manipulations [8, 9].\nFinally, a growing body of work has explored experimentally the hypothesis that MEC encodes\ncontinuous variables other than position, such as sound pitch [24] or abstract quantities like the\nwidth and height of a bird [25]. While we have referred to a \u201cposition\u201d encoding objective and\n\u201cpath\u201d integration, we note that our theory actually holds for generic continuous variables. That\nis, we would expect networks trained to keep track of sound pitch and volume to behave the same\nway. Perhaps, intriguingly, grid like structure may be relevant for neural processing in even more\nabstract domains of semantic cognition [25, 26]. Overall, the unifying pattern formation framework\nwe have identi\ufb01ed, that spans both normative and mechanistic models, affords a powerful conceptual\ntool to address many questions about the origins, structure, variability and robustness of grid-like\nrepresentations in the brain.\n\n9\n\n\fAcknowledgments\nS.G. thanks the Simons, and James S. McDonnell foundations, and NSF Career 1845166 for funding.\nB.S. thanks the Stanford Graduate Fellowship for \ufb01nancial support.\n\nReferences\n\n[1] Torkel Hafting et al. \u201cMicrostructure of a spatial map in the entorhinal cortex\u201d. In: Nature\n436.7052 (Aug. 2005), pp. 801\u2013806. ISSN: 0028-0836. DOI: 10.1038/nature03721. URL:\nhttp://www.nature.com/articles/nature03721.\n\n[2] Vegard Heimly Brun et al. \u201cProgressive increase in grid scale from dorsal to ventral medial\n\nentorhinal cortex\u201d. In: Hippocampus 18.12 (2008), pp. 1200\u20131212.\n\n[3] B L Mcnaughton, C A Barnes, and J O\u2019keefe. The Contributions of Position, Direction, and\nVelocity to Single Unit Activity in the Hippocampus of Freely-moving Rats. Tech. rep. 1983,\npp. 41\u201349. URL: https : / / link . springer . com / content / pdf / 10 . 1007 % 7B % 5C %\n%7D2FBF00237147.pdf.\n\n[4] William E Skaggs et al. An Information-Theoretic Approach to Deciphering the Hip-\npocampal Code. Tech. rep. URL: https : / / pdfs . semanticscholar . org / 079a /\nc9a229f99400777bb433c96417d549761bb8 . pdf ? %7B % 5C _ %7Dga = 2 . 54977259 .\n1986563027.1540487071-102463582.1538966432.\n\n[5] Kechen Zhang. Representation of Spatial Orientation by the Intrinsic Dynamics of the Head-\nDirection Cell Ensemble: A Theory. Tech. rep. 6. 1996, pp. 2112\u20132126. URL: http://www.\njneurosci.org/content/jneuro/16/6/2112.full.pdf.\n\n[6] Mark C Fuhs and David S Touretzky. \u201cA Spin Glass Model of Path Integration in Rat Medial\nEntorhinal Cortex\u201d. In: (2006). DOI: 10.1523/JNEUROSCI.4353- 05.2006. URL: http:\n//www.jneurosci.org/content/jneuro/26/16/4266.full.pdf.\n\n[7] Y Burak and I R Fiete. \u201cAccurate Path Integration in Continuous Attractor Network Models\nof Grid Cells\u201d. In: PLoS Comput Biol 5.2 (2009), p. 1000291. DOI: 10 . 1371 / journal .\npcbi.1000291. URL: www.ploscompbiol.org.\n\n[8] Samuel A. Ocko et al. \u201cEmergent elasticity in the neural code for space\u201d. In: Proceedings\nof the National Academy of Sciences of the United States of America 115.50 (Dec. 2018),\nE11798\u2013E11806. ISSN: 10916490. DOI: 10.1073/pnas.1805959115.\n\n[9] Malcolm G. Campbell et al. \u201cPrinciples governing the integration of landmark and self-\nmotion cues in entorhinal cortical codes for navigation\u201d. In: Nature Neuroscience 21.8 (Aug.\n2018), pp. 1096\u20131106. ISSN: 1097-6256. DOI: 10 . 1038 / s41593 - 018 - 0189 - y. URL:\nhttp://www.nature.com/articles/s41593-018-0189-y.\n\n[10] Yedidyah Dordek et al. \u201cExtracting grid cell characteristics from place cell inputs using non-\nnegative principal component analysis\u201d. In: eLife 5 (Mar. 2016). ISSN: 2050-084X. DOI: 10.\n7554/eLife.10094. URL: https://elifesciences.org/articles/10094.\n\n[11] Kimberly L Stachenfeld, Matthew M Botvinick, and Samuel J Gershman. \u201cThe hippocampus\nas a predictive map\u201d. In: Nature Neuroscience 20.11 (Nov. 2017), pp. 1643\u20131653. ISSN: 1097-\n6256. DOI: 10.1038/nn.4650. URL: http://www.nature.com/articles/nn.4650.\nJames C R Whittington et al. Generalisation of structural knowledge in the hippocampal-\nentorhinal system. Tech. rep.\n\n[12]\n\n[13] Christopher J Cueva and Xue-Xin Wei. Emergence of Grid-like Representations by Train-\ning Recurrent Neural Networks to Perform Spatial Localization. Tech. rep. arXiv: 1803 .\n07770v1. URL: https://arxiv.org/pdf/1803.07770.pdf.\n\n[14] Andrea Banino et al. \u201cVector-based navigation using grid-like representations in arti\ufb01cial\nagents\u201d. In: Nature 557.7705 (2018), pp. 429\u2013433. ISSN: 0028-0836. DOI: 10 . 1038 /\ns41586-018-0102-6.\n\n[15] Alexander Mathis, Martin B. Stemmier, and Andreas V.M. Herz. \u201cProbable nature of higher-\ndimensional symmetries underlying mammalian grid-cell activity patterns\u201d. In: eLife 2015.4\n(2015), pp. 1\u201329. ISSN: 2050084X. DOI: 10.7554/eLife.05979. arXiv: 1411.2136.\n\n10\n\n\f[16] Xue-Xin Wei, Jason Prentice, and Vijay Balasubramanian. \u201cA principle of economy predicts\nthe functional architecture of grid cells\u201d. In: Elife 4 (2015), e08362. DOI: 10.7554/eLife.\n08362.\n\n[17] M C Cross and P C Hohenberg. \u201cPattern formation outside of equilibrium\u201d. In: Rev. Mod.\nPhys. 65.3 (July 1993), pp. 851\u20131112. DOI: 10.1103/RevModPhys.65.851. URL: https:\n//link.aps.org/doi/10.1103/RevModPhys.65.851.\n\n[18] William N Butler, Kiah Hardcastle, and Lisa M Giocomo. \u201cRemembered reward locations\nrestructure entorhinal spatial maps.\u201d In: Science (New York, N.Y.) 363.6434 (Mar. 2019),\npp. 1447\u20131452. ISSN: 1095-9203. DOI: 10.1126/science.aav5297. URL: http://www.\nncbi.nlm.nih.gov/pubmed/30923222%20http://www.pubmedcentral.nih.gov/\narticlerender.fcgi?artid=PMC6516752.\nJulija Krupic et al. \u201cGrid cell symmetry is shaped by environmental geometry\u201d. In: Nature\n518.7538 (Feb. 2015), pp. 232\u2013235. ISSN: 0028-0836. DOI: 10.1038/nature14153. URL:\nhttp://www.nature.com/articles/nature14153.\n\n[19]\n\n[20] Revekka Ismakov et al. \u201cGrid Cells Encode Local Positional Information\u201d. In: Curr Biol\n\n[21]\n\n27.15 (2017), 2337\u20132343.e3. ISSN: 0960-9822. DOI: 10.1016/j.cub.2017.06.034.\nJena B Hales et al. \u201cMedial Entorhinal Cortex Lesions Only Partially Disrupt Hippocampal\nPlace Cells and Hippocampus-Dependent Place Memory\u201d. In: CellReports 9 (2014), pp. 893\u2013\n901. DOI: 10.1016/j.celrep.2014.10.009. URL: http://dx.doi.org/10.1016/j.\ncelrep.2014.10.009.\n\n[22] Kiah Hardcastle et al. \u201cA Multiplexed, Heterogeneous, and Adaptive Code for Navigation in\nMedial Entorhinal Cortex\u201d. In: (2017). DOI: 10 . 1016 / j . neuron . 2017 . 03 . 025. URL:\nhttp://dx.doi.org/10.1016/j.neuron.2017.03.025.\n\n[23] Kiah Hardcastle, Surya Ganguli, and Lisa M. Giocomo. \u201cEnvironmental Boundaries as an\nError Correction Mechanism for Grid Cells\u201d. In: Neuron 86.3 (May 2015), pp. 827\u2013839.\nISSN: 10974199. DOI: 10.1016/j.neuron.2015.03.039.\n\n[24] Dmitriy Aronov, Rhino Nevers, and David W. Tank. \u201cMapping of a non-spatial dimension\nby the hippocampal-entorhinal circuit\u201d. In: Nature 543.7647 (Mar. 2017), pp. 719\u2013722. ISSN:\n14764687. DOI: 10.1038/nature21692.\n\n[25] Alexandra O. Constantinescu, Jill X. O\u2019Reilly, and Timothy E.J. Behrens. \u201cOrganizing con-\nceptual knowledge in humans with a gridlike code\u201d. In: Science 352.6292 (June 2016),\npp. 1464\u20131468. ISSN: 10959203. DOI: 10.1126/science.aaf0941.\n\n[26] Andrew M. Saxe, James L. McClelland, and Surya Ganguli. \u201cA mathematical theory of se-\nmantic development in deep neural networks\u201d. In: Proceedings of the National Academy\nof Sciences of the United States of America 166.23 (June 2019), pp. 11537\u201311546. ISSN:\n10916490. DOI: 10.1073/pnas.1820226116. arXiv: 1810.10531.\n\n[27] U\u02c7gur M. Erdem and Michael Hasselmo. \u201cA goal-directed spatial navigation model using\nforward trajectory planning based on grid cells\u201d. In: European Journal of Neuroscience 35.6\n(Mar. 2012), pp. 916\u2013931. ISSN: 0953816X. DOI: 10.1111/j.1460-9568.2012.08015.x.\n\n11\n\n\f", "award": [], "sourceid": 5290, "authors": [{"given_name": "Ben", "family_name": "Sorscher", "institution": "Stanford University"}, {"given_name": "Gabriel", "family_name": "Mel", "institution": "Stanford University"}, {"given_name": "Surya", "family_name": "Ganguli", "institution": "Stanford"}, {"given_name": "Samuel", "family_name": "Ocko", "institution": "Stanford"}]}