{"title": "Cortical microcircuits as gated-recurrent neural networks", "book": "Advances in Neural Information Processing Systems", "page_first": 272, "page_last": 283, "abstract": "Cortical circuits exhibit intricate recurrent architectures that are remarkably similar across different brain areas. Such stereotyped structure suggests the existence of common computational principles. However, such principles have remained largely elusive. Inspired by gated-memory networks, namely long short-term memory networks (LSTMs), we introduce a recurrent neural network in which information is gated through inhibitory cells that are subtractive (subLSTM).  We propose a natural mapping of subLSTMs onto known canonical excitatory-inhibitory cortical microcircuits.   Our empirical evaluation across sequential image classification and language modelling tasks shows that subLSTM units can achieve similar performance to LSTM units. These results suggest that cortical circuits can be optimised to solve complex contextual problems and proposes a novel view on their computational function. Overall our work provides a step towards unifying recurrent networks as used in machine learning with their biological counterparts.", "full_text": "Cortical microcircuits as\n\ngated-recurrent neural networks\n\nRui Ponte Costa\u2217\n\nYannis M. Assael\u2217\n\nCentre for Neural Circuits and Behaviour\n\nDept. of Computer Science\n\nDept. of Physiology, Anatomy and Genetics\n\nUniversity of Oxford, Oxford, UK\n\nUniversity of Oxford, Oxford, UK\n\nrui.costa@cncb.ox.ac.uk\n\nand DeepMind, London, UK\n\nyannis.assael@cs.ox.ac.uk\n\nBrendan Shillingford\u2217\n\nDept. of Computer Science\n\nUniversity of Oxford, Oxford, UK\n\nand DeepMind, London, UK\n\nbrendan.shillingford@cs.ox.ac.uk\n\nNando de Freitas\n\nDeepMind\nLondon, UK\n\nnandodefreitas@google.com\n\nTim P. Vogels\n\nCentre for Neural Circuits and Behaviour\n\nDept. of Physiology, Anatomy and Genetics\n\nUniversity of Oxford, Oxford, UK\n\ntim.vogels@cncb.ox.ac.uk\n\nAbstract\n\nCortical circuits exhibit intricate recurrent architectures that are remarkably similar\nacross different brain areas. Such stereotyped structure suggests the existence of\ncommon computational principles. However, such principles have remained largely\nelusive. Inspired by gated-memory networks, namely long short-term memory\nnetworks (LSTMs), we introduce a recurrent neural network in which information\nis gated through inhibitory cells that are subtractive (subLSTM). We propose a\nnatural mapping of subLSTMs onto known canonical excitatory-inhibitory cortical\nmicrocircuits. Our empirical evaluation across sequential image classi\ufb01cation\nand language modelling tasks shows that subLSTM units can achieve similar\nperformance to LSTM units. These results suggest that cortical circuits can be\noptimised to solve complex contextual problems and proposes a novel view on\ntheir computational function. Overall our work provides a step towards unifying\nrecurrent networks as used in machine learning with their biological counterparts.\n\n1\n\nIntroduction\n\nOver the last decades neuroscience research has collected enormous amounts of data on the ar-\nchitecture and dynamics of cortical circuits, unveiling complex but stereotypical structures across\nthe neocortex (Markram et al., 2004; Harris and Mrsic-Flogel, 2013; Jiang et al., 2015). One of\nthe most prevalent features of cortical nets is their laminar organisation and their high degree of\nrecurrence, even at the level of local (micro-)circuits (Douglas et al., 1995; Song et al., 2005; Harris\nand Mrsic-Flogel, 2013; Jiang et al., 2015) (Fig. 1a). Another key feature of cortical circuits is\nthe detailed and tight balance of excitation and inhibition, which has received growing support\n\n* These authors contributed equally to this work.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fboth at the experimental (Froemke et al., 2007; Xue et al., 2014; Froemke, 2015) and theoretical\nlevel (van Vreeswijk and Sompolinsky, 1996; Brunel, 2000; Vogels and Abbott, 2009; Hennequin\net al., 2014, 2017). However, the computational processes that are facilitated by these architectures\nand dynamics are still elusive. There remains a fundamental disconnect between the underlying\nbiophysical networks and the emergence of intelligent and complex behaviours.\n\nArti\ufb01cial recurrent neural networks (RNNs), on the other hand, are crafted to perform speci\ufb01c\ncomputations. In fact, RNNs have recently proven very successful at solving complex tasks such\nas language modelling, speech recognition, and other perceptual tasks (Graves, 2013; Graves et al.,\n2013; Sutskever et al., 2014; van den Oord et al., 2016; Assael et al., 2016). In these tasks, the input\ndata contains information across multiple timescales that needs to be \ufb01ltered and processed according\nto its relevance. The ongoing presentation of stimuli makes it dif\ufb01cult to learn to separate meaningful\nstimuli from background noise (Hochreiter et al., 2001; Pascanu et al., 2012). RNNs, and in particular\ngated-RNNs, can solve this problem by maintaining a representation of relevant input sequences until\nneeded, without interference from new stimuli. In principle, such protected memories conserve past\ninputs and thus allow back-propagation of errors further backwards in time (Pascanu et al., 2012).\nBecause of their memory properties, one of the \ufb01rst and most successful types of gated-RNNs was\nnamed \u201clong short-term memory networks\u201d (LSTMs, Hochreiter and Schmidhuber (1997), Fig. 1c).\n\nHere we note that the architectural features of LSTMs overlap closely with known cortical structures,\nbut with a few important differences with regard to the mechanistic implementation of gates in a\ncortical network and LSTMs (Fig. 1b). In LSTMs, the gates control the memory cell as a multiplicative\nfactor, but in biological networks, the gates, i.e. inhibitory neurons, act (to a \ufb01rst approximation)\nsubtractively \u2014 excitatory and inhibitory (EI) currents cancel each other linearly at the level of\nthe postsynaptic membrane potential (Kandel et al., 2000; Gerstner et al., 2014). Moreover, such a\nsubtractive inhibitory mechanism must be well balanced (i.e. closely match the excitatory input) to act\nas a gate to the inputs in the \u2019closed\u2019 state, without perturbing activity \ufb02ow with too much inhibition.\nPrevious models have explored gating in subtractive excitatory and inhibitory balanced networks\n(Vogels and Abbott, 2009; Kremkow et al., 2010), but without a clear computational role. On the other\nhand, predictive coding RNNs with EI features have been studied (Bastos et al., 2012; Deneve and\nMachens, 2016), but without a clear match to state-of-the-art machine learning networks. Regarding\nprevious neuroscienti\ufb01c interpretations of LSTMs, there have been suggestions of LSTMs as models\nof working memory and different brain areas (e.g. prefrontal cortex, basal ganglia and hippocampus)\n(O\u2019Reilly and Frank, 2006; Krueger and Dayan, 2009; Cox and Dean, 2014; Marblestone et al., 2016;\nHassabis et al., 2017; Bhalla, 2017), but without a clear interpretation of the individual components\nof LSTMs and a speci\ufb01c mapping to known circuits.\n\nWe propose to map the architecture and function of LSTMs directly onto cortical circuits, with\ngating provided by lateral subtractive inhibition. Our networks have the potential to exhibit the\nexcitation-inhibition balance observed in experiments (Douglas et al., 1989; Bastos et al., 2012;\nHarris and Mrsic-Flogel, 2013) and yield simpler gradient propagation than multiplicative gating.\n\nWe study these dynamics through our empirical evaluation showing that subLSTMs achieve similar\nperformance to LSTMs in the Penn Treebank and Wikitext-2 language modelling tasks, as well\nas pixelwise sequential MNIST classi\ufb01cation. By transferring the functionality of LSTMs into a\nbiologically more plausible network, our work provides testable hypotheses for the most recently\nemerging, technologically advanced experiments on the functionality of entire cortical microcircuits.\n\n2 Biological motivation\n\nThe architecture of LSTM units, with their general feedforward structure aided by additional recurrent\nmemory and controlled by lateral gates, is remarkably similar to the columnar architecture of cortical\ncircuits (Fig. 1; see also Fig. S1 for a more detailed neocortical schematic). The central element in\nLSTMs and similar RNNs is the memory cell, which we hypothesise to be implemented by local\nrecurrent networks of pyramidal cells in layer-5. This is in line with previous studies showing a\nrelatively high level of recurrence and non-random connectivity between pyramidal cells in layer-\n5 (Douglas et al., 1995; Thomson et al., 2002; Song et al., 2005). Furthermore, layer-5 pyramidal\nnetworks display rich activity on (relatively) long time scales in vivo (Barth\u00f3 et al., 2009; Sakata\nand Harris, 2009; Luczak et al., 2015; van Kerkoerle et al., 2017) and in slices (Egorov et al., 2002;\nWang et al., 2006), consistent with LSTM-like function. There is strong evidence for persistent\n\n2\n\n\fneuronal activity both in higher cortical areas (Goldman-Rakic, 1995) and in sensory areas (Huang\net al., 2016; van Kerkoerle et al., 2017; Kornblith et al., 2017). Relatively speaking, sensory areas\n(e.g. visual cortex) exhibit sorter timescales than higher brain areas (e.g. prefrontal cortex), which we\nwould expect given the different temporal requirements these brain areas have. A similar behaviour is\nexpected in multi-area (or layer) LSTMs. Note that such longer time-scales can also be present in\nmore super\ufb01cial layers (e.g. layer 2/3) (Goldman-Rakic, 1995; van Kerkoerle et al., 2017), suggesting\nthe possibility of more than one memory cell per cortical microcircuit. Slow memory decay in these\nnetworks may be controlled through short- (York and van Rossum, 2009; Costa et al., 2013, 2017a)\nand long-term synaptic plasticity (Abbott and Nelson, 2000; Senn et al., 2001; P\ufb01ster and Gerstner,\n2006; Zenke et al., 2015; Costa et al., 2015, 2017a,b) at recurrent excitatory synapses.\n\nThe gates that protect a given memory in LSTMs can be mapped onto lateral inhibitory inputs in\ncortical circuits. We propose that, similar to LSTMs, the input gate is implemented by inhibitory\nneurons in layer-2/3 (or layer-4; Fig. 1a). Such lateral inhibition is consistent with the canonical view\nof microcircuits (Douglas et al., 1989; Bastos et al., 2012; Harris and Mrsic-Flogel, 2013) and sparse\nsensory-evoked responses in layer-2/3 (Sakata and Harris, 2009; Harris and Mrsic-Flogel, 2013).\nIn the brain, this inhibition is believed to originate from (parvalbumin) basket cells, providing a\nnear-exact balanced inhibitory counter signal to a given excitatory feedforward input (Froemke et al.,\n2007; Xue et al., 2014; Froemke, 2015). Excitatory and inhibitory inputs thus cancel each other and\narriving signals are ignored by default. Consequently, any activity within the downstream memory\nnetwork remains largely unperturbed, unless it is altered through targeted modulation of the inhibitory\nactivity (Harris and Mrsic-Flogel, 2013; Vogels and Abbott, 2009; Letzkus et al., 2015). Similarly,\nthe memory cell itself can only affect the output of the LSTM when its activity is unaccompanied by\ncongruent inhibition (mapped onto layer-5, layer-6 or layer 2/3 in the same microcircuit, which are\nknown to project to higher brain areas (Harris and Mrsic-Flogel, 2013); see Fig. S1), i.e. when lateral\ninhibition is turned down and the gate is open.\n\n2.1 Why subtractive neural integration?\n\nWhen a presynaptic cell \ufb01res, neurotransmitter is released by its synaptic terminals. The neurotrans-\nmitter is subsequently bound by postsynaptic receptors where it prompts a structural change of an ion\nchannel to allow the \ufb02ow of electrically charged ions into or out of the postsynaptic cell. Depending\non the receptor type, the ion \ufb02ux will either increase (depolarise) or decrease (hyperpolarise) the\npostsynaptic membrane potential. If suf\ufb01ciently depolarising \u201cexcitatory\u201d input is provided, the\npostsynaptic potential will reach a threshold and \ufb01re a stereotyped action potential (\u201cspike\u201d, Kandel\net al. (2000)). This behaviour can be formalised as a RC\u2013circuit (R = resistance, C = capacitance),\nwhich follows Ohm\u2019s laws u = RI and yields the standard leaky-integrate-and-\ufb01re neuron model\n(Gerstner and Kistler, 2002), \u03c4m \u02d9u = \u2212u + RIexc \u2212 RIinh, where \u03c4m = RC is the membrane time\nconstant, and Iexc and Iinh are the excitatory and inhibitory (hyperpolarizing) synaptic input currents,\nrespectively.\n\nAction potentials are initiated in this standard model (Brette and Gerstner, 2005; Gerstner et al.,\n2014)) when the membrane potential hits a hard threshold \u03b8. They are modelled as a momentary\npulse and a subsequent reset to a resting potential. Neuronal excitation and inhibition have opposite\neffects, such that inhibitory inputs acts linearly and subtractively on the membrane potential.\n\nThe leaky-integrate-and-\ufb01re model can be approximated at the level of \ufb01ring rates as rate \u223c\n\n(cid:16)\u03c4m ln R(Iexc\u2212Iinh)\n\nR(Iexc\u2212Iinh)\u2212\u03b8(cid:17)\u22121\n\n(see Fig. 1a for the input-output response; Gerstner and Kistler (2002)),\n\nwhich we used to demonstrate the impact of subtractive gating (Fig. 1b), and contrast it with multi-\nplicative gating (Fig. 1c).\n\nThis \ufb01ring-rate approximation forms the basis for our gated-RNN model which has a similar subtrac-\ntive behaviour and input-output function (cf. Fig. 1b; bottom). Moreover, the rate formulation also\nallows a cleaner comparison to LSTM units and the use of existing machine learning optimisation\nmethods.\n\nIt could be argued that a different form of inhibition (shunting inhibition), which counteracts excitatory\ninputs by decreasing the over all membrane resistance, has a characteristic multiplicative gating effect\non the membrane potential. However, when analysed at the level of the output \ufb01ring rate its effect\nbecomes subtractive (Holt and Koch, 1997; Prescott and De Koninck, 2003). This is consistent with\n\n3\n\n\fFigure 1: Biological and arti\ufb01cial gated recurrent neural networks. (a) Example unit of a simpli\ufb01ed\ncortical recurrent neural network. Sensory (or downstream) input arrives at pyramidal cells in\nlayer-2/3 (L2/3, or layer-4), which is then fed onto memory cells (recurrently connected pyramidal\ncells in layer-5). The memory decays with a decay time constant f . Input onto layer-5 is balanced\nout by inhibitory basket cells (BC). The balance is represented by the diagonal \u2018equal\u2019 connection.\nThe output of the memory cell is gated by basket cells at layer-6, 2/3 or 4 within the same area\n(or at an upstream brain area). (b) Implementation of (a), following a similar notation to LSTM\nunits, but with it and ot as the input and output subtractive gates. Dashed connections represent the\npotential to have a balance between excitatory and inhibitory input (weights are set to 1) (c) LSTM\nrecurrent neural network cell (see main text for details). The plots bellow illustrate the different\ngating modes: (a) using a simple current-based noisy leaky-integrate-and-\ufb01re neuron (capped to\n200Hz) with subtractive inhibition; (b) sigmoidal activation functions with subtractive gating; (c)\nsigmoidal activation functions with multiplicative gating. Output rate represents the number of spikes\nper second (Hz) as in biological circuits.\n\nour approach in that our model is framed at the \ufb01ring-rate level (rather than at the level of membrane\npotentials).\n\n3 Subtractive-gated long short-term memory\n\nIn an LSTM unit (Hochreiter and Schmidhuber, 1997; Greff et al., 2015) the access to the memory\ncell ct is controlled by an input gate it (see Fig.1c). At the same time a forget gate ft controls\nthe decay of this memory1, and the output gate ot controls whether the content of the memory\ncell ct is transmitted to the rest of the network. A LSTM network consists of many LSTM units,\neach containing its own memory cell ct, input it, forget ft and output ot gates. The LSTM state\nis described as ht = f (xt, ht\u22121, it, ft, ot) and the unit follows the dynamics given in the middle\ncolumn below.\n\n1Note that this leak is controlled by the input and recurrent units, which may be biologically unrealistic.\n\n4\n\nxt,ht-1 (input)ht (output)cellctxt,ht-1itotztxt,ht-1xt,ht-1ftxt,ht-1 (input)ht (output)cellctxt,ht-1itot--ztfxt,ht-1inputunit junit junit joutputL6, L4 or 2/3INPC==INL4 or 2/3PCsmemorycellfPCLayer 5abcsubLSTMLSTMcortical circuit=1=1=112340100200output rate (Hz)1234input010020012340100200baselineweak inh.strong inh.inh. = exc.inh. = exc.closed gatebaselineweak inh.strong inh.baselinestrong gatesubtractive gatingmultiplicative gatingweak gate\fLSTM\n\nsubLSTM\n\n[ft, ot, it]T =\n\n\u03c3(W xt + Rht\u22121 + b),\n\n\u03c3(W xt + Rht\u22121 + b), 2\n\nzt =\n\nct =\n\nht =\n\ntanh(W xt + Rht\u22121 + b),\n\n\u03c3(W xt + Rht\u22121 + b),\n\nct\u22121 \u2299 ft + zt \u2299 it,\n\nct\u22121 \u2299 ft + zt \u2212 it,\n\ntanh(ct) \u2299 ot.\n\n\u03c3(ct) \u2212 ot.\n\nHere, ct is the memory cell (note the multiplicative control of the input gate), \u2299 denotes element-wise\nmultiplication and zt is the new weighted input given with xt and ht\u22121 being the input vector and\nrecurrent input from other LSTM units, respectively. The overall output of the LSTM unit is then\ncomputed as ht. LSTM networks can have multiple layers with millions of parameters (weights and\nbiases), which are typically trained using stochastic gradient descent in a supervised setting. Above,\nthe parameters are W , R and b. The multiple gates allow the network to adapt the \ufb02ow of information\ndepending on the task at hand. In particular, they enable writing to the memory cell (controlled by\ninput gate, it), adjusting the timescale of the memory (controlled by forget gate, ft) and exposing the\nmemory to the network (controlled by output gate, ot). The combined effect of these gates makes it\npossible for LSTM units to capture temporal (contextual) dependencies across multiple timescales.\n\nHere, we introduce and study a new RNN unit, subLSTM. SubLSTM units are a mapping of LSTMs\nonto known canonical excitatory-inhibitory cortical microcircuits (Douglas et al., 1995; Song et al.,\n2005; Harris and Mrsic-Flogel, 2013). Similarly, subLSTMs are de\ufb01ned as ht = f (xt, ht\u22121, it, ft, ot)\n(Fig. 1b), however here the gating is subtractive rather than multiplicative. A subLSTM is de\ufb01ned by a\nmemory cell ct, the transformed input zt and the input gate it. In our model we use a simpli\ufb01ed notion\nof balance in the gating (\u03b8zj\nt ) (for the jth unit), where \u03b8 = 1. 3 For the memory forgetting we\nconsider two options: (i) controlled by gates (as in an LSTM unit) as ft = \u03c3(W xt + Rht\u22121 + b) or\n(ii) a more biologically plausible learned simple decay [0, 1], referred to in the results as \ufb01x-subLSTM.\nSimilarly to its input, subLSTM\u2019s output ht is also gated through a subtractive output gate ot (see\nequations above). We evaluated different activation functions and sigmoidal transformations had the\nhighest performance.\n\nt \u2212 \u03b8ij\n\nThe key differences to other gated-RNNs is in the subtractive inhibitory gating (it and ot) that has\nthe potential to be balanced with the excitatory input (zt and ct, respectively; Fig. 1b). See below a\nmore detailed comparison of the different gating modes.\n\n3.1 Subtractive versus multiplicative gating in RNNs\n\nThe key difference between subLSTMs and LSTMs lies in the implementation of the gating mech-\nanism. LSTMs typically use a multiplicative factor to control the amplitude of the input signal.\nSubLSTMs use a more biologically plausible interaction of excitation and inhibition. An important\nconsequence of subtractive gating is the potential for an improved gradient \ufb02ow backwards towards\nthe input layers. To illustrate this we can compare the gradients for the subLSTMs and LSTMs in a\nsimple example.\n\nFirst, we review the derivatives of the loss with respect to the various components of the subLSTM,\nusing notation based on (Greff et al., 2015). In this notation, \u03b4a represents the derivative of the loss\n\n2Note that we consider two versions of subLSTMs: one with a forget gate as in LSTMs (subLSTM) and\nanother with a simple memory decay (i.e. a scalar [0,1] that de\ufb01nes the memory timeconstant, \ufb01x-subLSTM).\n3These weights could also be optimised, but for this model we decided to keep the number of parameters to a\n\nminimum for simplicity and ease of comparison with LSTMs.\n\n5\n\n\fwith respect to a, and \u2206t\n\ndef= dloss\ndht\n\n, the error from the layer above. Then by chain rule we have:\n\n\u03b4ht = \u2206t\n\u03b4ot = \u2212\u03b4ht \u2299 \u03c3\u2032(ot)\n\u03b4ct = \u03b4ht \u2299 \u03c3\u2032(ct) + \u03b4ct+1 \u2299 ft+1\n\u03b4f t = \u03b4ct \u2299 ct\u22121 \u2299 \u03c3\u2032(f t)\n\u03b4it = \u2212\u03b4ct \u2299 \u03c3\u2032(it)\n\u03b4zt = \u03b4ct \u2299 \u03c3\u2032(zt)\n\nFor comparison, the corresponding derivatives for an LSTM unit are given by:\n\n\u03b4ht = \u2206t\n\u03b4ot = ht \u2299 tanh(ct) \u2299 \u03c3\u2032(ot)\n\u03b4ct = ht \u2299 ot \u2299 tanh\u2032(ct) + \u03b4ct+1 \u2299 ft+1\n\u03b4f t = \u03b4ct \u2299 ct\u22121 \u2299 \u03c3\u2032(f t)\n\u03b4it = \u03b4ct \u2299 zt \u2299 \u03c3\u2032(it)\n\u03b4zt = \u03b4ct \u2299 it \u2299 tanh\u2032(zt)\n\nwhere \u03c3(\u00b7) is the sigmoid activation function and the overlined variables ct, f t, etc. are the pre-\nactivation values of a gate or input transformation (e.g. ot = Woxt + Roht\u22121 + bo for the output\ngate of a subLSTM). Note that compared to the those of an LSTM, subLSTMs provide a simpler\ngradient with fewer multiplicative factors.\n\nNow, the LSTMs weights Wz of the input transformation z are updated according to\n\n\u03b4Wz =\n\nT\n\nX\n\nt=0\n\nT\n\nX\n\nt\u2032=t\n\n\u2206t\u2032\n\n\u2202ht\u2032\n\u2202ct\u2032\n\n\u00b7 \u00b7 \u00b7\n\n\u2202ct\n\u2202zt\n\n\u2202zt\n\u2202Wz\n\n,\n\n(1)\n\nwhere T is the total number of temporal steps and the ellipsis abbreviates the recurrent gradient\npaths through time, containing the path backwards through time via hs and cs for t \u2264 s \u2264 t\u2032. For\nsimplicity of analysis, we ignore these recurrent connections as they are the same in LSTM and\nsubLSTM, and only consider the depth-wise path through the network; we call this tth timestep\ndepth-only contribution to the derivative (\u03b4Wz)t. For an LSTM, by this slight abuse of notation, we\nhave\n\n(\u03b4Wz)t = \u2206t\n\n\u2202ht\n\u2202ct\n\n\u2202ct\n\u2202zt\n\n\u2202zt\n\u2202Wz\n\n\u2202zt\n\u2202zt\n\u2299 tanh\u2032(ct) \u2299 it\n|{z}\n\ninput gate\n\n= (cid:16)\u2206t \u2299 ot\n|{z}\n\noutput gate\n\n\u2299 tanh\u2032(zt)(cid:17)x\u22a4\nt ,\n\n(2)\n\nwhere tanh\u2032(\u00b7) is the derivative of tanh. Notice that when either of the input or output gates are set to\nzero (closed), the corresponding contributions to the gradient are zero. For a network with subtractive\ngating, the depth-only derivative contribution becomes\n\n(\u03b4Wz)t = (cid:16)\u2206t \u2299 \u03c3\u2032(ct) \u2299 \u03c3\u2032(zt)(cid:17)x\u22a4\nt ,\n\n(3)\n\nwhere \u03c3\u2032(\u00b7) is the sigmoid derivative. In this case, the input and output gates, ot and it, are not\npresent. As a result, the subtractive gates in subLSTMs do not (directly) impair error propagation.\n\n4 Results\n\nThe aims of our work were two-fold. First, inspired by cortical circuits we aimed to propose a\nbiological plausible implementation of an LSTM unit, which would allow us to better understand\ncortical architectures and their dynamics. To compare the performance of subLSTM units to LSTMs,\nwe \ufb01rst compared the learning dynamics for subtractive and multiplicative networks mathematically.\nIn a second step, we empirically compared subLSTM and \ufb01x-subLSTM with LSTM networks in\n\n6\n\n\ftwo tasks: sequential MNIST classi\ufb01cation and word-level language modelling on Penn Treebank\n(Marcus et al., 1993) and Wikitext-2 (Merity et al., 2016). The network weights are initialised with\nGlorot initialisation (Glorot and Bengio, 2010), and LSTM units have an initial forget gate bias of 1.\nWe selected the number of units for \ufb01x-subLSTM such that the number of parameters is held constant\nacross experiments to facilitate fair comparison with LSTMs and subLSTMs.\n\n4.1 Sequential MNIST\n\nIn the \u201csequential\u201d MNIST digit classi\ufb01cation task, each digit image from the MNIST dataset is\npresented to the RNN as a sequence of pixels (Le et al. (2015); Fig. 2a) We decompose the MNIST\nimages of 28\u00d728 pixels into sequences of 784 steps. The network was optimised using RMSProp\nwith momentum (Tieleman and Hinton, 2012), a learning rate of 10\u22124, one hidden layer and 100\nhidden units. Our results show that subLSTMs achieves similar results to LSTMs (Fig. 2b). Our\nresults are comparable to previous results using the same task (Le et al., 2015) and RNNs.\n\nFigure 2: Comparison of LSTM and subLSTM networks for sequential pixel-by-pixel MNIST, using\n100 hidden units. (a) Samples from MNIST dataset. We converted each matrix of 28\u00d728 pixels into\na temporal sequence of 784 timesteps. (b) Classi\ufb01cation accuracy on the test set. \ufb01x-subLSTM has a\n\ufb01xed but learned forget gate.\n\n4.2 Language modelling\n\nLanguage modelling represents a more challenging task for RNNs, with both short and long-term\ndependencies. RNN language models (RNN LMs) models the probability of text by autoregressively\npredicting a sequence of words. Each timestep is trained to predict the following word; in other\nwords, we model the word sequence as a product of conditional multinoulli distributions. We evaluate\nthe RNN LMs by measuring their perplexity, de\ufb01ned for a sequence of n words as\n\nperplexity = P (w1, . . . , wn)\u22121/n.\n\n(4)\n\nWe \ufb01rst used the Penn Treebank (PTB) dataset to train our model on word-level language modelling\n(929k training, 73k validation and 82k test words; with a vocabulary of 10k words).\n\nAll RNNs tested have 2 hidden layers; backpropagation is truncated to 35 steps, and a batch size of 20.\nTo optimise the networks we used RMSProp with momentum. We also performed a hyperparameter\nsearch on the validation set over input, output, and update dropout rates, the learning rate, and\nweight decay. The hyperparameter search was done with Google Vizier, which performs black-box\noptimisation using Gaussian process bandits and transfer learning. Tables 2 and 3 show the resulting\nhyperparameters. Table 1 reports perplexity on the test set (Golovin et al., 2017). To understand how\nsubLSTMs scale with network size we varied the number of hidden units between 10, 100, 200 and\n650.\n\nWe also tested the Wikitext-2 language modelling dataset based on Wikipedia articles. This dataset\nis twice as large as the PTB dataset (2000k training, 217k validation and 245k test words) and also\n\n7\n\nLSTMsubLSTMfix-subLSTM90.092.595.097.5100.0testing accuracy (%)seq....28x28ab97.9697.2997.27\ffeatures a larger vocabulary (33k words). Therefore, it is well suited to evaluate model performance\non longer term dependencies and reduces the likelihood of over\ufb01tting.\n\nOn both datasets, our results show that subLSTMs achieve perplexity similar to LSTMs (Table 1a\nand 1b). Interestingly, the more biological plausible version of subLSTM (with a simple decay as\nforget gates) achieves performance similar to or better than subLSTMs.\n\n(a) Penn Treebank (PTB) test perplexity\n\n(b) Wikitext-2 test perplexity\n\nsize\n\nsubLSTM \ufb01x-subLSTM LSTM\n\nsize\n\nsubLSTM \ufb01x-subLSTM LSTM\n\n10\n100\n200\n650\n\n222.80\n91.46\n79.59\n76.17\n\n213.86\n91.84\n81.97\n70.58\n\n215.93\n88.39\n74.60\n64.34\n\n10\n100\n200\n650\n\n268.33\n103.36\n89.00\n78.92\n\n259.89\n105.06\n94.33\n79.49\n\n271.44\n102.77\n86.15\n74.27\n\nTable 1: Language modelling (word-level) test set perplexities on (a) Penn Treebank and\n(b) Wikitext-2. The models have two layers and \ufb01x-subLSTM uses a \ufb01xed but learned forget\ngate f = [0, 1] for each unit. The number of units for \ufb01x-subLSTM was chosen such that the number\nof parameters were the same as those of (sub)LSTM to facilitate fair comparison. Size indicates the\nnumber of units.\n\nThe number of hidden units for \ufb01x-subLSTM were selected such that the number of parameters were\nthe same as LSTM and subLSTM, facilitating fair comparison.\n\n5 Conclusions & future work\n\nCortical microcircuits exhibit complex and stereotypical network architectures that support rich\ndynamics, but their computational power and dynamics have yet to be properly understood. It is\nknown that excitatory and inhibitory neuron types interact closely to process sensory information\nwith great accuracy, but making sense of these interactions is beyond the scope of most contemporary\nexperimental approaches.\n\nLSTMs, on the other hand, are a well-understood and powerful tool for contextual tasks, and their\nstructure maps intriguingly well onto the stereotyped connectivity of cortical circuits. Here, we\nanalysed if biologically constrained LSTMs (i.e. subLSTMs) could perform similarly well, and indeed,\n\nModel\n\nhidden\nunits\n\ninput\n\ndropout\n\noutput\ndropout\n\nupdate\ndropout\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\n10\n10\n11\n\n100\n100\n115\n\n200\n200\n230\n\n650\n650\n750\n\n0.026\n0.012\n0.009\n\n0.099\n0.392\n0.194\n\n0.473\n0.337\n0.394\n\n0.607\n0.562\n0.662\n\n0.047\n0.045\n0.043\n\n0.074\n0.051\n0.148\n\n0.345\n0.373\n0.472\n\n0.630\n0.515\n0.730\n\n0.002\n0.438\n0\n\n0.015\n0.246\n0.042\n\n0.013\n0.439\n0.161\n\n0.083\n0.794\n0.530\n\nlearning\n\nrate\n\n0.01186\n0.01666\n0.01006\n\n0.00906\n0.01186\n0.00400\n\n0.00496\n0.01534\n0.00382\n\n0.00568\n0.00301\n0.00347\n\nweight\ndecay\n\n0.000020\n0.000009\n0.000029\n\n0.000532\n0.000157\n0.000218\n\n0.000191\n0.000076\n0.000066\n\n0.000145\n0.000227\n0.000136\n\nTable 2: Penn Treebank hyperparameters.\n\n8\n\n\fModel\n\nhidden\nunits\n\ninput\n\ndropout\n\noutput\ndropout\n\nupdate\ndropout\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\nLSTM\nsubLSTM\n\ufb01x-subLSTM\n\n10\n10\n11\n\n100\n100\n115\n\n200\n200\n230\n\n650\n650\n750\n\n0.015\n0.002\n0.033\n\n0.198\n0.172\n0.130\n\n0.379\n0.342\n0.256\n\n0.572\n0.633\n0.656\n\n0.039\n0.030\n0.070\n\n0.154\n0.150\n0.187\n\n0.351\n0.269\n0.273\n\n0.566\n0.567\n0.590\n\n0\n0.390\n0.013\n\n0.002\n0.009\n0\n\n0\n0.018\n0\n\n0.071\n0.257\n0.711\n\nlearning\n\nrate\n\n0.01235\n0.00859\n0.00875\n\n0.01162\n0.00635\n0.00541\n\n0.00734\n0.00722\n0.00533\n\n0.00354\n0.00300\n0.00321\n\nweight\ndecay\n\n0\n0.000013\n0\n\n0.000123\n0.000177\n0.000172\n\n0.000076\n0.000111\n0.000160\n\n0.000112\n0.000142\n0.000122\n\nTable 3: Wikitext-2 hyperparameters.\n\nsuch subtractively gated excitation-inhibition recurrent neural networks show promise compared\nagainst LSTMs 4 on benchmarks such as sequence classi\ufb01cation and word-level language modelling.\n\nWhile it is notable that subLSTMs could not outperform their traditional counterpart (yet), we hope\nthat our work will serve as a platform to discuss and develop ideas of cortical function and to establish\nlinks to relevant experimental work on the role of excitatory and inhibitory neurons in contextual\nlearning (Froemke et al., 2007; Froemke, 2015; Poort et al., 2015; Pakan et al., 2016; Kuchibhotla\net al., 2017). In future work, it will be interesting to study how additional biological detail may affect\nperformance. Next steps should aim to include Dale\u2019s principle (i.e. that a given neuron can only\nmake either excitatory or inhibitory connections, Strata and Harvey (1999)), and naturally focus\non the perplexing diversity of inhibitory cell types (Markram et al., 2004) and behaviour, such as\nshunting inhibition and mixed subtractive and divisive control (Doiron et al., 2001; Mejias et al.,\n2013; El Boustani and Sur, 2014; Seybold et al., 2015).\n\nOverall, given the success of multiplicative gated LSTMs, it will be most insightful to understand if\nsome of the biological tricks of cortical networks may give LSTMs a further performance boost.\n\nAcknowledgements\n\nWe would like to thank Everton Agnes, \u00c7a\u02d8glar G\u00fcl\u00e7ehre, Gabor Melis and Jake Stroud for helpful\ncomments and discussion. R.P.C. and T.P.V. were supported by a Sir Henry Dale Fellowship by the\nWellcome Trust and the Royal Society (WT 100000). Y.M.A. was supported by the EPSRC and the\nResearch Council UK (RCUK). B.S. was supported by the Clarendon Fund.\n\nReferences\n\nAbbott, L. F. and Nelson, S. B. (2000). Synaptic plasticity: taming the beast. Nature Neuroscience, 3:1178.\n\nAssael, Y. M., Shillingford, B., Whiteson, S., and de Freitas, N. (2016). Lipnet: Sentence-level lipreading. arXiv\n\npreprint arXiv:1611.01599.\n\nBarth\u00f3, P., Curto, C., Luczak, A., Marguet, S. L., and Harris, K. D. (2009). Population coding of tone stimuli in\n\nauditory cortex: dynamic rate vector analysis. European Journal of Neuroscience, 30(9):1767\u20131778.\n\nBastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., and Friston, K. J. (2012). Canonical\n\nmicrocircuits for predictive coding. Neuron, 76(4):695\u2013711.\n\nBhalla, U. S. (2017). Dendrites, deep learning, and sequences in the hippocampus. Hippocampus.\n\n4Although here we have focus on a comparison with LSTMs, similar points would also apply to other\n\ngated-RNNs, such as Gated Recurrent Units (Chung et al., 2014).\n\n9\n\n\fBrette, R. and Gerstner, W. (2005). Adaptive exponential integrate-and-\ufb01re model as an effective description of\n\nneuronal activity. Journal of Neurophysiology, 94(5):3637.\n\nBrunel, N. (2000). Dynamics of Sparsely Connected Networks of Excitatory and Inhibitory Spiking Neurons.\n\nJournal of Computational Neuroscience, 8(3):183\u2013208.\n\nChung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural\n\nNetworks on Sequence Modeling. arXiv.org.\n\nCosta, R. P., Froemke, R. C., Sjostrom, P. J., and van Rossum, M. C. W. (2015). Uni\ufb01ed pre- and postsynaptic\n\nlong-term plasticity enables reliable and \ufb02exible learning. eLife, 4:e09457.\n\nCosta, R. P., Mizusaki, B. E. P., Sjostrom, P. J., and van Rossum, M. C. W. (2017a). Functional consequences of\npre- and postsynaptic expression of synaptic plasticity. Philosophical transactions of the Royal Society of\nLondon. Series B, Biological sciences, 372(1715):20160153.\n\nCosta, R. P., Padamsey, Z., D\u2019amour, J. A., Emptage, N. J., Froemke, R. C., and Vogels, T. P. (2017b). Synaptic\n\nTransmission Optimization Predicts Expression Loci of Long-Term Plasticity. Neuron, 96(1):177\u2013189.e7.\n\nCosta, R. P., Sjostrom, P. J., and van Rossum, M. C. W. (2013). Probabilistic inference of short-term synaptic\n\nplasticity in neocortical microcircuits. Frontiers in Computational Neuroscience, 7:75.\n\nCox, D. D. and Dean, T. (2014). Neural Networks and Neuroscience-Inspired Computer Vision. Current Biology,\n\n24(18):R921\u2013R929.\n\nDeneve, S. and Machens, C. K. (2016). Ef\ufb01cient codes and balanced networks. Nature Neuroscience, 19(3):375\u2013\n\n382.\n\nDoiron, B., Longtin, A., Berman, N., and Maler, L. (2001). Subtractive and divisive inhibition: effect of\n\nvoltage-dependent inhibitory conductances and noise. Neural Computation, 13(1):227\u2013248.\n\nDouglas, R., Koch, C., Mahowald, M., Martin, K., and Suarez, H. (1995). Recurrent excitation in neocortical\n\ncircuits. Science, 269(5226):981\u2013985.\n\nDouglas, R. J., Martin, K. A. C., and Whitteridge, D. (1989). A Canonical Microcircuit for Neocortex. Neural\n\nComputation, 1(4):480\u2013488.\n\nEgorov, A. V., Hamam, B. N., Frans\u00e9n, E., Hasselmo, M. E., and Alonso, A. A. (2002). Graded persistent\n\nactivity in entorhinal cortex neurons. Nature, 420(6912):173\u2013178.\n\nEl Boustani, S. and Sur, M. (2014). Response-dependent dynamics of cell-speci\ufb01c inhibition in cortical networks\n\nin vivo. Nature Communications, 5:5689.\n\nFroemke, R. C. (2015). Plasticity of cortical excitatory-inhibitory balance. Annual Review of Neuroscience,\n\n38(1):195\u2013219.\n\nFroemke, R. C., Merzenich, M. M., and Schreiner, C. E. (2007). A synaptic memory trace for cortical receptive\n\n\ufb01eld plasticity. Nature.\n\nGerstner, W. and Kistler, W. M. (2002). Spiking Neuron Models. Single Neurons, Populations, Plasticity.\n\nCambridge University Press.\n\nGerstner, W., Kistler, W. M., Naud, R., and Paninski, L. (2014). Neuronal Dynamics. From Single Neurons to\n\nNetworks and Models of Cognition. Cambridge University Press.\n\nGlorot, X. and Bengio, Y. (2010). Understanding the dif\ufb01culty of training deep feedforward neural networks.\nIn Proceedings of the Thirteenth International Conference on Arti\ufb01cial Intelligence and Statistics, pages\n249\u2013256.\n\nGoldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron, 14(3):477\u2013485.\n\nGolovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., and Sculley, D. (2017). Google vizier: A service for\nblack-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge\nDiscovery and Data Mining, pages 1487\u20131495. ACM.\n\nGraves, A. (2013). Generating Sequences With Recurrent Neural Networks. arXiv.org.\n\nGraves, A., Mohamed, A.-r., and Hinton, G. (2013). Speech recognition with deep recurrent neural networks.\n\narXiv preprint arXiv:1303.5778.\n\n10\n\n\fGreff, K., Srivastava, R. K., Koutn\u00edk, J., Steunebrink, B. R., and Schmidhuber, J. (2015). LSTM: A Search\n\nSpace Odyssey. arXiv.org.\n\nHarris, K. D. and Mrsic-Flogel, T. D. (2013). Cortical connectivity and sensory coding. Nature, 503(7474):51\u201358.\n\nHassabis, D., Kumaran, D., Summer\ufb01eld, C., and Botvinick, M. (2017). Neuroscience-Inspired Arti\ufb01cial\n\nIntelligence. Neuron, 95(2):245\u2013258.\n\nHennequin, G., Agnes, E. J., and Vogels, T. P. (2017). Inhibitory Plasticity: Balance, Control, and Codependence.\n\nAnnual Review of Neuroscience, 40(1):557\u2013579.\n\nHennequin, G., Vogels, T. P., and Gerstner, W. (2014). Optimal Control of Transient Dynamics in Balanced\n\nNetworks Supports Generation of Complex Movements. Neuron, 82(6):1394\u20131406.\n\nHochreiter, S., Bengio, Y., Frasconi, P., and Schmidhuber, J. (2001). Gradient \ufb02ow in recurrent nets: the dif\ufb01culty\n\nof learning long-term dependencies.\n\nHochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8):1735\u20131780.\n\nHolt, G. R. and Koch, C. (1997). Shunting inhibition does not have a divisive effect on \ufb01ring rates. Neural\n\nComputation, 9(5):1001\u20131013.\n\nHuang, Y., Matysiak, A., Heil, P., K\u00f6nig, R., Brosch, M., and King, A. J. (2016). Persistent neural activity in\nauditory cortex is related to auditory working memory in humans and nonhuman primates. eLife, 5:e15441.\n\nJiang, X., Shen, S., Cadwell, C. R., Berens, P., Sinz, F., Ecker, A. S., Patel, S., and Tolias, A. S. (2015). Principles\nof connectivity among morphologically de\ufb01ned cell types in adult neocortex. Science, 350(6264):aac9462\u2013\naac9462.\n\nKandel, E. R., Schwartz, J. H., Jessell, T. M., and Siegelbaum, S. A. (2000). Principles of neural science.\n\nKornblith, S., Quiroga, R. Q., Koch, C., Fried, I., and Mormann, F. (2017). Persistent Single-Neuron Activity\n\nduring Working Memory in the Human Medial Temporal Lobe. Current biology : CB, 0(0).\n\nKremkow, J., Aertsen, A., and Kumar, A. (2010). Gating of signal propagation in spiking neural networks by\n\nbalanced and correlated excitation and inhibition. The Journal of neuroscience, 30(47):15760\u201315768.\n\nKrueger, K. A. and Dayan, P. (2009). Flexible shaping: How learning in small steps helps. Cognition,\n\n110(3):380\u2013394.\n\nKuchibhotla, K. V., Gill, J. V., Lindsay, G. W., Papadoyannis, E. S., Field, R. E., Sten, T. A. H., Miller, K. D.,\nand Froemke, R. C. (2017). Parallel processing by cortical inhibition enables context-dependent behavior.\nNature Neuroscience, 20(1):62\u201371.\n\nLe, Q. V., Jaitly, N., and Hinton, G. E. (2015). A Simple Way to Initialize Recurrent Networks of Recti\ufb01ed\n\nLinear Units. arXiv.org.\n\nLetzkus, J. J., Wolff, S., and L\u00fcthi, A. (2015). Disinhibition, a Circuit Mechanism for Associative Learning and\n\nMemory. Neuron.\n\nLuczak, A., McNaughton, B. L., and Harris, K. D. (2015). Packet-based communication in the cortex. Nature\n\nReviews Neuroscience.\n\nMarblestone, A. H., Wayne, G., and Kording, K. P. (2016). Toward an Integration of Deep Learning and\n\nNeuroscience. Frontiers in Computational Neuroscience, 10:94.\n\nMarcus, M. P., Marcinkiewicz, M. A., and Santorini, B. (1993). Building a large annotated corpus of English:\n\nthe penn treebank. Computational Linguistics, 19(2):313\u2013330.\n\nMarkram, H., Toledo-Rodriguez, M., Wang, Y., Gupta, A., Silberberg, G., and Wu, C. (2004). Interneurons of\n\nthe neocortical inhibitory system. Nature Reviews Neuroscience, 5(10):793\u2013807.\n\nMejias, J. F., Kappen, H. J., Longtin, A., and Torres, J. J. (2013). Short-term synaptic plasticity and heterogeneity\n\nin neural systems. 1510:185.\n\nMerity, S., Xiong, C., Bradbury, J., and Socher, R. (2016). Pointer Sentinel Mixture Models. arXiv.org.\n\nO\u2019Reilly, R. C. and Frank, M. J. (2006). Making working memory work: a computational model of learning in\n\nthe prefrontal cortex and basal ganglia. Neural Computation, 18(2):283\u2013328.\n\n11\n\n\fPakan, J. M., Lowe, S. C., Dylda, E., Keemink, S. W., Currie, S. P., Coutts, C. A., Rochefort, N. L., and\nMrsic-Flogel, T. D. (2016). Behavioral-state modulation of inhibition is context-dependent and cell type\nspeci\ufb01c in mouse visual cortex. eLife, 5:e14985.\n\nPascanu, R., Mikolov, T., and Bengio, Y. (2012). On the dif\ufb01culty of training Recurrent Neural Networks.\n\narXiv.org.\n\nP\ufb01ster, J.-P. and Gerstner, W. (2006). Triplets of spikes in a model of spike timing-dependent plasticity. Journal\n\nof Neuroscience, 26(38):9673\u20139682.\n\nPoort, J., Khan, A. G., Pachitariu, M., Nemri, A., Orsolic, I., Krupic, J., Bauza, M., Sahani, M., Keller,\nG. B., Mrsic-Flogel, T. D., and Hofer, S. B. (2015). Learning Enhances Sensory and Multiple Non-sensory\nRepresentations in Primary Visual Cortex. Neuron, 86(6):1478\u20131490.\n\nPrescott, S. A. and De Koninck, Y. (2003). Gain control of \ufb01ring rate by shunting inhibition: roles of synaptic\n\nnoise and dendritic saturation. Proc. Natl. Acad. Sci. USA, 100(4):2076\u20132081.\n\nSakata, S. and Harris, K. D. (2009). Laminar structure of spontaneous and sensory-evoked population activity in\n\nauditory cortex. Neuron, 64(3):404\u2013418.\n\nSenn, W., Markram, H., and Tsodyks, M. (2001). An algorithm for modifying neurotransmitter release probability\n\nbased on pre-and postsynaptic spike timing. Neural Computation, 13(1):35\u201367.\n\nSeybold, B. A., Phillips, E. A. K., Schreiner, C. E., and Hasenstaub, A. R. (2015). Inhibitory Actions Uni\ufb01ed by\n\nNetwork Integration. Neuron, 87(6):1181\u20131192.\n\nSong, S., Sj\u00f6str\u00f6m, P. J., Reigl, M., Nelson, S., and Chklovskii, D. B. (2005). Highly Nonrandom Features of\n\nSynaptic Connectivity in Local Cortical Circuits. PLoS Biology, 3(3):e68.\n\nStrata, P. and Harvey, R. (1999). Dale\u2019s principle. Brain research bulletin, 50(5):349\u2013350.\n\nSutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks.\n\narXiv.org.\n\nThomson, A. M., West, D. C., Wang, Y., and Bannister, A. P. (2002). Synaptic connections and small circuits\ninvolving excitatory and inhibitory neurons in layers 2-5 of adult rat and cat neocortex: triple intracellular\nrecordings and biocytin labelling in vitro. Cerebral cortex (New York, N.Y. : 1991), 12(9):936\u2013953.\n\nTieleman, T. and Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent\n\nmagnitude. COURSERA: Neural networks for machine learning, 4(2):26\u201331.\n\nvan den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., and Kavukcuoglu, K. (2016).\n\nConditional Image Generation with PixelCNN Decoders. arXiv.org.\n\nvan Kerkoerle, T., Self, M. W., and Roelfsema, P. R. (2017). Layer-speci\ufb01city in the effects of attention and\n\nworking memory on activity in primary visual cortex. Nature Communications, 8:13804.\n\nvan Vreeswijk, C. and Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and\n\ninhibitory activity. Science, 274(5293):1724\u20131726.\n\nVogels, T. P. and Abbott, L. F. (2009). Gating multiple signals through detailed balance of excitation and\n\ninhibition in spiking networks. Nature Neuroscience, 12(4):483.\n\nWang, Y., Markram, H., Goodman, P. H., Berger, T. K., Ma, J., and Goldman-Rakic, P. S. (2006). Heterogeneity\n\nin the pyramidal network of the medial prefrontal cortex. Nature Publishing Group, 9(4):534\u2013542.\n\nXue, M., Atallah, B. V., and Scanziani, M. (2014). Equalizing excitation-inhibition ratios across visual cortical\n\nneurons. Nature, 511(7511):596\u2013600.\n\nYork, L. C. and van Rossum, M. C. W. (2009). Recurrent networks with short term synaptic depression. Journal\n\nof Computational Neuroscience, 27(3):607\u2013620.\n\nZenke, F., Agnes, E. J., and Gerstner, W. (2015). Diverse synaptic plasticity mechanisms orchestrated to form\n\nand retrieve memories in spiking neural networks. Nature Communications, 6:6922.\n\n12\n\n\f", "award": [], "sourceid": 216, "authors": [{"given_name": "Rui", "family_name": "Costa", "institution": "University of Oxford"}, {"given_name": "Ioannis Alexandros", "family_name": "Assael", "institution": "DeepMind"}, {"given_name": "Brendan", "family_name": "Shillingford", "institution": "University of Oxford"}, {"given_name": "Nando", "family_name": "de Freitas", "institution": "University of Oxford"}, {"given_name": "TIm", "family_name": "Vogels", "institution": "University of Oxford"}]}