{"title": "The emergence of multiple retinal cell types through efficient coding of natural movies", "book": "Advances in Neural Information Processing Systems", "page_first": 9389, "page_last": 9400, "abstract": "One of the most striking aspects of early visual processing in the retina is the immediate parcellation of visual information into multiple parallel pathways, formed by different retinal ganglion cell types each tiling the entire visual field. Existing theories of efficient coding have been unable to account for the functional advantages of such cell-type diversity in encoding natural scenes. Here we go beyond previous theories to analyze how a simple linear retinal encoding model with different convolutional cell types efficiently encodes naturalistic spatiotemporal movies given a fixed firing rate budget. We find that optimizing the receptive fields and cell densities of two cell types makes them match the properties of the two main cell types in the primate retina, midget and parasol cells, in terms of spatial and temporal sensitivity, cell spacing, and their relative ratio. Moreover, our theory gives a precise account of how the ratio of midget to parasol cells decreases with retinal eccentricity. Also, we train a nonlinear encoding model with a rectifying nonlinearity to efficiently encode naturalistic movies, and again find emergent receptive fields resembling those of midget and parasol cells that are now further subdivided into ON and OFF types. Thus our work provides a theoretical justification, based on the efficient coding of natural movies, for the existence of the four most dominant cell types in the primate retina that together comprise 70% of all ganglion cells.", "full_text": "The emergence of multiple retinal cell types through\n\nef\ufb01cient coding of natural movies\n\nSamuel A. Ocko\u21e4\u2020, Jack Lindsey\u21e4, Surya Ganguli1, Stephane Deny\u2020\n\nDepartment of Applied Physics, Stanford and 1Google Brain, Mountain View, CA\n\nAbstract\n\nOne of the most striking aspects of early visual processing in the retina is the im-\nmediate parcellation of visual information into multiple parallel pathways, formed\nby different retinal ganglion cell types each tiling the entire visual \ufb01eld. Existing\ntheories of ef\ufb01cient coding have been unable to account for the functional advan-\ntages of such cell-type diversity in encoding natural scenes. Here we go beyond\nprevious theories to analyze how a simple linear retinal encoding model with\ndifferent convolutional cell types ef\ufb01ciently encodes naturalistic spatiotemporal\nmovies given a \ufb01xed \ufb01ring rate budget. We \ufb01nd that optimizing the receptive\n\ufb01elds and cell densities of two cell types makes them match the properties of the\ntwo main cell types in the primate retina, midget and parasol cells, in terms of\nspatial and temporal sensitivity, cell spacing, and their relative ratio. Moreover,\nour theory gives a precise account of how the ratio of midget to parasol cells\ndecreases with retinal eccentricity. Also, we train a nonlinear encoding model with\na rectifying nonlinearity to ef\ufb01ciently encode naturalistic movies, and again \ufb01nd\nemergent receptive \ufb01elds resembling those of midget and parasol cells that are now\nfurther subdivided into ON and OFF types. Thus our work provides a theoretical\njusti\ufb01cation, based on the ef\ufb01cient coding of natural movies, for the existence of\nthe four most dominant cell types in the primate retina that together comprise 70%\nof all ganglion cells.\n\n1\n\nIntroduction\n\nThe time honored principle that the visual system evolved to ef\ufb01ciently encode the structure of\nour visual world opens up the tantalizing possibility that we can predict, ab initio, the functional\norganization of visual circuitry simply in terms of the statistical structure of natural scenes. Indeed,\nef\ufb01cient coding theory has achieved several successes in the retina by simply considering coding\nof static spatial scenes [1, 2, 3] or mostly temporal sequences [4]. However, such theories have not\nyet accounted for one of the most salient aspects of retinal computation, namely the existence of a\ndiversity of retinal ganglion cell types, each forming a mosaic that uniformly tiles the visual \ufb01eld [5].\nA few theoretical studies have suggested reasons for different cell types. One suggestion is the\nfeature detector hypothesis [6, 7, 8], or the need to detect highly specialized, behaviorally relevant\nenvironmental cues. However, many cell types respond broadly to general classes of stimuli whose\ndirect behavioral relevance remains unclear [9, 10, 11]. Another line of argument involves metabolic\nef\ufb01ciency. In particular the division of ganglion cells into rectifying ON and OFF populations is\nmore metabolically ef\ufb01cient than linear encoding with a single population [12], and the asymmetry\nbetween ON and OFF cells can be related to the asymmetric distribution of light intensity in natural\nspatial scenes [13]. Another ef\ufb01cient coding argument explains why two populations with similar\nreceptive \ufb01elds (RFs) have different activation thresholds in the salamander retina [14].\n\n\u21e4Equal contribution. All code available at https://github.com/ganguli-lab/RetinalCellTypes.\n\u2020Corresponding authors: samocko@gmail.com and stephane.deny.pro@gmail.com.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fHere we go beyond previous ef\ufb01cient coding theories of the retina by optimizing convolutional retinal\nmodels with multiple cell types of differing spatial densities to ef\ufb01ciently encode the spatiotemporal\nstructure of natural movies, rather than simply the spatial structure in natural scenes.\nIndeed,\npsychophysical studies of human sensitivity [15, 16] suggest our visual system is optimized to\nprocess the spatiotemporal information content of natural movies. Our theory enables us to account\nfor several detailed aspects of retinal function. In particular, the primate retina is dominated by four\ntypes of ganglion cells, ON midget and parasol cells and their OFF counterparts. Together, these\ntypes constitute 68% of all ganglion cells [17], and more than 95% in the central retina [18]. Midget\ncells are characterized by (1) a high density of cells (52% of the whole population), (2) a small\nspatial RF, (3) slow temporal \ufb01ltering, and (4) low sensitivity \u2013 as measured by the slope of their\ncontrast-response function. In contrast, parasol cells are characterized by (1) a low density of cells\n(16% of the whole population), (2) a large RF, (3) fast temporal \ufb01ltering and (4) high sensitivity\n[19, 20, 21, 22]. Moreover, the density ratio of midget to parasol cells systematically decreases across\nretinal eccentricity from the fovea to the periphery [23].\nRemarkably, our theory reveals how all these detailed retinal properties arise as a natural consequence\nof the statistical structure of natural movies and realistic energy constraints. In particular, our theory\nsimultaneously accounts for: (1) why it is bene\ufb01cial to have these multiple cell types in the \ufb01rst place,\n(2) why the four properties of cell density, spatial RF-size, temporal \ufb01ltering speed, and contrast\nsensitivity co-vary the way they do across midget and parasol types, and (3) quantitatively explains\nthe variation in midget to parasol density ratios over retinal eccentricities. Moreover, our theory,\ncombined with simulations of ef\ufb01cient nonlinear encoding models, also accounts for the existence\nof both ON and OFF midget and parasol cells. Thus simply by extending ef\ufb01cient coding theory to\nmultiple cell types and natural movies with a realistic energy constraint, we account for cell-type\ndiversity that captures 70% of all ganglion cells.\n\n2 A theoretical framework for optimal retinal function\n\nRetinal model. We de\ufb01ne a ganglion cell type as a convolutional array of neurons sampling linearly\nfrom a regularly spaced array of Np photoreceptors, indexed by i = 0, . . . , Np 1 (Fig. 1A). We\nmodel photoreceptors as linearly encoding local image contrast. We also assume the NC ganglion\ncells of cell type C, indexed by j = 0, . . . , NC 1, each have a common spatiotemporal RF, de\ufb01ned\nas FC(i, t), whose center is shifted to a position j \u00b7 sC on the photoreceptor array. Here sC is the\nconvolutional stride of type C, which is an integer denoting the number of photoreceptors separating\nadjacent RF centers of ganglion cells of type C, so that Np = sCNC. This yields a retinal model\n\nO({FC}) = V XNC1\nj=0 \u2326Y2\nj\u21b5p/2\n\n2\n\n.\n\n(3)\n\nYC,j(t) =XNp1\n\ni=0 XT1\n\nFC(i + [j \u00b7 sC], t t0) \u00b7 Xi(t0) + \u2318C,j(t),\n\n(1)\n\nt0=0\n\nwhere Xi(t0) is activation of photoreceptor i and time t0, \u2318C,j(t) is additive noise that is white across\nboth space (C and j) and time (t) with constant variance 2\n\u2318 [24], and YC,j(t) is the \ufb01ring rate of\nganglion cell j of type C. We work in one spatial dimension (generalizing to two is straightforward).\nThis linear model will provide conceptual insight into spatiotemporal RFs of different cell types\nthrough exact mathematical analysis. However, in Sec. 5 we also consider a nonlinear version of the\nmodel to account for rectifying properties of different cell types.\nNatural movie statistics. We approximate natural movies by their second order statistics (Fig. 1B),\nassuming Gaussianity. Natural movies are statistically translation invariant, and so their second order\nstatistics are described by their Fourier power spectrum, which follows an approximately space-time\nseparable power law as function of spatial (k) and temporal (!) frequency [25, 26, 27]:\n\nS (k, !) /\u21e0 1/|k|2 |!|2 .\n\n(2)\nOptimization Framework. We assume the objective of the retina is to faithfully encode natural\nmovies while minimizing overall ganglion cell \ufb01ring rate. We quantify encoding \ufb01delity by the amount\nof input variance V explained by the optimal minimum mean squared error (MMSE) reconstruction of\nphotoreceptor activity patterns from ganglion cell outputs by a linear decoder. Moreover, we assume\nan overall penalty on output \ufb01ring rate that is proportional to a power p of the rate. This yields an\nobjective function to be maximized over the set of convolutional \ufb01lters {FC}:\n\n\fA\n[\u2318\n\nNatural movie\nFsOneT\n\nlinear encoding\n\nVS\n\nNatural movie\nsA\n\n[\n\n[\n\nsB\nReconstruction\n\nC\n\nOne Type\n\nFiring \n\nRate\n\nEncoding\n\nPower\n\nParasol Firing \n\nRate\n\n.\n.\n.\n\nMidget\n\nFiring \n\nRate\n\nlinear decoding\n\nReconstruction\n\nB\n\nS(k, !)\n\n) S(k, !)\n\n \nTo t a l\nR R M S\n\n!\n\nk\n\n \nTo t a l\nR R M S\n\n!\n\nk\n\nFigure 1: An ef\ufb01cient coding model for multiple convolutional cell types. A) Natural movies are\nencoded in a set of ganglion cells (circles) through linear \ufb01lters corrupted by noise (Eq. 1). Left:\none convolutional cell type with stride s (green). Right: two different cell types (blue and orange)\nwith different strides sA and sB. B) The Fourier power spectrum S(k, !) of natural movies decays\nas a separable power law in both spatial (k) and temporal (!) frequency (see Eq. 2). Dashed lines\nare iso-contours of constant power in linear (left) and logarithmic (right) axes with power varying\nfrom high (red) to low (blue). Note that, aside from the origin, the most powerful Fourier modes\nare contained in two distinct regions: (1) low k, high !, and (2) high k, low !. C) We will show\nthat two convolutional cell types (orange, blue) encode visual information more ef\ufb01ciently than one\ntype (green) by specializing their \ufb01lters to cover these two regions. The orange, parasol-like cell type\nspecializes to region (1) using a small number of cells at large stride with large spatial RFs (low k),\nand fast temporal \ufb01lters (high !) that \ufb01re sensitively at high rates. The blue, midget-like cell type\nspecializes to region (2) using a large number of cells at small stride with small spatial RFs (high k)\nand slow temporal \ufb01lters (low !) using low \ufb01ring rates. Together, these two specialized cell types\ncan encode photoreceptor input patterns with the same \ufb01delity as a single, undifferentiated cell type\n(green), with less total \ufb01ring rate.\n\nHere is a parameter that trades off between the competing desiderata of maximizing encoding\n\ufb01delity versus minimizing \ufb01ring rate. Our \ufb01nal results will focus on the choice p = 1, motivated by\nthe linear relationship between metabolic cost and \ufb01ring rate [28, 29]. This choice is similar to an\n`1 penalty used in [30, 31, 2]. However we also consider more general p, including p = 2, both to\nconnect to prior work on ef\ufb01cient coding [1, 3] and as a building block for solving the p = 1 case.\nOutline. In Sec. 3, we prove mathematically that multiple cell types enable a more ef\ufb01cient code\n(i.e. same or better encoding \ufb01delity with lower \ufb01ring rates) than a single cell type, as long as p < 2.\nThe fundamental idea is that different cell types allow higher ef\ufb01ciency by specializing to different\nregions of the power spectrum of natural movies (Fig. 1C). In Sec. 4, we \ufb01nd the optimal cell types\nfor natural movies, demonstrating that the best two-type strategy substantially out-performs the best\none-type strategy. We then compare these optimal types to midget and parasol cells in the primate\nretina and \ufb01nd striking agreement between optimal and biological cell types. In Sec. 5, we extend\nour theory to non-linear ganglion cells and account for both ON and OFF midget and parasol cells.\n3 Mathematical proof of the bene\ufb01t of multiple cell types\nHere we derive mathematically how multiple specialized cell types can confer an ef\ufb01cient coding\nadvantage compared to a single cell type (Fig. 1A,C). In Sec. 3.1, we start with the simple case of a\nsingle cell type with stride 1 yielding an equal number of ganglion cells and photoreceptors, encoding\nstatic images [1, 32]. We then extend this framework to varying strides (Sec. 3.2), encoding natural\nmovies (Sec. 3.3), and multiple cell types, proving that they can confer an advantage (Sec. 3.4). We\nwill solve for the optimal RFs and strides of each cell type in Sec. 4.\n3.1 Encoding Np photoreceptors with NC = Np ganglion cells of a single type\nAs a warmup, for purely spatial scenes, we \ufb01rst consider optimizing the single cell-type retinal \ufb01lter\nFC(i, t) in Eq. 1 under the objective function in Eq. 3, in the simple case of stride sC = 1 so that\nNC = Np. Since we only have one cell type, we drop the cell-type index C in the following. The case\n\n3\n\nEncoding\nPower\fof NC = Np simpli\ufb01es because we can ignore aliasing [33], which we address in the next section.\nThus we can show (App. A) that each spatial Fourier mode \u02dcX(n) of photoreceptor patterns maps in\none-to-one fashion onto a single spatial Fourier mode \u02dcY(m) of ganglion cell patterns (Fig. 2A1):\n\nYj =XNp1\n\ni=0\n\n2\n\n(5)\n\n)\n\nF(i + j) \u00b7 Xi + \u2318j\n\n\u02dcY(m) = m,n\u02dcF(n) \u00b7 \u02dcX(n) + \u02dc\u2318(m),\n\n(4)\nwhere is the Kronecker delta function, n 2 {Np/2 + 1, . . . 0, 1, . . . Np/2} indexes photoreceptor\nFourier modes, m 2 {NC/2 + 1, . . . 0, 1, . . . NC/2} indexes ganglion cell Fourier modes, and \u02dc\u2318(m)\n\u2318). \u02dcF is the Fourier transform of\nis the spatial Fourier transform of the noise (which also has variance 2\nF across photoreceptors, rescaled by pNC. Each mode number n (m) corresponds to a photoreceptor\n(ganglion cell) spatial frequency kn \u2318 2\u21e1n/Np (pm \u2318 2\u21e1m/NC). The power S(n) in photoreceptor\nmode n is simply proportional to the power S (k, !) in natural movies (Eq. 2) evaluated at spatial\nfrequency k = k|n|. Finally, because image statistics are translation invariant, the objective (Eq. 3)\ncan be written (App. A) in terms of independent photoreceptor spatial modes (here p = 2):\n\nj=0 h|Yj|2i =XNp/2\n\nn=Np/2+1 |\u02dcF(n)|2S(n)2\n\nO = V XNC1\nThus O can be maximized independently for each \ufb01lter mode n, yielding the optimal \ufb01lter (App. A):\n(6)\n\n\u2318\u21e4! .\n\u2318 + |\u02dcF(n)|2S(n) \u21e5|\u02dcF(n)|2S(n) + 2\n\u2318, H = 1/p,\n\n|\u02dcFOpt(n)|2 = Q1(n)\u21e5H Q1(n)\u21e4+, where Q(n) =qS(n)/2\n\nwhere Q(n) is a measure of the quality of photoreceptor Fourier mode, or input channel n.\nThis solution has an appealing water-\ufb01lling [34] interpretation (Fig. 2A2) in which each channel n\nof quality Q(n) corresponds to a beaker with base height and width both equal to Q(n)1. These\n\nbeakers are \ufb01lled with water up to height H = 1/p, and the power |\u02dcFOpt(n)|2 assigned to \ufb01lter\nmode n is simply the volume of water in beaker n. Thus extremely low quality channels with beaker\nbase Q(n)1 greater than the water height H are not used. Similarly, high quality channels with a\nlow base are not assigned much \ufb01lter strength because they are also narrow. Thus the optimal solution\nassigns \ufb01lter strength as a non-monotonic function of channel quality (Fig. 2A3), favoring channels\nof intermediate quality, eschewing extremely low quality channels that do not contribute much to\nencoding \ufb01delity, while attenuating channels that are already high-quality whose ampli\ufb01cation would\nyield a cost in \ufb01ring rate that outweighs the improved coding \ufb01delity. As the penalty in \ufb01ring rate is\nreduced, the water height H increases, and more lower quality channels are used by the optimal \ufb01lter.\nBecause the power spectrum of natural movies decays with spatial frequency [25], higher (lower)\nquality channels correspond to lower (higher) spatial frequencies. Thus the non-monotonic optimal\n\ufb01lter strength as a function of channel quality (Fig. 2A3) leads to two qualitative effects (Fig. 2A4):\n(1) the attenuation of very low frequency high quality channels relative to intermediate frequency\nchannels (spatial whitening) driven primarily by the need to lower \ufb01ring rate, and (2) the eschewing\nof very high frequency low quality channels (spatial smoothing), which do not contribute strongly to\nencoding \ufb01delity.\n\n3.2 Encoding Np photoreceptors with NC < Np ganglion cells of a single type\nIn the case of strides greater than 1, with fewer ganglion cells than photoreceptors, more than one\nspatial Fourier mode of photoreceptor activity can map to the same spatial Fourier mode of ganglion\ncell activity, a phenomenon known as aliasing. Indeed, not only does photoreceptor mode index m\nmap to ganglion cell mode index m, as in Sec 3.2, but so does every other photoreceptor mode n\nseparated from m by an integer multiple of NC (Fig. 2 B, App. B), yielding the map\n\nYj =Xi\n\nF(i + js) \u00b7 Xi + \u2318j ) \u02dcY(m) =Xn=m+n\u2019NC\n\n\u02dcF(n) \u00b7 \u02dcX(n) + \u02dc\u2318(m),\n\n(7)\n\nwhere n\u2019 ranges over integers such that n = m + n\u2019NC enumerates all photoreceptor frequencies n\nwithin the bounds Np/2 < n \uf8ff Np/2 which alias to the same ganglion cell frequency m. Despite\nthe many-to-one map from photoreceptor to ganglion cell Fourier modes, one can still optimize Eq. 3\nindependently over different \ufb01lter modes akin to Eq. 5 through the following argument. The \ufb01ring of\neach ganglion cell frequency m comes from the set of photoreceptor frequencies which alias to it, i.e.\nn = m + n\u2019NC (Fig. 2 B1). First we show that it is optimal for a single ganglion cell mode n to draw\n\n4\n\n\fA1\nn\n\nm\n\nA4\n\n)\nk\n(\n\u02dcF\n\nWhitening\n\nA2\n\n]\n\n)\nn\n(\n\nQ\n/\n1\n\n]\n1/Q(n)\nA3\n\nS\n\nm\n\no\n\no\n\nt\n\nh\n\ni\n\nn\n\ng\n\nk\n\n)\nk\n(\n\nQ\n\nt\np\nO\nF\n\nQ\n\n]H\n\nB1\nn\n\nm\n\nB2\n\n)\nk\n(\n\u02dcF\n\n\n\n\n\nS\n\nm\n\noothing\n\n)\nk\n(\n\nQ\nAnti-Aliasing\n\nWhitening\n\nk\n\nC\n\nn\n\n!\n\n!\n\nD\n\n1\n \nR\nP\n\n\n\nPR 0\n\n1 Cell Type\n\nThis is a\n\n1\n \nR\nP\n\ncheckmark.\n\nm\n\nPR 0\n\n2 Cell Types\n\nFigure 2: A1) In the convolutional framework of [1], every photoreceptor spatial frequency (upper\ndots) maps to a corresponding ganglion cell spatial frequency (lower dots). A2) The optimal \ufb01lter\nstrength assigned to each mode n can be viewed as the volume of water assigned to a corresponding\nbeaker whose base height and width are inversely related to channel quality Q(n), and the water\nheight H is inversely related to the \ufb01ring rate penalty parameter . A3) The optimal \ufb01lter strength is\na non-monotonic function of channel quality. A4) This also leads to non-monotonic optimal \ufb01lter\nstrength as a function of spatial frequency. B1) With fewer ganglion cells than photoreceptors, the\noptimal \ufb01lter will sample only from the lowest photoreceptor spatial frequencies that map in one-to-\none fashion to the lowest ganglion cell spatial frequencies, and will ignore the higher photoreceptor\nfrequencies that alias to the same ganglion cell frequencies (App. B). B2) Thus the optimal \ufb01lter\nachieves a similar solution as in A4, with an additional upper bound on frequency to avoid aliasing.\nC) In spacetime, the optimal \ufb01lter maps spatiotemporal photoreceptor frequencies (upper dots) to\nganglion cell frequencies (lower dots) in a one-to-one fashion, ignoring higher photoreceptor spatial\nfrequencies to avoid spatial aliasing. Different spatial (temporal) frequencies are indicated by shade\n(color). D) (See Sec. 3.4) The blue ellipsoid illustrates a correlated stimulus covariance across two\nphotoreceptors (PR). Top: the two arrows denote two \ufb01lters ~F0 and ~F1 of two ganglion cells of a single\ncell type, related by a convolutional translation, modulo Np = 2 and therefore re\ufb02ection-symmetric\nabout the diagonal (Sec. 3.4). Bottom: the two arrows denote two rotated \ufb01lters ~FA and ~FB that each\nspecialize to a different eigenbasis vector of the stimulus covariance, thereby differentiating into two\ncell types, enabling a more ef\ufb01cient neural code with the same \ufb01delity but lower \ufb01ring rate cost.\n\nonly from the input eigen-mode with largest power (App. B.1). Now given the lowest photoreceptor\nspatial frequencies have the highest power, the optimal convolutional \ufb01lter should sample only from\nthe lowest photoreceptor frequencies, and not from the higher aliasing frequencies. Thus the optimal\n\ufb01lter has \u02dcF(n) = 0 for all n with |n| > NC/2. Therefore, the optimal \ufb01lter in frequency space is\nsimply a scaled, truncated version of Eq. 6 (Fig. 2 B1, B2):\n\u2318, H = 1/p,\n\n|\u02dcFOpt(n)|2 = H (NC/2 | n|) Q1(n)\u21e5H Q1(n)\u21e4+, where Q(n) =qS(n)/2\n\nwhere H is the Heaviside function.\nNote the optimal upper frequency cutoff to avoid aliasing naturally yields tiling, in which the spatial\nRF width becomes proportional to the stride [33]. To see this, consider a cell type with a stride in\nphysical space of length s. Spatial frequencies higher than O(1/s) will lead to aliasing, yielding a\nfrequency cut-off ks / 1/s. Further assume, for simplicity, that the water-\ufb01lling solution \ufb01lls all\nspatial frequency modes up to ks with the same amplitude, and chooses the same phase. This yields a\nbox Fourier spectrum whose inverse spatial RF is a sinc function whose \ufb01rst zero crossing occurs\nat spatial scale 1/ks / s. Thus the RF width is proportional to stride, and cells with high (low)\nfrequency cut-offs have small (large) RF widths and strides.\n3.3 Generalizing the framework to spatiotemporal movies for a single cell type\nWith the addition of time, we can Fourier transform Eq. 1 in both space and time, yielding\n\n(8)\n\n\u02dcY(m, !) =Xn=m+n\u2019NC\n\n\u02dcF(n, !) \u00b7 \u02dcX(n, !) + \u02dc\u2318(m, !),\n\n(9)\n\nwhere Xi(t0) is the activity of photoreceptor i at time t0, \u02dcF is the Fourier transform of F rescaled\nby pNCT, and \u02dc\u2318(m, !) is the Fourier transform of the noise. Note that with fewer ganglion cells\n\n5\n\n\fthan photoreceptors, there will be a many-to-one map from photoreceptor spatial frequency to each\nganglion cell spatial frequency as in Eq. 7, but a one-to-one map from photoreceptor to ganglion\ncell temporal frequencies (Fig. 2C, App. C). As in Sec. 3.2, the optimal \ufb01lter will map the lowest\nphotoreceptor spatial frequencies one-to-one to the lowest ganglion cell frequencies, while ignoring\nhigher photoreceptor spatial frequencies to avoid aliasing. Moreover, within this aliasing constraint,\nspatiotemporal photoreceptor frequencies map one-to-one to ganglion cell frequencies, yielding an\noptimal solution given by Eq. 8 with channel quality depending on spacetime power S(kn,! ).\n\nInterplay of \ufb01ring rate penalty and the bene\ufb01t of multiple cell types\n\n3.4\nTo build intuition for when and why multiple cell types can enable more ef\ufb01cient neural codes, we\nconsider the simplest possible scenario of Np = 2 photoreceptors and two convolution ganglion cells\nof a single type. The stimulus statistics and ganglion cell \ufb01lters are given by (see also Fig. 2D):\n\nCXX =\u2713 1\n\nc\n\nc\n\n1 \u25c6 ,\n\n~F0 =\u2713 f0\nf1 \u25c6 ,\n\n~F1 =\u2713 f1\nf0 \u25c6 .\n\nNote the two \ufb01lters are equal up to a translation (modulo Np) and therefore obey the convolutional\nconstraint. Let\u2019s call D the optimal decoder. Then the reconstruction ~Xr of the input is:\n\n~Xr = D(F~X + ~\u2318) = DF~X + D~\u2318 .\n\nHere F is a 2 by 2 \ufb01lter matrix whose rows are given by the two ganglion cell \ufb01lters. Now the\ndecoding performance is unaffected by an orthogonal rotation of the rows of F. Indeed, when\nF ! RF, we can transform D ! DR1 yielding the reconstruction\n\n~Xr = DR1(RF~X + ~\u2318) = DF~X + DR1~\u2318 .\n\nBecause R is a rotation (i.e. R1 = RT ) and ~\u2318 is isotropic Gaussian white noise, the statistics of ~Xr\nconditioned on ~X, and thus the explained variance V, is unchanged by the rotation. More formally, the\nexplained variance can be computed to be V = 2Tr DFCXX Tr DFCXX(DF)T 2\n\u2318TrDDT [35],\nand is independent of the transformation effected by R. This yields an entire manifold of optimal\n\ufb01lter matrices F with the same explained variance V.\nNow consider a particular choice of rotation R that rotates the two convolutional \ufb01lters into the\neigenbasis of CXX:\n\n~FA =\n\n~F0 + ~F1p2\n\n=\n\n(f0 + f1)\n\np2 \u2713 1\n\n1 \u25c6 , ~FB =\n\n~F0 ~F1p2\n\n=\n\np2 \u2713 1\n1 \u25c6 .\n(f0 f1)\n\nThe rotated \ufb01lters ~FA and ~FB are no longer related by any translation. Thus the convolutional\nconstraint is relaxed and they are analogous to two different cell types. We now compare the signal\ncomponent of the total \ufb01ring rate cost for the single cell-type convolutional \ufb01lters, given by:\n\n(10)\n\n\u2326Y2\nA\u21b5p/2\n\nwith the rotated, specialized, two-cell type \ufb01lters, given by\n\n\u2326Y2\n0\u21b5p/2\n+\u2326Y2\nB\u21b5p/2\n\n0 + f2\n\n= 2f2\n\n+\u2326Y2\n1\u21b5p/2\n=(1 + c)(f0 + f1)2p/2\n\n1 + 2cf0f1p/2\n+(1 c)(f0 f1)2p/2\n\n,\n\n.\n\n(11)\nAs long as c 6= 0 and p < 2, the rotated (Eq. 11) two-type solution uses a lower \ufb01ring rate budget\nthan the one-type solution (Eq. 10). We generalize this proof in App. D to arbitrary numbers of cells,\nconvolutional types and natural movie statistics. Thus intriguingly we \ufb01nd a sharp transition in the\nexponent p relating \ufb01ring rate to cost, with multiple cell types favored if and only if p < 2.\nSome prior work on ef\ufb01cient coding [1, 3] employed an `2 penalty on \ufb01ring rate (i.e. p = 2), while\nothers [30, 31, 2] have employed an `1 penalty (i.e. p = 1 in our Gaussian scenario). We note\nthat energetic considerations suggest that metabolic cost is linearly related to \ufb01ring rate [28, 29]\n(i.e. p = 1). Prior knowledge that multiple retinal cell types do indeed exist, in addition to these\nenergetic considerations, lead us to consider p = 1 in the following, corresponding to a penalty on\nthe root-mean-squared (RMS) \ufb01ring rates, summed over all cells.\n\n6\n\n\fA\n\nD\n\nOne type\n\nB\n\n\u2018Parasol\u2019\n\n\u2018Midget\u2019\n\nOne cell type \nbest solution\n\nFixed RMS rate\n\nE\n\nOne cell type \nbest solution\n\nFixed error\n\nC\n\nF\n\nFigure 3: Optimal cell types match properties of midget and parasol cells. A) Optimal RF power\nspectra for a single cell type with darker shades denoting higher \ufb01lter strength. Black lines are\niso-contours of the power spectrum of natural movies. B) Optimal RF power spectra for two cell\ntypes with same conventions as A. C) Left: RF power spectrum of the two cell types along the\nfrequency axis at \ufb01xed spatial frequency (the blue dashed lines in B). Right: Measured sensitivity or\ncontrast gain as a function of temporal frequency of real midget and parasol cells (reprinted from\n[36]). D) Reconstruction error as a function of the fraction of midget cells for the RMS \ufb01ring rate\nbudget at which the optimal fraction is 93% (red line), consistent with the fraction found at a certain\nretinal eccentricity [18]. E) RMS \ufb01ring rate budget as a function of the fraction of midget cells, for a\n\ufb01xed reconstruction error. Note the optimal one type solution (last point on the right) requires a 50%\nhigher \ufb01ring rate than the optimal two-type solution. F) Fraction of midget cells as a function of the\ntotal density of cells. Red points: optimal fractions predicted by theory (only one parameter was \ufb01tted\nto the data, see text). Line: \ufb01t to the fraction of midget across the human retina estimated from [23].\n\n4 Comparison of theoretically derived cell types to primate retinal cell types\nHere we optimize the ef\ufb01cient coding objective in Eq. 3, with a metabolically motivated \ufb01ring rate\npenalty corresponding to p = 1, using the procedure described in App. E. In particular, for the same\ntotal RMS \ufb01ring rate budget (obtained by increasing in Eq. 3 until we \ufb01rst match this budget) we\nfound both the optimal single (Fig. 3A) and two (Fig. 3B) cell type solutions. For the two-cell type\nsolutions, we additionally scanned the two strides of each cell type. Given that for each cell type C,\nthe number of cells NC and stride sC are related by sCNC = Np, varying the two strides is equivalent\nto varying the fractions of each cell type. All such two-cell type solutions had the same RMS rate, but\nvarying encoding \ufb01delity (the variance explained term V in Eq. 3). Thus for each \ufb01ring rate budget,\nwe \ufb01nd an optimal cell-type ratio with highest encoding \ufb01delity. In Fig. 3AB we employed a budget\nthat yielded an optimal cell-type ratio matching that of midget to parasol cells in the primate retina\n[18]. However, the general structure of the resultant RF power spectra in Fig. 3AB is robust to the\nchoice of total RMS \ufb01ring rate budget.\nRemarkably, this general structure of the theoretically derived two-cell-type solution matches many\nproperties of biologically observed primate retinal cell types. The \ufb01rst type corresponds to para-\nsol cells, covering low spatial frequencies (implying large spatial RFs with large stride and low\nnumber density) and high temporal frequencies (implying fast temporal \ufb01ltering). The second type\ncorresponds to midget cells, covering a large number of high spatial frequencies (implying small\nspatial RFs with small stride and high number density) and low temporal frequencies (implying slow\ntemporal \ufb01ltering). Moreover, a single slice of the RF power spectrum along temporal frequency at a\n\ufb01xed spatial frequency (Fig. 3C, left) reveals that our theoretically derived \u201cparasol\u201d cell type has\nhigher sensitivity (i.e. \ufb01lter strength, or gain between input-contrast and output response) compared\nto the \u201cmidget\u201d cell type, consistent with observations from the primate retina (Fig. 3C, right [36]).\nThus as promised in Fig. 1C, the striking covariation of the four distinct features (cell density, spatial\nRF-size, temporal \ufb01ltering speed, and contrast sensitivity) across the two dominant primate retinal\ncell types, arises as a simple emergent property of the two tailed structure of the natural movie power\nspectrum. By specializing to these two tails, the two-cell type solution in Fig. 3B can achieve higher\nencoding \ufb01delity at the same RMS \ufb01ring rate budget compared to the single-cell type solution in\nFig. 3A. Indeed for the common \ufb01ring rate budget chosen in Fig. 3AB, the optimal two-cell type\nsolution achieves a 34% reduction in reconstruction error compared to the single type solution (Fig.\n\n7\n\n\f3D). Conversely, at a \ufb01xed reconstruction error (of 0.5%), two cell types are 33% more ef\ufb01cient than\none in terms of total RMS \ufb01ring rate (Fig. 3E). More generally, across for any non-zero \ufb01ring rate\nbudget the two-type solution achieves higher encoding \ufb01delity, and for any desired encoding \ufb01delity,\nthe two-type solution requires lower \ufb01ring rates.\nThe \ufb01xed budget shown in Fig. 3D and the \ufb01xed reconstruction error shown in Fig. 3E, were chosen\nsuch that optimal fractions of midget and parasol cells were 93% and 7%, respectively, consistent with\nthose found at certain eccentricities of the primate retina [23]. However, in our model the optimal\nfractions change as the \ufb01ring rate budget is increased (or equivalently, as the reconstruction error\nis decreased). The total density of cells in the optimal solution computed by the model also varies\nwith the \ufb01ring rate budget. Thus, our model makes a speci\ufb01c numerical prediction relating total cell\ndensity to the ratio of midget to parasol cells. The total density of cells varies across eccentricity by 3\norders of magnitude in the primate retina. In Fig. 3F, we plot the predicted evolution of the percentage\nof midget cells with cell density and compare it to the evolution of this percentage estimated from\nbiological data [23] (see App. F for estimation method). Our model involves only one adjustable\nparameter to account for our arbitrary choice of units of cell density. Remarkably, we \ufb01nd an excellent\nmatch between theory and experiment in Fig. 3F, providing further evidence that the principle of\nef\ufb01cient encoding of natural movies under a limited \ufb01ring rate budget may be driving the functional\norganization of the primate retina.\n\n5 A neural network simulation for linear-non-linear neurons\nWhile the linear theory accounts for several properties of midget and parasol cells, it suffers from two\nmain de\ufb01ciencies. First, like previous ef\ufb01cient coding theories [1, 32], it only predicts the power of\nRF Fourier spectra, leaving the phase, and therefore the full spacetime RF unspeci\ufb01ed. Second, it\ncannot account for rectifying nonlinearities, leading to the partition of ganglion cells into ON and\nOFF types. Here we remedy these de\ufb01ciencies through neural network simulations, in which we\nnonlinearly autoencode natural movies with two spatial dimensions and one temporal dimension\nusing three-dimensional convolutional neurons (full simulation details are given in App. G).\nThe main simulation ingredients include: (1) enforcing nonnegativity of neural \ufb01ring rates through a\nReLU nonlinearity in the ganglion cell encoding layer, (2) an `2 penalty on total weight magnitude,\ncorresponding to a cost for synaptic connections [37], (3) encouraging decoding input stimuli with a\nshort but non-zero temporal lag [38, 39], (4) implementing a \ufb01ring rate budget with an `1 penalty on\ntotal \ufb01ring rate. We assume four cell types, and optimize the number of cells allotted to each type. To\nmatch the fact that our image contrast distribution is zero mean Gaussian, and therefore symmetric\nabout the origin, we pair the types into two pairs of equal-size populations and keep the number of\ncells the same across each pair during this optimization, expecting that ON and OFF homologous\ntypes will emerge. It would be interesting to explore skewed image statistics and test whether these\nwould yield ON-OFF asymmetries, as are found in biological retinas [13]. We con\ufb01rmed that our\nequal pairing of types is a locally optimal cell type allocation (for our symmetric image statistics) by\nperforming a stability analysis around the best paired solution found (see App. G).\nWe optimize the number of neurons allocated to each type by grid search and their corresponding\nRFs by gradient descent (Fig. 4, App. G). The four optimal cell type RFs are strikingly similar\nto those of real ON-OFF midget and parasol cells found in the primate retina (see Fig. 4A-D for\nrepresentative examples of near-optimal neural network RFs, Fig. 4E-H for macaque data, also see\nApp. I). Both biphasic temporal \ufb01lters and the characteristic center-surround RF shape are visible.\nMoreover, consistent with the linear theory, the RF Fourier power spectra of parasol (midget) cells,\nboth in nonlinear simulations and experiments, specialize to cover low (high) spatial and high (low)\ntemporal frequencies. Furthermore, we \ufb01nd that the parasol cells have a higher average \ufb01ring rate\nthan the midget cells (Fig. 4I), consistent with the greater sensitivity of parasol cells found both in\nbiological data and our linear theory (see Fig. 3C). Also consistent with the linear theory, the neural\nnetwork optimization loss is reduced for four cell types (two pairs) compared to two (one pair) (Fig.\n4J). Moreover, the dependence of performance on cell type ratio mirrors the predictions of our linear\ntheory (compare Fig 3DE with Fig. 4J and see Appendix H).\n\n6 Discussion\nIn summary, we \ufb01rst demonstrated mathematically that there is a metabolic advantage to encoding\nnatural movies with more than one convolutional cell type (Fig. 1C). By \ufb01nding the optimal RFs,\n\n8\n\n\fA\n\ny\n\nB\n\ny\n\nC\n\nyy\n\nD\n\ny\n\nx\n\nx\n\nx\n\nx\n\nx\n\n)\n.\nu\n.\na\n(\n \nt\nh\ng\ni\ne\nW\n\n \nr\ne\nt\nn\ne\nC\n\n)\n.\nu\n.\na\n(\n \nt\nh\ng\ni\ne\nW\n\n \nr\ne\nt\nn\ne\nC\n\n)\n.\nu\n.\na\n(\n \nt\nh\ng\ni\ne\nW\n\n \nr\ne\nt\nn\ne\nC\n\n)\n.\nu\n.\na\n(\n \nt\nh\ng\ni\ne\nW\n\n \nr\ne\nt\nn\ne\nC\n\nSimulation\n\n\u03c9\n\n0\n\n0\n\n\u03c9\n\n0\n\n\u03c9\n\n0\n\n\u03c9\n\n0\n\nn\n\nn\n\nn\n\nn\n\n-7 0 2\n\n0\n\nTime\n\nTime\n\n0\n\n-7 0 2\n\n0\n\nTime\n\n-7 0 2\n\nTime\n\n-7 0 2\n\nTime\n\nE\n\ny\n\nF\n\ny\n\nG\n\ny\n\nH\n\ny\n\nx\n\nx\n\nx\n\nx\n\nData\n\n100 \u00b5m\n\n50 ms\n\nTime\n\nTime\n\nTime\n\nTime\n\n\u03c9\n\n0\n\n\u03c9\n\n0\n\n\u03c9\n\n0\n\n\u03c9\n\n0\n\nn\n\nn\n\nn\n\nn\n\nI\n\n)\n.\nu\n.\na\n(\n \ne\nt\na\nR\n \ng\nn\ni\nr\ni\nF\n \n.\ng\nv\nA\n\n0\n\nJ\n\nON ON OFF OFF \n P M P M\n\nOne type\n\n0\n.\n6\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n0\n.\n3\n \n\n)\n.\nu\n.\na\n(\n \ns\ns\no\nL\n\n 0.5 1.0\nFraction Midget Cells\n\nFigure 4: A nonlinear convolutional autoencoder reproduces primate retinal cell types. A, B, C,\nD) ON parasol-like type, ON midget-like type, OFF parasol-like type, and OFF midget-like type,\nrespectively. Within each panel, far left: spatial receptive \ufb01eld (RF) at peak temporal slice of the\nspatiotemporal RF. Center: temporal RF, measured as the evolution of the central photoreceptor of\nthe RF across time. Right: Space-time power spectrum of the RF, where dark shades correspond to\nhigh power. E, F, G, H) Same quantities measured from macaque retina (see Appendix I). I) Average\n\ufb01ring rate per cell of each autoencoder cell type. J) Optimal loss (see Eqn. 3, p=1) as a function of\nthe ratio between cell types densities. Note that optimal reconstruction with two cell type pairs is\napproximately 2 times better than with one cell type pair (black arrow).\n\nstrides and cell number ratios for the two populations, we show this advantage is substantial: a 33%\nreduction in RMS \ufb01ring rate at a \ufb01xed encoding \ufb01delity (reconstruction error: 0.5%). Moreover, the\ncorresponding cell types have similar RFs and densities to midget and parasol cells. We also predict\nwith great accuracy how the ratio of midget to parasol cells varies with the total cell density (Fig. 3F).\nFinally, by training a nonlinear neural network on the same task of reconstructing natural movies\nwith a limited \ufb01ring rate budget, we again con\ufb01rm the advantage of having midget and parasol cells,\nand we \ufb01nd further differentiation into ON and OFF types.\nThere are a number of other ganglion cell types found in the primate retina [17] (20 types). Our\ncurrent model accounts for the four most common cell types (ON and OFF midget and parasol cells),\nbut it could be extended to account for more cell types. The next most common cell type found in the\nprimate retina is the small bistrati\ufb01ed type [5], which, unlike midget and parasol cells, pools from\nblue cones with an opposite polarity to red and green cones. Midget cells are color sensitive [40], a\nproperty that we do not account for in our current model, due to our focus on grayscale movies. By\ntaking into account the spatiotemporal statistics of colors in natural movies, one can likely understand\nthe division of labor between midget, parasol and small bistrati\ufb01ed cells observed in primates.\nOur theory predicts primate cell types well, but interestingly we could not \ufb01nd a good match in other\nspecies, such as mouse. The most numerous ganglion cell type in the mouse retina is a selective,\nnon-linear feature detector (W3 cells [41]), thought to serve as an alarm system for overhead predators.\nIntriguingly, the retina may have evolved to detect behaviorally important predator cues in small\nanimals [7] and ef\ufb01ciently and faithfully encode natural movies in larger animals. A recent study using\na deep convolutional model of the visual system suggests that retinal computations either emerge\nas linear and information preserving encoders, or in the contrary as non-linear feature detectors,\ndepending on the degree of neural resources allocated to downstream visual circuitry [42].\nThus overall our work suggests that the retina has evolved to ef\ufb01ciently encode the translation\ninvariant statistics of natural movies through convolutional operations. Our model strikingly accounts\nfor the 4 dominant cell types comprising 70% of all primate ganglion cells. Furthermore, promising\nextensions of this work to color statistics could expand the reach of this theory to encompass even\ngreater cell-type diversity.\n\n9\n\n\fAcknowledgements\n\nWe thank Alexandra Kling and E.J. Chichilnisky for useful discussions, and for providing us with\nreceptive \ufb01eld visualizations of real midget and parasol cells. We thank Gabriel Mel for a helpful\ninsight about the two-cell proof. We thank the Karel Urbanek Postdoctoral fellowship (S.O) and\nthe NIH Brain Initiative U01-NS094288 (S.D), and the Burroughs-Wellcome, McKnight, James S.\nMcDonnell, and Simons Foundations, and the Of\ufb01ce of Naval Research (S.G) for support.\n\nReferences\n[1] Joseph J. Atick and A. Norman Redlich. Towards a theory of early visual processing. Neural Computation,\n\n2(3):308\u2013320, 1990.\n\n[2] Yan Karklin and Eero P. Simoncelli. Ef\ufb01cient coding of natural images with a population of noisy\nlinear-nonlinear neurons. In Advances in neural information processing systems, pages 999\u20131007, 2011.\n\n[3] Eizaburo Doi, Jeffrey L. Gauthier, Greg D. Field, Jonathon Shlens, Alexander Sher, Martin Greschner,\nTimothy A. Machado, Lauren H. Jepson, Keith Mathieson, Deborah E. Gunning, Alan M. Litke, Liam\nPaninski, E. J. Chichilnisky, and Eero P. Simoncelli. Ef\ufb01cient coding of spatial information in the primate\nretina. The Journal of Neuroscience: The Of\ufb01cial Journal of the Society for Neuroscience, 32(46):16256\u2013\n16264, November 2012.\n\n[4] J. H. van Hateren. Theoretical predictions of spatiotemporal receptive \ufb01elds of \ufb02y LMCs, and experimental\n\nvalidation. Journal of Comparative Physiology A, 171(2):157\u2013170, September 1992.\n\n[5] Heinz W\u00e4ssle. Parallel processing in the mammalian retina. Nature Reviews Neuroscience, 5(10):747,\n\n2004.\n\n[6] H. B. Barlow. Possible Principles Underlying the Transformations of Sensory Messages. The MIT Press,\n\nSeptember 1961.\n\n[7] J. Y. Lettvin, H. R. Maturana, W. S. McCulloch, and W. H. Pitts. What The Frogs Eye Tells The Frogs\n\nBrain. University of Pennsylvania Law Review, 154(3):233\u2013258, 1968.\n\n[8] Tim Gollisch and Markus Meister. Eye smarter than scientists believed: neural computations in circuits of\n\nthe retina. Neuron, 65(2):150\u2013164, January 2010.\n\n[9] Maria Neimark Geffen, Saskia E. J. de Vries, and Markus Meister. Retinal Ganglion Cells Can Rapidly\n\nChange Polarity from Off to On. PLOS Biology, 5(3):e65, March 2007.\n\n[10] Alexandra Tikidji-Hamburyan, Katja Reinhard, Hartwig Seitter, Anahit Hovhannisyan, Christopher A.\nProcyk, Annette E. Allen, Martin Schenk, Robert J. Lucas, and Thomas A. M\u00fcnch. Retinal output changes\nqualitatively with every change in ambient illuminance. Nature neuroscience, 18(1):66\u201374, January 2015.\n\n[11] Stephane Deny, Ulisse Ferrari, Emilie Mace, Pierre Yger, Romain Caplette, Serge Picaud, Ga\u0161per Tka\u02c7cik,\nand Olivier Marre. Multiplexed computations in retinal ganglion cells of a single type. Nature communica-\ntions, 8(1):1964, 2017.\n\n[12] Julijana Gjorgjieva, Haim Sompolinsky, and Markus Meister. Bene\ufb01ts of Pathway Splitting in Sensory\n\nCoding. The Journal of Neuroscience, 34(36):12127\u201312144, September 2014.\n\n[13] Charles P. Ratliff, Bart G. Borghuis, Yen-Hong Kao, Peter Sterling, and Vijay Balasubramanian. Retina is\nstructured to process an excess of darkness in natural scenes. Proceedings of the National Academy of\nSciences, 107(40):17368\u201317373, 2010.\n\n[14] David B. Kastner, Stephen A. Baccus, and Tatyana O. Sharpee. Critical and maximally informative\nencoding between neural populations in the retina. Proceedings of the National Academy of Sciences of\nthe United States of America, 112(8):2533\u20132538, February 2015.\n\n[15] J. Hans Van Hateren. Spatiotemporal contrast sensitivity of early vision. Vision research, 33(2):257\u2013267,\n\n1993.\n\n[16] Dawei W. Dong. Spatiotemporal coupling and scaling of natural images and human visual sensitivities. In\n\nAdvances in neural information processing systems, pages 859\u2013865, 1997.\n\n[17] D. M. Dacey. Origins of perception: retinal ganglion cell diversity and the creation of parallel visual\n\npathways. The cognitive neurosciences, 3:281\u2013301, 2004.\n\n10\n\n\f[18] Dennis M. Dacey. The mosaic of midget ganglion cells in the human retina. Journal of Neuroscience,\n\n13(12):5334\u20135355, 1993.\n\n[19] D. M. Dacey. Physiology, morphology and spatial densities of identi\ufb01ed ganglion cell types in primate\n\nretina. Ciba Foundation Symposium, 184:12\u201328; discussion 28\u201334, 63\u201370, 1994.\n\n[20] Jeffrey L. Gauthier, Greg D. Field, Alexander Sher, Jonathon Shlens, Martin Greschner, Alan M. Litke, and\nE. J. Chichilnisky. Uniform Signal Redundancy of Parasol and Midget Ganglion Cells in Primate Retina.\nJournal of Neuroscience, 29(14):4675\u20134680, April 2009.\n\n[21] W. H. Merigan, C. E. Byrne, and J. H. Maunsell. Does primate motion perception depend on the\nmagnocellular pathway? The Journal of Neuroscience: The Of\ufb01cial Journal of the Society for Neuroscience,\n11(11):3422\u20133429, November 1991.\n\n[22] E Kaplan and R M Shapley. The primate retina contains two types of ganglion cells, with high and low\ncontrast sensitivity. Proceedings of the National Academy of Sciences of the United States of America,\n83(8):2755\u20132757, April 1986.\n\n[23] D M Dacey and M R Petersen. Dendritic \ufb01eld size and morphology of midget and parasol ganglion cells\nof the human retina. Proceedings of the National Academy of Sciences of the United States of America,\n89(20):9666\u20139670, October 1992.\n\n[24] L. J. Croner, K. Purpura, and E. Kaplan. Response variability in retinal ganglion cells of primates.\nProceedings of the National Academy of Sciences of the United States of America, 90(17):8128\u20138130,\nSeptember 1993.\n\n[25] David J. Field. Relations between the statistics of natural images and the response properties of cortical\n\ncells. Josa a, 4(12):2379\u20132394, 1987.\n\n[26] Dawei W. Dong and Joseph J. Atick. Statistics of natural time-varying images. Network: Computation in\n\nNeural Systems, 6(3):345\u2013358, 1995.\n\n[27] Dawei W. Dong. Spatiotemporal inseparability of natural images and visual sensitivities. In Motion Vision,\n\npages 371\u2013380. Springer, 2001.\n\n[28] Simon B Laughlin. Energy as a constraint on the coding and processing of sensory information. Current\n\nOpinion in Neurobiology, 11(4):475\u2013480, August 2001.\n\n[29] Vijay Balasubramanian and Michael J. Berry. A test of metabolically ef\ufb01cient coding in the retina. Network:\n\nComputation in Neural Systems, 13(4):531\u2013552, 2002.\n\n[30] Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning a\n\nsparse code for natural images. Nature, 381(6583):607, 1996.\n\n[31] Anthony J. Bell and Terrence J. Sejnowski. The \u201cindependent components\u201d of natural scenes are edge\n\n\ufb01lters. Vision research, 37(23):3327\u20133338, 1997.\n\n[32] Joseph J. Atick and A. Norman Redlich. What Does the Retina Know about Natural Scenes? Neural\n\nComputation, 4(2):196\u2013210, March 1992.\n\n[33] Steven H. Devries and Denis A. Baylor. Mosaic Arrangement of Ganglion Cell Receptive Fields in Rabbit\n\nRetina. Journal of Neurophysiology, 78(4):2048\u20132060, October 1997.\n\n[34] Thomas M. Cover and Joy A. Thomas. Elements of information theory 2nd edition. 2006.\n\n[35] Alan Julian Izenman. Reduced-rank regression for the multivariate linear model. Journal of Multivariate\n\nAnalysis, 5(2):248 \u2013 264, 1975.\n\n[36] B. B. Lee, J. Pokorny, V. C. Smith, and J. Kremers. Responses to pulses and sinusoids in macaque ganglion\n\ncells. Vision Research, 34(23):3081\u20133096, December 1994.\n\n[37] Benjamin T. Vincent and Roland J. Baddeley. Synaptic energy ef\ufb01ciency in retinal processing. Vision\n\nResearch, 43(11):1285\u20131292, May 2003.\n\n[38] Stephanie E. Palmer, Olivier Marre, Michael J. Berry, and William Bialek. Predictive information in a\n\nsensory population. Proceedings of the National Academy of Sciences, 112(22):6908\u20136913, 2015.\n\n[39] Matthew Chalk, Olivier Marre, and Ga\u0161per Tka\u02c7cik. Toward a uni\ufb01ed theory of ef\ufb01cient, predictive, and\n\nsparse coding. Proceedings of the National Academy of Sciences, 115(1):186\u2013191, 2018.\n\n11\n\n\f[40] Li Zhaoping and Zhaoping Li. Understanding vision: theory, models, and data. Oxford University Press,\n\nUSA, 2014.\n\n[41] Yifeng Zhang, In-Jung Kim, Joshua R. Sanes, and Markus Meister. The most numerous ganglion cell\ntype of the mouse retina is a selective feature detector. Proceedings of the National Academy of Sciences,\n109(36):E2391\u2013E2398, 2012.\n\n[42] Anonymous. The effects of neural resource constraints on early visual representations. In Submitted to\n\nInternational Conference on Learning Representations, 2019. under review.\n\n[43] Greg D Field, Jeffrey L Gauthier, Alexander Sher, Martin Greschner, Timothy A Machado, Lauren H\nJepson, Jonathon Shlens, Deborah E Gunning, Keith Mathieson, Wladyslaw Dabrowski, et al. Functional\nconnectivity in the retina at the resolution of photoreceptors. Nature, 467(7316):673, 2010.\n\n[44] Greg D Field, Alexander Sher, Jeffrey L Gauthier, Martin Greschner, Jonathon Shlens, Alan M Litke,\nand EJ Chichilnisky. Spatial properties and functional organization of small bistrati\ufb01ed ganglion cells in\nprimate retina. Journal of Neuroscience, 27(48):13261\u201313272, 2007.\n\n12\n\n\f", "award": [], "sourceid": 5723, "authors": [{"given_name": "Samuel", "family_name": "Ocko", "institution": "Stanford"}, {"given_name": "Jack", "family_name": "Lindsey", "institution": "Stanford University"}, {"given_name": "Surya", "family_name": "Ganguli", "institution": "Stanford"}, {"given_name": "Stephane", "family_name": "Deny", "institution": "Stanford"}]}