{"title": "Recognizing retinal ganglion cells in the dark", "book": "Advances in Neural Information Processing Systems", "page_first": 2476, "page_last": 2484, "abstract": "Many neural circuits are composed of numerous distinct cell types that perform different operations on their inputs, and send their outputs to distinct targets. Therefore, a key step in understanding neural systems is to reliably distinguish cell types. An important example is the retina, for which present-day techniques for identifying cell types are accurate, but very labor-intensive. Here, we develop automated classifiers for functional identification of retinal ganglion cells, the output neurons of the retina, based solely on recorded voltage patterns on a large scale array. We use per-cell classifiers based on features extracted from electrophysiological images (spatiotemporal voltage waveforms) and interspike intervals (autocorrelations). These classifiers achieve high performance in distinguishing between the major ganglion cell classes of the primate retina, but fail in achieving the same accuracy in predicting cell polarities (ON vs. OFF). We then show how to use indicators of functional coupling within populations of ganglion cells (cross-correlation) to infer cell polarities with a matrix completion algorithm. This can result in accurate, fully automated methods for cell type classification.", "full_text": "Recognizing retinal ganglion cells in the dark\n\nEmile Richard\n\nStanford University\n\nemileric@stanford.edu\n\nGeorges Goetz\n\nStanford University\n\nggoetz@stanford.edu\n\nE.J. Chichilnisky\nStanford University\nej@stanford.edu\n\nAbstract\n\nMany neural circuits are composed of numerous distinct cell types that perform\ndifferent operations on their inputs, and send their outputs to distinct targets.\nTherefore, a key step in understanding neural systems is to reliably distinguish\ncell types. An important example is the retina, for which present-day techniques\nfor identifying cell types are accurate, but very labor-intensive. Here, we develop\nautomated classi\ufb01ers for functional identi\ufb01cation of retinal ganglion cells, the out-\nput neurons of the retina, based solely on recorded voltage patterns on a large\nscale array. We use per-cell classi\ufb01ers based on features extracted from electro-\nphysiological images (spatiotemporal voltage waveforms) and interspike intervals\n(autocorrelations). These classi\ufb01ers achieve high performance in distinguishing\nbetween the major ganglion cell classes of the primate retina, but fail in achiev-\ning the same accuracy in predicting cell polarities (ON vs. OFF). We then show\nhow to use indicators of functional coupling within populations of ganglion cells\n(cross-correlation) to infer cell polarities with a matrix completion algorithm. This\ncan result in accurate, fully automated methods for cell type classi\ufb01cation.\n\n1\n\nIntroduction\n\nIn the primate and human retina, roughly 20 distinct classes of retinal ganglion cells (RGCs) send\ndistinct visual information to diverse targets in the brain [18, 7, 6]. Two complementary meth-\nods for identi\ufb01cation of these RGC types have been pursued extensively. Anatomical studies have\nrelied on indicators such as dendritic \ufb01eld size and shape, and strati\ufb01cation patterns in synaptic con-\nnections [8] to distinguish between cell classes. Functional studies have leveraged differences in\nresponses to stimulation with a variety of visual stimuli [9, 3] for the same purpose. Although suc-\ncessful, these methods are dif\ufb01cult, time-consuming and require signi\ufb01cant expertise. Thus, they are\nnot suitable for large-scale, automated analysis of existing large-scale physiological recording data.\nFurthermore, in some clinical settings, they are entirely inapplicable. At least two speci\ufb01c scienti\ufb01c\nand engineering goals demand the development of ef\ufb01cient methods for cell type identi\ufb01cation:\n\n\u2022 Discovery of new cell types. While \u223c20 morphologically distinct RGC types exist, only 7\nhave been characterized functionally. Automated means of detecting unknown cell types\nin electrophysiological recordings would make it possible to process massive amounts of\nexisting large-scale physiological data that would take too long to analyze manually, in\norder to search for the poorly understood RGC types.\n\u2022 Developing brain-machine interfaces of the future. In blind patients suffering from retinal\ndegeneration, RGCs no longer respond to light. Advanced retinal prostheses previously\ndemonstrated ex-vivo aim at electrically restoring the correct neural code in each RGC type\nin a diseased retina [11], which requires cell type identi\ufb01cation without information about\nthe light response properties of RGCs.\n\nIn the present paper, we introduce two novel and ef\ufb01cient computational methods for cell type\nidenti\ufb01cation in a neural circuit, using spatiotemporal voltage signals produced by spiking cells\n\n1\n\n\frecorded with a high-density, large-scale electrode array [14]. We describe the data we used for\nour study in Section 2, and we show how the raw descriptors used by our classi\ufb01ers are extracted\nfrom voltage recordings of a primate retina. We then introduce a classi\ufb01er that leverages both hand-\nspeci\ufb01ed and random-projection based features of the electrical signatures of unique RGCs, as well\nas large unlabeled data sets, to identify cell types (Section 3). We evaluate its performance for\ndistinguishing between midget, parasol and small bistrati\ufb01ed cells on manually annotated datasets.\nThen, in Section 4, we show how matrix completion techniques can be used to identify populations\nof unique cell types, and assess the accuracy of our algorithm by predicting the polarity (ON or OFF)\nof RGCs on datasets where a ground truth is available. Section 5 is devoted to numerical experiments\nthat we designed to test our modeling choices. Finally, we discuss future work in Section 6.\n\n2 Extracting descriptors from electrical recordings\n\nIn this section, we de\ufb01ne the electrical signatures that we will use in cell classi\ufb01cation, and the algo-\nrithms that allow us to perform the statistical inference of cell type are described in the subsequent\nsections.\nWe exploit three electrical signatures of recorded neurons that are well measured in large-scale, high-\ndensity recordings. First, the electrical image (EI) of each cell, which is the average spatiotemporal\npattern of voltage measured across the entire electrode array during the spiking of a cell. This\nmeasure provides information about the geometric and electrical conduction properties of the cell\nitself. Second, the inter-spike interval distribution (ISI), which summarizes the temporal separation\nbetween spikes emitted by the cell. This measure re\ufb02ects the speci\ufb01c ion channels in the cell and\ntheir distribution across the cell. Third, the cross-correlation function (CCF) of \ufb01ring between cells.\nThis measure captures the degree and polarity of interactions between cells in generation of a spike.\n\n2.1 Electrophysiological image calculation, alignment and \ufb01ltering\n\nThe raw data we used for our numerical experiments consist of extracellular voltage recordings of\nthe electrical activity of retinas from male and female macaque monkeys, which were sampled and\ndigitized at 20 kHz per channel over 512 channels laid out in a 60 \u00b5m hexagonal lattice (See Ap-\npendix for a 100 ms sample movie of an electrical recording). The emission of an action potential\nby a spiking neuron causes transient voltage \ufb02uctuations along its anatomical features (soma, den-\ndritic tree, axon). By bringing an extracellular matrix of electrodes in contact with neural tissue, we\ncapture the 2D projection of these voltage changes onto the plane of the recording electrodes (see\nFigure 1). With such dense multielectrode arrays, the voltage activity from a single cell is usually\npicked up on multiple electrodes. While the literature refers to this footprint as the electrophysiolog-\nical or electrical image (EI) of the cell [13], it is an inherently spatiotemporal characteristic of the\nneuron, due to the transient nature of action potentials. In essence, it is a short movie (\u223c 1.5 ms) of\nthe average electrical activity over the array during the emission of an action potential by a spiking\nneuron, which can include the properties of other cells whose \ufb01ring is correlated with this neuron.\nWe calculated the electrical images of each identi\ufb01ed RGC in the recording as described in the\nliterature [13]. In a 30\u201360 minute recording, we typically detected 1,000\u2013100,000 action potentials\nper RGC. For each cell, we averaged the voltages recorded over the entire array in a 1.5 ms window\nstarting .25 ms before the peak negative voltage sample for each action potential. We cropped from\nthe electrode array the subset of electrodes that falls within a 125 \u00b5m radius around the RGC soma\n(see Figure 1) in order to represent each EI by a 30\u00d7 19 matrix (time points \u00d7 number of electrodes\nin a 125 \u00b5m radius), or equivalently a 570 dimensional vector. We augment the training data by\nexploiting the symmetries of the (approximately) hexagonal grid of the electrode array. We form the\ntraining data EIs from original EIs, rotating them by i\u03c0/3, i = 1,\u00b7\u00b7\u00b7 , 6 and the re\ufb02ection of each\n(12 spatial symmetries in total). The characteristic radius (125 \u00b5m here) used to select the central\nportion of the EI is a hyper-parameter of our method which controls the signal to noise ratio in the\ninput data (see Section 5, Figure 3 middle panel).\nIn the Appendix of this paper we describe 3 families (subdivided into 7 sub-families) of \ufb01lters we\nmanually built to capture anatomical features of the cell. In particular, we included \ufb01lters corre-\nsponding to various action potential propagation velocities at level of the the axon and hard-coded a\nparameter which captures the soma size. These quantities are believed to be indicative of cell type.\n\n2\n\n\f(Top row) Multielectrode arrays record a 2D projection of\nFigure 1: EIs and cell morphology.\nspatio-temporal action potentials, schematically illustrated here for a midget (left) and a parasol\n(right) RGC. Midget cells have an asymmetric dendritic \ufb01eld, while parasol cells are more isotropic.\n(Bottom row) Temporal evolution of the voltage recorded on the electrodes located within a 125 \u00b5m-\nradius around the electrode where the largest action potential was detected, which we use for cell\ntype classi\ufb01cation. Amplitude of the circles materialize signal amplitude. Red circles \u2014 positive\nvoltages, blue circles \u2014 negative voltages.\n\nWe \ufb01ltered the spatiotemporally aligned RGC electrical images with our hand de\ufb01ned \ufb01lters to cre-\nate a \ufb01rst feature set. In separate experiments we also \ufb01ltered aligned EIs with iid Gaussian random\n\ufb01lters (as many as our features) in the fashion of [17], see Table 1 to compare performances.\n\n2.2\n\nInterspike Intervals\n\nThe statistics of the timing of action potential trains are another source of information about func-\ntional RGC types. Interspike intervals (ISIs) are an estimate of the probability of emission of two\nconsecutive action potentials within a given time difference by a spiking neuron. We build his-\ntograms of the times elapsed between two consecutive action potentials for each cell to form its ISI.\nWe estimate the interspike intervals over 100 ms, with a time granularity of 0.5 ms, resulting in\n200 dimensional ISI vectors. ISIs always begin by a refractory period (i.e. a duration over which\nno action potentials occur, following a \ufb01rst action potential). This period lasts the \ufb01rst 1-2 ms. ISIs\nthen increase before decaying back to zero at rates representative of the functional cell type (see\nFigure 2, left hand side). We describe each ISI using the values of time differences \u2206t where the\nsmoothed ISI reaches 20, 40, 60, 80, 100% of its maximum value as well as the slopes of the linear\ninterpolations between each consecutive pair of points.\n\n2.3 Cross-correlation functions and electrical coupling of cells\n\nThere is in the retina a high probability of joint emission of action potentials between neighboring\nganglion cells of the same type, while RGCs of antagonistic polarities (ON vs OFF cells) tend to\nexhibit strongly negatively correlated \ufb01ring patterns [16, 10]. In other words, the emission of an\naction potential in the ON pathway leads to a reduced probability of observing an action potential\nin the OFF pathway at the same time. The cross-correlation function of two RGCs characterizes the\nprobability of joint emission of action potentials for this pair of cells with a given latency, and as\nsuch holds information about functional coupling between the two cells. Cross-correlations between\ndifferent functional RGC types have been studied extensively in the literature previously, for exam-\nple in [10]. Construction of CCFs follows the same steps as ISI computation: we obtain the CCF\nof pairs of cells by building histograms of time differences between their consecutive \ufb01ring times.\nA large CCF value near the origin is indicative of positive functional coupling whereas negative\ncoupling corresponds to a negative CCF at the origin (see Figure 2, the three panels on the right).\n\n3\n\nTime to spike (ms)-0.2500.250.50.7500.511.5Time (ms)Distance to soma, \u00b5m0300600120 \u00b5mTime to spike (ms)-0.2500.250.50.75120 \u00b5m00.511.5Time (ms)Distance to soma, \u00b5m0600t = \u22120.2 mst = \u22120.1 mst = 0 mst = 0.1 mst = 0.2 mst = 0.3 ms > 0< 0\fFigure 2: (Left panel) Interspike Intervals for the 5 major RGC types of the primate retina. (Right\npanels) Cross-correlation functions between parasol cells. Purple traces: single pairwise CCF. Red\nline: population average. Green arrow: strength of the correlation.\n\n3 Learning electrical signatures of retinal ganglion cells\n\n3.1 Learning dictionaries from slices of unlabeled data\n\nLearning descriptors from unlabeled data, or dictionary learning [15], has been successfully used for\nclassi\ufb01cation tasks in high-dimensional data such as images, speech and texts [15, 4]. The method-\nology we used for learning discriminative features given a relatively large amount of unlabeled data\nfollows closely the steps described in [4, 5].\n\nExtracting independent slices from the data. The \ufb01rst step in our approach consists of extracting\nindependent (as much as possible) slices from data points. One can think of a slice as a subset of\nthe descriptors that is (nearly) independent from other subsets. In image processing the analogue\nobject is named a patch, i.e. a small sub-image.\nIn our case, we used 8 data slices. The ISI\ndescriptors form one such slice, the others are extracted from EIs. It is reasonable to assume ISI\nfeatures and EI descriptors are independent quantities. After aligning the EIs and \ufb01ltering them\nwith a collection of 7 \ufb01lter banks (see Appendix for a description of our biologically motivated\n\ufb01lters), we group each set of \ufb01ltered EIs. Each group of \ufb01lters reacts to speci\ufb01c patterns in EIs:\nrotational motion driven by dendrites, radial propagation of the electrical signal along the axon and\nthe direction of propagation constitute behaviors captured by distinct \ufb01lter banks. Thereby, we treat\nthe response of data to each one of them as a unique data slice. Each slice is then whitened [5],\nand \ufb01nally we perform sparse k-means on each slice separately, k denotes an integer which is a\nparameter of our algorithm. That is, letting X \u2208 Rn\u00d7p denote a slice of data (n: number of data\npoints and p: dimensionality of the slice) and Cn,k denote the set of cluster assignment matrices:\nCn,k = {U \u2208 {0, 1}n\u00d7k : \u2200i \u2208 [n] , (cid:107)Ui,\u00b7(cid:107)0 = 1}, we consider the optimization problem\n\nmin(cid:107)X \u2212 UVT(cid:107)2\n\nF + \u03b7(cid:107)V(cid:107)1\n\ns.th. U \u2208 Cn,k , V \u2208 Rp\u00d7k .\n\n(1)\n\nWarm-starting k-means with warm started NMF.\nIn order to solve the optimization problem\n(1), we propose a coarse-to-\ufb01ne strategy that consists in relaxing the constraint U \u2208 Cn,k in two\nsteps. We initially relax the constraint U \u2208 Cn,k completely and set \u03b7 = 0. That is, we consider\nproblem (1) where we substitute Cn,k with the larger set Rn\u00d7k and run an alternate minimization\nfor a few steps. Then, we replace the clustering constraint Cn,k with a nonnegativity constraint U \u2208\nRn\u00d7k\n+ while retaining \u03b7 = 0. After a few steps of nonnegative alternate minimization we activate\nthe constraint U \u2208 Cn,k and \ufb01nally raise the value of \u03b7. This warm-start strategy systematically\nresulted in lower values of the objective compared to random or k-means++ [1] initializations.\n\n3.2 Building feature vectors for labeled data\n\nIn order to extract feature vectors from labeled data we \ufb01rst extract slice each data point: we extract\nISI features on the one hand and \ufb01lter each data point with all \ufb01lter families. Each slice is separately\nwhitened and compared to the cluster centers of its slice. For this, we use the matrices V(j) of\ncluster centroids computed for the all slices j = 1,\u00b7\u00b7\u00b7 , 8. Letting s(\u00b7, \u03ba) denote the soft thresholding\noperator s(x, \u03ba) = (sign(xi) max{|xi|\u2212 \u03ba, 0})i, we compute \u02dcx(j) = s(V(j)Tx(j), \u03ba) for each slice,\nwhich is the soft-thresholded inner products of the corresponding slice of the data point x(j) with\nall cluster centroids for the same slice j. We concatenate the \u02dcx(j)s from different slices and use\n\n4\n\n05010015020000.0050.010.0150.020.025 \u0394t (ms)FrequencyInterspike Intervals Off ParasolOn ParasolOn MidgetOff MidgetSBC\u221250050\u22120.4\u22120.200.20.40.6On\u2212On Parasols \u0394t (ms)correlation\u221250050\u22120.4\u22120.3\u22120.2\u22120.100.10.20.3On\u2212Off Parasols \u0394t (ms)correlation\u221250050\u22120.3\u22120.2\u22120.100.10.20.30.40.5Off\u2212Off Parasols \u0394t (ms)correlation\fthe resulting encoded point to predict cell types: \u02dcx = (\u02dcx(j))j. The last step is performed either\nby feeding concatenated vectors \u02dcx together with the corresponding label to a logistic regression\nclassi\ufb01er which handles multiple classes in a one-versus-all fashion, or to a random forest classi\ufb01er.\n\n4 Predicting cell polarities by completing the RGC coupling matrix\n\nWe additionally exploit pairwise spike train cross-correlations to infer RGC polarities (ON vs OFF)\nand estimate the polarity vector y by using a measure of the pairwise functional coupling strength\nbetween cells. The rationale behind this approach is that neighboring cells of the same polarity\nwill tend to exhibit positive correlations between their action potential spike trains, corresponding to\npositive functional coupling. If the cells are of antagonistic polarities, functional coupling strength\nwill be negative. The coupling of two neighboring cells i, j can therefore be modeled as c{i,j} (cid:39)\nyiyj, where yi, yj \u2208 {+1,\u22121} denote cell polarities. Because far apart cells do not excite or\ninhibit each other, to avoid incorporating noise in our model, we choose to only include estimates\nof functional coupling strengths between neighboring cells. The neighborhood size is a hyper-\nparameter of this approach that we study in Section 5.\nIf G denotes the graph of neighboring cells in a recording, we only use cross-correlations for spike\ntrains of cells which are connected with an edge in G. Since we can estimate the position of each\nRGC in the lattice from its EI [13], we therefore can form the graph G, which is a 2-dimensional\nregular geometric graph. If q is the number of edges in G, let P denote the linear map Rn\u00d7n \u2192 Rq\nreturning the values P(C) = (Ci,j){i,j}\u2208E(G) for cells i and j located within a critical distance. We\nuse P\u2217 to denote the adjoint (transpose) operator. The complete matrix of pairwise couplings can\nthen be written \u2014 up to observation noise \u2014 as yyT, where y \u2208 {+1,\u22121}n is the vector of cell\npolarities (+1 for ON and \u22121 for OFF cells). Therefore, the observation can be modeled as:\n\nc = P(yyT) + \u03b5 with \u03b5\n\nobservation noise.\n\n(2)\n\nand the recovery of yyT is then formulated as a standard matrix completion problem.\n\n2(cid:107)P(zzT)\u2212c(cid:107)2\n\nshifted Hessian matrix H(z) = P\u2217(cid:0)2 P(zzT) \u2212 c(cid:1)+\u03bdIn where \u03bd > 0 ensures positive de\ufb01niteness\n\n4.1 Minimizing the nonconvex loss using warm-started Newton steps\nIn this section, we show how to estimate y given the observation of c = P(yyT) + \u03b5 by minimizing\nthe non-convex loss (cid:96)(z) = 1\n2. Even though minimizing this degree 4 polynomial loss\nfunction is NP-hard in general, we propose a Newton method warm-started with a spectral heuristic\nfor approaching the solution (see Algorithm 1). In similar contexts, when the sampling of entries is\nuniform, this type of spectral initialization followed by alternate minimization has been proven to\nconverge to the global minimum of a least-squared loss, analogous to (cid:96) [12].\nWhile our sampling graph G is not an Erdos-Renyi graph, we empirically observed that its regular\nstructure enables us to come up with a reliable initial spectral guess that falls within the basin of\nattraction of the global minimum of (cid:96).\nIn the subsequent Newton scheme, we iterate using the\nH(z) (cid:31) 0. Whenever computing \u03bd and H(z)\u22121 is expensive due to a potentially large number of\ncells n, then replacing H(z)\u22121 by a diagonal or scalar approximation \u03b1/(cid:107)z(cid:107)2\n2 reduces per iteration\ncost while resulting in a slower convergence. We refer to this method as a \ufb01rst-order method for\nminimizing the nonconvex objective, while ISTA [2] is a \ufb01rst order method applied to the convex\nrelaxation of the problem as presented in the Appendix (see Figure 4, middle panel). Using the same\nconvex relaxation we prove in the Appendix that the proposed estimator has a classi\ufb01cation accuracy\nof at least 1 \u2212 b(cid:107)\u03b5(cid:107)2\u221e with b \u2248 2.91.\nAlgorithm 1 Polarity matrix completion\nRequire: c observed couplings, P the projection operator\n\nLet \u03bb, v be the leading eigenpair of P\u2217(c)\n\u221a\nInitialize z0 \u2190 n\nfor t = 0, 1,\u00b7\u00b7\u00b7 do\n\nzt+1 \u2190 zt \u2212 H\u22121(zt)P\u2217(cid:0)P(ztzT\n\n\u03bb v/(cid:112)|#revealed entries|\nt ) \u2212 c(cid:1) zt\n\nend for\n\n\\\\ H(zt) is the Hessian or an approximation\n\n5\n\n\fInput\n\nTask\nT\nP\nT+P\n\nEI & ISI\nour \ufb01lters\nk = 30\n\n93.5 % (1.1 )\n81.5 % (3.0)\n78.0 % ( 3.3)\n\nEI & ISI\nrand. \ufb01lters\n\nk = 50\n\n88.3 % (1.9)\n80.0 % (2.3)\n66.7 % (1.9)\n\nEI & ISI\nrand. \ufb01lters\n\nk = 10\n\n93.1 % (1.3)\n77.8 % (2.3)\n72.0 % (1.7)\n\nEI only\nour \ufb01lters\nk = 30\n\nISI only\n\nCCF\n\n86.0 % (2.6)\n64.1 % (3.7)\n60.4 % (2.9)\n\n80.6 % (2.6)\n76.8 % (3.8)\n64.7 % (2.9)\n\n75.7 % (4.9)\n\n\u2013\n\n\u2013\n\nTable 1: Comparing performance for input data sources and \ufb01lters. T: cell type identi\ufb01cation. P:\npolarity identi\ufb01cation. T+P: cell type and polarity identi\ufb01cation. EIs cropped within 125 \u00b5m from\nthe central electrode.\n\n5 Numerical experiments\n\nIn this section, we benchmark the performance of the cell type classi\ufb01ers introduced previously on\ndatasets where the ground truth was available. For the RGCs in those datasets, experts manually\nhand-labeled the light response properties of the cells in the manner previously described in the\nliterature [9, 3]. Our unlabeled data contained 17,457 \u00d7 12 (spatial symmetries) data points. The\nlabeled data consists of 436 OFF midget, 652 OFF parasol, 964 ON midget, 607 ON parasol and\n169 small bistrati\ufb01ed cells assembled from 10 distinct recordings.\n\nRGC classi\ufb01cation from their electrical features. Our numerical experiment consists in hiding\none out of 10 labeled recordings, learning cell classi\ufb01ers on the 9 others and testing the classi\ufb01er on\nthe hidden recording. We chose to test the performance of the classi\ufb01er against individual recordings\nfor two reasons. Firstly, we wanted to compare the polarity prediction accuracy from electrical\nfeatures with the prediction made by matrix completion (see Section 4) and the matrix completion\nalgorithm takes as input pairwise data obtained from a single recording only. Secondly, experimental\nparameters likely to in\ufb02uence the EIs and ISIs such as recording temperature vary from recording\nto recording, but remain consistent within a recording. Since we want the reported scores to re\ufb02ect\nexpected performance against new recordings, not including points from the test distribution gives\nus a more realistic proxy to the true test error.\nIn Table 1 we report classi\ufb01cation accuracies on 3 different classi\ufb01cation tasks:\n\n1. Cell type identi\ufb01cation (T): midget vs. parasol vs. small bistrati\ufb01ed cells;\n2. Polarity identi\ufb01cation (P): ON versus OFF cells;\n3. Cell type and polarity (T+P): ON-midget vs. ON-parasol vs. OFF-midget vs. OFF-parasol\n\nvs. small bistrati\ufb01ed.\n\nEach row of the table contains the data used as input. The \ufb01rst column represents the results for the\nmethod where the dictionary learning step is performed with k = 30, and EIs are recorded within a\nradius of 125 \u00b5m from the central electrode (19 electrodes on our array). We compare our method\nwith an identical method where we replaced the hand-speci\ufb01ed \ufb01lters by the random Gaussian\n\ufb01lters of [17] (second column for k = 50 and third for k = 10). The performance of random\n\ufb01lters opens perspectives for learning deeper predictors using random \ufb01lters in the \ufb01rst layer. The\nimpact of k on our \ufb01lters can be seen in Figure 3, left-hand panel: larger k seems to bring further\ninformation for polarity prediction but not for cell type classi\ufb01cation, which leads to an optimal\nchoice k (cid:39) 20 in the 5-class problem. In the 4th and 5th columns, we used only part of the features\nsets at our disposal, EIs only and ISIs only respectively. These results con\ufb01rm that the joint use of\nboth EIs and ISIs for cell classi\ufb01cation is bene\ufb01cial. Globally, cell type identi\ufb01cation turns out to\nbe an easier task than polarity prediction using per cell descriptors.\n\nFigure 3 middle panel illustrates the impact of EI diameter on classi\ufb01cation accuracy. While a\nlarger recording radius lets us make use of more signal, the amount of noise incorporated also\nincreases with the number of electrodes taken into account and we observe a trade-off in terms\nof signal to noise ratio on all three tasks. An interesting observation is the second jump in the\naccuracy of cell-type prediction around an EI diameter of 325\u00b5m, at which point we attain a peak\nperformance of 96.8% \u00b1 1.0. We believe this jump takes place when axonal signals start being\nincorporated in the EI, and we believe these signals to be a strong indicator of cell type because of\n\n6\n\n\fFigure 3: (Left panel) Effect of the dictionary size k and (Middle panel) EIs radius on per cell\nclassi\ufb01cation. (Right panel) Effect of the neighborhood size on polarity prediction using matrix\ncompletion.\n\nFigure 4: (Left panel) Observed coupling matrix. (Middle panel) Convergence of matrix completion\nalgorithms. (Right panel) k-means with our initialization (SP-NMF) versus other choices.\n\nknown differences in axonal conduction velocities [13]. Prediction variance is also relatively low\nfor cell-type prediction compared to polarity prediction, while predicting polarity turns out to be\nsigni\ufb01cantly easier on some datasets than others. On average, the logistic regression classi\ufb01er we\nused performed slightly better (\u223c +1%) than random forests on the various tasks and data sets at\nour disposal.\n\nMatrix completion based polarity prediction. Matrix completion resulted in > 90% accuracy\non three out of 10 datasets and in an average of 66.8% accuracy in the 7 other datasets. We report\nthe average performance in Table 1 even though it is inferior to the simpler classi\ufb01cation approach\nfor two reasons: (a) the idea of using matrix completion for this task is new and (b) it has a high\npotential, as demonstrated by Figure 3, right hand panel. On some datasets, matrix completion\nhas 100%accuracy. However, on other datasets, either because of issues a fragile spike-sorting, or\nof other noise, the approach does not do as well. In Figure 3 (right hand side) we examine the\neffect of the neighborhood size on prediction accuracy. Colors correspond to different datasets. For\nsake of readability, we only show the results for 4 out of 10 datasets: the best, the worse and 2\nintermediary. The sensitivity to maximum cell distance is clear on this plot. Bold curves correspond\nto the prediction resulting after 100 steps of our Newton algorithm. Dashed curves correspond to\npredictions by the \ufb01rst order (nonconvex) method stopped after 100 steps, and dots are prediction\naccuracies of the leading singular vector, i.e. the spectral initialization of our algorithm. Overall, the\nNewton algorithm seems to perform better than its rivals, and there appears to be an optimal radius\nto choose for each dataset which corresponds to the characteristic distance between pairs of cells\n(here only Parasols). This parameter varies from dataset to dataset and hence requires parameter\ntuning before extracting CCF data in order to get the best performance out of the algorithm.\n\nWarm-start strategy for dictionary learning. We refer to Figure 4, right hand panel for an illus-\ntration of our warm-start strategy for minimizing (1) as described in Section 3.1. There, we compare\ndense (\u03b7 = 0) k-means initialized with our double-warm start (25 steps of unconstrained alternate\nminimization and 25 steps of nonnegative alternate minimization, referred to as SP-NMF), with a\nsingle spectral warm start 50 steps unconstrained alternate minimization initialization (SP) and a\n50 steps nonnegative alternate minimization (NMF) as well as with two standard baselines which\n\n7\n\nDictionary size (k)Accuracy (%)5100200250300350150101520303540455025707580859095100707580859095100Electrical image radius (\u00b5m)Maximum cell distance (\u00b5m)1002002503001507050806090100Cell indexP*(c): observed couplingsCell index1020305060708010011090402040608010010.80.60.20-0.2-0.4-0.8-1-0.60.4Coupling strength (a.u.)Iteration (t)loss (ut) \u2212 OPT NewtonFirst orderConvex (ISTA)PCA10010010210-410-210-6101102loss (t) - OPT SP-MNFSPNMF++RNDIteration (t)10010010-1010-410-210-610-8101102103\fare random initialization and k-means++ initializations [1]. We postpone theoretical study of this\ninitialization choice to future work. Note that each step of the alternate minimization involves a few\nmatrix-matrix products and element-wise operations on matrices. Using a NVIDIA Tesla K40\nGPU drastically accelerated these steps, allowing us to scale up our experiments.\n\n6 Discussion\n\nWe developed accurate cell-type classi\ufb01ers using a unique collection of labeled and unlabeled elec-\ntrical recordings and employing recent advances in several areas of machine learning. The results\nshow strong empirical success of the methodology, which is highly scalable and adapted for ma-\njor applications discussed below. Matrix completion for binary classi\ufb01cation is novel, and the two\nheuristics we used for minimizing our non-convex objectives show convincing superiority to existing\nbaselines. Future work will be dedicated to studying properties of these algorithms.\nRecording Methods. Three major aspects of electrical recordings are critical for successful cell\ntype identi\ufb01cation from electrical signatures. First, high spatial resolution is required to detect the\n\ufb01ne features of the EIs; much more widely spaced electrode arrays such as those often used in the\ncortex may not perform as well. Second, high temporal resolution is required to measure the ISI\naccurately; this suggests that optical measurements using Ca++ sensors would not be as useful as\nelectrical measurements. Third, large-scale recordings are required to detect many pairs of cells\nand estimate their functional interactions; electrode arrays with fewer channels may not suf\ufb01ce.\nThus, large-scale, high-density electrophysiological recordings are uniquely well suited to the task\nof identifying cell types.\nFuture directions. A probable source of variability in cell type classi\ufb01cation is differences between\nretinal preparations, including eccentricity in the retina, inter-animal variability, and experimental\nvariables such as temperature and signal-to-noise of the recording.\nIn the present data, features\nwere de\ufb01ned and assembled across a dozen different recordings. This motivates transfer learning\nto account for such variability, exploiting the fact that although the features may change somewhat\nbetween preparations (target domains), the underlying cell types and the fundamental differences\nin electrical signatures are expected to remain. We expect future work to result in models that\nenjoy higher complexity thanks to training on larger datasets, thus achieving invariance to ambient\nconditions (eccentricity and temperature) automatically. The model we used can be interpreted as\na single-layer neural network. A straightforward development would be to increase the number\nof layers. The relative success of random \ufb01lters on the \ufb01rst layer is a sign that one can hope to\nget further automated improvement by building richer representations from the data itself and with\nminimum incorporation of prior knowledge.\nApplication. Two major applications are envisioned. First, an extensive set of large-scale, high-\ndensity recordings from primate retina can now be mined for information on infrequently-recorded\ncell types. Manual identi\ufb01cation of cell types using their light response properties is extremely\nlabor-intensive, however, the present approach promises to facilitate automated mining. Second,\nthe identi\ufb01cation of cell types without light responses is fundamental for the development of high-\nresolution retinal prostheses of the future [11]. In such devices, it is necessary to identify which\nelectrodes are capable of stimulating which cells, and drive spiking in RGCs according to their type\nin order to deliver a meaningful visual signal to the brain. For this futuristic brain-machine interface\napplication, our results solve a fundamental problem. Finally, it is hoped that these applications in\nthe retina will also be relevant for other brain areas, where identi\ufb01cation of neural cell types and\ncustomized electrical stimulation for high-resolution neural implants may be equally important in\nthe future.\n\nAcknowledgement\n\nWe are grateful to A. Montanari and D. Palanker for inspiring discussions and valuable comments,\nand C. Rhoades for labeling the data. ER acknowledges support from grants AFOSR/DARPA\nFA9550-12-1-0411 and FA9550-13-1-0036. We thank the Stanford Data Science Initiative for \ufb01-\nnancial support and NVIDIA Corporation for the donation of the Tesla K40 GPU we used. Data\ncollection was supported by National Eye Institute grants EY017992 and EY018003 (EJC). Please\ncontact EJC (ej@stanford.edu) for access to the data.\n\n8\n\n\fReferences\n[1] D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In Society for\nIndustrial and Applied Mathematics, editors, Proceedings of the eighteenth annual ACM-SIAM\nsymposium on Discrete algorithms, 2007.\n\n[2] M. Beck, A.and Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse\n\nproblems. SIAM Journal of Imaging Sciences, 2(1):183\u2013202, 2009.\n\n[3] E. J. Chichilnisky and Rachel S. Kalmar. Functional asymmetries in on and off ganglion cells\n\nof primate retina. The Journal of Neuroscience, 22(7):2737\u20132747, 2002.\n\n[4] A. Coates and A. Y. Ng. The importance of encoding versus training with sparse coding and\nvector quantization. In International Conference in Machine Learning (ICML), volume 28,\n2011.\n\n[5] A. Coates, A. Y. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature\nlearning. In International Conference on Arti\ufb01cial Intelligence and Statistics (AISTATS), pages\n215\u2013223, 2011.\n\n[6] D. M. Dacey. The Cognitive Neurosciences, book section Origins of perception: retinal gan-\nglion cell diversity and the creation of parallel visual pathways, pages 281\u2013301. MIT Press,\n2004.\n\n[7] D. M. Dacey and O S Packer. Colour coding in the primate retina: diverse cell types and\n\ncone-speci\ufb01c circuitry. Curr Opin Neurobiol, 13:421\u2013427, 2003.\n\n[8] D. M. Dacey and M. R. Petersen. Dendritic \ufb01eld size and morphology of midget and parasol\n\ncells of the human retina. PNAS, 89:9666\u20139670, 1992.\n\n[9] Steven H. Devries and Denis A. Baylor. Mosaic arrangement of ganglion cell receptive \ufb01elds\n\nin rabbit retina. Journal of Neurophysiology, 78(4):2048\u20132060, 1997.\n\n[10] M. Greschner, J. Shlens, C. Bakolista, G. D. Field, J. L. Gauthier, L. H. Jepson, A. Sher, A. M.\nLitke, and E. J. Chichilnisky. Correlated \ufb01ring among major ganglion cell types in primate\nretina. J Physiol, 589:75\u201386, 2011.\n\n[11] L. H. Jepson, P. Hottowy, G. A. Wiener, W. Dabrowski, A. M. Litke, and E. J. Chichilnisky.\nHigh-\ufb01delity reproduction of spatiotemporal visual signals for retinal prosthesis. Neuron,\n83:87\u201392, 2014.\n\n[12] R. H. Keshavan, A. Montanari, and S. Oh. Matrix completion from a few entries.\n\nTransactions on Information Theory, 56(6):2980\u20132998, 2010.\n\nIEEE\n\n[13] P. H. Li, J. L. Gauthier, M. Schiff, A. Sher, D. Ahn, G. D. Field, M. Greschner, E. M. Callaway,\nA. M. Litke, and E. J. Chichilnisky. Anatomical identi\ufb01cation of extracellularly recorded cells\nin large-scale multielectrode recordings. J Neurosci, 35(11):4663\u201375, 2015.\n\n[14] A. M. Litke, N. Bezayiff, E. J. Chichilnisky, W. Cunningham, W. Dabrowski, A. A. Grillo,\nM. I. Grivich, P. Grybos, P. Hottowy, S. Kachiguine, R. S. Kalmar, K. Mathieson, D. Petrusca,\nM. Rahman, and A. Sher. What does the eye tell the brain? development of a system for the\nlarge-scale recording of retinal output activity. IEEE Trans. on Nuclear Science, 51(4):1434\u2013\n1440, 2004.\n\n[15] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In\n\nInternational Conference on Machine Learning (ICML), pages 689\u2013696, 2009.\n\n[16] D. N. Mastronarde. Correlated \ufb01ring of cat retinal ganglion cells. i. spontaneously active inputs\n\nto x- and y-cells. J Neurophysiol, 49(2):303\u2013324, 1983.\n\n[17] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in\n\nneural information processing systems (NIPS), pages 1177\u20131184, 2007.\n\n[18] L. C. L. Silveira and V.H. Perry. The topography of magnocellular projecting ganglion cells\n\n(m-ganglion cells) in the primate retina. Neuroscience, 40(1):217\u2013237, 1991.\n\n9\n\n\f", "award": [], "sourceid": 1463, "authors": [{"given_name": "Emile", "family_name": "Richard", "institution": "Stanford University"}, {"given_name": "Georges", "family_name": "Goetz", "institution": "Stanford University"}, {"given_name": "E.J.", "family_name": "Chichilnisky", "institution": "Stanford"}]}