{"title": "A blind sparse deconvolution method for neural spike identification", "book": "Advances in Neural Information Processing Systems", "page_first": 1440, "page_last": 1448, "abstract": "We consider the problem of estimating neural spikes from extracellular voltage recordings. Most current methods are based on clustering, which requires substantial human supervision and produces systematic errors by failing to properly handle temporally overlapping spikes. We formulate the problem as one of statistical inference, in which the recorded voltage is a noisy sum of the spike trains of each neuron convolved with its associated spike waveform. Joint maximum-a-posteriori (MAP) estimation of the waveforms and spikes is then a blind deconvolution problem in which the coefficients are sparse. We develop a block-coordinate descent method for approximating the MAP solution. We validate our method on data simulated according to the generative model, as well as on real data for which ground truth is available via simultaneous intracellular recordings. In both cases, our method substantially reduces the number of missed spikes and false positives when compared to a standard clustering algorithm, primarily by recovering temporally overlapping spikes. The method offers a fully automated alternative to clustering methods that is less susceptible to systematic errors.", "full_text": "A blind deconvolution method for neural spike\n\nChaitanya Ekanadham\n\nCourant Institute\n\nNew York University\nNew York, NY 10012\n\nchaitu@math.nyu.edu\n\nidenti\ufb01cation\n\nDaniel Tranchina\nCourant Institute\n\nNew York University\nNew York, NY 10012\n\nAbstract\n\nCenter for Neural Science\n\nHoward Hughes Medical Institute\n\nEero P. Simoncelli\nCourant Institute\n\nNew York University\nNew York, NY 10012\n\nWe consider the problem of estimating neural spikes from extracellular voltage\nrecordings. Most current methods are based on clustering, which requires sub-\nstantial human supervision and systematically mishandles temporally overlapping\nspikes. We formulate the problem as one of statistical inference, in which the\nrecorded voltage is a noisy sum of the spike trains of each neuron convolved with\nits associated spike waveform. Joint maximum-a-posteriori (MAP) estimation of\nthe waveforms and spikes is then a blind deconvolution problem in which the\ncoef\ufb01cients are sparse. We develop a block-coordinate descent procedure to ap-\nproximate the MAP solution, based on our recently developed continuous basis\npursuit method. We validate our method on simulated data as well as real data\nfor which ground truth is available via simultaneous intracellular recordings. In\nboth cases, our method substantially reduces the number of missed spikes and\nfalse positives when compared to a standard clustering algorithm, primarily by\nrecovering overlapping spikes. The method offers a fully automated alternative to\nclustering methods that is less susceptible to systematic errors.\n\n1\n\nIntroduction\n\nThe identi\ufb01cation of individual spikes in extracellularly recorded voltage traces is a critical step in\nthe analysis of neural data for much of systems neuroscience. One or more electrodes are embed-\nded in neural tissue, and the voltage(s) are recorded as a function of time, with the intention of\nrecovering the spiking activity of one or more nearby cells. Each spike appears with a stereotyped\nwaveform, whose shape depends on the cell morphology, the \ufb01ltering properties of the medium and\nthe electrode, and the cell\u2019s position relative to the electrode. The \u201cspike sorting\u201d problem is that\nof identifying distinct cells and their respective spike times. This is a dif\ufb01cult statistical inverse\nproblem, since one typically does not know the number of cells, the shapes of their waveforms, or\nthe frequency or temporal dynamics of their spike trains (see [1] for a review).\n\nThe observed voltage is well-described as a linear superposition of the spike waveforms [1, 2, 3, 4],\nand thus, the problem bears resemblance to the classic sparse decomposition problem in signal pro-\ncessing and machine learning, where the neural waveforms are the \u201cfeatures\u201d and the spike trains\nare the \u201ccoef\ufb01cients\u201d, with the additional constraint that the features are unknown but convolutional,\nand the coef\ufb01cients are mostly zero except for a few that are close to one. This sparse blind de-\nconvolution problem arises in a variety of contexts other than spike sorting, including radar [5],\nseismology [6], and acoustic processing [7, 8].\n\nMost current approaches to spike sorting (with notable exceptions [9, 10]) can be summarized in\nthree steps ([1, 2]): (1) identify segments of neural activity (e.g., by thresholding the voltage), (2)\n\n1\n\n\fdetermine a low-dimensional feature representation for these segments (e.g., PCA), (3) cluster the\nsegments in the feature space (e.g., k-means, mixture of Gaussians). Fig. 1 illustrates a simple\nversion of this procedure. Segments within the same cluster are interpreted as spikes of a single\nneuron, whose waveform is estimated by the cluster centroid. This method works well in identifying\ntemporally isolated spikes whose waveforms are easily distinguishable from background noise and\neach other. However it generally fails for segments containing more than one spike (either from the\nsame or different neurons), because these segments do not lie close to the clusters of any individual\ncell [1]. This is illustrated in Figs. 1(b) 1(c), and 1(d). Several state-of-the-art methods improve\nor combine upon one or more of these steps (e.g., [11, 12]), but remain susceptible to these errors\nbecause they still rely on clustering. These errors are systematic, and can have important scienti\ufb01c\nconsequences. For example, an unresolved question in neuroscience is whether the occurrence of\ncorrelated or synchronous spikes carries specialized information [13, 14]. In order to experimentally\naddress this question, one needs to record from multiple neurons, and to accurately obtain their joint\nspiking activity. A method that systematically fails for synchronous spikes (e.g., by missing them\naltogether, or by incorrectly assigning them to another neuron) will lead to erroneous conclusions.\n\nAlthough the limitations of clustering methods have been known within the neuroscience commu-\nnity for some time [1, 2, 15, 16], they remain ubiquitous. Practitioners have developed a wide\nrange of manual adjustments to overcome these limitations, from adjusting the electrode position\nto isolate a single neuron, to manually performing the clustering for spike identi\ufb01cation. However,\nprevious studies have shown that there is great variability in manual sorting results [17], and that\nhuman choices for cluster parameters are often suboptimal [18]. As such, there is a need for a fully\nautomated sorting method that avoids these errors. This need is becoming ever more urgent as the\nuse of multi-electrode arrays increases ([19]): manual parameter selection for a multi-dimensional\nclustering problem becomes more dif\ufb01cult and time-consuming as the number of electrodes grows.\n\nWe formulate the spike sorting problem as a Bayesian estimation problem by incorporating a prior\nmodel for the spikes and assuming a linear-Gaussian model for the recording given the spikes [2, 4].\nAlthough the generative model is simple, inferring the spike times and waveforms is challenging.\nWe approximate the most likely spikes and waveform shapes given the recording (i.e. the maximum-\na-posteriori, or MAP solution), by alternating between solving for the spike times while \ufb01xing the\nwaveforms and vice versa. Solving for optimal spike times and amplitudes with \ufb01xed waveform\nshapes is itself an NP-hard problem, and we employ a novel method called continuous basis pursuit\n[20, 21], combined with iterative reweighting techniques, to approximate its solution. We compare\nour method with clustering on simulated and real data, demonstrating substantial reduction in spike\nidenti\ufb01cation errors (both misses and false positives), particularly when spikes overlap in the signal.\n\n2 Model of voltage trace\n\nThe major de\ufb01ciency of clustering is that each time segment is modeled as a noisy version of a single\ncentered waveform rather than a noisy superposition of multiple, time-shifted waveforms. A simple\ngenerative model for the observed voltage trace V (t) is summarized as follows:\n\nN\n\nKn\n\nV (t) =\n\naniWn(t \u2212 \u03c4ni) + \u03b7(t)\n\nXn=1\n\nXi=1\n\n{\u03c4ni}Kn\n{ani}Kn\n\ni=1 \u223c Poisson Process(\u03bbn)\ni=1 \u223c N (1, \u01eb2\nn)\n\nn = 1, ..., N\n\nn = 1, ..., N\n\n(1)\n\n(2)\n\nIn words, the spikes are a Poisson processes with known rates {\u03bbn} and amplitudes independently\nnormally distributed about unity. The trace is the sum of convolutions of the spikes with their re-\nspective waveforms W \u2261 {Wn(t)}N\nn=1 along with Gaussian noise \u03b7(t) (note: other log-concave\nnoise distributions can be used). Here, Kn is the (Poisson-distributed) number of spikes of the n\u2019th\nwaveform in the signal. Thus, the model accounts for superimposed spikes, variability in spike\namplitude, as well as background noise. The model can easily be generalized to multielectrode\nrecordings by making V (t) and the Wn(t)\u2019s vector-valued, but to simplify notation we assume a\nsingle electrode. Note also that since the model describes the full voltage trace, it does not require a\n\n2\n\n\f \n\n2\nC\nP\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n7\n\n2\n\n1\n\n5\n\n4\n\n6\n\n3\n\n \n\n\u221220\n\n0\n\nPC 1\n\n20\n\n(b)\n\n(a)\n\n \n\n1\n\n2\n\n3\n\n(c)\n\n4\n\n5\n\n6\n\n7\n\n20\n\n10\n\n0\n\n \n\n2\nC\nP\n\n\u221210\n\n\u221220\n\n10\n\n\u221210\n\n0\nPC 1\n(d)\n\nFigure 1: Illustration of clustering on simulated data. (a) Threshold/windowing procedure. Peaks are\nidenti\ufb01ed using a threshold (horizontal lines) and windows are drawn about them (vertical lines) to\nidentify segments. (b) Plot of the segments projected onto the \ufb01rst two principal components. Color\nindicates the output of k-means clustering (k = 3). (c) The top-left plot shows the true waveforms\nused in this example. The other plots indicate the waveforms whose projections are the black points\nin (b).(d) Another example of simulated data with a single biphasic waveform (not shown). The\nprojections of the spikes can have a non-Gaussian distribution in PC space. Two clusters arise\nbecause the waveform has two peaks around which the segments can be centered.\n\nthresholding/windowing preprocessing stage, which can lead to additional artifacts (e.g., Fig 1(d)).\nThe priors on the spike trains account for the observed variability in spike amplitudes and aver-\nage spike rates with minimal assumptions. We are interested in the maximum-a-posteriori (MAP)\nsolution of the waveforms and spike times and amplitudes given the observed voltage trace V (t):\n\narg max\n\nP ({ani}, {\u03c4ni}, W|V (t))\n\n{ani},{\u03c4ni},W\n\n(3)\n\n= arg max\n\nlog(P (V (t)|{ani}, {\u03c4ni}, W)) + log(P ({ani}, {\u03c4ni}, W))\n\n{ani},{\u03c4ni},W\n\nIn the following sections, we describe a procedure to approximate this solution.\n\n3\n\nInference methods\n\n3.1 Objective function\n\nMAP estimation under the model described in Eq. (2) and Eq. (1) boils down to solving:\n\nmin\n\n{ani},{\u03c4ni},W\n\n1\n2\n\nkV (t)\u2212Xn,i\n\naniWn(t\u2212\u03c4ni)k2\n\n2,\u03a3+Xn,i (cid:20) (ani \u2212 1)2\n\n2\u01eb2\nn\n\n+\n\n1\n2\n\nlog(2\u03c0\u01eb2\n\nn) \u2212 log(\u03bbn)(cid:21) (4)\n\nwhere k~xk2,\u03a3 = k\u03a3\u22121/2~xk2 and \u03a3 is the noise covariance. Direct inference of the parameters is a\nhighly nonlinear and intractable problem. However, we can make the problem tractable by using a\nlinear representation for time-shifted waveforms. The simplest such representation uses a dictionary\ncontaining discretely time-shifted copies of the waveforms themselves {Wn(t \u2212 i\u2206)}n,i. We chose\n\n3\n\n\fto use a more accurate and ef\ufb01cient dictionary to represent continuously time-shifted waveforms in\nthe context of sparse optimization, which relies on trigonometrically varying coef\ufb01cients [21]:\n\nN\n\nXn=1\n\nKn\n\nXi=1\n\naniWn(t \u2212 \u03c4ni) \u2248\n\n=\n\nN\n\nUn(t \u2212 i\u2206)\n\nXn=1Xi Cn(t \u2212 i\u2206)\nVn(t \u2212 i\u2206) !\nXn=1Xi Cn(t \u2212 i\u2206)\nVn(t \u2212 i\u2206) !\n\nUn(t \u2212 i\u2206)\n\nN\n\nT\n\n\u03b4\n\n)\n\nani\n\nanirn cos( 2\u03c4ni\u03b8n\nanirn sin( 2\u03c4ni\u03b8n\n\nT \uf8eb\n\u2206 ) \uf8f6\n\uf8ed\n\uf8f8\n xni1\nxni3 ! = (\u03a6W~x)(t)\n\nxni2\n\n(5)\n\nThe dictionary \u03a6W contains shifted copies of the functions Cn(t), Un(t), Vn(t) that approximate\nthe space of time-shifted waveforms. The functions Cn(t), Un(t), and V (t), as well as the constants\nrn and \u03b8n depend on the waveform Wn(t) and are explained in Fig. 2(b). We can then solve the\nfollowing optimization problem:\n\nmin\n~x,W\n\nF (~x, W) such that\n\nxni2 \u2265 rn cos(\u03b8n)xni1, \u2200n, i\nni3 \u2264 rnxni1, \u2200n, i\n\nni2 + x2\n\n(6)\n\nwhere F (~x, W) =\n\n1\n2\n\nkV (t) \u2212 (\u03a6W~x)(t)k2\n\nlog(cid:0)(1 \u2212 \u03bbn\u2206)\u03b4(xni1) + (\u03bbn\u2206)\u03c61,\u01eb2\n\nn\n\n(xni1)(cid:1)\n\nwhere \u03c6\u00b5,\u03c32(.) is the Gaussian density function.The constraints on ~x in Eq. (6) ensure that each\ntriplet (xni1, xni2, xni3) is consistent with the mapping de\ufb01ned in Eq. 5, with xni1 being the ampli-\ntude and \u2206\natan (xni3/xni2) being the time-shift associated with the waveform Wn(t) (see [21]\n2\u03b8n\nfor a detailed development of this approach). The constrained region, denoted by C, is convex and\nis illustrated as sections of cones in Fig. 2(c). Note that we have used the Bernoulli discrete-time\nprocess with a spacing \u2206 (matching the interpolation dictionary spacing) to approximate the Poisson\nprocess described in Eq. (2). Even with this linear representation, the problem is not jointly convex\nin W and ~x, and is not convex in ~x for \ufb01xed W. The optimization of Eq. (6) resembles that of [22]\nand other sparse-coding objective functions with the following important differences: (1) the dictio-\nnary is translation-invariant and interpolates continuous time-shifts, (2) there is a constraint on the\ncoef\ufb01cients ~x due to the interpolation, and (3) there is a nonconvex mixture prior on the coef\ufb01cients\nto model the spike amplitudes. We propose a block coordinate descent procedure to solve Eq. (6).\nAfter initializing W randomly, we iterate the following steps:\n\npx2\n2,\u03a3 \u2212Xn,i\n\n1. Given W, approximately solve for ~x.\n2. Perform a rescaling xnij \u2190 xnij\nzn\n\noptimize F (h xnij\n\nzn i , {znWn(t)}).\n\nand Wn(t) \u2190 znWn(t) where the zn\u2019s are chosen to\n\n3. Given ~x, solve for W, constraining kWn(t)k2 to be less than or equal to its current value.\n\nThe \ufb01rst step minimizes successive convex approximations of F and is the most involved of the\nthree. The second is guaranteed to decrease F and amounts to N scalar optimizations. The \ufb01nal\nstep minimizes the \ufb01rst term with respect to the waveforms while keeping the second term constant,\nand amounts to an L2-constrained least squares problem (ridge regression) that can be solved very\nef\ufb01ciently. The following sections provide details of each of the steps.\n\n3.2 Solve spikes given waveforms\n\nIn this step we wish to minimize the function F (\u00b7, W) while ensuring that the solution lies in the\nconvex set C. However, this function is nonconvex and nonsmooth due to the second term in Eq. (6).\nThis especially causes problems when the current estimates of W are far from the optimal values,\nsince in this case there are many intermediate amplitudes between 0 and 1. To get around this, we\nreplace each summand in the second term by a relaxation:\n\nG(xni1) = \u2212 log(cid:18)(1 \u2212 \u03bbn\u2206)Z \u221e\n\n0\n\n1\n\u03b3\n\ne\u2212 xni1\n\n\u03b3 P (\u03b3)d\u03b3 + (\u03bbn\u2206)\u03c61,\u01eb2\n\nn\n\n(xni1)(cid:19)\n\n(7)\n\n4\n\n\fM\n\nf-\u2206/2\n\nf0\n\nc\n\nf\u2206/2\n\n(a)\n\n(b)\n\n(c)\n\nFigure 2: (a) Illustration of the circle approximation in [21]. The manifold M of translates of a\nfunction f (t) lies on the hypersphere since translation preserves norm (black curve). This can be\nlocally approximated by a circle (red curve). The approximation is exact at 3 equally-spaced points\n(black dots). (b) Visualization in the plane on which the three translates of f (t) lie. The quantities\nr and \u03b8 can be derived analytically for a \ufb01xed f (t) and spacing \u2206. (c) These circle approximations\ncan be linked together to form a piecewise-circular approximation of the entire manifold.\n\nwhich replaces the delta function at 0 with a mixture of exponential distributions. We chose the\nparameter \u03b3 to be Gamma-distributed about a \ufb01xed small value. We solve this approximation using\nan iterative reweighting scheme.The weights are initialized to be uniform w(0)\nni = \u03bbn, \u2200n, i. Then\nthe following updates are iterated computed:\n\n~x(t+1) \u2190 arg min\n\n~x\u2208C\n\n1\n2\n\nw(t+1)\n\nni \u2190\n\n)\n\nG(x(t+1)\nni1\nx(t+1)\nni1\n\nkV (t) \u2212 (\u03a6W~x)(t)k2\n\n2 +Xn,i\n\nw(t)\n\nni |xni1|\n\n(8)\n\n(9)\n\nEq. (8) is a convex optimization that can be solved ef\ufb01ciently. The weights are updated so that the\nsecond term in Eq. (8) is exactly the negative log prior probability of the previous solution ~x(t). If a\ncoef\ufb01cient is 0, its weight is \u221e and the corresponding basis function is discarded. Such reweighting\nprocedures have been used to optimize a nonconvex function by a series of convex optimizations\n[23, 24, 25]. Although there is no convergence guarantee, we \ufb01nd that it works well in practice.\n\n3.3 Solve rescaling factors\n\nThe \ufb01rst term of F (~x, {Wn(t)}) does not change by much if one divides the coef\ufb01cients xnij by\n1. The second term does change under\nsome zn and multiplies the corresponding waveform by zn\nsuch a rescaling. In order to avoid the solution where the waveforms/coef\ufb01cients become arbitrarily\nlarge/small, respectively, we perform a rescaling in a separate step and then optimize the waveform\nshapes subject to a \ufb01xed norm constraint (described in the next section). Since the second term\ndecomposes into terms that are each only dependent on one zn, we can independently solve the\nfollowing scalar optimizations numerically:\n\nzn \u2190 arg max\n\nz>0 Xi\n\nlog(cid:18)(1 \u2212 \u2206\u03bbn)\n\n1\n\u03b3\n\ne\u2212 xni1\n\nz\u03b3 + \u2206\u03bbn\u03c61,\u01eb2\n\nz (cid:17)(cid:19)\nn(cid:16) xni1\n\nn = 1, ..., N\n\n(10)\n\nThese are essentially maximum likelihood estimates of the scale factors given \ufb01xed coef\ufb01cients and\nwaveform shapes. One then performs the updates:\n\n1If \u03a6W is linear in W, there is no change. For our choice of \u03a6W, there is a small change of order O(\u2206).\n\n5\n\n\fxnij \u2190\n\nxnij\nzn\n\n\u2200n, i, j\n\nWn(t) \u2190 znWn(t)\n\n\u2200n\n\n(11)\n\n(12)\n\nThis step is guaranteed not to increase the objective in Eq. (6) since the \ufb01rst term is held constant\n(up to a small error term, see footnote) and the second term cannot increase.\n\n3.4 Solve waveforms given spikes\n\nGiven a set of coef\ufb01cients ~x, we can optimize waveform shapes by solving:\n\nmin\n\nW:kWi(t)k2\u2264ki\n\n1\n2\n\nkV (t) \u2212 (\u03a6W~x)(t)k2\n2\n\n(13)\n\nwhere ki is the current norm of Wi(t). The constraints ensure that only the waveform shapes change\n(ideally, we would like the norm to be held \ufb01xed, but we relax to to an inequality to retain convexity),\nleaving any changes in scale to the previous step. Since (\u03a6W~x)(t) is approximately a linear function\nof the waveforms, Eq. (13) is a standard ridge regression problem. Ef\ufb01cient algorithms exist for\nsolving this problem in its dual form ([26]). This step is guaranteed to decrease the objective in\nEq. (6) since the second term is held constant and the \ufb01rst term can only decrease.\n\n4 Results\n\nWe applied our method to two data sets. The \ufb01rst was simulated according to the generative model\ndescribed in Eq. (2-1). The second is real data from Harris et al. ([18]) consisting of simultaneous\npaired intracellular/extracellular recordings. The intracellular recording provides ground truth spikes\nfor one of the cells in the extracellular recording.\n\n4.1 Simulated data\n\nWe obtained three waveforms from retinal recordings made in the Chichilnisky lab at the Salk Insti-\ntute (shown in Fig. 3(a)). Three Poisson spike trains were sampled independently with rate (1\u2212\u03c1)\u03bb0\nwith \u03bb0 = 10Hz. To introduce a correlation of \u03c1 = 1\n3 , we sampled another Poisson spike train with\nrate \u03c1\u03bb0 and added these spikes (with random jitter) to each of the previous three trains. Spike\namplitudes were drawn from N (1, 0.12). The spikes were convolved with the waveforms and Gaus-\nsian white noise was added (with \u03c3 six times the smallest waveform amplitude). For clustering, the\noriginal trace was thresholded to identify segments(the threshold was varied in order to see the error\ntradeoff). PCA was applied and the leading PC\u2019s explaining 95% of the total variance were retained.\nK-means clustering was then applied (with k = 3) in the reduced space.\nTo reduce computational cost, we applied our method to disjoint segments of the trace, which were\nsplit off whenever activity was less than 3\u03c3 for more than half the waveform duration (about 4ms).\nThe waveforms were initialized randomly and P (\u03b3) was Gamma-distributed with mean 0.0005 and\ncoef\ufb01cient of variation 0.25 (in Eq. (7)) for all experiments. The waveforms were allowed to change\nin length by adding (removing) padding on the ends on each iteration if the values exceeded (did not\nexceed) 5% of the peak amplitude (similar to [7]). Padding was added in increments of 10% of the\ncurrent waveform length. Convex optimizations were performed using the CVX package ([27]). The\nlearned waveforms and spike amplitude distributions are shown in Fig. 3. The amplitude distribu-\ntions are well-matched to the generative distributions (shown in red). To evaluate performance, we\ncounted missed spikes (relative to the number of true spikes) and false positives (relative to the num-\nber of predicted spikes) for clustering and our method. We varied the segment-\ufb01nding threshold for\nclustering, and the amplitude threshold for our algorithm. The error tradeoff is shown in Fig. 4(a),\nand indicates that our method reduces both types of errors.\n\nTo visualize the errors, we chose optimal thresholds for each method (yielding the smallest number\nof misses and false positives), and then projected all segments used in clustering onto the \ufb01rst two\nprincipal components. We indicate by dots, open circles, and crosses the hits, misses, and false\n\n6\n\n\fpositives, respectively (with colors indicating the waveform). For the same segments, we illustrate\nthe behavior of our method in the same space. Note that unlike clustering, our method is allowed to\nassign more than one spike to each segment. The visualization is shown in Figures 4(b) and 4(c),\nand shows how clustering fails to account for the superimposed spikes, while our method eliminates\na large portion of these errors. We found that this improvement was robust to the amount of noise\nadded to the original trace (not shown).\n\n10\n\n0\n\n\u221210\n\n)\ns\nt\ni\n\nn\nu\n\u03c3\n(\n \n\n \n\ne\nd\nu\n\nt\ni\nl\n\np\nm\na\n\n\u221210\n\n0\n\nsamples\n(a)\n\n10\n\nq\ne\nr\nf\n \n\ne\nv\ni\nt\n\nl\n\na\ne\nr\n\n6\n\n4\n\n2\n\n0\n0\n\n10\n\n20\namplitude (\u03c3 units)\n\n(b)\n\nq\ne\nr\nf\n \n\ne\nv\ni\nt\n\nl\n\na\ne\nr\n\n6\n\n4\n\n2\n\n0\n0\n\n10\n\n20\namplitude (\u03c3 units)\n\n(c)\n\nq\ne\nr\nf\n \n\ne\nv\ni\nt\n\nl\n\na\ne\nr\n\n6\n\n4\n\n2\n\n0\n0\n\n10\n\n20\namplitude (\u03c3 units)\n\n(d)\n\nFigure 3: (a) Three waveforms used in simulations. (b),(c),(d) Histograms of the spike amplitudes\nlearned by our algorithm of the blue,green, and red waveforms, respectively. The amplitudes were\nconverted into units \u03c3 by multiplying them by the corresponding waveform amplitudes, then divid-\ning by the noise standard deviation. The red line indicates the generative density, corresponding to\na Gaussian with mean 1 and standard deviation 0.1.\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\ne\ns\na\n\nl\n\ni\n\nf\n \ns\ne\nk\np\ns\n \n.\nt\ns\ne\n\n \nt\n\nn\ne\nc\nr\ne\nP\n\n \n\n0\n0\n\n \n\nk=3\nCBP\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n \n\n2\nC\nP\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n \n\n2\nC\nP\n\n30\n\n\u221230\n\n\u221220\n\n\u221210\n\n0\n\nPC 1\n\n(b)\n\n10\n\n20\n\n30\n\n\u221230\n\n\u221220\n\n\u221210\n\n10\n\n20\n\n30\n\n0\n\nPC 1\n\n(c)\n\n10\n\n20\n\nPercent true spikes missed\n\n(a)\n\nFigure 4: (a) Tradeoff of misses and false positives as the segment-identi\ufb01cation threshold in clus-\ntering is varied (blue), and the amplitude threshold for our method (red) is varied. Diagonal lines\nindicate surfaces with equal total error. (b),(c) Visualization of spike sorting errors for clustering\n(b) and our method (c). Each point is a threshold-crossing segment in the signal, projected onto the\n\ufb01rst two principal components. Dots represent segments whose composite spikes were all correctly\nidenti\ufb01ed, with the color specifying the waveform (see Fig. 3(a)). Open circles and crosses repre-\nsent misses and false positives, respectively. The thresholds were optimized for each method, and\ncorrespond to the enlarged dots in (a).\n\n4.2 Real data\n\nWe used one electrode from the tetrode data in [18] to simplify our analysis. The raw trace was high-\npass \ufb01ltered (800Hz) to remove slow drift. The noise standard deviation was estimated from regions\nnot exceeding three times the overall standard deviation. We then repeated the same analysis as\nfor the simulated data. The resulting waveforms and coef\ufb01cients histograms are shown in Figure 5.\nUnlike the simulated example, the spike amplitude distributions are bimodal in nature, despite the\nprior amplitude distribution containing only one Gaussian. We \ufb01rst focus on the high-amplitude\ngroups (2 and 4), both of which are well-separated from their low-amplitude counterparts (1 and\n3), suggesting that an appropriately chosen threshold would provide accurate spike identi\ufb01cation for\nthe ground-truth cell (4). Figure 6(a) con\ufb01rms this, showing that our method provides substantial\nreduction in misses/false positives. Figures 6(b) and 6(c) show that, as before, the majority of this\nreduction is accounted for by recovering spikes overlapping with those of another cell (group 2).\nThe low-amplitude groups (1 and 3) could arise from background cells whose waveforms look like\nscaled-down versions of those of the foreground cells 2 and 4, thus creating secondary \u201clumps\u201d in\nthe amplitude distributions. The projections of the events in these groups are labeled in Figures 6(b)\n\n7\n\n\fand 6(c), showing that it is unclear whether they arise from noise or one or two background cells. It\nis up to the user whether to interpret these badly-isolated groups as cells.\n\n10\n\n0\n\n\u221210\n\n)\ns\nt\ni\n\nn\nu\n\u03c3\n(\n \ne\nd\nu\n\nt\ni\nl\n\np\nm\na\n\nother\ncell\n\nground\ntruth\ncell\n\n\u221220\n\n20\n\n0\n\nsamples\n(a)\n\n6\n\n4\n\n2\n\nq\ne\nr\nf\n \n\ne\nv\ni\nt\n\nl\n\na\ne\nr\n\n0\n0\n\n1\n\nother cell\n\n2\n\n10\n\n20\namplitude (\u03c3 units)\n\n(b)\n\nq\ne\nr\nf\n \n\ne\nv\ni\nt\n\nl\n\na\ne\nr\n\n6\n\n4\n\n2\n\n3\n\n0\n0\n\nground truth cell\n\n4\n\n10\n\n20\namplitude (\u03c3 units)\n\n(c)\n\nFigure 5: (a) Two waveforms learned from CBP. (b),(c) Distributions of the amplitude values for\nthe blue and green waveform, respectively. The numbers label distinct groups of amplitudes that\ncould be treated as spikes of a single cell. Group 4 corresponds to the ground truth cell. Group 2\ncorresponds to another foreground cell. Groups 1 and 3 likely correspond to a mixture of background\ncell activity and noise. The groups are labeled in PC-space in Figures 6(b) and 6(c).\n\ne\ns\na\n\nl\n\ni\n\nf\n \ns\ne\nk\np\ns\n \n.\nt\ns\ne\n \nt\nn\ne\nc\nr\ne\nP\n\n20\n\n15\n\n10\n\n5\n\n \n\n0\n0\n\n \n\nk=2\nk=3\nk=4\nCBP\n\n5\n\nPercent true spikes missed\n\n10\n\n15\n\n20\n\n(a)\n\n \n\n2\nC\nP\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n\u221215\n\n\u221210\n\noverlapping\n\nMissed\nspikes\n\n2\n\n1\n\n3\n\n0\n\n4\n\n10\nPC 1\n\n(b)\n\n20\n\n2\n \nC\nP\n\n15\n\n10\n\n5\n\n0\n\n\u22125\n\n\u221210\n\n\u221215\n\n\u221210\n\n2\n\n1\n\n3\n\n0\n\n4\n\n10\nPC 1\n\n(c)\n\n20\n\nFigure 6: (a) Error tradeoff as in Fig. 4(a). The blue, green, and red curves are results of k-means\nclustering for different k. (b) Illustration of clustering errors in PC-space, with k = 4 and a threshold\ncorresponding to the large red dot in (a). (c) Errors for our method with threshold corresponding\nto the large black dot. The numbers show the approximate location in PC-space of the amplitude\ngroups demarcated in Figures 5(b) and 5(c).\n\n5 Discussion\n\nWe have formulated the spike sorting problem as a maximum-a-posteriori (MAP) estimation prob-\nlem, assuming a linear-Gaussian likelihood of the observed trace given the spikes and a Poisson\nprocess prior on the spikes. Unlike clustering methods, the model explicitly accounts for over-\nlapping spikes, translation-invariance, and variability in spike amplitudes. Unlike other methods\nthat handle overlapped spikes (e.g., [10]), our method jointly learns waveforms and spikes within a\nuni\ufb01ed framework. We derived an iterative procedure based on block-coordinate descent to approx-\nimate the MAP solution. We showed empirically on simulated data that our method outperforms the\nstandard clustering approach, particularly in the case of superimposed spikes. We also showed that\nour method yields an improvement on a real data set with ground truth, despite the fact that there\nare similar waveform shapes with different amplitudes. The majority of improvement in this case is\nalso accounted for by identifying superimposed spikes. Our method has only a few parameters that\nare stable across a variety of conditions, thus addressing the need for an automated method for spike\nsorting that is not susceptible to systematic errors.\n\nReferences\n\n[1] M. S. Lewicki. A review of methods for spike sorting: the detection and classi\ufb01cation of neural action\n\npotentials. Network, 9(4):R53\u2013R78, Nov 1998.\n\n[2] M. Sahani. Latent variable models for neural data analysis. PhD thesis, California Institute of Technol-\n\nogy, Pasadena, California, 1999.\n\n[3] M Wehr, J S Pezaris, and M Sahani. Simultaneous paired intracellular and tetrode recordings for evaluat-\n\ning the performance of spike sorting algorithms. Neurocomputing, 26-27:1061\u20131068, 1999.\n\n8\n\n\f[4] Maneesh Sahani, John S. Pezaris, and Richard A. Andersen. On the separation of signals from neighboring\ncells in tetrode recordings. In In Advances in Neural Information Processing Systems 10, pages 222\u2013228.\nMIT Press, 1998.\n\n[5] P. H. van Cittert. Zum ein\ufb02u der spaltbreite auf die intensittsverteilung in spektrallinien. ii. Zeitschrift fr\n\nPhysik A Hadrons and Nuclei, 69:298\u2013308, 1931. 10.1007/BF01391351.\n\n[6] J. Mendel. Optimal Seismic Deconvolution: An Estimation Based Approach. Academic Press, 1983.\n[7] Evan Smith and Michael S Lewicki. Ef\ufb01cient coding of time-relative structure using spikes. Neural\n\nComputation, 17(1):19\u201345, Jan 2005.\n\n[8] Roger Grosse Rajat Raina, Helen Kwong, and Andrew Y. Ng. Shift-invariant sparse coding for audio\n\nclassi\ufb01cation. In UAI, 2007.\n\n[9] J W Pillow, J Shlens, L Paninski, A Sher, A M Litke, E J Chichilnisky, and E P Simoncelli. Spatio-\ntemporal correlations and visual signaling in a complete neuronal population. Nature, 454(7206):995\u2013\n999, Aug 2008.\n\n[10] Jason S. Prentice, Jan Homann, Kristina D. Simmons, Gaper Tkaik, Vijay Balasubramanian, and Philip C.\nNelson. Fast, scalable, bayesian spike identi\ufb01cation for multi-electrode arrays. PLoS ONE, 6(7):e19884,\n07 2011.\n\n[11] R. Quian Quiroga, Z. Nadasdy, and Y. Ben-Shaul. Unsupervised spike detection and sorting with wavelets\n\nand superparamagnetic clustering. Neural Comput., 16:1661\u20131687, August 2004.\n\n[12] Ki Yong Kwon and K. Oweiss. Wavelet footprints for detection and sorting of extracellular neural action\npotentials. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference\non, pages 609 \u2013612, may 2011.\n\n[13] Markus Meister, Jerome Pine, and Denis A. Baylor. Multi-neuronal signals from the retina: acquisition\n\nand analysis. Journal of Neuroscience Methods, 51(1):95 \u2013 106, 1994.\n\n[14] S. Nirenberg, S. M. Carcieri, A. L. Jacobs, and P. E. Latham. Retinal ganglion cells act largely as inde-\n\npendent encoders. Nature, 411:698\u2013701, 2001.\n\n[15] R. Segev, J. Goodhouse, J. Puchalla, and M. J. Berry. Recording spikes from a large fraction of the\n\nganglion cells in a retinal patch. Nature Neuroscience, 7(10):1154\u20131161, October 2004.\n\n[16] C. Pouzat, O. Mazor, and G. Laurent. Using noise signature to optimize spike-sorting and to assess\n\nneuronal classi\ufb01cation quality. J Neurosci Methods, 122(1):43\u201357, 2002.\n\n[17] Frank Wood, Michael J. Black, Carlos Vargas-irwin, Matthew Fellows, and John P. Donoghue. On the\n\nvariability of manual spike sorting. IEEE Transactions on Biomedical Engineering, 51:912\u2013918, 2004.\n\n[18] Kenneth D. Harris, Darrell A. Henze, Jozsef Csicsvari, Hajime Hirase, Kenneth D, Darrell A. Henze, and\nJozsef Csicsvari. Accuracy of tetrode spike separation as determined by simultaneous intracellular and\nextracellular measurements. J Neurophysiol, 84:401\u2013414, 2000.\n\n[19] Emery N. Brown, Robert E. Kass, and Partha P. Mitra. Multiple neural spike train data analysis: state-of-\n\nthe-art and future challenges. Nature neuroscience, 7(5):456\u2013461, May 2004.\n\n[20] C Ekanadham, D Tranchina, and E P Simoncelli. Sparse decomposition of transformation-invariant sig-\nnals with continuous basis pursuit. In Proc. Int\u2019l Conf Acoustics Speech Signal Processing (ICASSP), Los\nAngeles, CA, May 22-27 2011. IEEE Sig Proc Society.\n\n[21] C Ekanadham, D Tranchina, and E P Simoncelli. Sparse decomposition of translation-invariant signals\nwith continuous basis pursuit. IEEE Transactions on Signal Processing, 2011. Accepted for publication.\n[22] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning a sparse\n\ncode for natural images. Nature, 381(6583):607\u2013609, Jun 1996.\n\n[23] Ingrid Daubechies, Ronald DeVore, Massimo Fornasier, and C. Sinan Gntrk. Iteratively reweighted least\nsquares minimization for sparse recovery. Communications on Pure and Applied Mathematics, 63(1):1\u2013\n38, 2010.\n\n[24] Emmanuel J. C. Enhancing sparsity by reweighted 1 minimization. J. Fourier Analysis and Applications,\n\npages 877\u2013905, 2008.\n\n[25] R. Chartrand and Wotao Yin. Iteratively reweighted algorithms for compressive sensing. In Acoustics,\nSpeech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3869 \u2013\n3872, 31 2008-april 4 2008.\n\n[26] Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y. Ng. Ef\ufb01cient sparse coding algorithms.\n\nAdvances in Neural Information Processing Systems 19, pages 801\u2013808. 2007.\n\nIn\n\n[27] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21.\n\nhttp://cvxr.com/cvx, October 2010.\n\n9\n\n\f", "award": [], "sourceid": 831, "authors": [{"given_name": "Chaitanya", "family_name": "Ekanadham", "institution": null}, {"given_name": "Daniel", "family_name": "Tranchina", "institution": null}, {"given_name": "Eero", "family_name": "Simoncelli", "institution": null}]}