{"title": "Robust Estimation of Neural Signals in Calcium Imaging", "book": "Advances in Neural Information Processing Systems", "page_first": 2901, "page_last": 2910, "abstract": "Calcium imaging is a prominent technology in neuroscience research which allows for simultaneous recording of large numbers of neurons in awake animals. Automated extraction of neurons and their temporal activity from imaging datasets is an important step in the path to producing neuroscience results. However, nearly all imaging datasets contain gross contaminating sources which could originate from the technology used, or the underlying biological tissue. Although past work has considered the effects of contamination under limited circumstances, there has not been a general framework treating contamination and its effects on the statistical estimation of calcium signals. In this work, we proceed in a new direction and propose to extract cells and their activity using robust statistical estimation. Using the theory of M-estimation, we derive a minimax optimal robust loss, and also find a simple and practical optimization routine for this loss with provably fast convergence. We use our proposed robust loss in a matrix factorization framework to extract the neurons and their temporal activity in calcium imaging datasets. We demonstrate the superiority of our robust estimation approach over existing methods on both simulated and real datasets.", "full_text": "Robust Estimation of Neural Signals in Calcium\n\nImaging\n\nHakan Inan 1\n\ninanh@stanford.edu\n\nMurat A. Erdogdu 2,3\n\nerdogdu@cs.toronto.edu\n\nMark J. Schnitzer 1,4\n\nmschnitz@stanford.edu\n\n1Stanford University 2Microsoft Research 3Vector Institute 4Howard Hughes Medical Institute\n\nAbstract\n\nCalcium imaging is a prominent technology in neuroscience research which allows\nfor simultaneous recording of large numbers of neurons in awake animals. Auto-\nmated extraction of neurons and their temporal activity from imaging datasets is an\nimportant step in the path to producing neuroscience results. However, nearly all\nimaging datasets contain gross contaminating sources which could originate from\nthe technology used, or the underlying biological tissue. Although past work has\nconsidered the effects of contamination under limited circumstances, there has not\nbeen a general framework treating contamination and its effects on the statistical\nestimation of calcium signals. In this work, we proceed in a new direction and\npropose to extract cells and their activity using robust statistical estimation. Using\nthe theory of M-estimation, we derive a minimax optimal robust loss, and also\n\ufb01nd a simple and practical optimization routine for this loss with provably fast\nconvergence. We use our proposed robust loss in a matrix factorization framework\nto extract the neurons and their temporal activity in calcium imaging datasets.\nWe demonstrate the superiority of our robust estimation approach over existing\nmethods on both simulated and real datasets.\n\nIntroduction\n\n1\nCalcium imaging has become an indispensable tool in systems neuroscience research. It allows\nsimultaneous imaging of the activity of very large ensembles of neurons in awake and even freely\nbehaving animals [3, 4, 6]. It relies on \ufb02uorescence imaging of intracellular calcium activity reported\nby genetically encoded calcium indicators. A crucial task for a neuroscientist working with calcium\nimaging is to extract signals (i.e. temporal traces and spatial footprints of regions of interest) from\nthe imaging dataset. This allows abstraction of useful information from a large dataset in a highly\ncompressive manner, losing little to no information. Automating this process is highly desirable, as\nmanual extraction of cells and their activities in large-scale datasets is prohibitively laborious, and\nprone to \ufb02awed outcomes.\nA variety of methods have been proposed for automated signal extraction in calcium imaging datasets,\nincluding the ones based on matrix factorization [13, 14, 15, 16], and image segmentation [1, 10].\nSome of these tools were tailored to two-photon calcium imaging, for which signal-to-noise ratio is\ntypically high, and the \ufb02uorescence background is fairly stable [3], whereas some targeted one-photon\nand microendoscopic calcium imaging [4, 5], which are often characterized by low SNR and large\nbackground \ufb02uctuations. Interestingly, least squares estimation has been a predominant paradigm\namong previous methods; yet there is no previous work addressing statistically the generic nature\nof calcium imaging datasets, which includes non-gaussian noise, non-cell background activity (e.g.\nneuropil), and overlapping cells not captured by algorithms (out-of-focus or foreground). As a\nconsequence, the impact of such impurities inherent in calcium imaging on the accuracy of extracted\nsignals has not been thoroughly investigated previously. This lack of focus on signal accuracy is\nworrisome as cell extraction is a fairly early step in the research pipeline, and \ufb02awed signals may\nlead to incorrect scienti\ufb01c outcomes.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fIn this work, we propose an approach which takes into account the practical nature of calcium imaging,\nand solves the signal extraction problem through robust estimation. First, we offer a mathematical\nabstraction of imaging datasets, and arrive at an estimator which is minimax robust, in the sense\nthat is prevalent in the \ufb01eld of robust estimation. We then use this M-estimator to solve a matrix\nfactorization problem, jointly yielding the temporal and spatial components of the extracted signals.\nThe main insight behind our robust estimation framework is that the signals present in imaging data\nare the superposition of many positive amplitude sources, and a lower amplitude noise component\nwhich could be well modeled by a normal distribution. That the majority of the components is\npositive stems from the fact that the underlying signals in calcium imaging are all made up of photons,\nand they elicit activity above a baseline as opposed to \ufb02uctuating around it. However, not all positive\nsources are cells that could be extracted by an algorithm (some could be neuropil, other noise, or\nnon-captured cells); hence we model them as generic gross non-negative contamination sources. By\nusing the machinery of robust estimation [7], we propose an M-estimator which is asymptotically\nminimax optimal for our setting.\nWe also propose a fast \ufb01xed-point optimization routine for solving our robust estimation problem.\nWe show local linear convergence guarantees for our routine, and we demonstrate numerically that it\nconverges very fast while only having the same per-step cost with gradient descent. The fast optimizer\nallows for very fast automated cell extraction in large-scale datasets. Further, since the \ufb01nal form for\nour loss function is simple and optimization only depends on matrix algebra, it is highly amenable to\nGPU implementation providing additional improvements.\nWe validate our robust estimation-based cell extraction algorithm on both synthetic and real datasets.\nWe show that our method offers large accuracy improvements over non-robust techniques in realistic\nsettings, which include classical scenarios such as overlapping cells and neuropil contamination.\nParticularly, our method signi\ufb01cantly outperforms methods with non-robust reconstruction routines in\nmetrics such as signal \ufb01delity and crosstalk, which are crucial for steps subsequent to cell extraction.\n\n2 M-Estimation under Gross Non-negative Contamination\nIn this section, we introduce our signal estimation machinery, based on the literature of robust\nM-estimation. The theory of M-estimation is well-developed for symmetric and certain asymmetric\ncontamination regimes [2, 7, 9, 12]; however the existing theory does not readily suggest an optimal\nestimator suitable for \ufb01nding the kind of signals present in \ufb02uorescence imaging of calcium in the\nbrain. We \ufb01rst motivate and introduce a simple mathematical abstraction for this new regime, and\nthen derive a minimax optimal M-estimator.\n\n2.1 Noise Model & Mathematical Setting\n\nFor simplicity, we consider the setting of location estimation, which straightforwardly generalizes to\nmultivariate regression.\nConsidering the nature of contamination in calcium imaging datasets, we base our noise model on\nthe following observation: The signal background is dominated by the baseline activity which is well\nmodeled by a normal distribution. This type of noise stems from the random arrivals of photons from\nthe background in the imaging setup governed by a poisson process; this distribution very rapidly\nconverges to a normal distribution. However, the signal background also contains other sources\nof noise such as neuropil activity, out-of-focus cells, and residual activity of overlapping cells not\naccounted for by the cell extraction method. The latter kind of contamination is very distinct from\na normal-type noise; it is non-negative (or above the signal baseline), its characteristics are rather\nirregular and it may take on arbitrarily large values.\nConsequently, we model the data generation through an additive noise source which is normally\ndistributed 1 \u2212 \u0001 fraction of the time, and free to be any positive value greater than a threshold\notherwise:\n\n(1)\n\n(2)\n\n(cid:26)N (0, 1),\n\nH\u03b1,\n\nyi = \u03b2\u2217 + \u03c3i\n\u03c3i \u223c\nH\u03b1 \u2208 H\u03b1 = {All distributions with support [\u03b1,\u221e)}, \u03b1 \u2265 0.\n\nw.p. 1 \u2212 \u0001\nw.p. \u0001\n\n2\n\n\fFigure 1: One-sided Huber. (a) loss function of one-sided Huber (\u03c1) and its derivative (\u03c8) for \u03ba = 2. (b)\nOne-sided Huber yields lower MSE compared to other known M-estimators under the distribution which causes\nthe worst-case variance for any given estimator (for \u0001 = 0.1).\n\nIn above, \u03b2\u2217 is the true parameter, and is corrupted additively as in (1); \u03c3i is a standard normal with\n1 \u2212 \u0001 probability, and distributed according to an unknown distribution H\u03b1 with probability \u0001. In the\nspirit of full generality, we allow H\u03b1 to be any probability distribution with support greater than a set\nvalue \u03b1; particularly, it could be nonzero at arbitrarily large values. Therefore, \u0001 could be interpreted\nas the gross contamination level. The parameter \u03b1 could be interpreted as the minimum observed\nvalue of the positive contamination, although its exact value is insigni\ufb01cant outside our theoretical\nanalysis. We denote the full noise distribution by FH\u03b1, subscripted by H\u03b1.\nGiven the observations {yi}n\nariant M-estimator as follows\n\ni=1, we estimate the true parameter \u03b2\u2217 with \u02c6\u03b2 by considering an equiv-\n\n\u03c1(yi \u2212 \u03b2).\n\n(3)\n\nTypically, M-estimators are characterized by \u03c8 (cid:44) \u03c1(cid:48). In this paper, we are going to consider \u03c8\u2019s\nwith speci\ufb01c properties that allow for ef\ufb01cient optimization and more general theoretical guarantees.\nLet\u2019s de\ufb01ne a set \u03a8 = {\u03c8 | \u03c8 is non-decreasing} . If we choose an estimator \u03c8 \u2208 \u03a8, \ufb01nding a point\nestimate \u02c6\u03b2 through (3) becomes equivalent to solving the \ufb01rst order condition:\n\n\u03c8(yi \u2212 \u02c6\u03b2) = 0.\n\n(4)\n\nThis is simply because the members of \u03a8 correspond to convex loss functions. Our focus is on such\nfunctions since they are typically easier to optimize, and offer global optimality guarantees.\n2.2 One-Sided Huber Estimator and its Asymptotic Minimax Optimality\n\ni=1\n\nWe are interested in \ufb01nding an M-estimator for our noise model which is robust to the variation in the\nnoise distribution (H\u03b1 in particular) in the sense of minimizing the worst-case deviation from the\ntrue parameter, as measured by the mean squared error. We \ufb01rst introduce our proposed estimator,\nand then show that it is exactly optimal in the aforementioned minimax sense.\nDe\ufb01nition 1 (One-sided Huber). De\ufb01ne an estimator \u03c80 as follows:\n\n\u03c80(y, \u03ba) =\n\nif y < \u03ba\nif y \u2265 \u03ba,\n\n(5)\n\nwhere \u03ba is de\ufb01ned in terms of the contamination level, \u0001, according to\n\n\u03a6(\u03ba) +\n\ng(\u03ba)\n\n\u03ba\n\n1\n\n=\n\n(1 \u2212 \u0001)\n\n,\n\nwith \u03a6(\u00b7) and g(\u00b7) denoting the distribution and the density functions for a standard normal variable,\nrespectively.\nWe shall refer to \u03c80 as one-sided Huber, and denote with \u03c10(\u00b7, \u03ba) its loss function (see Figure 1\nfor visualization). Clearly, \u03c80 \u2208 \u03a8, and therefore the loss function \u03c10 is convex. Under the data\ngeneration model introduced in the previous section, we can now state an asymptotic minimax result\nfor \u03c80.\n\n3\n\nn(cid:88)\n\n\u02c6\u03b2 = argmin\n\n\u03b2\n\ni=1\n\nn(cid:88)\n\n(cid:26)y,\n\n\u03ba,\n\nloss,ab\fAlgorithm 1 Fast Solver for one-sided Huber Loss\n\nfunction fp_solve(X, Y, k, \u03b4)\n\n// X = [x1, . . . , xn]T , Y = [y1, . . . , yn]T\n\n1. Compute: X+ = (XT X)\u22121XT , \u03b2LS = X+Y\n2. Initialize \u03b2(0) at random, set t = 0.\n\n3. while(cid:13)(cid:13)\u03b2(t+1) \u2212 \u03b2(t)(cid:107)2 \u2265 \u03b4 do\n\n\u03b2(t+1) = \u03b2LS \u2212 X+ max(0, Y \u2212 X\u03b2(t) \u2212 \u03ba)\nt \u2190 t + 1.\n\n4. end while\n\nreturn \u03b2(t).\n\nProposition 2.1. One-sided Huber \u03c80 yields an asymptotically unbiased M-estimator for FH\u03ba =\n{(1 \u2212 \u0001)\u03a6 + \u0001H\u03ba}. Further, \u03c80 minimizes the worst case asymptotic variance in FH\u03ba , i.e.\n\n\u03c80 = arg inf\n\u03c8\u2208\u03a8\n\nsup\nF\u2208FH\u03ba\n\nV (\u03c8, F ).\n\nA proof for Proposition 2.1 is given in the supplementary material. Proposition 2.1 establishes that\nthat one-sided Huber estimator has zero bias as long as the non-zero contamination is suf\ufb01ciently\nlarger than zero, and it also achieves the best worst-case asymptotic variance.\nWe would like to offer a discussion for a comparison between one-sided Huber and some other\npopular M-estimators, such as the sample mean ((cid:96)2 loss), the sample median ((cid:96)1 loss), Huber [7],\nand the sample quantile. First of all, the sample mean, the sample median, and Huber estimators all\nhave symmetric loss functions and therefore suffer from bias. This is particularly detrimental for the\nsample mean and leads to unbounded MSE as the gross contamination tends to very large values.\nThe bias problem may be eliminated using a quantile estimator whose quantile level is set according\nto \u0001. However, this estimator has higher asymptotic variance than the one-sided Huber. We present\nin Figure 1b comparison of empirical mean square errors for different estimators under the noise\ndistribution which causes the worst asymptotic variance among distributions in FH\u03ba\n1. The MSEs of\nthe sample mean and the sample median quickly become dominated by their bias with increasing\nn2. Although the quantile estimator was set up to be unbiased, its MSE (or equivalently, variance) is\ngreater than the one-sided Huber. These results corroborate the theoretical properties of one-sided\nHuber, and af\ufb01rm it as a good \ufb01t for our setting.\nAlthough we have not come across a previous study of one-sided Huber estimator in this context, we\nshould note that it is related to the technique in [11], where samples are assumed to be nonnegative,\nand in the sample mean estimator summands are shrunk when they are above a certain threshold (this\ntechnique is called winsorizing). However, their model and application are quite different than what\nwe consider in this paper.\n2.3 Generalization to Regression Setting\n\nHere we introduce the regression setting which we will use for the remainder of the paper. We observe\n{yi, xi}n\ni=1, where xi \u2208 Rp could be either \ufb01xed or random, and yi\u2019s are generated according to\nyi = (cid:104)xi, \u03b2\u2217(cid:105) + \u03c3g\ni are as previously\nde\ufb01ned. We estimate \u03b2\u2217 with\n\ni , where \u03b2\u2217 \u2208 Rp is the true parameter, and \u03c3h\n\ni and \u03c3g\n\ni + \u03c3h\n\n\u02c6\u03b2 = argmin\n\nf\u03ba(\u03b2) :=\n\n\u03b2\n\n\u03c10(yi \u2212 (cid:104)xi, \u03b2(cid:105) , \u03ba).\n\n(6)\n\nn(cid:88)\n\ni=1\n\nClassical M-estimation theory establishes \u2013under certain regularity conditions\u2013 that the minimax\noptimality in Section 2.2 carries over to regression; we refer reader to [8] for details.\n3 Fast Fixed-point Solver for One-Sided Huber Loss\nWe are interested in solving the robust regression problem in (6) in the large-scale setting due to the\nlarge \ufb01eld of view and length of most calcium imaging recordings. Hence, the solver for our problem\n\n1Refer to the proof of Proposition 2.1 for the form of this distribution.\n2We omit Huber in this comparison since its MSE is also bias-dominated.\n\n4\n\n\fAlgorithm 2 Tractable and Robust Automated Cell Extraction\n\nfunction EXTRACT(M, N, \u03ba, \u03b4)\n\n1. Initialize S(0), T(0), set t = 0.\n2. for t=1 to N do\n\nT(t+1) = fp_solve_nonneg(S(t), M, \u03ba, \u03b4)\nS(t+1) = fp_solve_nonneg(T(t)T\nS(t+1), T(t+1) = remove_redundant\n\n(cid:16)\nS(t+1), T(t+1)(cid:17)\n\n, MT , \u03ba, \u03b4)T\n\n3. end for\nreturn S(t), T(t).\n\nshould ideally be tractable for large n and also give as accurate an output as possible. To this end,\nwe propose a \ufb01xed point optimization method (Algorithm 1), which has a step cost equal to that of\ngradient descent, while converging to the optimum at rates more similar to Newton\u2019s method. The\nfollowing proposition establishes the convergence of our solver.\nProposition 3.1. Let \u03b2\u2217 be the \ufb01xed point of Algorithm 1 for the problem (6), and let \u03bbmax and\ni , and let maxi (cid:107)xi(cid:107) \u2264 k. Assume that for\na subset of indeces s \u2282 {1, 2, ..., n}, \u2203\u2206s > 0 such that yi \u2212 (cid:104)xi, \u03b2\u2217(cid:105) \u2264 \u03ba \u2212 \u2206s and denote the\nmin < 2. If the initial\npoint \u03b20 is close to the true minimizer, i.e., (cid:107)\u03b20 \u2212 \u03b2\u2217(cid:107)2 \u2264 k/\u2206s, then Algorithm 1 converges linearly,\n\n\u03bbmin > 0 denote the extreme eigenvalues of(cid:80)n\nextreme eigenvalues of(cid:80)\n\ni by \u03b3max and \u03b3min > 0 satisfying \u03bbmax\u03b3max/\u03bb2\n\ni=1 xixT\n\ni\u2208s xixT\n\n(cid:19)t(cid:2)f\u03ba(\u03b20) \u2212 f\u03ba(\u03b2\u2217)(cid:3) .\n\n(7)\n\n(cid:18)\n\nf\u03ba(\u03b2t) \u2212 f\u03ba(\u03b2\u2217) \u2264\n\n1 \u2212 2\n\n\u03b3min\n\u03bbmax\n\n+\n\n\u03b3max\u03b3min\n\n\u03bb2\n\nmin\n\nA proof for Proposition 3.1 is given in the supplementary material.\nOur solver is second order in nature3, hence its convergence behavior should be close to that of\nNewton\u2019s method. However, there is one caveat: the second derivative of the one-sided Huber loss is\nnot continuous. Therefore, one cannot expect to achieve a quadratic rate of convergence; this issue is\ncommonly encountered in M-estimation. Nevertheless, Algorithm 1 converges very fast in practice.\nWe compare our solver to Newton\u2019s method and gradient descent by simulating a regression setting\nwhere we synthesize a 100 x 100 movie frame (Y) with 100 neurons (see Section 5 for details). Then,\ngiven the ground truth cell images (X), we optimize for the \ufb02uorescence traces for the single frame\n(\u03b2) using the three algorithms. For our \ufb01xed-point solver, we use \u03ba = 1. For gradient descent, we set\nthe step size to the reciprocal of the largest eigenvalue of the hessian (while not taking into account\nthe time taken to compute it). Results are shown in Figure 2. Our solver has close convergence\nbehavior to that of Newton\u2019s method, while taking much less time to achieve the same accuracy due\nto its small per-step cost. We would like to also note that estimating the entire matrix of \ufb02uorescence\ntraces (or cell images) does not require any modi\ufb01cation of Algorithm 1; hence, in practice estimating\nentire matrices of components at once does not cause much computational burden. For Newton\u2019s\nmethod, every frame (or every pixel) requires a separate hessian; runtime in this case scales at least\nlinearly.\n4 Robust Automated Cell Extraction\nWe now introduce our proposed method for automated cell extraction via robust estimation. Our\nmethod is based on a matrix factorization framework, where we model the imaging data as the matrix\nproduct of a spatial and a temporal matrix with additive noise:\n\nM = ST + \u03a3.\n\nIn above, M \u2208 RdS\u00d7dT is the movie matrix, S \u2208 RdS\u00d7m\nare the nonnegative\nspatial and temporal matrices, respectively. \u03a3 \u2208 RdS\u00d7dT is meant to model the normal noise\ncorrupted with non-negative contamination, and \u03a3ij has the same distribution with \u03c3 in (2) (up to\nthe noise standard deviation). Our main contribution in this work is that we offer a method which\nestimates S and T using the one-sided Huber estimator, which provides the optimal robustness against\nthe non-negative contamination inherent in calcium imaging, as discussed in Section 2.\n\nand T \u2208 Rm\u00d7dT\n\n+\n\n+\n\n3Interested reader is referred to the supplementary material for a more rigorous argument.\n\n5\n\n\fFigure 2: Our \ufb01xed point solver converges to the optimum with similar rates with Newton\u2019s method, while\nbeing more computationally ef\ufb01cient. (a) Optimality gap versus absolute time. (b) Optimality gap versus number\nof iterations. Fixed point solver achieves the same accuracy with a notably faster speed compared to Newton\u2019s\nmethod and gradient descent.\n\nOur cell extraction algorithm starts by computing initial estimates for the matrices S and T. This is\ndone by (1) detecting a cell peak from the time maximum of the movie one cell at a time (2) solving\nfor the current cell\u2019s spatial and the temporal components using the one-sided Huber estimator (3)\nrepeating until a stopping criterion is reached. We detail this step in the supplementary material.\nAfter initial guesses for S and T are computed, the main update algorithm proceeds in a straightfor-\nward manner, where multiple alternating robust regression steps are performed using the one-sided\nHuber loss. At each step, new estimates of S and T are computed based on M and the current esti-\nmate of the other matrix. For computing the estimates, we use the fast \ufb01xed-point algorithm derived\nin Section 3. However, since we constrain S and T to be nonnegative matrices, the \ufb01xed-point solver\ncannot be used without constraints that enforce non-negativity. To this end, we combine our solver\nwith the alternating directions method of multipliers(ADMM), a dual ascent method which solves for\nmultiple objectives by consensus. We call the combined solver fp_solve_nonneg(). Note that,\ndue to the symmetry between the two alternating steps, we use the same solver for computing both S\nand T.\nWe do minimal post-processing at the end of each step to remove redundant components. Speci\ufb01cally,\nwe identify and remove near duplicate components in S or T, and we then eliminate components\nwhich have converged to zero. We repeat these steps alternatingly for a desired number of steps N.\nSelection of \u03ba depends on the positive contamination level; nevertheless, we have observed that\nprecise tuning of \u03ba is not necessary in practice. A range of [0.5, 1] times the standard deviation of\nthe normally distributed noise is reasonable for \u03ba for most practices. One should note, however,\nthat although the robust estimator has favorable mis-speci\ufb01cation bias, it might become signi\ufb01cant\nunder crucially low SNR conditions. For instance, setting a small \u03ba in such cases will likely lead\nto detrimental under-estimation. On the other hand, setting high \u03ba values decreases the estimator\nrobustness ( this makes the loss function approach the (cid:96)2 loss). Consequently, the advantage of robust\nestimation is expected to diminish in extremely low SNR regimes.\nOur algorithm has a highly favorable runtime in practice owing to the simplicity of its form. Fur-\nthermore, since the solver we use relies on basic matrix operations, we were able to produce a GPU\nimplementation, allowing for further reduction in runtime. Comparison of our GPU implementation\nto other algorithms in their canonical forms naturally causes bias; therefore, we defer our runtime\ncomparison results to the supplementary material.\nFrom here on, we shall call our algorithm EXTRACT.\n\n5 Experiments\n\nIn this section, we perform experiments on both simulated and real data in order to establish the\nimproved signal accuracy obtained using EXTRACT. We represent the signal accuracy with two\nquantities: (1) signal \ufb01delity, which measures how closely a temporal (\ufb02uorescence trace) or spatial\n(cell image) signal matches its underlying ground truth, and (2) signal crosstalk, which quanti\ufb01es\ninterference from other sources, or noise. We primarily focus on temporal signals since they typically\n\n6\n\n01020304050iterationOptimality gapfixed pointnewtongradient descent10101010101010-12-10-8-6-4-20b00.050.10.15time (sec)10101010101010-12-10-8-6-4-20Optimality gapfixed pointnewtongradient descenta\fFigure 3: Performance comparison of EXTRACT vs. CNMF for movies with overlapping image sources. (a)\nExamples where a captured cell (circled in white) is overlapping with non-captured neighbors (circled in red).\nGround truth traces are shown in black. EXTRACT \ufb01nds images and traces that match closely with the ground\ntruth, where CNMF admits notable crosstalk from neighbors both in its found cell images and traces.(b) An\nexample maximum projection of an imaging movie in time. (c) An example ROC curve for X=0.4, computed by\nvarying event detection threshold and averaging TPR and FPR over single cells for each threshold. (d) Mean\narea under the ROC curve computed over 20 experiments for each initial fraction of true cells, X, and each\niteration. EXTRACT consistently outperforms CNMF, with the performance lead becoming signi\ufb01cant for lower\nX. Error bars are 1 s.e.m.\n\nrepresent the entirety of the calcium movie for the steps subsequent to cell extraction. As opposed\nto using simple correlation based metrics, we compute true and false positive detection rates based\non estimated calcium events found via simple amplitude thresholding. We then present receiver\noperating characteristics (ROC) based metrics. We compare EXTRACT to the two dominantly used\ncell extraction methods: CNMF [15], and spatio-temporal ICA [13], the latter of which we will\nsimply refer to as ICA. Both methods are matrix factorization methods like EXTRACT; CNMF\nestimates its temporal and spatial matrices alternatingly, and jointly estimates traces and its underlying\ncalcium event peaks, and ICA \ufb01nds a single unmixing matrix which is then applied to the singular\nvalue decomposition (SVD) of the movie to jointly obtain traces and images. CNMF uses quadratic\nreconstruction loss with (cid:96)1 penalty, whereas ICA uses a linear combination of movie data guided\nby high order pixel statistics for reconstruction; hence they both can be considered as non-robust\nestimation techniques.\nSimulated data. For simulated movies, we use a \ufb01eld of view of size 50 by 50 pixels, and produce\ndata with 1000 time frames. We simulate 30 neurons with gaussian shaped images with standard\ndeviations drawn from [3, 4.8] uniformly. We simulate the \ufb02uorescence traces using a Poisson process\nwith rate 0.01 convolved with an exponential kernel with a time constant of 10 frames. We corrupt the\nmovie with independent and normally distributed noise whose power is matched to the power of the\nneural activity so that average pixel-wise SNR in cell regions is 1. We have re-run our experiments\nwith different SNR levels in order to establish the independence of our key results from noise level;\nwe report them in the supplementary material.\n\n5.1 Crosstalk between cells for robust vs. non-robust methods\n\nAs a \ufb01rst experiment, we demonstrate consequences of a common phenomenon, namely cells with\noverlapping spatial weights. Overlapping cells do not pose a signi\ufb01cant problem when their spatial\ncomponents are correctly estimated; however, in reality, estimated images typically do not perfectly\nmatch their underlying excitation, or some overlapping cells might not even be captured by the\nextraction algorithm. In the latter two cases, crosstalk becomes a major issue, causing captured cells\nto carry false calcium activity in their \ufb02uorescence traces.\nWe try to reproduce the aforementioned scenarios by simulating movies, and initializing the algorithms\nof interest with a fraction of the ground truth cells. Our aim is to set up a controlled environment to\n(1) quantitatively investigate the crosstalk in the captured cell traces due to missing cells, (2) observe\nthe effect of alternating estimation on the \ufb01nal accuracy of estimates. In this case, the outputs of\n\n7\n\n0.20.40.60.81false positive rate0.20.40.60.81true positive rateEXTRACT, AUC = 0.99CNMF, AUC = 0.92aExample cases of cells with non-captured neighborsROC curve by varying event detection thresholdMean area under the ROC curve for when initialized with X fraction of true cellsbcdExample maximum projection imageiter 1iter 2iter 3iter 1iter 2iter 3iter 1iter 2iter 3iter 1iter 2iter 3iter 1iter 2iter 3iter 1iter 2iter 3iter 1iter 2iter 3iter 1iter 2iter 30.80.850.90.951X=0.8X=0.6X=0.4X=0.2EXTRACTCNMF\fFigure 4: EXTRACT outperforms other algorithms in the existence of neuropil contamination. (a) Example\ntraces from algorithm outputs overlaid on the ground truth traces. EXTRACT produces traces closest to the\nground truth, admitting signi\ufb01cantly less crosstalk compared to others. (b) An example ROC curve for an\ninstance with neuropil. (c) Mean area under the curve computed over 15 experiments, and separately for with\nand without neuropil. EXTRACT shows better performance, and its performance is the most robust against\nneuropil contamination. (d) Average cell \ufb01nding statistics over 15 experiments, computed separately for with\nand without neuropil. EXTRACT achieves better competitive performance especially when there is neuropil\ncontamination.\n\nalternating estimation algorithms should deteriorate through the iteration loop since they estimate\ntheir components based on imperfect estimates of each other. We select EXTRACT and CNMF for\nthis experiment since they are both alternating estimation algorithms.\nWe initialize the algorithms with 4 different fractions of ground truth cells: X = {0.2, 0.4, 0.6, 0.8}.\nWe carry out 20 experiments for each X, and we perform a 3 alternating estimation iterations for\neach algorithm. This number was chosen with the consideration that CNMF canonically performs\n2 iterations on its initialized components. We report results for 6 iterations in the supplementary\nmaterial. At the end of each iteration, we detect calcium events from the algorithms\u2019 \ufb02uorescence\ntraces, and match them with the ground truth spikes to compute event true positive rate (TPR) and\nevent false positive rate (FPR).\nFigure 3 summarizes the results of this experiment. At the end of the 3 iterations, EXTRACT\nproduces images and traces that are visually closer to ground truth in the existence of non-captured\nneighboring cells with overlapping images (Figure 3a). Figure 3c shows the ROC curve from one\ninstance of the experiment, computed by varying the threshold amplitude for detecting calcium events,\nand plotting FPR against TPR for each threshold. We report quantitative performance by the area\nunder the ROC curve (AUC). We average the AUCs over all the experiments performed for each\ncondition, and report it separately for each iteration in Figure 3d. EXTRACT outperforms CNMF\nuniformly, and the performance gap becomes pronounced with very low fraction of initially provided\ncells. This boost in the signal accuracy over non-robust estimators (e.g. ones with quadratic penalty)\nstands to validate our proposed robust estimator and its underlying model assumptions.\n\n5.2 Cell extraction with neuropil contamination\n\nIn most calcium imaging datasets, data is contaminated with non-cellular calcium activity caused by\nneuropil. This may interfere with cell extraction by contaminating the cell traces, and by making it\ndif\ufb01cult to accurately locate spatial components of cells. We study the effect of such contamination\nby simulating neural data and combining it with neuropil activity extracted from real two-photon\nimaging datasets. For this experiment, we use EXTRACT, CNMF and ICA.\nIn order for a fair comparison, we initialize all algorithms with the same set of initial estimates. We\nchoose to use the greedy initializer of CNMF to eliminate any competitive advantage EXTRACT\nmight have due to using its native initializer. We perform 15 experiments with no neuropil, and 15\nwith added neuropil. We match the variance of the neuropil activity to that of the gaussian noise while\nkeeping SNR constant. For each experiment, we compute (1) cell trace statistics based on the ROC\ncurve as previously described, (2) cell \ufb01nding statistics based on precision, recall, and F1 metrics.\nEXTRACT produces qualitatively more accurate \ufb02uorescence traces (Figure 4a), and it outperforms\nboth CNMF and ICA quantitatively (Figure 4b,c), with the performance gap becoming more signi\ufb01-\ncant in the existence of neuropil contamination. Further, EXTRACT yields more true cells than the\nother methods with less false positives when there is neuropil (Figure 4d).\n\n8\n\nabdTRUECNMFICAEXTRACTcExample \ufb02uorescence tracesROC curve by varying event detection thresholdMean area under the ROC curveCell \ufb01nding statistics0.20.40.60.81false positive rate0.20.40.60.81true positive rateEXTRACT, AUC = 0.96CNMF, AUC = 0.91ICA, AUC = 0.880.920.960.910.950.900.90w/o neuropilRecallPrecisionF10.870.870.810.820.790.79w/neuropil0.920.940.900.860.820.800.850.90.951w/o neuropilw/neuropilEXTRACTCNMFICA\fFigure 5: EXTRACT better estimates neural signals in microendoscopic single-photon imaging data. (a) The\nmanually classi\ufb01ed \"good\" cells for all 3 algorithms overlaid on the maximum of the imaging movie in time.\nLetter N refers to the total good cell count. (b) The \ufb02uorescence traces of the 3 algorithms belonging to the same\ncell. The cell has signi\ufb01cantly low SNR compared to a neighbor cell which is also captured by all the methods.\nThe time frames with arrows pointing to them are shown with the snapshot of the cell (circled in green) and its\nsurrounding area. EXTRACT correctly assigns temporal activity to the cell of interest, while other algorithms\nregister false calcium activity from the neighboring cell.\n\n5.3 Cell extraction from microendoscopic single-photon imaging data\n\nData generated using microendoscopic single-photon calcium imaging could be quite challenging due\nto low SNR, and \ufb02uctuating background (out of focus \ufb02uorescence activity etc.). We put EXTRACT\nto test in this data regime, using an imaging dataset recorded from the dorsal CA1 region of the\nmouse hippocampus [17], an area known to have high cell density. We compare EXTRACT with\nCNMF and ICA. For this experiment, the output of each algorithm was checked by human annotators\nand cells were manually classi\ufb01ed to be true cells or false positives judging from the match of their\ntemporal signal to the activity in the movie.\nEXTRACT successfully extracts the majority of the cells apparent in the maximum image of the\nmovie in time dimension, and is able to capture highly overlapping cells (Figure 5a). EXTRACT also\naccurately estimates the temporal activity. Figure 5b shows an instance of a dim cell with a high SNR\nneighboring cell, both of which are captured by all three algorithms. While CNMF and ICA both\nfalsely show activity when the neighbor is active, EXTRACT trace seems immune to this type of\ncontamination and is silent at such instants.\n6 Conclusion\nWe presented an automated cell extraction algorithm for calcium imaging which uses a novel robust\nestimator. We arrived at our estimator by de\ufb01ning a generic data model and optimizing its worst-case\nperformance. We proposed a fast solver for our estimation problem, which allows for tractable cell\nextraction in practice. As we have demonstrated in our experiments, our cell extraction algorithm,\nEXTRACT, is a powerful competitor for the existing methods, performing well under different\nimaging modalities due to its generic nature.\n\n9\n\n0 10203040time (seconds)ICACNMFEXTRACTEXTRACTCNMFICAabN=476N=272N=329\fAcknowledgements\nWe gratefully acknowledge support from DARPA and technical assistance from Biafra Ahanonu,\nLacey Kitch, Yaniv Ziv, Elizabeth Otto and Margaret Carr.\nReferences\n[1] N. J. Apthorpe, A. J. Riordan, R. E. Aguilar, J. Homann, Y. Gu, D. W. Tank, and H. S. Seung12.\nAutomatic neuron detection in calcium imaging data using convolutional networks. arXiv\npreprint arXiv:1606.07372, 2016.\n\n[2] J. R. Collins. Robust estimation of a location parameter in the presence of asymmetry. The\n\nAnnals of Statistics, pages 68\u201385, 1976.\n\n[3] W. Denk, J. H. Strickler, W. W. Webb, et al. Two-photon laser scanning \ufb02uorescence microscopy.\n\nScience, 248(4951):73\u201376, 1990.\n\n[4] B. A. Flusberg, A. Nimmerjahn, E. D. Cocker, E. A. Mukamel, R. P. Barretto, T. H. Ko, L. D.\nBurns, J. C. Jung, and M. J. Schnitzer. High-speed, miniaturized \ufb02uorescence microscopy in\nfreely moving mice. Nature methods, 5(11):935, 2008.\n\n[5] K. K. Ghosh, L. D. Burns, E. D. Cocker, A. Nimmerjahn, Y. Ziv, A. El Gamal, and M. J.\nSchnitzer. Miniaturized integration of a \ufb02uorescence microscope. Nature methods, 8(10):871\u2013\n878, 2011.\n\n[6] F. Helmchen and W. Denk. Deep tissue two-photon microscopy. Nature methods, 2(12):932\u2013940,\n\n2005.\n\n[7] P. J. Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics,\n\n35(1):73\u2013101, 1964.\n\n[8] P. J. Huber. Robust regression: asymptotics, conjectures and monte carlo. The Annals of\n\nStatistics, pages 799\u2013821, 1973.\n\n[9] L. A. Jaeckel. Robust estimates of location: Symmetry and asymmetric contamination. The\n\nAnnals of Mathematical Statistics, pages 1020\u20131034, 1971.\n\n[10] P. Kaifosh, J. D. Zaremba, N. B. Danielson, and A. Losonczy. Sima: Python software for\n\nanalysis of dynamic \ufb02uorescence imaging data. Frontiers in neuroinformatics, 8:80, 2014.\n\n[11] P. Kokic and P. Bell. Optimal winsorizing cutoffs for a strati\ufb01ed \ufb01nite population estimator.\n\nJournal of Of\ufb01cial Statistics, 10(4):419, 1994.\n\n[12] R. D. Martin and R. H. Zamar. Ef\ufb01ciency-constrained bias-robust estimation of location. The\n\nAnnals of Statistics, pages 338\u2013354, 1993.\n\n[13] E. A. Mukamel, A. Nimmerjahn, and M. J. Schnitzer. Automated analysis of cellular signals\n\nfrom large-scale calcium imaging data. Neuron, 63(6):747\u2013760, 2009.\n\n[14] M. Pachitariu, A. M. Packer, N. Pettit, H. Dalgleish, M. Hausser, and M. Sahani. Extracting\nregions of interest from biological images with convolutional sparse block coding. In Advances\nin Neural Information Processing Systems, pages 1745\u20131753, 2013.\n\n[15] E. A. Pnevmatikakis, D. Soudry, Y. Gao, T. A. Machado, J. Merel, D. Pfau, T. Reardon, Y. Mu,\nC. Lace\ufb01eld, W. Yang, et al. Simultaneous denoising, deconvolution, and demixing of calcium\nimaging data. Neuron, 89(2):285\u2013299, 2016.\n\n[16] P. Zhou, S. L. Resendez, G. D. Stuber, R. E. Kass, and L. Paninski. Ef\ufb01cient and accu-\nrate extraction of in vivo calcium signals from microendoscopic video data. arXiv preprint\narXiv:1605.07266, 2016.\n\n[17] Y. Ziv, L. D. Burns, E. D. Cocker, E. O. Hamel, K. K. Ghosh, L. J. Kitch, A. El Gamal, and\nM. J. Schnitzer. Long-term dynamics of ca1 hippocampal place codes. Nature neuroscience,\n16(3):264\u2013266, 2013.\n\n10\n\n\f", "award": [], "sourceid": 1663, "authors": [{"given_name": "Hakan", "family_name": "Inan", "institution": "Stanford University"}, {"given_name": "Murat", "family_name": "Erdogdu", "institution": "Microsoft Research"}, {"given_name": "Mark", "family_name": "Schnitzer", "institution": "Stanford University"}]}