{"title": "Sparse convolutional coding for neuronal assembly detection", "book": "Advances in Neural Information Processing Systems", "page_first": 3675, "page_last": 3685, "abstract": "Cell assemblies, originally proposed by Donald Hebb (1949), are subsets of neurons firing in a temporally coordinated way that gives rise to repeated motifs supposed to underly neural representations and information processing. Although Hebb's original proposal dates back many decades, the detection of assemblies and their role in coding is still an open and current research topic, partly because simultaneous recordings from large populations of neurons became feasible only relatively recently. Most current and easy-to-apply computational techniques focus on the identification of strictly synchronously spiking neurons. In this paper we propose a new algorithm, based on sparse convolutional coding, for detecting recurrent motifs of arbitrary structure up to a given length. Testing of our algorithm on synthetically generated datasets shows that it outperforms established methods and accurately identifies the temporal structure of embedded assemblies, even when these contain overlapping neurons or when strong background noise is present. Moreover, exploratory analysis of experimental datasets from hippocampal slices and cortical neuron cultures have provided promising results.", "full_text": "Sparse convolutional coding for neuronal assembly\n\ndetection\n\nSven Peter1,\u2217\n\nElke Kirschbaum1,\u2217\n\n{sven.peter,elke.kirschbaum}@iwr.uni-heidelberg.de\n\nMartin Both2\n\nmboth@physiologie.uni-heidelberg.de\n\nLee A. Campbell3\n\nlee.campbell@nih.gov\n\nBrandon K. Harvey3\n\nbharvey@mail.nih.gov\n\nConor Heins3,4,\u2020\n\nconor.heins@ds.mpg.de\n\nDaniel Durstewitz5\n\ndaniel.durstewitz@zi-mannheim.de\n\nFerran Diego Andilla6,\u2021\n\nferran.diegoandilla@de.bosch.com\n\nFred A. Hamprecht1\n\nfred.hamprecht@iwr.uni-heidelberg.de\n\n1Interdisciplinary Center for Scienti\ufb01c Computing (IWR), Heidelberg, Germany\n\n2Institute of Physiology and Pathophysiology, Heidelberg, Germany\n\n3National Institute on Drug Abuse, Baltimore, USA\n\n4Max Planck Institute for Dynamics and Self-Organization, G\u00f6ttingen, Germany\n\n5Dept. Theoretical Neuroscience, Central Institute of Mental Health, Mannheim, Germany\n\n6Robert Bosch GmbH, Hildesheim, Germany\n\nAbstract\n\nCell assemblies, originally proposed by Donald Hebb (1949), are subsets of neurons\n\ufb01ring in a temporally coordinated way that gives rise to repeated motifs supposed\nto underly neural representations and information processing. Although Hebb\u2019s\noriginal proposal dates back many decades, the detection of assemblies and their\nrole in coding is still an open and current research topic, partly because simultane-\nous recordings from large populations of neurons became feasible only relatively\nrecently. Most current and easy-to-apply computational techniques focus on the\nidenti\ufb01cation of strictly synchronously spiking neurons. In this paper we propose\na new algorithm, based on sparse convolutional coding, for detecting recurrent\nmotifs of arbitrary structure up to a given length. Testing of our algorithm on\nsynthetically generated datasets shows that it outperforms established methods and\naccurately identi\ufb01es the temporal structure of embedded assemblies, even when\nthese contain overlapping neurons or when strong background noise is present.\nMoreover, exploratory analysis of experimental datasets from hippocampal slices\nand cortical neuron cultures have provided promising results.\n\n\u2217Both authors contributed equally.\n\u2020Majority of this work was done while co-author was at 3.\n\u2021Majority of this work was done while co-author was at 1.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\f(b) Syn\ufb01re chain\n\n(a) Synchronously \ufb01ring neu-\nrons\nFigure 1: Temporal motifs in neuronal spike trains. All three illustrations show the activity of four\ndifferent neurons over time. The spikes highlighted in red are part of a repeating motif. In (a) the\nmotif is de\ufb01ned by the synchronous activity of all neurons, while the syn\ufb01re chain in (b) exhibits\nsequential spiking patterns. (c) shows a more complex motif with non-sequential temporal structure.\n(Figure adapted from [23].)\n\n(c) Temporal motif\n\n1\n\nIntroduction\n\nThe concept of a cell assembly (or cortical motif or neuronal ensemble) was originally introduced by\nDonald Hebb [1] and denotes subsets of neurons that by \ufb01ring coherently represent mental objects\nand form the building blocks of cortical information processing. Numerous experimental studies\nwithin the past 30 years have attempted to address the neural assembly hypothesis from various\nangles in different brain areas and species, but the concept remains debated, and recent massively\nparallel single-unit recording techniques have opened up new opportunities for studying the role of\nspatio-temporal coordination in the nervous system [2\u201312].\nA number of methods have been proposed to identify motifs in neuronal spike train data, but most\nof them are only designed for strictly synchronously \ufb01ring neurons (see \ufb01gure 1a), i.e. with zero\nphase-lag [13\u201317], or strictly sequential patterns as in syn\ufb01re chains [18\u201321] (see \ufb01gure 1b). However,\nsome experimental studies have suggested that cortical spiking activity may harbor motifs with more\ncomplex structure [5, 22] (see \ufb01gure 1c). Only quite recently statistical algorithms were introduced\nthat can ef\ufb01ciently deal with arbitrary lag constellations among the units participating in an assembly\n[23], but the identi\ufb01cation and validation of motifs with complex temporal structure remains an area\nof current research interest.\nIn this paper we present a novel approach to identify motifs with any of the temporal structures shown\nin \ufb01gure 1 in a completely unsupervised manner. Based on the idea of convolutive Non-Negative\nMatrix Factorization (NMF) [24, 25] our algorithm reconstructs the neuronal spike matrix as a\nconvolution of motifs and their activation time points. In contrast to convolutive NMF, we introduce\nan (cid:96)0 and (cid:96)1 prior on the motif activation and appearance, respectively, instead of a single (cid:96)1 penalty.\nThis (cid:96)0 regularization enforces more sparsity in the temporal domain; thus performing better in\nextracting motifs from neuronal spike data by reducing false positive activations. Adding the (cid:96)0\nand (cid:96)1 penalty terms requires a novel optimization scheme. This replaces the multiplicative update\nrules by a combination of discrete and continuous optimizations, which are matching pursuit and\nLASSO regression. Additionally we added a sorting and non-parametric threshold estimation method\nto distinguish between real and spurious results of the optimization problem. We benchmark our\napproach on synthetic data against Principal Component Analysis (PCA) and Independent Component\nAnalysis (ICA) as the most widely used methods for motif detection, and against convolutive NMF\nas the method most closely related to the proposed approach. Our algorithm outperforms the other\nmethods especially when identifying long motifs with complex temporal structure. We close with\nresults of our approach on two real-world datasets from hippocampal slices and cortical neuron\ncultures.\n\n2 Related work\n\nPCA is one of the simplest methods that has been used for a long time to track cell motifs [26]. Its\nbiggest limitations are that different assembly patterns can easily be merged into a single \u2019large\u2019\ncomponent, and that neurons shared between motifs are assigned lower weights than they should\nhave. Moreover, recovering individual neurons which belong to a single assembly is not reliably\npossible [27, 17], and the detected assemblies are not very robust to noise and rate \ufb02uctuations [23].\nICA with its assumption of non-Gaussian and statistically independent subcomponents [28] is able\nto recover individual neuron-assembly membership, and neurons belonging to multiple motifs are\n\n2\n\n\fY\n\n=\n\n=\n\n+\n\nnoise\n\n+\n\n(cid:126)\n\n(cid:126)\n\na1\n\na2\n\ns1\n\ns2\n\nFigure 2: Sketch of convolutional coding. In this example the raw data matrix Y is described by a\nmatrix which is an additive mixture of two motifs a1 (cyan) and a2 (salmon) convolved with their\nactivities s1 and s2, respectively, plus background noise.\n\nalso correctly identi\ufb01ed [17]. ICA provides a better estimate for synchronous motifs than PCA [17],\nbut motifs with more complicated temporal structure are not (directly) accommodated within this\nframework. An overview of PCA and ICA for identifying motifs is provided in [17].\nMore sophisticated statistical approaches have been developed, like unitary event analysis [13, 14], for\ndetecting coincident, joint spike events across multiple cells. More advanced methods and statistical\ntests were also designed for detecting higher-order correlations among neurons [15, 16], as well\nas syn\ufb01re chains [20]. However, none of these techniques is designed to detect more complex,\nnon-synchronous, non-sequential temporal structure. Only quite recently more elaborate statistical\nschemes for capturing assemblies with arbitrary temporal structure, and also for dealing with issues\nlike non-stationarity and different time scales, were advanced [23]. The latter method works by\nrecursively merging sets of units into larger groups based on their joint spike count probabilities\nevaluated across multiple different time lags. The method proposed in this paper, in contrast,\napproaches the detection of complex assemblies in a very different manner, attempting to detect\ncomplex patterns as a whole.\nNMF techniques have been widely applied to recover spike trains from calcium \ufb02uorescence record-\nings [29\u201335]. Building on these schemes, NMF has been used to decompose a binned spike matrix\ninto multiple levels of synchronous patterns which describe a hierarchical structuring of the motifs\n[36]. But these previous applications of NMF considered only neurons \ufb01ring strictly synchronously.\nIn audio processing, convolutive NMF has been successfully used to detect motifs with temporal\nstructure [24, 25, 37]. However, as we will show later, the constraints used in audio processing are\ntoo weak to extract motifs from neuronal spike data. For this reason we propose a novel optimization\napproach using sparsity constraints adapted to neuronal spike data.\n\n3 Sparse convolutional coding\n\nWe formulate the identi\ufb01cation of motifs with any of the temporal structures displayed in \ufb01gure 1 as\na convolutional matrix decomposition into motifs and their activity in time, based on the idea behind\nconvolutive NMF [24, 25], and combined with the sparsity constraints used in [34]. We use a novel\noptimization approach and minimize the reconstruction error while taking into account the sparsity\nconstraints for both motifs and their activation time points.\nLet Y \u2208 Rn\u00d7m\nbe a matrix whose n rows represent individual neurons with their spiking activity\nbinned to m columns. We assume that this raw signal is an additive mixture of l motifs ai \u2208 Rn\u00d7\u03c4\nwith temporal length \u03c4, convolved with a sparse activity signal si \u2208 R1\u00d7m\nplus noise (see \ufb01gure 2).\nWe address the unsupervised problem of simultaneously estimating both the coef\ufb01cients making up\nthe motifs ai and their activities si. To this end, we propose to solve the optimization problem\n\n+\n\n+\n\n+\n\nsi (cid:126) ai\n\n(cid:107)si(cid:107)0 + \u03b2\n\n(cid:107)ai(cid:107)1\n\n(1)\n\ni=1\n\ni=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)Y \u2212 l(cid:88)\n\ni=1\n\nmin\na,s\n\nl(cid:88)\n\nl(cid:88)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)2\n\nF\n\n+ \u03b1\n\n3\n\n\f\u03c4(cid:88)\n\nwith \u03b1 and \u03b2 controlling the regularization strength of the (cid:96)0 norm of the activations and the (cid:96)1 norm\nof the motifs, respectively. The convolution operator (cid:126) is de\ufb01ned by\n\nsi (cid:126) ai =\n\nai,j \u00b7 S(j \u2212 1)si\n\n(2)\n\nj=1\n\nwith ai,j being the jth column of ai. The column shift operator S(j) moves a matrix j places to\nthe right while keeping the same size and \ufb01lling missing values appropriately with zeros [24]. The\nproduct on the right-hand side is an outer product.\nIn [25] the activity of the learned motifs is regularized only with a (cid:96)1 prior which is too weak to\nrecover motifs in neuronal spike trains. Instead we choose the (cid:96)0 prior for si since it has been\nsuccessfully used to learn spike trains of neurons [34]. For the motifs themselves a (cid:96)1 prior is used to\nenforce only few non-zero coef\ufb01cients while still allowing exact optimization [38].\n\n3.1 Optimization\n\nThis problem is non-convex in general but can be approached by initializing the activities si randomly\nand using a block coordinate descent strategy [39, Section 2.7] to alternatingly optimize for the two\nvariables.\nWhen keeping the activations si \ufb01xed, the motif coef\ufb01cients ai are learned using LASSO regression\nwith non-negativity constraints [40] by transforming the convolution with si to a linear set of equations\nby using modi\ufb01ed Toeplitz matrices \u02dcsi \u2208 Rmn\u00d7n\u03c4 which are then stacked column-wise [41, 38]:\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)vec(Y)\n\n(cid:124) (cid:123)(cid:122) (cid:125)\n\nb\u2208Rmn\n\nmin\n\na\n\n\u2212 [\u02dcs1\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n... \u02dcsl]\nA\u2208Rmn\u00d7ln\u03c4\n\n(cid:34)vec(a1)\n(cid:124)\n(cid:123)(cid:122)\n\nvec(al)\nx\u2208Rln\u03c4\n\n...\n\n(cid:125)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)\n\n(cid:35)\n(cid:125)\n\n2\n\n2\n\nl(cid:88)\n\ni=1\n\n+ \u03b2\n\n(cid:107)ai(cid:107)1\n\n(3)\n\nThe matrices \u02dcsi are constructed from the si with \u02dcsi,j,k = \u02dcsi,j+1,k+1 = si,j\u2212k for j \u2265 k and \u02dcsi,j,k = 0\nfor j < k and \u02dcsi,j,k = 0 for j > p \u00b7 m and k < p \u00b7 \u03c4 for p = 1, . . . , n (where i denotes the ith matrix\nwith element indices j and k).\nWhen keeping the currently found motifs ai \ufb01xed, their activation in time is learned using a con-\nvolutional matching pursuit algorithm [42\u201344] to approximate the (cid:96)0 norm. The greedy algorithm\niteratively includes an assembly appearance that most reduces the reconstruction error.All details of\nthe algorithm are outlined in the supplementary material for this paper.\n\n3.2 Motif sorting and non-parametric threshold estimation\n\nThe list of identi\ufb01ed motifs is expected to also contain false positives which do not appear repeatedly\nin the data. The main non-biological reason for this is that our algorithm only \ufb01nds local minima of\nthe optimization problem given by equation (1). Experiments on various synthetic datasets showed\nthat motifs present at the global optimum should always have the same appearance, independent of\nthe random initialization of the activities. The false positives which are only present in particular\nlocal minima, however, look differently every time the initialization is changed. We therefore propose\nto run our algorithm multiple times on the same data with the same parameter settings but with\ndifferent random initializations, and use the following sorting and non-parametric threshold estimation\nalgorithm in order to distinguish between true (reproducible) and spurious motifs. The following is\nonly a brief description. More details are given in the supplementary material.\nIn the \ufb01rst step, the motifs found in each run are sorted using pairwise matching. The sorting is\nnecessary because the order of the motifs after learning is arbitrary and it has to be assured that the\nmotifs with the smallest difference between different runs are compared. Sorting the sets of motifs\nfrom all runs at the same time is an NP hard multidimensional assignment problem [45]. Therefore, a\ngreedy algorithm is used instead. It starts by sorting the two sets of motifs with the lowest assignment\ncost. Thereafter, the remaining sets of motifs are sorted one by one according to the order of motifs\ngiven by the already sorted sets.\nInspired by permutation tests, we estimate a threshold T by creating a shuf\ufb02ed spike matrix to\ndetermine which motifs are only spurious. In the shuf\ufb02ed matrix all temporal correlations between\n\n4\n\n\fand within neurons have been destroyed. Hence, there are no real motifs in the shuf\ufb02ed matrix and\nthe motifs learned from this matrix will likely be different with each new initialization. We take the\nminimal difference of any two motifs from different runs of the algorithm on the shuf\ufb02ed matrix as\nthe threshold. We assume that motifs that show a difference between different runs larger than this\nthreshold are spurious and discard them.\n\n3.3 Parameter selection\n\nThe sparse convolutional coding algorithm has only three parameters that have to be speci\ufb01ed by\nthe user: the maximal number of assemblies, the maximal temporal length of a motif, and the\npenalty \u03b2 on the (cid:96)1 norm of the motifs. The number of assemblies to be learned can be set to a\ngenerous upper limit since the sorting method assures that only the true motifs remain while all false\npositives are deleted. The temporal length of a motif can also be set to a generous upper bound.\nTo \ufb01nd an adequate (cid:96)1 penalty for the assemblies, different values need to be tested, and it should\nbe set to a value where neither the motifs are completely empty nor all neurons are active over the\nwhole possible length of the motifs. In the tested cases the appearance of the found motifs did not\nchange drastically while varying the (cid:96)1 penalty within one order of magnitude, so \ufb01ne-tuning it is not\nnecessary. Instead of specifying the penalty \u03b1 on the (cid:96)0 norm of the activations directly, we chose to\nstop the matching pursuit algorithm when adding an additional assembly appearance increases the\nreconstruction error or when the difference of reconstruction errors from two consecutive steps falls\nbelow a small threshold.\nAll code for\nSparse-convolutional-coding-for-neuronal-assembly-detection\n\nthe proposed method is available at:\n\nhttps://github.com/sccfnad/\n\n4 Results\n\n4.1 Synthetic data\n\nSince ground truth datasets are not available, we have simulated different synthetic datasets to\nestablish the accuracy of the proposed method, and compare it to existing work.\nFor PCA and ICA based methods the number of motifs is estimated using the Marchenko-Pastur\neigenvalue distribution [17]. The sparsity parameter in the sparse convolutive NMF (scNMF) that\nresulted in the best performance was chosen empirically [25].\nAn illustrative example dataset with twenty neurons, one hundred spurious spikes per neuron and\nthree temporal motifs can be seen in \ufb01gure 3. Consecutive activation times between motifs were\nmodeled as Poisson renewal processes with a mean inter-event-distance of twenty frames. When\nrunning our method from two different random initial states to identify a total of \ufb01ve motifs, all three\noriginal motifs were among those extracted from the data (\ufb01gure 3c and 3d; the motifs have been\nsorted manually to match up with the ground truth; all parameters for the analysis can be found in\ntable 1). While the two spurious motifs change depending on the random initialization, the three true\nmotifs consistently show up in the search results. Neither PCA, ICA nor scNMF were able to extract\nthe true motifs (see \ufb01gures 3e, 3f and 3g).\nFor further analysis, various datasets consisting of \ufb01fty neurons observed over one thousand time\nframes were created. Details on the generation of these datasets can be found in the supplementary\nmaterial. For each of the different motif lenghts \u03c4 = 1, 7 and 21 frames, twenty different datasets\nwere created, with different noise levels and numbers of neurons shared between assemblies.\nTo compare the performance of different methods, we use the functional association between neurons\nas an indicator [27, 46, 12]. For this a neuron association matrix (NAM) is calculated from the\nlearned motifs. The NAM contains for each pair of neurons a 1 if the two neurons belong to the same\nassembly and a 0 otherwise. The tested methods, however, do not make binary statements about\nwhether a neuron belongs to an assembly, but provide only the information to what degree the neuron\nwas associated with an assembly. We apply multiple thresholds to binarize the output of the tested\nmethods and compute true positive rate and false positive rate between the ground truth NAM and the\nbinarized NAM, leading to the ROC curves shown in \ufb01gure 4. We chose this method since it works\nwithout limitations for synchronous motifs and also allows for comparisons for the more complex\ncases.\n\n5\n\n\f(a) Spike matrix\n\n(b) Ground truth motifs\n\n(c) Learned motifs (proposed\nmethod, \ufb01rst trial)\n\n(d) Learned motifs (proposed\nmethod, second trial)\n\n(e) Learned component (PCA)\nFigure 3: Results on a synthetic dataset. (a) shows a synthetic spike matrix. (b) shows the three motifs\npresent in the data. By running our algorithm with two different random initial states the motifs seen\nin (c) and (d) are learned. (e), (f) and (g) show the results from PCA, ICA and scNMF, respectively.\n\n(f) Learned component (ICA)\n\n(g) Learned motifs (scNMF)\n\n(a) \u03c4 = 1\n\n(b) \u03c4 = 7\n\n(c) \u03c4 = 21\n\nFigure 4: ROC curves of different methods on synthetic data for different temporal motif lengths. We\nshow the mean ROC curve and its standard deviation averaged over all trials on different synthetic\ndatasets. All methods were run ten times on each dataset with different random initializations.\n\nIn the synchronous case (i.e. \u03c4 = 1, \ufb01gure 4a) our proposed method performs as good as the best\ncompetitor. As expected PCA performance shows a huge variance since some of the datasets contain\nneurons shared between multiple motifs and since extracting actual neuron-assembly assignments\nis not always possible [27, 17]. When temporal structure is introduced we are still able to identify\nassociations between neurons with very high accuracy. For short temporal motifs (\u03c4 = 7, \ufb01gure 4b)\nscNMF is able to identify associations, but only our method was able to accurately recover most\nassociations in long motifs (\u03c4 = 21, \ufb01gure 4c).\n\n6\n\n0100200300400500frame1234567891011121314151617181920neuron135791113frame1234567891011121314151617181920neuronmotif 1135791113frame1234567891011121314151617181920motif 2135791113frame1234567891011121314151617181920motif 30.00.20.40.60.81.013579111315frame1234567891011121314151617181920neuronmotif 113579111315frame1234567891011121314151617181920motif 213579111315frame1234567891011121314151617181920motif 313579111315frame1234567891011121314151617181920motif 413579111315frame1234567891011121314151617181920motif 50.00.10.20.30.40.50.60.713579111315frame1234567891011121314151617181920neuronmotif 113579111315frame1234567891011121314151617181920motif 213579111315frame1234567891011121314151617181920motif 313579111315frame1234567891011121314151617181920motif 413579111315frame1234567891011121314151617181920motif 50.00.10.20.30.40.50.60.70.81234567891011121314151617181920neuronmotif 1\u22120.4\u22120.20.00.20.41234567891011121314151617181920neuronmotif 1\u22120.6\u22120.4\u22120.20.00.20.40.613579111315frame1234567891011121314151617181920neuronmotif 113579111315frame1234567891011121314151617181920motif 213579111315frame1234567891011121314151617181920motif 313579111315frame1234567891011121314151617181920motif 413579111315frame1234567891011121314151617181920motif 50.000.050.100.150.200.250.300.350.400.00.20.40.60.81.0False positive rate0.00.20.40.60.81.0True positive ratePCAICAour methodscNMF0.00.20.40.60.81.0False positive rate0.00.20.40.60.81.0True positive ratePCAICAour methodscNMF0.00.20.40.60.81.0False positive rate0.00.20.40.60.81.0True positive ratePCAICAour methodscNMF\fTable 1: Experimental parameters. We show the used maximal number of assemblies, maximal motif\nlength in frames, (cid:96)1 penalty value \u03b2, and number of runs of the algorithm with different initializations\nfor the performed experiments on synthetic and real datasets. We also display the estimated threshold\nT used for distinguishing between real and spurious motifs.\n\nExperiment\nsynthetic example data\nhippocampal CA1 region\ncortical neuron culture\n\n#motifs motif length in frames\n\n5\n5\n5\n\n15\n10\n10\n\n\u03b2\n\n5 \u00b7 10\u22124\n10\u22126\n10\u22126\n\n#runs\n\n2\n5\n5\n\nT\n\u2013\n\n5.7 \u00b7 10\u22126\n6.5 \u00b7 10\u22124\n\n4.2 Real data\n\nIn vitro hippocampal CA1 region data. We analyzed spike trains of 91 cells from the hippocampal\nCA1 region recorded at high temporal and multiple single cell resolution using CA2+ imaging. The\nacute mouse hippocampal slices were recorded in a so-called interface chamber [47].\nOn this dataset, our algorithm identi\ufb01ed three motifs as real motifs. They are shown in \ufb01gure 5a. The\nactivity of each assembly has been calculated at every frame and is shown in \ufb01gure 5b. In order to\nqualitatively show that the proposed method appropriately eliminates false positives from the list of\nfound motifs also on real data, we plotted in \ufb01gure 6 for each motif the difference to the best matching\nmotif from every other run. We did this for the motifs identi\ufb01ed in the original spike matrix (\ufb01gure\n6a), as well as for the motifs identi\ufb01ed in the shuf\ufb02ed spike matrix (\ufb01gure 6b). The motifs found in\nthe shuf\ufb02ed matrix show much higher variability between runs than those found in the original matrix.\nFor motifs 1 and 3 from the original matrix the difference between runs is in average about two to\nthree times higher than for the other motifs, but still smaller than the average difference between runs\nfor all of the motifs from the shuf\ufb02ed data. Nevertheless, these motifs are deleted as false positives,\nsince the threshold for discarding a motif is set to the minimum difference of motifs from different\nruns on the shuf\ufb02ed matrix. This shows that the \ufb01nal set of motifs is unlikely to contain spurious\nmotifs anymore.\nThe spontaneous hippocampal network activity is expected to appear under the applied recording\nconditions as sharp wave-ripple (SPW-R) complexes that support memory consolidation [48\u201350, 47].\nMotif 5 in \ufb01gure 5a shows the typical behavior of principal neurons \ufb01ring single or two consecutive\nspikes at a low \ufb01ring rate ((cid:28) 1 Hz) during SPW-R in vitro [47]. This might be interpreted as the\nre-activation of a formerly established neuronal assembly.\n\nIn vitro cortical neuron culture data. Primary cortical neurons were prepared from E15 embryos\nof Sprague Dawley rats as described in [51] and approved by the NIH Animal Care and Usage\nCommittee. Cells were transduced with an adeno-associated virus expressing the genetically-encoded\ncalcium indicator GCaMP6f on DIV 7 (Addgene #51085). Wide-\ufb01eld epi\ufb02uorescent videos of\nspontaneous calcium activity from individual wells (6 \u00d7 104 cells/well) were recorded on DIV 14 or\n18 at an acquisition rate of 31.2 frames per second. The data for the shown example contains 400\nidenti\ufb01ed neurons imaged for 10 minutes on DIV 14.\nOur algorithm identi\ufb01ed two motifs in the used dataset, shown in \ufb01gure 5c. Their activity is plotted\nin \ufb01gure 5d. For each column of the two motifs, \ufb01gure 7 shows the percentage of active neurons\nat every time frame. The motifs were thresholded such that only neurons with a motif coef\ufb01cient\nabove 50% of the maximum coef\ufb01cient of the motif were counted. We show those columns of the\nmotifs which contained more than one neuron after thresholding. The fact that \ufb01gure 7 shows only\nfew motif activations that include all of the cells that are a part of the motif has less to do with the\nactual algorithm, but more with how the nervous system works: Only rarely all cells of an assembly\nwill spike [23], due to both the intrinsic stochasticity, like probabilistic synaptic release [52] and the\nfact that synaptic connectivity and thus assembly membership will be graded and strongly \ufb02uctuates\nacross time due to short-term synaptic plasticity [53]. Nevertheless, the plot shows that often several\ncolumns are active in parallel and there are some time points where a high percentage of the neurons\nin all columns is active together. This shows that the found motifs really contain temporal structure\nand are repeated multiple times in the data.\nAll parameters for the analysis of the shown experiments can be found in table 1.\n\n7\n\n\f(a) Motifs from hippocampal CA1 region data\n\n(b) Activity of motifs from hip-\npocampal CA1 region data\n\n(c) Motifs from cortical neuron culture data\n\n(d) Activity of motifs from cortical neuron culture data\n\nFigure 5: Results from real data. We show the results of our algorithm for two different real datasets.\nThe datasets vary in temporal length as well as number of observed cells. For each dataset we show\nthe motifs that our algorithm identi\ufb01ed as real motifs and their activity over time.\n\n(a) Difference between runs for motifs learned on origi-\nnal matrix\nFigure 6: Differences between the \ufb01ve runs for all \ufb01ve learned motifs from hippocampal CA1 region\ndata. The plots show for each motif the difference to the best matching motif from every other run.\nWe did this for the motifs identi\ufb01ed in the original hippocampal CA1 region data (a), as well as for\nthe motifs identi\ufb01ed in the shuf\ufb02ed spike matrix (b). The motifs found in the shuf\ufb02ed matrix show\nmuch higher variability between runs than those found in the original matrix.\n\n(b) Difference between runs for motifs learned on shuf-\n\ufb02ed matrix\n\n5 Discussion\n\nWe have presented a new approach for the identi\ufb01cation of motifs that is not limited to synchronous\nactivity. Our method leverages sparsity constraints on the activity and the motifs themselves to allow\na simple and elegant formulation that is able to learn motifs with temporal structure. Our algorithm\nextends convolutional coding methods with a novel optimization approach to allow modeling of\ninteractions between neurons. The proposed algorithm is designed to identify motifs in data with\ntemporal stationarity. Non-stationarities in the data, which are expected to appear especially in\n\n8\n\n12345678910frame0102030405060708090neuronmotif 212345678910frame0102030405060708090motif 412345678910frame0102030405060708090motif 50.000.020.040.060.080.100.120.140100020003000400050000.000.250.500.75activitymotif 20100020003000400050000.001.002.003.00activitymotif 4010002000300040005000frame0.000.501.001.50activitymotif 512345678910frame0102030405060708090100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400neuronmotif 112345678910frame0102030405060708090100110120130140150160170180190200210220230240250260270280290300310320330340350360370380390400motif 30.000.020.040.060.080.100250050007500100001250015000175000.001.002.003.00activitymotif 1025005000750010000125001500017500frame0.001.002.003.00activitymotif 312345run12345runmotif 112345run12345runmotif 212345run12345runmotif 312345run12345runmotif 412345run12345runmotif 50e+003e-056e-059e-0512345run12345runmotif 112345run12345runmotif 212345run12345runmotif 312345run12345runmotif 412345run12345runmotif 50e+003e-056e-059e-05\fFigure 7: Percentage of active neurons per column over time, for all motifs identi\ufb01ed in the cortical\nneuron culture dataset. For each column of the two motifs displayed in \ufb01gure 5c, we show the\npercentage of active neurons at every time frame. Vertical grey bars indicate points in time at\nwhich all signi\ufb01cantly populated columns of a motif \ufb01re with at least 30% of their neurons. Their\nreoccurence shows that the motifs really contain temporal structure and are repeated multiple times\nin the dataset.\n\nrecordings from in vivo, are not yet taken into account.\nIn cases where non-stationarities are\nexpected to be strong, the method for stationarity-segmentation introduced in [54] could be used\nbefore applying our algorithm to the data. Although our algorithm has some limitations in terms of\nnon-stationarities, results on simulated datasets show that the proposed method outperforms others\nespecially when identifying long motifs. Additionally, the algorithm shows stable performance on\nreal datasets. Moreover, the results found on the cortical neuron culture dataset show that our method\nis able to detect assemblies within large sets of recorded neurons.\n\nAcknowledgments\n\nSP and EK thank Eleonora Russo for sharing her knowledge on generating synthetic data and Fynn\nBachmann for his support. LAC, BKH and CH thank Lowella Fortuno for technical assistance with\ncortical cultures and acknowledge the support by the Intramural Research Program of the NIH, NIDA.\nDD acknowledges partial \ufb01nancial support by DFG Du 354/8-1. SP, EK, MB, DD, FD and FAH\ngratefully acknowledge partial \ufb01nancial support by DFG SFB 1134.\n\nReferences\n[1] D. Hebb, The Organization of Behaviour: A Neuropsychological Theory. Wiley, 1949.\n[2] D. Marr, D. Willshaw, and B. McNaughton, Simple memory: a theory for archicortex. Springer, 1991.\n[3] W. Singer, \u201cSynchronization of cortical activity and its putative role in information processing and learning,\u201d\n\nAnnual review of physiology, vol. 55, no. 1, pp. 349\u2013374, 1993.\n\n[4] M. A. Nicolelis, E. E. Fanselow, and A. A. Ghazanfar, \u201cHebb\u2019s dream: the resurgence of cell assemblies,\u201d\n\nNeuron, vol. 19, no. 2, pp. 219\u2013221, 1997.\n\n[5] Y. Ikegaya, G. Aaron, R. Cossart, D. Aronov, I. Lampl, D. Ferster, and R. Yuste, \u201cSyn\ufb01re chains and\n\ncortical songs: temporal modules of cortical activity,\u201d Science, vol. 304, no. 5670, pp. 559\u2013564, 2004.\n\n[6] P. Cossart and P. J. Sansonetti, \u201cBacterial invasion: The paradigms of enteroinvasive pathogens,\u201d Science,\n\nvol. 304, no. 5668, pp. 242\u2013248, 2004.\n\n[7] G. Buzs\u00e1ki, \u201cLarge-scale recording of neuronal ensembles,\u201d Nature neuroscience, vol. 7, no. 5, pp. 446\u2013451,\n\n2004.\n\n[8] A. Mokeichev, M. Okun, O. Barak, Y. Katz, O. Ben-Shahar, and I. Lampl, \u201cStochastic emergence of\nrepeating cortical motifs in spontaneous membrane potential \ufb02uctuations in vivo,\u201d Neuron, vol. 53, no. 3,\npp. 413\u2013425, 2007.\n\n[9] E. Pastalkova, V. Itskov, A. Amarasingham, and G. Buzs\u00e1ki, \u201cInternally generated cell assembly sequences\n\nin the rat hippocampus,\u201d Science, vol. 321, no. 5894, pp. 1322\u20131327, 2008.\n\n[10] I. H. Stevenson and K. P. Kording, \u201cHow advances in neural recording affect data analysis,\u201d Nature\n\nneuroscience, vol. 14, no. 2, pp. 139\u2013142, 2011.\n\n[11] M. B. Ahrens, M. B. Orger, D. N. Robson, J. M. Li, and P. J. Keller, \u201cWhole-brain functional imaging at\n\ncellular resolution using light-sheet microscopy,\u201d Nature methods, vol. 10, no. 5, pp. 413\u2013420, 2013.\n\n[12] L. Carrillo-Reid, J.-e. K. Miller, J. P. Hamm, J. Jackson, and R. Yuste, \u201cEndogenous sequential cortical\n\nactivity evoked by visual stimuli,\u201d Journal of Neuroscience, vol. 35, no. 23, pp. 8813\u20138828, 2015.\n\n[13] S. Gr\u00fcn, M. Diesmann, and A. Aertsen, \u201cUnitary events in multiple single-neuron spiking activity: I.\n\ndetection and signi\ufb01cance,\u201d Neural Computation, vol. 14, no. 1, pp. 43\u201380, 2002.\n\n[14] S. Gr\u00fcn, M. Diesmann, and A. Aertsen, \u201cUnitary events in multiple single-neuron spiking activity: II.\n\nnonstationary data,\u201d Neural Computation, vol. 14, no. 1, pp. 81\u2013119, 2002.\n\n9\n\n0250050007500100001250015000175000100010001000100motif 14567column025005000750010000125001500017500frame010001000100motif 3345columnpercentage of active neurons\f[15] B. Staude, S. Rotter, and S. Gr\u00fcn, \u201cCubic: cumulant based inference of higher-order correlations in\nmassively parallel spike trains,\u201d Journal of Computational Neuroscience, vol. 29, no. 1, pp. 327\u2013350, 2010.\n[16] B. Staude, S. Gr\u00fcn, and S. Rotter, \u201cHigher-order correlations in non-stationary parallel spike trains:\n\nstatistical modeling and inference,\u201d Frontiers in Computational Neuroscience, vol. 4, p. 16, 2010.\n\n[17] V. Lopes-dos Santos, S. Ribeiro, and A. B. Tort, \u201cDetecting cell assemblies in large neuronal populations,\u201d\n\nJournal of neuroscience methods, vol. 220, no. 2, pp. 149\u2013166, 2013.\n\n[18] A. C. Smith and P. C. Smith, \u201cA set probability technique for detecting relative time order across multiple\n\nneurons,\u201d Neural Comput., vol. 18, no. 5, pp. 1197\u20131214, 2006.\n\n[19] A. C. Smith, V. K. Nguyen, M. P. Karlsson, L. M. Frank, and P. Smith, \u201cProbability of repeating patterns\n\nin simultaneous neural data,\u201d Neural Comput., vol. 22, no. 10, pp. 2522\u20132536, 2010.\n\n[20] G. L. Gerstein, E. R. Williams, M. Diesmann, S. Gr\u00fcn, and C. Trengove, \u201cDetecting syn\ufb01re chains in\n\nparallel spike data,\u201d Journal of Neuroscience Methods, vol. 206, no. 1, pp. 54 \u2013 64, 2012.\n\n[21] E. Torre, P. Quaglio, M. Denker, T. Brochier, A. Riehle, and S. Gr\u00fcn, \u201cSynchronous spike patterns in\nmacaque motor cortex during an instructed-delay reach-to-grasp task,\u201d Journal of Neuroscience, vol. 36,\nno. 32, pp. 8329\u20138340, 2016.\n\n[22] R. Yuste, J. N. MacLean, J. Smith, and A. Lansner, \u201cThe cortex as a central pattern generator,\u201d Nature\n\nReviews Neuroscience, vol. 6, no. 6, pp. 477\u2013483, 2005.\n\n[23] E. Russo and D. Durstewitz, \u201cCell assemblies at multiple time scales with arbitrary lag constellations,\u201d\n\neLife, vol. 6, p. e19428, 2017.\n\n[24] P. Smaragdis, \u201cNon-negative matrix factor deconvolution; extraction of multiple sound sources from\nmonophonic inputs,\u201d Lecture Notes in Computer Science (including subseries Lecture Notes in Arti\ufb01cial\nIntelligence and Lecture Notes in Bioinformatics), vol. 3195, pp. 494\u2013499, 2004.\n\n[25] P. D. O\u2019Grady and B. A. Pearlmutter, \u201cConvolutive non-negative matrix factorisation with a sparseness\nconstraint,\u201d in 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal\nProcessing, pp. 427\u2013432, 2006.\n\n[26] M. A. Nicolelis, L. A. Baccala, R. Lin, and J. K. Chapin, \u201cSensorimotor encoding by synchronous\nneural ensemble activity at multiple levels of the somatosensory system,\u201d Science, vol. 268, no. 5215,\npp. 1353\u20131358, 1995.\n\n[27] V. Lopes-dos Santos, S. Conde-Ocazionez, M. A. L. Nicolelis, S. T. Ribeiro, and A. B. L. Tort, \u201cNeuronal\nassembly detection and cell membership speci\ufb01cation by principal component analysis,\u201d PLOS ONE,\nvol. 6, no. 6, pp. 1\u201316, 2011.\n\n[28] P. Comon, \u201cIndependent component analysis, a new concept?,\u201d Signal processing, vol. 36, no. 3, pp. 287\u2013\n\n[29] A. Cichocki and R. Zdunek, \u201cMultilayer nonnegative matrix factorisation,\u201d Electronics Letters, vol. 42,\n\n314, 1994.\n\nno. 16, pp. 947\u2013948, 2006.\n\n[30] J. T. Vogelstein, A. M. Packer, T. A. Machado, T. Sippy, B. Babadi, R. Yuste, and L. Paninski, \u201cFast\nnonnegative deconvolution for spike train inference from population calcium imaging,\u201d Journal of Neuro-\nphysiology, vol. 104, no. 6, pp. 3691\u20133704, 2010.\n\n[31] R. Rubinstein, M. Zibulevsky, and M. Elad, \u201cDouble sparsity: Learning sparse dictionaries for sparse\n\nsignal approximation,\u201d IEEE Transactions on Signal Processing, vol. 58, no. 3, pp. 1553\u20131564, 2010.\n\n[32] E. A. Pnevmatikakis, T. A. Machado, L. Grosenick, B. Poole, J. T. Vogelstein, and L. Paninski, \u201cRank-\npenalized nonnegative spatiotemporal deconvolution and demixing of calcium imaging data,\u201d in Computa-\ntional and Systems Neuroscience (Cosyne) 2013, 2013.\n\n[33] E. A. Pnevmatikakis and L. Paninski, \u201cSparse nonnegative deconvolution for compressive calcium imaging:\n\nalgorithms and phase transitions,\u201d in NIPS, 2013.\n\n[34] F. Diego Andilla and F. A. Hamprecht, \u201cSparse space-time deconvolution for calcium image analysis,\u201d in\nAdvances in Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D.\nLawrence, and K. Q. Weinberger, eds.), pp. 64\u201372, Curran Associates, Inc., 2014.\n\n[35] E. A. Pnevmatikakis, Y. Gao, D. Soudry, D. Pfau, C. Lace\ufb01eld, K. Poskanzer, R. Bruno, R. Yuste, and\nL. Paninski, \u201cA structured matrix factorization framework for large scale calcium imaging data analysis,\u201d\narXiv:1409.2903 [q-bio, stat].\n\n[36] F. Diego and F. A. Hamprecht, \u201cLearning multi-level sparse representations,\u201d in NIPS, 2013.\n[37] R. J. Weiss and J. P. Bello, \u201cIdentifying repeated patterns in music using sparse convolutive non-negative\n\nmatrix factorization,\u201d in ISMIR, 2010.\n\n[38] H. Zou and T. Hastie, \u201cRegularization and variable selection via the elastic net,\u201d Journal of the Royal\n\nStatistical Society, Series B (Statistical Methodology), vol. 67, no. 2, pp. 301\u2013320, 2005.\n\n[39] D. P. Bertsekas, Nonlinear Programming. Athena Scienti\ufb01c, 1999.\n[40] R. Tibshirani, \u201cRegression shrinkage and selection via the lasso,\u201d Journal of the Royal Statistical Society.\n\nSeries B (Methodological), vol. 58, no. 1, pp. 267\u2013288, 1996.\n\n[41] P. C. Hansen, \u201cDeconvolution and regularization with Toeplitz matrices,\u201d Numerical Algorithms, vol. 29,\n\nno. 4, pp. 323\u2013378, 2002.\n\n[42] S. G. Mallat and Z. Zhang, \u201cMatching pursuits with time-frequency dictionaries,\u201d IEEE Transactions on\n\nSignal Processing, vol. 41, no. 12, pp. 3397\u20133415, 1993.\n\n[43] M. Protter and M. Elad, \u201cImage sequence denoising via sparse and redundant representations,\u201d IEEE\n\nTransactions on Image Processing, vol. 18, no. 1, pp. 27\u201335, 2009.\n\n10\n\n\f[44] A. Szlam, K. Kavukcuoglu, and Y. LeCun, \u201cConvolutional matching pursuit and dictionary training,\u201d\n\n[45] W. P. Pierskalla, \u201cLetter to the editor \u2013 the multidimensional assignment problem,\u201d Operations Research,\n\nComputer Research Repository (arXiv), 2010.\n\nvol. 16, no. 2, pp. 422\u2013431, 1968.\n\n[46] Y. N. Billeh, M. T. Schaub, C. A. Anastassiou, M. Barahona, and C. Koch, \u201cRevealing cell assemblies at\n\nmultiple levels of granularity,\u201d Journal of Neuroscience Methods, vol. 236, pp. 92 \u2013 106, 2014.\n\n[47] T. Pfeiffer, A. Draguhn, S. Reichinnek, and M. Both, \u201cOptimized temporally deconvolved Ca2+ imaging\nallows identi\ufb01cation of spatiotemporal activity patterns of CA1 hippocampal ensembles,\u201d NeuroImage,\nvol. 94, pp. 239\u2013249, 2014.\n\n[48] G. Buzs\u00e1ki, \u201cMemory consolidation during sleep: A neurophysiological perspective,\u201d Journal of Sleep\n\nResearch, vol. 7 Suppl 1, pp. 17\u201323, 1998.\n\n[49] G. Girardeau, K. Benchenane, S. I. Wiener, G. Buzs\u00e1ki, and M. B. Zugaro, \u201cSelective suppression of\nhippocampal ripples impairs spatial memory,\u201d Nature Neuroscience, vol. 12, no. 10, pp. 1222\u20131223, 2009.\n[50] G. Girardeau and M. Zugaro, \u201cHippocampal ripples and memory consolidation,\u201d Current Opinion in\n\nNeurobiology, vol. 21, no. 3, pp. 452\u2013459, 2011.\n\n[51] D. B. Howard, K. Powers, Y. Wang, and B. K. Harvey, \u201cTropism and toxicity of adeno-associated viral\nvector serotypes 1, 2, 5, 6, 7, 8, and 9 in rat neurons and glia in vitro,\u201d Virology, vol. 372, no. 1, pp. 24 \u2013\n34, 2008.\n\n[52] C. F. Stevens, \u201cNeurotransmitter release at central synapses,\u201d Neuron, vol. 40, no. 2, pp. 381 \u2013 388, 2003.\n[53] H. Markram, Y. Wang, and M. Tsodyks, \u201cDifferential signaling via the same axon of neocortical pyramidal\n\nneurons,\u201d Proceedings of the National Academy of Sciences, vol. 95, no. 9, pp. 5323\u20135328, 1998.\n\n[54] C. S. Quiroga-Lombard, J. Hass, and D. Durstewitz, \u201cMethod for stationarity-segmentation of spike\ntrain data with application to the pearson cross-correlation,\u201d Journal of Neurophysiology, vol. 110, no. 2,\npp. 562\u2013572, 2013.\n\n11\n\n\f", "award": [], "sourceid": 2050, "authors": [{"given_name": "Sven", "family_name": "Peter", "institution": "University Heidelberg"}, {"given_name": "Elke", "family_name": "Kirschbaum", "institution": "HCI/IWR, Heidelberg University"}, {"given_name": "Martin", "family_name": "Both", "institution": "Institute for Physiology and Pathophysiology Heidelberg University"}, {"given_name": "Lee", "family_name": "Campbell", "institution": "Intramural Research Program, National Institute on Drug Abuse"}, {"given_name": "Brandon", "family_name": "Harvey", "institution": "Intramural Research Program, National Institute on Drug Abuse"}, {"given_name": "Conor", "family_name": "Heins", "institution": "Intramural Research Program, National Institute on Drug Abuse"}, {"given_name": "Daniel", "family_name": "Durstewitz", "institution": "CIMH Heidelberg University"}, {"given_name": "Ferran", "family_name": "Diego", "institution": "Bosch"}, {"given_name": "Fred", "family_name": "Hamprecht", "institution": "Heidelberg University"}]}