{"title": "A coupled autoencoder approach for multi-modal analysis of cell types", "book": "Advances in Neural Information Processing Systems", "page_first": 9267, "page_last": 9276, "abstract": "Recent developments in high throughput profiling of individual neurons have spurred data driven exploration of the idea that there exist natural groupings of neurons referred to as cell types. The promise of this idea is that the immense complexity of brain circuits can be reduced, and effectively studied by means of interactions between cell types. While clustering of neuron populations based on a particular data modality can be used to define cell types, such definitions are often inconsistent across different characterization modalities. We pose this issue of cross-modal alignment as an optimization problem and develop an approach based on coupled training of autoencoders as a framework for such analyses. We apply this framework to a Patch-seq dataset consisting of transcriptomic and electrophysiological profiles for the same set of neurons to study consistency of representations across modalities, and evaluate cross-modal data prediction ability. We explore the problem where only a subset of neurons is characterized with more than one modality, and demonstrate that representations learned by coupled autoencoders can be used to identify types sampled only by a single modality.", "full_text": "A coupled autoencoder approach for multi-modal\n\nanalysis of cell types\n\nRohan Gala, Nathan Gouwens, Zizhen Yao, Agata Budzillo, Osnat Penn,\n\nBosiljka Tasic, Gabe Murphy, Hongkui Zeng, Uygar S\u00fcmb\u00fcl\n\nrohang@alleninstitute.org, uygars@alleninstitute.org\n\nAllen Institute, Seattle, WA 98109\n\nAbstract\n\nRecent developments in high throughput pro\ufb01ling of individual neurons have\nspurred data driven exploration of the idea that there exist natural groupings of\nneurons referred to as cell types. The promise of this idea is that the immense\ncomplexity of brain circuits can be reduced, and effectively studied by means of\ninteractions between cell types. While clustering of neuron populations based on\na particular data modality can be used to de\ufb01ne cell types, such de\ufb01nitions are\noften inconsistent across different characterization modalities. We pose this issue\nof cross-modal alignment as an optimization problem and develop an approach\nbased on coupled training of autoencoders as a framework for such analyses.\nWe apply this framework to a Patch-seq dataset consisting of transcriptomic and\nelectrophysiological pro\ufb01les for the same set of neurons to study consistency of\nrepresentations across modalities, and evaluate cross-modal data prediction ability.\nWe explore the problem where only a subset of neurons is characterized with\nmore than one modality, and demonstrate that representations learned by coupled\nautoencoders can be used to identify types sampled only by a single modality.\n\n1\n\nIntroduction\n\nComputation in the brain can involve complicated interactions between millions of different cells.\nIdentifying cell types and their stereotypical interactions based on functional and developmental\ncharacteristics of individual cells has the potential to reduce this complexity in service of our efforts\nto understand the brain. However, capturing the notion of a cell type identity that is consistent\nacross different single cell characterization modalities such as transcriptomics, electrophysiology,\nmorphology, and connectivity has been a challenging computational problem [1, 2, 3, 4, 5].\nA general approach to understand correspondence between cell type de\ufb01nitions based on different\nmodalities [3] is to evaluate the degree to which the observable cellular features themselves can\nbe aligned across the modalities. The existence of such alignment would allow one to determine\nan abstract, potentially low-dimensional representation for each cell. In such a scenario, different\ntransformations could be used to generate realizations of the features measured in the different\nmodalities from the abstract representation itself. Moreover, tasks such as clustering to de\ufb01ne cell\ntypes could be performed on such representations obtained for cell populations. Here, we propose\na method to reveal such abstract identities of cells by casting it as an optimization problem. We\ndemonstrate that (i) cell classes de\ufb01ned by a single data modality can be predicted with high accuracy\nfrom observations measuring seemingly very different aspects of neuronal identity, and (ii) the same\nframework enables cross-modal prediction of raw recordings.\nWell known approaches to obtain coordinated representations [6] from multi-modal datasets include\nthe canonical correlation analysis (CCA) and its nonlinear variants [7, 8]. These techniques involve\ncalculation of explicit transformation matrices and possibly parameters of multi-layer perceptrons.\n\n\fFigure 1: (A) Illustration of a k-coupled autoencoder. (B) 2D representations of the MNIST dataset obtained by\none agent of a 2-CAE for various forms of Ccoupling. Colors represent different digits. (i) Representations shrink\nto zero in the absence of scaling (Eq.2). (ii) Representations collapse to a line if the scaling is based on batch\nnormalization [11]. Reasonable representations are obtained with CFC (iii) and CMSV (iv). CMSV and CFC\nlead to identical Crecon when the full covariance matrix estimates are reliable. For large latent dimensionality\n(C) or small batch sizes (D), CMSV leads to lower Crecon (mean \u00b1 SE, n = 10).\nAnother recent approach for this problem is the correspondence autoencoder architecture [9],\nwherein individual agents are standard autoencoders that encode a high dimensional input into a\nlow dimensional latent space from which the input is reconstructed [10]. The trained network is\nexpected to align the representations without any explicit transformation matrices. However, in the\nabsence of any normalization of the representations, the individual agents can arbitrarily scale down\ntheir representations to minimize the coupling cost without a penalty on reconstruction accuracy.\nWhile Batch Normalization [11] prevents the representations from collapsing to zero by setting the\nscale for each latent dimension independently, it permits a different pathological solution wherein the\nrepresentations collapse onto a one dimensional manifold. We present a rigorous analysis of these\nproblems, and show that normalization with the full covariance matrix of the mini-batch is suf\ufb01cient,\nas expected [8], to obtain reasonable latent space representations. However, this calculation can\nbe prohibitively inaccurate depending on the latent space dimensionality and batch size (\u201ccurse of\ndimensionality\u201d). Therefore, we propose an alternative normalization that relies only on estimating\nthe minimum eigenvalue of this covariance matrix. Moreover, we derive a probabilistic setting for\nthe cross-modal representation alignment problem and show that our optimization objective can be\ninterpreted as the maximization of a likelihood function, which suggests multiple generalizations of\nour current implementation.\nWhile there is limited literature on analysis of multi-modal neuronal recordings from a cell types\nperspective, the advent of large transcriptomic datasets have led to a recent surge of interest in\nunimodal characterization methods for such data [12, 13, 14, 15, 16, 17]. In particular, Lopez\net al. [17] propose a generative model for transcriptomic data using variational inference on an\nautoencoding architecture, and apply k-means clustering on the latent representation. While the\ncommonly used Gaussian prior is in contrast with the search for discrete cell classes, mixture model\npriors [18] are not easily applicable to cases with potentially hundreds of categories. Here, we \ufb01t a\nGaussian mixture on the latent space representation following the optimization of a discriminative\nmodel. We study cross-modal prediction of cell types and raw data with this approach.\nFinally, our method can work with partially paired datasets. This setting raises two problems of\npractical signi\ufb01cance for cell type classi\ufb01cation: (i) would types that are not sampled by some\nmodalities be falsely aligned to other types? (ii) would types that are sampled by all modalities in the\nabsence of any pairing knowledge have consistent embeddings across the modalities? We demonstrate\nthe utility of our approach in addressing these problems by designing a controlled experiment.\n2 Theory\n\n2.1 Optimization framework\n\nAn illustration of the multi-agent autoencoder architecture is shown in Fig. 1A, where agent i receives\ninput xsi for which it learns a latent representation zsi. This representation is used to obtain a\n\nreconstruction of the input,(cid:101)xsi. The representation learned by a given agent is compared to those\n\nlearned by all other agents to which it is coupled through a dissimilarity measure. The agents minimize\nan overall cost function C, that consists of penalties on reconstruction error Crecon, and mismatches\ncompared to representations learned by other agents, Ccoupling. The trade-off between learning\na representation that minimizes reconstruction error, and one that agrees with the representations\nlearned by other agents is controlled by a coupling constant, \u03bb.\n\n2\n\nC\u2026\u2026Axsjx!sjzsj\ud835\udf00j\ud835\udc9fjxsix!sizsi\ud835\udf00i\ud835\udc9fixskx!skzsk\ud835\udf00k\ud835\udc9fk(iii)(iv)B(i)(ii)D\fFormally, we de\ufb01ne the k-coupled autoencoding tuple (k-CAE) \u03a6 as\n\n\u03a6 = ({(Ei,Di, ri)}i\u2208K, c, \u03bb),\n\nwhere K is an ordered, \ufb01nite index set, Ei, Di are continuous operators that can express any linear\ntransformation, codomain(Ei) = domain(Dj), i, j \u2208 K, \u03bb \u2265 0, and ri and c are non-negative\nconvex functions.\nFor a set of inputs X = {(xs1, xs2, . . . , xsk), s \u2208 S}, we de\ufb01ne the loss of the k-CAE \u03a6 as\n\nwhere\n\nCrecon,\u03a6(X) =\n\n(cid:88)\n\n(cid:88)\n\ns\u2208S\n\ni\u2208K\n\nC\u03a6(X) = Crecon,\u03a6(X) + \u03bbCcoupling,\u03a6(X),\n\nri(xsi\u2212Di(Ei(xsi))), Ccoupling,\u03a6(X) =\n\n(1)\n\nc(Ei(xsi)\u2212Ej(xsj)).\n\n(cid:88)\n\n(cid:88)\n\ns\u2208S\n\ni,j\u2208K,\ni 0. When c is also chosen as the\nsquared Euclidean distance and \u03b1i = 1 for all i, one obtains the cost function of Feng et al. [9],\nc(Ei(xsi) \u2212 Ej(xsj)) = (cid:107)Ei(xsi) \u2212 Ej(xsj)(cid:107)2\n2:\n\nCcoupling =\n\n(cid:107)zsi \u2212 zsj(cid:107)2\n2.\n\n(2)\n\ni\n\ni 0\nsatisfy (cid:107)zsi(cid:107) < \u0001, for any norm (cid:107) \u00b7 (cid:107), input set X, \u0001 > 0, and all s, i. (Proof in supp. material)\n\n2.2 Scaling latent representation with batch normalization\n\nA way to alleviate the shrinking representation problem is to impose a length scale on the repre-\nsentation. Mini-batch statistics can be used to determine such a scale, as is the case with batch\nnormalization [11]. In its conventional implementation, each dimension m is centered and scaled by\nempirical estimates of the population mean Es(zsi(m)), and standard deviation \u03c3s(zsi(m)) based on\nmini-batch samples:\n\nCcoupling =\n\n(cid:107)\u00afzsi \u2212 \u00afzsj(cid:107)2\n2,\n\n\u00afzsi(m) =\n\nzsi(m) \u2212 Es(zsi(m))\n\n\u03c3s(zsi(m))\n\n(3)\n\n(cid:88)\n\n(cid:88)\n\ns\u2208S\n\ni 0 satisfy\n|zsi(m) \u2212 zsi( \u00afm)| < \u0001, for any 1 \u2264 m, \u00afm \u2264 p, s \u2208 S, 1 \u2264 i \u2264 k, \u0001 > 0. (Proof in supp. material)\nThus, latent representations that do not collapse onto a single dimension do not have a stable training\npath in the sense that, under a continuous probability model for zsi|zsj (Section 2.4), such coupled\nrepresentations are of measure zero.\n\n2.3 Mini-batch singular value based normalization\n\nEstimates of the covariance matrix are increasingly inaccurate for smaller batch sizes and larger\nlatent dimensionalities. We propose an alternative that entails scaling the latent representation by\nthe narrowest dimension. This can be formally evaluated as the smallest singular value of the batch\nmatrix. Ccoupling can thus be written as:\n\n(cid:88)\n\n(cid:88)\n\ns\u2208S\n\ni 80%\naccuracy over more than 40 types with a 3D latent space (Fig. 3A, \u03bb = 0). As \u03bb is increased, the\ngreater emphasis on minimizing mismatches with the electrophysiology representation leads to a\nslight degradation of transcriptomic type prediction. With \u03bb = 1, 10, we were able to obtain highly\nconsistent representations of multi-modal identity (Fig. 2C) as re\ufb02ected by the high classi\ufb01cation\naccuracy in Fig. 3A-B. We performed this analysis using 3D representations obtained with CCA\n[7, 28] that use transcriptomic and electrophysiological data reduced by PCA (PC-CCA, tuples\nindicate number of principal components of transcriptomic and electrophysiological data used for\nCCA). Transcriptomic and electrophysiological data were projected onto the top 3 CCA components,\nfollowed by a whitening transformation to ensure that the scale for the representations is the same.\nRed plots in Fig. 3A shows that 3D projections obtained in this manner offer a weak alternative to\nanalyze multi-modal cell identity.\n\n6\n\nBCA\fFigure 4: Cross-modal data prediction with 3D latent representations. Estimates of expression for a set of\n37 peptidergic genes based on sPCA features (A), and of the sPCA features based on gene expression (B) for\nexample test cells (\u03bb = 10) show qualitative agreement of the predictions with the observations. (C) Quantifying\nCrecon with a reference of \u03bb = 0 across the test set demonstrates the trade-off for \u03bb : increasing \u03bb makes the\nrepresentations similar, leading to smaller differences between the same- (light colors) and cross-modal data\n(dark colors) prediction, and a higher Crecon.\n\nA similar analysis was performed using the electrophysiological representations, ze, to test cross-\nmodal prediction of transcriptomic types. Fig. 3B shows that the classi\ufb01er performance is worse\ncompared to Fig. 3A when \u03bb = 0, which suggests that variations in the electrophysiology features do\nnot completely overlap with variations in gene expression pro\ufb01les. This is in line with the inconsistent\nclusters obtained in studies that consider single data modalities to de\ufb01ne cell types. As \u03bb increases,\nzt and ze become more similar, and therefore allow cross modal prediction with better accuracy.\nUnsupervised cross modal type prediction: We used unsupervised clustering to test the consistency\nof clusters obtained by coupled autoencoders to not be limited by the differential gene expression-\nbased ground truth labels used for the supervised analysis. We \ufb01tted Gaussian mixture models\nwith different component counts (E-M algorithm, 100 initializations) to the training data zt and ze\nindependently, for each cross-validation set. Labels for zt and ze of the validation data were assigned\nbased on their respective \ufb01tted mixture models. Fig. 3C shows the adjusted mutual information\n(mean \u00b1 SE, n = 50 cross-validation sets) as a measure of consistency of the labels obtained by\nsuch independent, unsupervised clustering of the representations. As \u03bb increases, the clusters become\nmore consistent across modalities. The 3D CCA-based representations do not show distinct clusters,\nand consequently the consistency of labels unsupervised clustering is low overall.\nAnalysis of reconstruction error as a function of \u03bb: The representations obtained by coupled\nautoencoders enable prediction of gene expression pro\ufb01les from electrophysiological features and\nvice versa. Examples of such cross modal data predictions (Fig. 4A-B) based on very low dimensional\n(d = 3) representations capture salient features of the data already. To quantify the effect of imposing\na penalty on representation mismatches when it comes to the cross modal data prediction task, we\ncompared Crecon for data reconstructions based on coupled representations (\u03bb > 0) to that obtained\nby setting \u03bb = 0. Fig. 4C demonstrates that for the Patch-seq dataset, increasing \u03bb leads to worse\nreconstruction accuracy as expected. While the difference is small for predicting transcriptomic data,\nit is larger for electrophysiological feature prediction as a consequence of using \u03b1 < 1 (Section 2.4).\nCell type discovery: For partially paired datasets (Fig. 2A), an important problem is whether cell\ntypes not observed in some of the modalities can be uncovered by the alignment method. To test\nthis, we split the FACS dataset into two subsets (A and B), where samples of four cell types were\nallowed to be in only one of the two subsets. From among the cell types shared across A and B, we\nconsidered 1/3 of the cells \u2019paired\u2019 based on (i) their cell type label, (ii) similarity of peptidergic\ngene expression [29], and (iii) distance in a representation obtained for the complete FACS dataset\nby a single autoencoder (see supp. methods for details). Fig. 5A shows the representations zA and\nzB obtained by the coupled autoencoder for the two subsets. Our results demonstrate that (i) types\nunique to subset A appear in zA in positions that are not occupied by other cell types in zB and vice\nversa, whereas (ii) a type present in both subsets for which no cells were marked as paired occupied\nsimilar positions in zA and zB. To quantify this observation, we calculated the nearest neighbor\ndistance in zB for the types unique to subset A by using their positions from zA (and vice versa),\nFig. 4B. This simple quanti\ufb01cation already shows that samples of types unique to subset A can easily\nbe distinguished from other types in subset B. This proof-of-principle experiment suggests that\ncoupling representations in this manner can serve as a framework to discover shared and distinct cell\ntypes from aligned datasets, for data obtained from different modalities, brain regions, or species.\n\n7\n\nCAGene expression: log(CPM+1)Neuropeptide genesBCoefficientssPCAfeaturesObservedPredicted\fFigure 5: Coupled autoencoders can facilitate discovery of cell types unique to a single modality (A) 2D\nrepresentations of two subsets created from the FACS dataset, with sparse (\u223c1/3) fraction of samples marked\nas paired. Colors: cell type annotations of [22]. Arrows: selected types exclusively placed in only one of\nthe two subsets, or present in both subsets but with no samples considered as paired. The representations are\nqualitatively similar, with types unique to each subset appearing in distinct, non-overlapping locations. The\ntype shared across the subsets but not considered as paired appears in similar positions. (B) Nearest-neighbor\ndistance distributions for test cells (\u2018paired\u2019 types are in the outlined distribution) in the 2D representation space\nsupports these observations (p < 0.01 for top four rows, p = 0.89 for bottom row, 2-sample K-S test).\n\n5 Discussion\n\nWe presented a method to identify the type of a cell based on observations from a single modality\nsuch that the identity would be consistent if the assignment was based on a different modality.\nWhile our method is applicable to cross-modal learning in general, our motivation stems from recent\nexperimental developments in high-throughput, multi-modal pro\ufb01ling of neurons [30, 23]. In this\nstudy, we have demonstrated a surprising level of cross-modal predictive ability across transcriptomic\nand electrophysiological recordings. Speci\ufb01cally, we showed that the transcriptomic class can\nbe predicted with \u223c80% accuracy from electrophysiological recordings when the transcriptomic\nhierarchy is resolved into 15 classes, and with \u223c70% accuracy when it is resolved into 25 classes\n(\u03bb = 10 results). As datasets grow, we expect the performance to improve even in the absence of\nfurther technical development since many cell types in our dataset have a small number of samples.\nWhile we focused on the correspondence problem between transcriptomics and electrophysiology(k =\n2), we presented the technical development of k-coupled autoencoders in full generality. Therefore,\nour method is applicable to the joint alignment of additional modalities.\nThe utility of autoencoders to obtain low dimensional representations of transcriptomic data, as well\nas the biological interpretation of such representations have been explored in recent works [17]. Here,\nwe demonstrated the utility of the coupled autoencoder approach in obtaining such correspondence\nbetween modalities. We studied the potential pitfalls of coupling functions, and proposed a novel and\npractical function based on calculating the smallest singular value of the batch matrix.\nWe derived the distributions that establish an equivalence between our original deterministic approach\nand a discriminative probabilistic model. We also studied different generalizations of our objective\nfunction using this relationship. Finally, we proposed \ufb01tting a Gaussian mixture model to the\nlatent representation after training, which provides an ef\ufb01cient generative model. Methodological\nimprovements addressing potentially unshared variabilities across modalities, and joint, ef\ufb01cient\nlearning of a generative model are promising avenues for future research.\nFinally, we explored the ability of our method to identify cell types that are sampled only by a subset\nof characterization modalities. Such problems are frequently encountered due to sampling biases of\nthe different experimental modalities and protocols used to characterize cells. We demonstrated that\nour method can (i) disambiguate types that may not be observed in all modalities, and (ii) obtain a\ncoherent, well constrained embedding in the absence of pairing information for types that are sampled\nby multiple modalities (Fig. 5).\nCodes and Data: Code repository: https://github.com/AllenInstitute/coupledAE. MNIST and FACS\ndatasets are publicly available; Patch-seq dataset will be released by collaborators at a later date.\n\n8\n\nzAzBOnly in AOnly in AOnly in BOnly in BNot paired, in A and BANot paired, in A and BBDistance to nearest neighborProbability densityOnly in AOnly in BOnly in AOnly in BNot paired, In A and B\fAcknowledgements\n\nWe wish to thank the Allen Institute for Brain Science founder, Paul G Allen, for his vision, encour-\nagement and support.\n\nReferences\n[1] H Sebastian Seung and Uygar S\u00fcmb\u00fcl. Neuronal cell types and connectivity: lessons from the retina.\n\nNeuron, 83(6):1262\u20131272, 2014.\n\n[2] Henry Markram, Eilif Muller, Srikanth Ramaswamy, Michael W Reimann, Marwan Abdellah, Car-\nlos Aguado Sanchez, Anastasia Ailamaki, Lidia Alonso-Nanclares, Nicolas Antille, Selim Arsever, et al.\nReconstruction and simulation of neocortical microcircuitry. Cell, 163(2):456\u2013492, 2015.\n\n[3] Hongkui Zeng and Joshua R Sanes. Neuronal cell-type classi\ufb01cation: challenges, opportunities and the\n\npath forward. Nature Reviews Neuroscience, 18(9):530, 2017.\n\n[4] Amit Zeisel, Hannah Hochgerner, Peter L\u00f6nnerberg, Anna Johnsson, Fatima Memic, Job Van Der Zwan,\nMartin H\u00e4ring, Emelie Braun, Lars E Borm, Gioele La Manno, et al. Molecular architecture of the mouse\nnervous system. Cell, 174(4):999\u20131014, 2018.\n\n[5] Nathan W Gouwens, Staci A Sorensen, Jim Berg, Changkyu Lee, Tim Jarsky, Jonathan Ting, Susan M\nSunkin, David Feng, Costas Anastassiou, Eliza Barkan, et al. Classi\ufb01cation of electrophysiological and\nmorphological types in mouse visual cortex. bioRxiv, page 368456, 2018.\n\n[6] Tadas Baltru\u0161aitis, Chaitanya Ahuja, and Louis-Philippe Morency. Multimodal machine learning: A survey\n\nand taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.\n\n[7] Harold Hotelling. Relations between two sets of variates. In Breakthroughs in statistics, pages 162\u2013190.\n\nSpringer, 1992.\n\n[8] Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. On deep multi-view representation learning.\n\nIn International Conference on Machine Learning, pages 1083\u20131092, 2015.\n\n[9] Fangxiang Feng, Xiaojie Wang, and Ruifan Li. Cross-modal retrieval with correspondence autoencoder. In\n\nProceedings of the 22nd ACM international conference on Multimedia, pages 7\u201316. ACM, 2014.\n\n[10] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks.\n\nscience, 313(5786):504\u2013507, 2006.\n\n[11] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing\n\ninternal covariate shift. arXiv preprint arXiv:1502.03167, 2015.\n\n[12] Emma Pierson and Christopher Yau. Zifa: Dimensionality reduction for zero-in\ufb02ated single-cell gene\n\nexpression analysis. Genome biology, 16(1):241, 2015.\n\n[13] Christopher Yau et al. pcareduce: hierarchical clustering of single cell transcriptional pro\ufb01les. BMC\n\nbioinformatics, 17(1):140, 2016.\n\n[14] Sandhya Prabhakaran, Elham Azizi, Ambrose Carr, and Dana Pe\u2019er. Dirichlet process mixture model for\ncorrecting technical variation in single-cell gene expression data. In International Conference on Machine\nLearning, pages 1070\u20131079, 2016.\n\n[15] Davide Risso, Fanny Perraudeau, Svetlana Gribkova, Sandrine Dudoit, and Jean-Philippe Vert. Zinb-wave:\nA general and \ufb02exible method for signal extraction from single-cell rna-seq data. BioRxiv, page 125112,\n2017.\n\n[16] Christopher Heje Gr\u00f8nbech, Maximillian Fornitz Vording, Pascal N Timshel, Casper Kaae S\u00f8nderby,\nTune Hannes Pers, and Ole Winther. scvae: Variational auto-encoders for single-cell gene expression data.\nbioRxiv, page 318295, 2018.\n\n[17] Romain Lopez, Jeffrey Regier, Michael B Cole, Michael Jordan, and Nir Yosef. Bayesian inference for a\ngenerative model of transcriptome pro\ufb01les from single-cell rna sequencing. bioRxiv, page 292037, 2018.\n\n[18] Nat Dilokthanakul, Pedro AM Mediano, Marta Garnelo, Matthew CH Lee, Hugh Salimbeni, Kai Arulku-\nmaran, and Murray Shanahan. Deep unsupervised clustering with gaussian mixture variational autoencoders.\narXiv preprint arXiv:1611.02648, 2016.\n\n9\n\n\f[19] James Baglama, Daniela Calvetti, and Lothar Reichel. Irbl: An implicitly restarted block-lanczos method\nfor large-scale hermitian eigenproblems. SIAM Journal on Scienti\ufb01c Computing, 24(5):1650\u20131677, 2003.\n\n[20] Davide Risso, Fanny Perraudeau, Svetlana Gribkova, Sandrine Dudoit, and Jean-Philippe Vert. A general\nand \ufb02exible method for signal extraction from single-cell rna-seq data. Nature communications, 9(1):284,\n2018.\n\n[21] Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to\n\ndocument recognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[22] Bosiljka Tasic, Zizhen Yao, Lucas T Graybuck, Kimberly A Smith, Thuc Nghi Nguyen, Darren Bertag-\nnolli, Jeff Goldy, Emma Garren, Michael N Economo, Sarada Viswanathan, et al. Shared and distinct\ntranscriptomic cell types across neocortical areas. Nature, 563(7729):72, 2018.\n\n[23] Cathryn R Cadwell, Athanasia Palasantza, Xiaolong Jiang, Philipp Berens, Qiaolin Deng, Marlene Yilmaz,\nJacob Reimer, Shan Shen, Matthias Bethge, Kimberley F Tolias, et al. Electrophysiological, transcriptomic\nand morphologic pro\ufb01ling of single neurons using patch-seq. Nature biotechnology, 34(2):199, 2016.\n\n[24] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[25] Elham Azizi, Sandhya Prabhakaran, Ambrose Carr, and Dana Pe\u2019er. Bayesian inference for single-cell\n\nclustering and imputing. Genomics and Computational Biology, 3(1):e46\u2013e46, 2017.\n\n[26] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout:\nA simple way to prevent neural networks from over\ufb01tting. The Journal of Machine Learning Research,\n15(1):1929\u20131958, 2014.\n\n[27] Dazhi Zhao, Guozhu Yu, Peng Xu, and Maokang Luo. Equivalence between dropout and data augmentation:\n\nA mathematical check. Neural Networks, 2019.\n\n[28] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,\nR. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.\nScikit-learn: Machine Learning in Python . Journal of Machine Learning Research, 12:2825\u20132830, 2011.\n\n[29] Stephen J Smith, Uygar S\u00fcmb\u00fcl, Lucas Graybuck, Forrest Collman, Sharmishtaa Seshamani, Rohan Gala,\nOlga Gliko, Leila Elabbady, Jeremy A Miller, Trygve Bakken, et al. Single-cell transcriptomic evidence\nfor dense intracortical neuropeptide networks. bioRxiv, page 519694, 2019.\n\n[30] Kok Hao Chen, Alistair N Boettiger, Jeffrey R Mof\ufb01tt, Siyuan Wang, and Xiaowei Zhuang. Spatially\n\nresolved, highly multiplexed rna pro\ufb01ling in single cells. Science, 348(6233):aaa6090, 2015.\n\n10\n\n\f", "award": [], "sourceid": 4966, "authors": [{"given_name": "Rohan", "family_name": "Gala", "institution": "Allen Institute"}, {"given_name": "Nathan", "family_name": "Gouwens", "institution": "Allen Institute"}, {"given_name": "Zizhen", "family_name": "Yao", "institution": "Allen Institute"}, {"given_name": "Agata", "family_name": "Budzillo", "institution": "Allen Institute"}, {"given_name": "Osnat", "family_name": "Penn", "institution": "Allen Institute"}, {"given_name": "Bosiljka", "family_name": "Tasic", "institution": "Allen Institute"}, {"given_name": "Gabe", "family_name": "Murphy", "institution": "Allen Institute"}, {"given_name": "Hongkui", "family_name": "Zeng", "institution": "Allen Institute"}, {"given_name": "Uygar", "family_name": "S\u00fcmb\u00fcl", "institution": "Allen Institute"}]}