{"title": "Manifold-regression to predict from MEG/EEG brain signals without source modeling", "book": "Advances in Neural Information Processing Systems", "page_first": 7323, "page_last": 7334, "abstract": "Magnetoencephalography and electroencephalography (M/EEG) can reveal neuronal dynamics non-invasively in real-time and are therefore appreciated methods in medicine and neuroscience. Recent advances in modeling brain-behavior relationships have highlighted the effectiveness of Riemannian geometry for summarizing the spatially correlated time-series from M/EEG in terms of their covariance. However, after artefact-suppression, M/EEG data is often rank deficient which limits the application of Riemannian concepts. In this article, we focus on the task of regression with rank-reduced covariance matrices. We study two Riemannian approaches that vectorize the M/EEG covariance between sensors through projection into a tangent space. The Wasserstein distance readily applies to rank-reduced data but lacks affine-invariance. This can be overcome by finding a common subspace in which the covariance matrices are full rank, enabling the affine-invariant geometric distance. We investigated the implications of these two approaches in synthetic generative models, which allowed us to control estimation bias of a linear model for prediction. We show that Wasserstein and geometric distances allow perfect out-of-sample prediction on the generative models. We then evaluated the methods on real data with regard to their effectiveness in predicting age from M/EEG covariance matrices. The findings suggest that the data-driven Riemannian methods outperform different sensor-space estimators and that they get close to the performance of biophysics-driven source-localization model that requires MRI acquisitions and tedious data processing. Our study suggests that the proposed Riemannian methods can serve as fundamental building-blocks for automated large-scale analysis of M/EEG.", "full_text": "Manifold-regression to predict from MEG/EEG brain\n\nsignals without source modeling\n\nDavid Sabbagh \u2217\u2020\u2021, Pierre Ablin, Ga\u00ebl Varoquaux, Alexandre Gramfort, Denis A. Engemann \u00a7\n\nUniversit\u00e9 Paris-Saclay, Inria, CEA, Palaiseau, 91120, France\n\nAbstract\n\nMagnetoencephalography and electroencephalography (M/EEG) can reveal neu-\nronal dynamics non-invasively in real-time and are therefore appreciated methods in\nmedicine and neuroscience. Recent advances in modeling brain-behavior relation-\nships have highlighted the effectiveness of Riemannian geometry for summarizing\nthe spatially correlated time-series from M/EEG in terms of their covariance. How-\never, after artefact-suppression, M/EEG data is often rank de\ufb01cient which limits\nthe application of Riemannian concepts. In this article, we focus on the task of\nregression with rank-reduced covariance matrices. We study two Riemannian ap-\nproaches that vectorize the M/EEG covariance between-sensors through projection\ninto a tangent space. The Wasserstein distance readily applies to rank-reduced\ndata but lacks af\ufb01ne-invariance. This can be overcome by \ufb01nding a common sub-\nspace in which the covariance matrices are full rank, enabling the af\ufb01ne-invariant\ngeometric distance. We investigated the implications of these two approaches in\nsynthetic generative models, which allowed us to control estimation bias of a linear\nmodel for prediction. We show that Wasserstein and geometric distances allow\nperfect out-of-sample prediction on the generative models. We then evaluated\nthe methods on real data with regard to their effectiveness in predicting age from\nM/EEG covariance matrices. The \ufb01ndings suggest that the data-driven Riemannian\nmethods outperform different sensor-space estimators and that they get close to\nthe performance of biophysics-driven source-localization model that requires MRI\nacquisitions and tedious data processing. Our study suggests that the proposed\nRiemannian methods can serve as fundamental building-blocks for automated\nlarge-scale analysis of M/EEG.\n\n1\n\nIntroduction\n\nMagnetoencephalography and electroencephalography (M/EEG) measure brain activity with mil-\nlisecond precision from outside the head [23]. Both methods are non-invasive and expose rhythmic\nsignals induced by coordinated neuronal \ufb01ring with characteristic periodicity between minutes and\nmilliseconds [10]. These so-called brain-rhythms can reveal cognitive processes as well as health\nstatus and are quanti\ufb01ed in terms of the spatial distribution of the power spectrum over the sensor\narray that samples the electromagnetic \ufb01elds around the head [3].\n\nStatistical learning from M/EEG commonly relies on covariance matrices estimated from band-\npass \ufb01ltered signals to capture the characteristic scale of the neuronal events of interest [7, 22,\n16]. However, covariance matrices do not live in an Euclidean space but a Riemannian manifold.\n\n\u2217Additional af\ufb01liation: Inserm, UMRS-942, Paris Diderot University, Paris, France\n\u2020Additional af\ufb01liation: Department of Anaesthesiology and Critical Care, Lariboisi\u00e8re Hospital, Assistance\n\nPublique H\u00f4pitaux de Paris, Paris, France\n\n\u2021dav.sabbagh@gmail.com\n\u00a7denis-alexander.engemann@inria.fr\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fFortunately, Riemannian geometry offers a principled mathematical approach to use standard linear\nlearning algorithms such as logistic or ridge regression that work with Euclidean geometry. This is\nachieved by projecting the covariance matrices into a vector space equipped with an Euclidean metric,\nthe tangent space. The projection is de\ufb01ned by the Riemannian metric, for example the geometric\naf\ufb01ne-invariant metric [5] or the Wasserstein metric [6]. As a result, the prediction error can be\nsubstantially reduced when learning from covariance matrices using Riemannian methods [45, 14].\n\nIn practice, M/EEG data is often provided in a rank de\ufb01cient form by platform operators but\nalso curators of public datasets [32, 2].\nIts contamination with high-amplitude environmental\nelectromagnetic artefacts often render aggressive of\ufb02ine-processing mandatory to yield intelligible\nsignals. Commonly used tools for artefact-suppression project the signal linearly into a lower\ndimensional subspace that is hoped to predominantly contain brain signals [40, 42, 34]. But this\nnecessarily leads to inherently rank-de\ufb01cient covariance matrices for which no af\ufb01ne-invariant\ndistance is de\ufb01ned. One remedy may consist in using anatomically informed source localization\ntechniques that can typically deal with rank de\ufb01ciencies [17] and can be combined with source-level\nestimators of neuronal interactions [31]. However, such approaches require domain-speci\ufb01c expert\nknowledge, imply processing steps that are hard to automate (e.g. anatomical coregistration) and\nyields pipelines in which excessive amounts of preprocessing are not under control of the predictive\nmodel.\n\nIn this work, we focus on regression with rank-reduced covariance matrices. We propose two\nRiemannian methods for this problem. A \ufb01rst approach uses a Wasserstein metric that can handle\nrank-reduced matrices, yet is not af\ufb01ne-invariant. In a second approach, matrices are projected into a\ncommon subspace in which af\ufb01ne-invariance can be provided. We show that both metrics can achieve\nperfect out-of-sample predictions in a synthetic generative model. Based on the SPoC method [15],\nwe then present a supervised and computationally ef\ufb01cient approach to learn subspace projections\ninformed by the target variable. Finally, we apply these models to the problem of inferring age\nfrom brain data [33, 31] on 595 MEG recordings from the Cambridge Center of Aging (Cam-CAN,\nhttp://cam-can.org) covering an age range from 18 to 88 years [41]. We compare the data-driven\nRiemannian approaches to simpler methods that extract power estimates from the diagonal of the\nsensor-level covariance as well as the cortically constrained minimum norm estimates (MNE) which\nwe use to project the covariance into a subspace de\ufb01ned by anatomical prior knowledge.\n\nNotations We denote scalars s \u2208 R with regular lowercase font, vectors s = [s1, . . . , sN ] \u2208 RN\nwith bold lowercase font and matrices S \u2208 RN \u00d7M with bold uppercase fonts. IN is the identity\nmatrix of size N . [\u00b7]\u22a4 represents vector or matrix transposition. The Frobenius norm of a matrix\nwill be denoted by ||M||2\nF = Tr(MM \u22a4) = P|Mij|2 with Tr(\u00b7) the trace operator. rank(M ) is\nthe rank of a matrix. The l2 norm of a vector x is denoted by ||x||2\ni . We denote by MP\nthe space of P \u00d7 P square real-valued matrices, SP = {M \u2208 MP , M \u22a4 = M} the subspace of\nsymmetric matrices, S ++\nP = {S \u2208 SP , x\u22a4Sx > 0,\u2200x \u2208 RP} the subspace of P \u00d7 P symmetric\npositive de\ufb01nite matrices, S +\nP = {S \u2208 SP , x\u22a4Sx \u2265 0,\u2200x \u2208 RP} the subspace of P \u00d7 P symmetric\nsemi-de\ufb01nite positive (SPD) matrices, S +\nP , rank(S) = R} the subspace of SPD\nmatrices of \ufb01xed rank R. All matrices S \u2208 S ++\nP ) and\ndiagonalizable with real strictly positive eigenvalues: S = U \u039bU \u22a4 with U an orthogonal matrix of\neigenvectors of S (U U \u22a4 = IP ) and \u039b = diag(\u03bb1, . . . , \u03bbn) the diagonal matrix of its eigenvalues\n\u03bb1 \u2265 . . . \u2265 \u03bbn > 0. For a matrix M , diag(M ) \u2208 RP is its diagonal. We also de\ufb01ne the exponential\nand logarithm of a matrix: \u2200S \u2208 S ++\nP , log(S) = U diag(log(\u03bb1), . . . , log(\u03bbn)) U \u22a4 \u2208 SP , and\n\u2200M \u2208 SP , exp(M ) = U diag(exp(\u03bb1), . . . , exp(\u03bbn)) U \u22a4 \u2208 S ++\nP . N (\u00b5, \u03c32) denotes the normal\n(Gaussian) distribution of mean \u00b5 and variance \u03c32. Finally, Es[x] represents the expectation and\nVars[x] the variance of any random variable x w.r.t. their subscript s when needed.\n\nare full rank, invertible (with S\u22121 \u2208 S ++\n\n2 = P x2\n\nP,R = {S \u2208 S +\n\nP\n\nBackground and M/EEG generative model MEG or EEG data measured on P channels are\nmultivariate signals x(t) \u2208 RP . For each subject i = 1 . . . N , the data are a matrix Xi \u2208 RP \u00d7T\nwhere T is the number of time samples. For the sake of simplicity, we assume that T is the same for\neach subject, although it is not required by the following method. The linear instantaneous mixing\nmodel is a valid generative model for M/EEG data due to the linearity of Maxwell\u2019s equations [23].\nAssuming the signal originates from Q < P locations in the brain, at any time t, the measured signal\n\n2\n\n\fvector of subject i = 1 . . . N is a linear combination of the Q source patterns as\n\nj \u2208 RP , j = 1 . . . Q:\n\n(1)\nwhere the patterns form the time and subject-independent source mixing matrix As = [as\nQ] \u2208\nRP \u00d7Q, si(t) \u2208 RQ is the source vector formed by the Q time-dependent sources amplitude, ni(t) \u2208\nRP is a contamination due to noise. Note that the mixing matrix As and sources si are not known.\n\nxi(t) = As si(t) + ni(t) ,\n\n1, . . . , as\n\nFollowing numerous learning models on M/EEG [7, 15, 22], we consider a regression setting where\nthe target yi is a function of the power of the sources, denoted pi,j = Et[s2\ni,j(t)]. Here we consider\nthe linear model:\n\nQ\n\nyi =\n\n\u03b1jf (pi,j) ,\n\n(2)\n\nXj=1\n\nwhere \u03b1 \u2208 RQ and f : R+ \u2192 R is increasing. Possible choices for f that are relevant for neuro-\nscience are f (x) = x, or f (x) = log(x) to account for log-linear relationships between brain signal\npower and cognition [7, 22, 11]. A \ufb01rst approach consists in estimating the sources before \ufb01tting\nsuch a linear model, for example using the Minimum Norm Estimator (MNE) approach [24]. This\nboils down to solving the so-called M/EEG inverse problem which requires costly MRI acquisitions\nand tedious processing [3]. A second approach is to work directly with the signals Xi. To do so,\nmodels that enjoy some invariance property are desirable: these models are blind to the mixing\nAs and working with the signals x is similar to working directly with the sources s. Riemannian\ngeometry is a natural setting where such invariance properties are found [18]. Besides, under Gaussian\nassumptions, model (1) is fully described by second order statistics [37]. This amounts to working\nwith covariance matrices, Ci = XiX \u22a4\ni /T , for which Riemannian geometry is well developed. One\nspeci\ufb01city of M/EEG data is, however, that signals used for learning have been rank-reduced. This\nleads to rank-de\ufb01cient covariance matrices, Ci \u2208 S +\nP,R, for which speci\ufb01c matrix manifolds need to\n\nbe considered.\n\n2 Theoretical background to model invariances on S +\n\nP,R manifold\n\n2.1 Riemannian matrix manifolds\n\nEndowing a continuous set M of square matrices with a\nmetric, that de\ufb01nes a local Euclidean structure, gives a\nRiemannian manifold with a solid theoretical framework.\nLet M \u2208 M, a K-dimensional Riemannian manifold. For\nany matrix M \u2032 \u2208 M, as M \u2032 \u2192 M , \u03beM = M \u2032 \u2212 M\nbelongs to a vector space TM of dimension K called the\ntangent space at M .\nThe Riemannian metric de\ufb01nes an inner product h\u00b7,\u00b7iM :\nTM \u00d7 TM \u2192 R for each tangent space TM , and as a con-\nsequence a norm in the tangent space k\u03bekM =ph\u03be, \u03beiM .\nIntegrating this metric between two points gives a geodesic\ndistance d : M \u00d7 M \u2192 R+. It allows to de\ufb01ne means on\n\nthe manifold:\n\nMeand(M1, . . . , MN ) = arg min\nM \u2208M\n\nFigure 1: Tangent Space, exponential\nand logarithm on Riemannian manifold\nillustration.\n\nd(Mi, M )2 .\n\n(3)\n\nN\n\nXi=1\n\nThe manifold exponential at M \u2208 M, denoted ExpM , is a smooth mapping from TM to M that\npreserves local properties. In particular, d(ExpM (\u03beM ), M ) = k\u03beMkM for \u03beM small enough. Its\ninverse is the manifold logarithm LogM from M to TM , with kLogM (M \u2032)kM = d(M , M \u2032) for\nM , M \u2032 \u2208 M. Finally, since TM is Euclidean, there is a linear invertible mapping \u03c6M : TM \u2192 RK\nsuch that for all \u03beM \u2208 TM , k\u03beMkM = k\u03c6M (\u03beM )k2. This allows to de\ufb01ne the vectorization\noperator at M \u2208 M, PM : M \u2192 RK , de\ufb01ned by PM (M \u2032) = \u03c6M (LogM (M \u2032)). Fig. 1\nillustrates these concepts.\n\nThe vectorization explicitly captures the local Euclidean properties of the Riemannian manifold:\n\nd(M , M \u2032) = kPM (M \u2032)k2\n\n(4)\n\n3\n\n\fHence, if a set of matrices M1, . . . , MN is located in a small portion of the manifold, denoting\nM = Meand(M1, . . . , MN ), it holds:\n\nd(Mi, Mj) \u2243 kPM (Mi) \u2212 PM (Mj)k2\n\n(5)\n\nFor additional details on matrix manifolds, see [1], chap. 3.\n\nRegression on matrix manifolds The vectorization operator is key for machine learning ap-\nit projects points in M on RK , and the distance d on M is approximated by the\nplications:\ndistance \u21132 on RK . Therefore, those vectors can be used as input for any standard regression\ntechnique, which often assumes a Euclidean structure of the data. More speci\ufb01cally, through-\nout the article, we consider the following regression pipeline. Given a training set of samples\nM1, . . . , MN \u2208 M and target continuous variables y1, . . . , yN \u2208 R, we \ufb01rst compute the mean\nof the samples M = Meand(M1, . . . , MN ). This mean is taken as the reference to compute the\nvectorization. After computing v1, . . . , vN \u2208 RK as vi = PM (Mi), a linear regression technique\n(e.g. ridge regression) with parameters \u03b2 \u2208 RK can be employed assuming that yi \u2243 v\u22a4\n\ni \u03b2.\n\n2.2 Distances and invariances on positive matrices manifolds\n\nWe will now introduce two important distances: the geometric distance on the manifold S ++\nknown as af\ufb01ne-invariant distance), and the Wasserstein distance on the manifold S +\n\nP,R.\n\nP\n\n(also\n\nThe geometric distance Seeking properties of covariance matrices that are invariant by linear\ntransformation of the signal, leads to endow the positive de\ufb01nite manifold S++\nP with the geometric\ndistance [18]:\n\ndG(S, S \u2032) = k log(S\u22121S \u2032)kF =\" P\nXi=1\n\n1\n\n2\n\nlog2 \u03bbk#\n\nwhere \u03bbk, k = 1 . . . P are the real eigenvalues of S\u22121S \u2032. The af\ufb01ne invariance property writes:\n\nFor W invertible, dG(W \u22a4SW , W \u22a4S\u2032W ) = dG(S, S\u2032) .\n\n(6)\n\n(7)\n\n1\n\nThis distance gives a Riemannian-manifold structure to S++\nP with the inner product hP , QiS =\nTr(P S\u22121QS\u22121) [18].\nThe corresponding manifold logarithm at S is LogS(S \u2032) =\n2 log(cid:0)S\u2212 1\n2 and the vectorization operator PS(S \u2032) of S \u2032 w.r.t. S: PS(S \u2032) =\nS\nUpper(S\u2212 1\n2 ) = Upper(log(S\u2212 1\n2 )), where Upper(M ) \u2208 RK is the vector-\nized upper-triangular part of M , with unit weights on the diagonal and \u221a2 weights on the off-diagonal,\nand K = P (P + 1)/2.\n\n2 S \u2032S\u2212 1\n2 LogS(S \u2032)S\u2212 1\n\n2(cid:1)S\n\n2 S \u2032S\u2212 1\n\n1\n\n\u2217\n\n\u2217\n\nP , it is hard to endow the S +\n\nP,R manifold with a distance\nthat yields tractable or cheap-to-compute logarithms [43]. This manifold is classically viewed as\nis the set P \u00d7 R matrices of rank R [30]. This view\n/OR, where OR is the orthogonal group of size R.\n\nThe Wasserstein distance Unlike S ++\nS +\nP,R = {YY\u22a4|Y \u2208 RP \u00d7R\nallows to write S +\nThis means that each matrix YY\u22a4 \u2208 S +\nIt has recently been proposed [35] to use the standard Frobenius metric on the total space RP \u00d7R\nThis metric in the total space is equivalent to the Wasserstein distance [6] on S +\n\nP,R is identi\ufb01ed with the set {YQ|Q \u2208 OR}.\n\nP,R as a quotient manifold RP \u00d7R\n\n}, where RP \u00d7R\n\nP,R:\n\n\u2217\n\n\u2217\n\n.\n\ndW (S, S\u2032) =hTr(S) + Tr(S\u2032) \u2212 2Tr((S\n\nThis provides cheap-to-compute logarithms:\n\n1\n\n2 S\u2032S\n\n1\n\n2 )\n\n1\n\n2\n\n1\n\n2 )i\n\nLogY Y \u22a4 (Y \u2032Y \u2032\u22a4) = Y \u2032Q\u2217 \u2212 Y \u2208 RP \u00d7R\n\n\u2217\n\n,\n\n(8)\n\n(9)\n\nwhere U \u03a3V \u22a4 = Y \u22a4Y \u2032 is a singular value decomposition and Q\u2217 = V U \u22a4. The vectorization\noperator is then given by PY Y \u22a4 (Y \u2032Y \u2032\u22a4) = vect(Y \u2032Q\u2217 \u2212 Y ) \u2208 RP R, where the vect of a matrix\n\nis the vector containing all its coef\ufb01cients.\n\n4\n\n\fThis framework offers closed form projections in the tangent space for the Wasserstein distance,\nwhich can be used to perform regression. Importantly, since S++\nP,P , we can also use this\ndistance on the positive de\ufb01nite matrices. This distance possesses the orthogonal invariance property:\n\nP = S+\n\nFor W orthogonal, dW (W \u22a4SW , W \u22a4S\u2032W ) = dW (S, S\u2032) .\n\n(10)\nThis property is weaker than the af\ufb01ne invariance of the geometric distance (7). A natural question\nis whether such an af\ufb01ne invariant distance also exists on this manifold. Unfortunately, it is shown\nin [8] that the answer is negative for R < P (proof in appendix 6.3).\n\n3 Manifold-regression models for M/EEG\n\n3.1 Generative model and consistency of linear regression in the tangent space of S ++\n\nP\n\n1, . . . , as\n\n1 , . . . , an\n\n1 , . . . , an\n\nHere, we consider a more speci\ufb01c generative model than (1) by assuming a speci\ufb01c struc-\nthe additive noise ni(t) = An\u03bdi(t) with An =\nture on the noise. We assume that\nP \u2212Q] \u2208 RP \u00d7(P \u2212Q) and \u03bdi(t) \u2208 RP \u2212Q. This amounts to assuming that the noise\n[an\nis of rank P \u2212 Q and that the noise spans the same subspace for all subjects. Denoting A =\nP \u2212Q] \u2208 RP \u00d7P and \u03b7i(t) = [si,1(t), . . . si,Q(t), \u03bdi,1(t), . . . , \u03bdi,P \u2212Q(t)] \u2208\nQ, an\n[as\nRP , this generative model can be compactly rewritten as xi(t) = A\u03b7i(t).\nWe assume that the sources si are decorrelated and independent from \u03bdi: with pi,j = Et[s2\nthe powers, i.e. the variance over time, of the j-th source of subject i, we suppose Et[si(t)s\u22a4\ndiag((pi,j)j=1...Q) and Et[si(t)\u03bdi(t)\u22a4] = 0. The covariances are then given by:\n\ni,j(t)]\ni (t)] =\n\nCi = AEiA\u22a4 ,\n\n(11)\n\nwhere Ei = Et[\u03b7i(t)\u03b7i(t)\u22a4] is a block diagonal matrix, whose upper Q \u00d7 Q block is\ndiag(pi,1, . . . , pi,Q).\n\nIn the following, we show that different functions f from (2) yield a linear relationship between the\nyi\u2019s and the vectorization of the Ci\u2019s for different Riemannian metrics.\nProposition 1 (Euclidean vectorization). Assume f (pi,j) = pi,j . Then, the relationship between yi\nand Upper(Ci) is linear.\n\nProof. Indeed, if f (p) = p, the relationship between yi and the pi,j is linear. Rewriting Eq. (11) as\nEi = A\u22121CiA\u2212\u22a4, and since the pi,j are on the diagonal of the upper block of Ei, the relationship\nbetween the pi,j and the coef\ufb01cients of Ci is also linear. This means that there is a linear relationship\nbetween the coef\ufb01cients of Ci and the variable of interest yi. In other words, yi is a linear combination\nof the vectorization of Ci w.r.t. the standard Euclidean distance.\n\nProposition 2 (Geometric vectorization). Assume f (pi,j) = log(pi,j).\nDenote C =\nMeanG(C1, . . . , CN ) the geometric mean of the dataset, and vi = PC(Ci) the vectorization of Ci\nw.r.t. the geometric distance. Then, the relationship between yi and vi is linear.\n\nThe proof is given in appendix 6.1. It relies crucially on the af\ufb01ne invariance property that means that\nusing Riemannian embeddings of the Ci\u2019s, is equivalent to working directly with the Ei\u2019s.\nProposition 3 (Wasserstein vectorization). Assume f (pi,j) = \u221api,j . Assume that A is orthogonal.\nDenote C = MeanW (C1, . . . , CN ) the Wasserstein mean of the dataset, and vi = PC(Ci) the\nvectorization of Ci w.r.t. the Wasserstein distance. Then, the relationship between yi and vi is linear.\n\nThe proof is given in appendix 6.2. The restriction to the case where A is orthogonal stems from\nthe orthogonal invariance of the Wasserstein distance. In the neuroscience literature square root\nrecti\ufb01cations are however not commonly used for M/EEG modeling. Nevertheless, it is interesting to\nsee that the Wasserstein metric that can naturally cope with rank reduced data is consistent with this\nparticular generative model.\n\nThese propositions show that the relationship between the samples and the variable y is linear in\nthe tangent space, motivating the use of linear regression methods (see simulation study in Sec. 4).\nThe argumentation of this section relies on the assumption that the covariance matrices are full rank.\nHowever, this is rarely the case in practice.\n\n5\n\n\fFigure 2: Proposed regression pipeline. The considered choices for each sequential step are detailed\nbelow each box. Identity means no spatial \ufb01ltering W = I. Only the most relevant combinations\nare reported. For example Wasserstein vectorization does not need projections as it directly applies\nto rank-de\ufb01cient matrices. Geometric vectorization is not in\ufb02uenced by the choice of projections\ndue to its af\ufb01ne-invariance. Choices for vectorization are depicted by the colors used for visualizing\nsubsequent analyses.\n\nR\n\n3.2 Learning projections on S ++\nIn order to use the geometric distance on the Ci \u2208 S +\nR to make\nthem full rank. In the following, we consider a linear operator W \u2208 RP \u00d7R of rank R which is\ncommon to all samples (i.e. subjects). For consistency with the M/EEG literature we will refer to rows\nof W as spatial \ufb01lters. The covariance matrices of \u2018spatially \ufb01ltered\u2019 signals W \u22a4xi are obtained\nas: \u03a3i = W \u22a4CiW \u2208 RR\u00d7R. With probability one, rank(\u03a3i) = min(rank(W ), rank(Ci)) = R,\nhence \u03a3i \u2208 S++\nR . Since the Ci\u2019s do not span the same image, applying W destroys some information.\nRecently, geometry-aware dimensionality reduction techniques, both supervised and unsupervised,\nhave been developed on covariance manifolds [28, 25]. Here we considered two distinct approaches\nto estimate W .\n\nP,R, we have to project them on S ++\n\nUnsupervised spatial \ufb01ltering A \ufb01rst strategy is to project the data into a subspace that captures\nmost of its variance. This is achieved by Principal Component Analysis (PCA) applied to the averaged\ncovariance matrix computed across subjects: WUNSUP = U , where U contains the eigenvectors\ncorresponding to the top R eigenvalues of the average covariance matrix C = 1\ni=1 Ci. This step\nis blind to the values of y and is therefore unsupervised. Note that under the assumption that the time\nseries across subjects are independent, the average covariance C is the covariance of the data over\nthe full population.\n\nN PN\n\nSupervised spatial \ufb01ltering We use a supervised spatial \ufb01ltering algorithm [15] originally de-\nveloped for intra-subject Brain Computer Interfaces applications, and adapt it to our cross-person\nprediction problem. The \ufb01lters W are chosen to maximize the covariance between the power of the\n\ufb01ltered signals and y. Denoting by Cy = 1\ni=1 yiCi the weighted average covariance matrix, the\n\ufb01rst \ufb01lter wSUP is given by:\n\nN PN\n\nwSUP = arg max\n\nw\n\nw\u22a4Cyw\nw\u22a4Cw\n\n.\n\nIn practice, all the other \ufb01lters in WSUP are obtained by solving a generalized eigenvalue decomposi-\ntion problem (see the proof in Appendix 6.4).\n\nThe proposed pipeline is summarized in Fig. 2.\n\n4 Experiments\n\n4.1 Simulations\n\nP\n\nIndependent\n\nWe start by illustrating Prop. 2.\n\nidentically distributed covariance matrices\nC1, . . . , CN \u2208 S++\nand variables y1, . . . , yN are generated following the above generative model.\nThe matrix A is taken as exp(\u00b5B) with B \u2208 RP \u00d7P a random matrix, and \u00b5 \u2208 R a scalar con-\ntrolling the distance from A to identity (\u00b5 = 0 yields A = IP ). We use the log function for f to\nlink the source powers (i.e. the variance) to the yi\u2019s. Model reads yi = Pj \u03b1j log(pij) + \u03b5i, with\n\u03b5i \u223c N (0, \u03c32) a small additive random perturbation.\n\n6\n\nPreprocessingRegressionXirawXiCi\u03a3ivi\u1ef9iIdentitySupervisedUnsupervisedLog-diagEuclideanWassersteinGeometricRidgeCovarianceRepresentationProjectionVectorization\fWe compare three methods of vectorization: the geometric distance, the Wasserstein distance and\nthe non-Riemannian method \u201clog-diag\u201d extracting the log of the diagonals of Ci as features. Note\nthat the diagonal of Ci contains the powers of each sensor for subject i. A linear regression model\nis used following the procedure presented in Sec. 2. We take P = 5, N = 100 and Q = 2. We\nmeasure the score of each method as the average mean absolute error (MAE) obtained with 10-fold\ncross-validation. Fig. 3 displays the scores of each method when the parameters \u03c3 controlling the\nnoise level and \u00b5 controlling the distance from A to Ip are changed. We also investigated the realistic\nscenario where each subject has a mixing matrix deviating from a reference: Ai = A + Ei with\nentries of Ei sampled i.i.d. from N (0, \u03c32).\nThe same experiment with f (p) = \u221ap yields comparable results, yet with Wasserstein distance\nperforming best and achieving perfect out-of-sample prediction when \u03c3 \u2192 0 and A is orthogonal.\n\nE\nA\nM\nd\ne\nz\n\n \n\ni\nl\n\na\nm\nr\no\nN\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n0.00\n\nlog\u2212diag\n\nWasserstein\n\ngeometric\n\nlog\u2212diag\n\nWasserstein\n\ngeometric\n\nchance level\n\nchance level\n\nE\nA\nM\nd\ne\nz\n\n \n\ni\nl\n\na\nm\nr\no\nN\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n0.00\n\nlog\u2212diag\nsup. log\u2212diag\n\nchance level\n\nWasserstein\ngeometric\n\nsup. geometric\n\n1.00\n\n0.75\n\n0.50\n\n0.25\n\n0.00\n\nE\nA\nM\nd\ne\nz\n\n \n\ni\nl\n\na\nm\nr\no\nN\n\n0.01\n\n0.10\n\n1.00\n\n\u03c3\n\n10.00\n\n0.0\n\n0.5\n\n1.0\n\n1.5\n\u00b5\n\n2.0\n\n2.5\n\n3.0\n\n0.003\n\n0.010\n\n0.030\n\u03c3\n\n0.100\n\n0.300\n\nFigure 3: Illustration of Prop.2. Data is generated following the generative model with f = log.\nThe regression pipeline consists in projecting the data in the tangent space, and then use a linear\nmodel. The left plot shows the evolution of the score when random noise of variance \u03c32 is added\nto the variables yi. The MAE of the geometric distance pipeline goes to 0 in the limit of no noise,\nindicating perfect out-of-sample prediction. This illustrates the linearity in the tangent space for the\ngeometric distance (Prop. 2). The middle plot explores the effect of the parameter \u00b5 controlling\nthe distance between A and IP . Riemannian geometric method is not affected by \u00b5 due to its\naf\ufb01ne invariance property. Although the Wasserstein distance is not af\ufb01ne invariant, its performance\ndoes not change much with \u00b5. On the contrary, the log-diag method is sensitive to changes in\nA. The right plot shows how the score changes when mixing matrices become sample dependent.\nWe can see then only when \u03c3 = 0 supervised + log-diag and Riemann reach perfect performance.\nGeometric Riemann is uniformly better and indifferent to projection choice. Wasserstein, despite\nmodel mismatch, outperforms supervised + log-diag with high \u03c3.\n\n4.2 MEG data\n\nPredicting biological age from MEG on the Cambridge center of ageing dataset\nIn the follow-\ning, we apply our methods to infer age from brain signals. Age is a dominant driver of cross-person\nvariance in neuroscience data and a serious confounder [39]. As a consequence of the globally\nincreased average lifespan, ageing has become a central topic in public health that has stimulated\nneuropsychiatric research at large scales. The link between age and brain function is therefore of\nutmost practical interest in neuroscienti\ufb01c research.\n\nTo predict age from brain signals, here we use the currently largest publicly available MEG dataset\nprovided by the Cam-CAN [38]. We only considered the signals from magnetometer sensors\n(P = 102) as it turns out that once SSS is applied (detailed in Appendix 6.6), magnetometers and\ngradiometers are linear combination of approximately 70 signals (65 \u2264 Ri \u2264 73), which become\nredundant in practice [19]. We considered task-free recordings during which participants were asked\nto sit still with eyes closed in the absence of systematic stimulation. We then drew T \u2243 520, 000 time\nsamples from N = 595 subjects. To capture age-related changes in cortical brain rhythms [4, 44, 12],\nwe \ufb01ltered the data into 9 frequency bands: low frequencies [0.1\u2212 1.5], \u03b4[1.5\u2212 4], \u03b8[4\u2212 8], \u03b1[8\u2212 15],\n\u03b2low[15 \u2212 26], \u03b2high[26 \u2212 35], \u03b3low[35 \u2212 50], \u03b3mid[50 \u2212 74] and \u03b3high[76 \u2212 120] (Hz unit). These\nfrequencies are compatible with conventional de\ufb01nitions used in the Human Connectome Project\n[32]. We verify that the covariance matrices all lie on a small portion of the manifold, justifying\nprojection in a common tangent space. Then we applied the covariance pipeline independently in\neach frequency band and concatenated the ensuing features.\n\n7\n\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\u25cf\n\fData-driven covariance projection for age prediction Three types of approaches are here com-\npared: Riemannian methods (Wasserstein or geometric), methods extracting log-diagonal of matrices\n(with or without supervised spatial \ufb01ltering, see Sec. 3.2) and a biophysics-informed method based\non the MNE source imaging technique [24]. The MNE method essentially consists in a standard\nTikhonov regularized inverse solution and is therefore linear (See Appendix 6.5 for details). Here it\nserves as gold-standard informed by the individual anatomy of each subject. It requires a T1-weighted\nMRI and the precise measure of the head in the MEG device coordinate system [3] and the coor-\ndinate alignment is hard to automate. We con\ufb01gured MNE with Q = 8196 candidate dipoles. To\nobtain spatial smoothing and reduce dimensionality, we averaged the MNE solution using a cortical\nparcellation encompassing 448 regions of interest from [31, 21]. We then used ridge regression\nand tuned its regularization parameter by generalized cross-validation [20] on a logarithmic grid\nof 100 values in [10\u22125, 103] on each training fold of a 10-fold cross-validation loop. All numerical\nexperiments were run using the Scikit-Learn software [36], the MNE software for processing M/EEG\ndata [21] and the PyRiemann package [13]. We also ported to Python some part of the Matlab\ncode of Manopt toolbox [9] for computations involving Wasserstein distance. The proposed method,\nincluding all data preprocessing, applied on the 500GB of raw MEG data from the Cam-CAN dataset,\nruns in approximately 12 hours on a regular desktop computer with at least 16GB of RAM. The\npreprocessing for the computation of the covariances is embarrassingly parallel and can therefore be\nsigni\ufb01cantly accelerated by using multiple CPUs. The actual predictive modeling can be performed\nin less than a minute on standard laptop. Code used for data analysis can be found on GitHub5.\n\nlog\u2212diag\n\nWasserstein\n\ngeometric\n\nMNE\n\nidentity\n\nsupervised\n\nidentity\n\nunsupervised\n\nbiophysics\n\n6\n\n7\n\n8\n\n9\n\n10\n\n11\n\nmean absolute error (years)\n\nFigure 4: Age prediction on Cam-CAN\nMEG dataset for different methods, or-\ndered by out-of-sample MAE. The y-\naxis depicts the projection method, with\nidentity denoting the absence of projec-\ntion. Colors indicate the subsequent em-\nbedding. The biophysics-driven MNE\nmethod (blue) performs best.\nThe\nRiemannian methods (orange) follow\nclosely and their performance depends\nlittle on the projection method. The non-\nRiemannian methods log-diag (green)\nperform worse, although the supervised\nprojection clearly helps.\n\nRiemannian projections are the leading data-driven methods Fig. 4 displays the scores for each\nmethod. The biophysically motivated MNE projection yielded the best performance (7.4y MAE),\nclosely followed by the purely data-driven Riemannian methods (8.1y MAE). The chance level\nwas 16y MAE. Interestingly, the Riemannian methods give similar results, and outperformed the\nnon-Riemannian methods. When Riemannian geometry was not applied, the projection strategy\nturned out to be decisive. Here, the supervised method performed best: it reduced the dimension of\nthe problem while preserving the age-related variance.\n\nRejecting a null-hypothesis that differences between models are due to chance would require several\nindependent datasets. Instead, for statistical inference, we considered uncertainty estimates of paired\ndifferences using 100 Monte Carlo splits (10% test set size). For each method, we counted how often\nit was performing better than the baseline model obtained with identity and log-diag. We observed\nfor supervised log-diag 73%, identity Wasserstein 85%, unsupervised geometric 96% and biophysics\n95% improvement over baseline. This suggests that inferences will carry over to new data.\n\nImportantly, the supervised spatial \ufb01lters and MNE both support model inspection, which is not the\ncase for the two Riemannian methods. Fig. 5 depicts the marginal patterns [27] from the supervised\n\ufb01lters and the source-level ridge model, respectively. The sensor-level results suggest predictive\ndipolar patterns in the theta to beta range roughly compatible with generators in visual, auditory\nand motor cortices. Note that differences in head-position can make the sources appear deeper than\n\n5 https://www.github.com/DavidSabbagh/NeurIPS19_manifold-regression-meeg\n\n8\n\n\fthey are (distance between the red positive and the blue negative poles). Similarly, the MNE-based\nmodel suggests localized predictive differences between frequency bands highlighting auditory, visual\nand premotor cortices. While the MNE model supports more exhaustive inspection, the supervised\npatterns are still physiologically informative. For example, one can notice that the pattern is more\nanterior in the \u03b2-band than the \u03b1-band, potentially revealing sources in the motor cortex.\n\nFigure 5: Model inspection.\nUpper panel: sensor-level pat-\nterns from supervised projec-\ntion. One can notice dipolar\ncon\ufb01gurations varying across\nfrequencies.\nLower panel:\nstandard deviation of patterns\nover frequencies from MNE\nprojection highlighting bilat-\neral visual, auditory and pre-\nmotor cortices.\n\n5 Discussion\n\nIn this contribution, we proposed a mathematically principled approach for regression on rank-reduced\ncovariance matrices from M/EEG data. We applied this framework to the problem of inferring age\nfrom neuroimaging data, for which we made use of the currently largest publicly available MEG\ndataset. To the best of our knowledge, this is the \ufb01rst study to apply a covariance-based approach\ncoupled with Riemannian geometry to regression problem in which the target is de\ufb01ned across\npersons and not within persons (as in brain-computer interfaces). Moreover, this study reports\nthe \ufb01rst benchmark of age prediction from MEG resting state data on the Cam-CAN. Our results\ndemonstrate that Riemannian data-driven methods do not fall far behind the gold-standard methods\nwith biophysical priors, that depend on manual data processing. One limitation of Riemannian\nmethods is, however, their interpretability compared to other models that allow to report brain-\nregion and frequency-speci\ufb01c effects. These results suggest a trade-off between performance and\nexplainability. Our study suggests that the Riemannian methods have the potential to support\nautomated large-scale analysis of M/EEG data in the absence of MRI scans. Taken together, this\npotentially opens new avenues for biomarker development.\n\nAcknowledgement\n\nThis work was supported by a 2018 \u201cm\u00e9decine num\u00e9rique\u201d (for digital Medicine) thesis grant issued\nby Inserm (French national institute of health and medical research) and Inria (French national\nresearch institute for the digital sciences). It was also partly supported by the European Research\nCouncil Starting Grant SLAB ERC-YStG-676943.\n\nReferences\n\n[1] P-A Absil, Robert Mahony, and Rodolphe Sepulchre. Optimization algorithms on matrix\n\nmanifolds. Princeton University Press, 2009.\n\n[2] Anahit Babayan, Miray Erbey, Deniz Kumral, Janis D. Reinelt, Andrea M. F. Reiter, Jose\ufb01n R\u00f6b-\nbig, H. Lina Schaare, Marie Uhlig, Alfred Anwander, Pierre-Louis Bazin, Annette Horstmann,\nLeonie Lampe, Vadim V. Nikulin, Hadas Okon-Singer, Sven Preusser, Andr\u00e9 Pampel, Chris-\ntiane S. Rohr, Julia Sacher, Angelika Th\u00f6ne-Otto, Sabrina Trapp, Till Nierhaus, Denise Alt-\nmann, Katrin Arelin, Maria Bl\u00f6chl, Edith Bongartz, Patric Breig, Elena Cesnaite, Sufang Chen,\nRoberto Cozatl, Saskia Czerwonatis, Gabriele Dambrauskaite, Maria Dreyer, Jessica Enders,\nMelina Engelhardt, Marie Michele Fischer, Norman Forschack, Johannes Golchert, Laura Golz,\nC. Alexandrina Guran, Susanna Hedrich, Nicole Hentschel, Daria I. Hoffmann, Julia M. Hunten-\nburg, Rebecca Jost, Anna Kosatschek, Stella Kunzendorf, Hannah Lammers, Mark E. Lauckner,\nKeyvan Mahjoory, Ahmad S. Kanaan, Natacha Mendes, Ramona Menger, Enzo Morino, Karina\n\n9\n\n\fN\u00e4the, Jennifer Neubauer, Handan Noyan, Sabine Oligschl\u00e4ger, Patricia Panczyszyn-Trzewik,\nDorothee Poehlchen, Nadine Putzke, Sabrina Roski, Marie-Catherine Schaller, Anja Schiefer-\nbein, Benito Schlaak, Robert Schmidt, Krzysztof J. Gorgolewski, Hanna Maria Schmidt, Anne\nSchrimpf, Sylvia Stasch, Maria Voss, Annett Wiedemann, Daniel S. Margulies, Michael Gae-\nbler, and Arno Villringer. A mind-brain-body dataset of MRI, EEG, cognition, emotion, and\nperipheral physiology in young and old adults. Scienti\ufb01c Data, 6:180308 EP \u2013, 02 2019.\n\n[3] Sylvain Baillet. Magnetoencephalography for brain electrophysiology and imaging. Nature\n\nNeuroscience, 20:327 EP \u2013, 02 2017.\n\n[4] Luc Berthouze, Leon M James, and Simon F Farmer. Human eeg shows long-range temporal\ncorrelations of oscillation amplitude in theta, alpha and beta bands across a wide age range.\nClinical Neurophysiology, 121(8):1187\u20131197, 2010.\n\n[5] Rajendra Bhatia. Positive De\ufb01nite Matrices. Princeton University Press, 2007.\n\n[6] Rajendra Bhatia, Tanvi Jain, and Yongdo Lim. On the bures\u2013wasserstein distance between\n\npositive de\ufb01nite matrices. Expositiones Mathematicae, 2018.\n\n[7] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. Muller. Optimizing spatial \ufb01lters\n\nfor robust eeg single-trial analysis. IEEE Signal Processing Magazine, 25(1):41\u201356, 2008.\n\n[8] Silvere Bonnabel and Rodolphe Sepulchre. Riemannian metric and geometric mean for positive\nsemide\ufb01nite matrices of \ufb01xed rank. SIAM Journal on Matrix Analysis and Applications,\n31(3):1055\u20131070, 2009.\n\n[9] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt, a Matlab toolbox for optimization\n\non manifolds. Journal of Machine Learning Research, 15:1455\u20131459, 2014.\n\n[10] Gy\u00f6rgy Buzs\u00e1ki and Rodolfo Llin\u00e1s. Space and time in the brain. Science, 358(6362):482\u2013485,\n\n2017.\n\n[11] Gy\u00f6rgy Buzs\u00e1ki and Kenji Mizuseki. The log-dynamic brain: how skewed distributions affect\n\nnetwork operations. Nature Reviews Neuroscience, 15(4):264, 2014.\n\n[12] C Richard Clark, Melinda D Veltmeyer, Rebecca J Hamilton, Elena Simms, Robert Paul, Daniel\nHermens, and Evian Gordon. Spontaneous alpha peak frequency predicts working memory\nperformance across the age span. International Journal of Psychophysiology, 53(1):1\u20139, 2004.\n\n[13] M. Congedo, A. Barachant, and A. Andreev. A new generation of brain-computer interface\n\nbased on Riemannian geometry. arXiv e-prints, October 2013.\n\n[14] Marco Congedo, Alexandre Barachant, and Rajendra Bhatia. Riemannian geometry for EEG-\nbased brain-computer interfaces; a primer and a review. Brain-Computer Interfaces, 4(3):155\u2013\n174, 2017.\n\n[15] Sven D\u00e4hne, Frank C Meinecke, Stefan Haufe, Johannes H\u00f6hne, Michael Tangermann, Klaus-\nRobert M\u00fcller, and Vadim V Nikulin. Spoc: a novel framework for relating the amplitude of\nneuronal oscillations to behaviorally relevant parameters. NeuroImage, 86:111\u2013122, 2014.\n\n[16] Jacek Dmochowski, Paul Sajda, Joao Dias, and Lucas Parra. Correlated components of ongoing\neeg point to emotionally laden attention \u2013 a possible marker of engagement? Frontiers in\nHuman Neuroscience, 6:112, 2012.\n\n[17] Denis A Engemann and Alexandre Gramfort. Automated model selection in covariance estima-\n\ntion and spatial whitening of meg and eeg signals. NeuroImage, 108:328\u2013342, 2015.\n\n[18] Wolfgang F\u00f6rstner and Boudewijn Moonen. A metric for covariance matrices. In Geodesy-The\n\nChallenge of the 3rd Millennium, pages 299\u2013309. Springer, 2003.\n\n[19] Pilar Garc\u00e9s, David L\u00f3pez-Sanz, Fernando Maest\u00fa, and Ernesto Pereda. Choice of magnetome-\n\nters and gradiometers after signal space separation. Sensors, 17(12):2926, 2017.\n\n[20] Gene H. Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method\n\nfor choosing a good ridge parameter. Technometrics, 21(2):215\u2013223, 1979.\n\n10\n\n\f[21] Alexandre Gramfort, Martin Luessi, Eric Larson, Denis A. Engemann, Daniel Strohmeier,\nChristian Brodbeck, Lauri Parkkonen, and Matti S. H\u00e4m\u00e4l\u00e4inen. MNE software for processing\nMEG and EEG data. NeuroImage, 86:446\u2013460, Feb. 2014.\n\n[22] M. Grosse-Wentrup* and M. Buss. Multiclass common spatial patterns and information theoretic\nfeature extraction. IEEE Transactions on Biomedical Engineering, 55(8):1991\u20132000, Aug 2008.\n\n[23] Matti H\u00e4m\u00e4l\u00e4inen, Riitta Hari, Risto J Ilmoniemi, Jukka Knuutila, and Olli V Lounasmaa.\nMagnetoencephalography\u2014theory, instrumentation, and applications to noninvasive studies of\nthe working human brain. Reviews of modern Physics, 65(2):413, 1993.\n\n[24] MS H\u00e4m\u00e4l\u00e4inen and RJ Ilmoniemi. Interpreting magnetic \ufb01elds of the brain: minimum norm\n\nestimates. Technical Report TKK-F-A559, Helsinki University of Technology, 1984.\n\n[25] Mehrtash Harandi, Mathieu Salzmann, and Richard Hartley. Dimensionality reduction on spd\nmanifolds: The emergence of geometry-aware methods. IEEE transactions on pattern analysis\nand machine intelligence, 40(1):48\u201362, 2017.\n\n[26] Riitta Hari and Aina Puce. MEG-EEG Primer. Oxford University Press, 2017.\n\n[27] Stefan Haufe, Frank Meinecke, Kai G\u00f6rgen, Sven D\u00e4hne, John-Dylan Haynes, Benjamin\nBlankertz, and Felix Bie\u00dfmann. On the interpretation of weight vectors of linear models in\nmultivariate neuroimaging. NeuroImage, 87:96 \u2013 110, 2014.\n\n[28] Inbal Horev, Florian Yger, and Masashi Sugiyama. Geometry-aware principal component\n\nanalysis for symmetric positive de\ufb01nite matrices. Machine Learning, 106, 11 2016.\n\n[29] Mainak Jas, Denis A Engemann, Yousra Bekhti, Federico Raimondo, and Alexandre Gramfort.\nAutoreject: Automated artifact rejection for MEG and EEG data. NeuroImage, 159:417\u2013429,\n2017.\n\n[30] Michel Journ\u00e9e, Francis Bach, P-A Absil, and Rodolphe Sepulchre. Low-rank optimization on\nthe cone of positive semide\ufb01nite matrices. SIAM Journal on Optimization, 20(5):2327\u20132351,\n2010.\n\n[31] Sheraz Khan, Javeria A Hashmi, Fahimeh Mamashli, Konstantinos Michmizos, Manfred G\nKitzbichler, Hari Bharadwaj, Yousra Bekhti, Santosh Ganesan, Keri-Lee A Garel, Susan\nWhit\ufb01eld-Gabrieli, et al. Maturation trajectories of cortical resting-state networks depend on\nthe mediating frequency band. NeuroImage, 174:57\u201368, 2018.\n\n[32] Linda J Larson-Prior, Robert Oostenveld, Stefania Della Penna, G Michalareas, F Prior, Abbas\nBabajani-Feremi, J-M Schoffelen, Laura Marzetti, Francesco de Pasquale, F Di Pompeo, et al.\nAdding dynamics to the Human Connectome Project with MEG. Neuroimage, 80:190\u2013201,\n2013.\n\n[33] Franziskus Liem, Ga\u00ebl Varoquaux, Jana Kynast, Frauke Beyer, Shahrzad Kharabian Masouleh,\nJulia M. Huntenburg, Leonie Lampe, Mehdi Rahim, Alexandre Abraham, R. Cameron Craddock,\nStef\ufb01 Riedel-Heller, Tobias Luck, Markus Loef\ufb02er, Matthias L. Schroeter, Anja Veronica Witte,\nArno Villringer, and Daniel S. Margulies. Predicting brain-age from multimodal imaging data\ncaptures cognitive impairment. NeuroImage, 148:179 \u2013 188, 2017.\n\n[34] Scott Makeig, Anthony J. Bell, Tzyy-Ping Jung, and Terrence J. Sejnowski.\n\nIndependent\ncomponent analysis of electroencephalographic data. In Proceedings of the 8th International\nConference on Neural Information Processing Systems, NIPS\u201995, pages 145\u2013151, Cambridge,\nMA, USA, 1995. MIT Press.\n\n[35] Estelle Massart and Pierre-Antoine Absil. Quotient geometry with simple geodesics for the\nmanifold of \ufb01xed-rank positive-semide\ufb01nite matrices. Technical report, UCLouvain, 2018.\npreprint on webpage at http://sites.uclouvain.be/absil/2018.06.\n\n[36] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,\nP. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,\nM. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine\nLearning Research, 12:2825\u20132830, 2011.\n\n11\n\n\f[37] Pedro Luiz Coelho Rodrigues, Marco Congedo, and Christian Jutten. Multivariate time-series\nanalysis via manifold learning. In 2018 IEEE Statistical Signal Processing Workshop (SSP),\npages 573\u2013577. IEEE, 2018.\n\n[38] Meredith A Shafto, Lorraine K Tyler, Marie Dixon, Jason R Taylor, James B Rowe, Rhodri\nCusack, Andrew J Calder, William D Marslen-Wilson, John Duncan, Tim Dalgleish, et al. The\nCambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional,\nlifespan, multidisciplinary examination of healthy cognitive ageing. BMC neurology, 14(1):204,\n2014.\n\n[39] Stephen M Smith and Thomas E Nichols. Statistical challenges in \u201cbig data\u201d human neuroimag-\n\ning. Neuron, 97(2):263\u2013268, 2018.\n\n[40] Samu Taulu and Matti Kajola. Presentation of electromagnetic multichannel data: the signal\n\nspace separation method. Journal of Applied Physics, 97(12):124905, 2005.\n\n[41] Jason R Taylor, Nitin Williams, Rhodri Cusack, Tibor Auer, Meredith A Shafto, Marie Dixon,\nLorraine K Tyler, Richard N Henson, et al. The Cambridge Centre for Ageing and Neuroscience\n(Cam-CAN) data repository: structural and functional MRI, MEG, and cognitive data from a\ncross-sectional adult lifespan sample. Neuroimage, 144:262\u2013269, 2017.\n\n[42] Mikko A Uusitalo and Risto J Ilmoniemi. Signal-space projection method for separating MEG\nor EEG into components. Medical and Biological Engineering and Computing, 35(2):135\u2013140,\n1997.\n\n[43] Bart Vandereycken, P-A Absil, and Stefan Vandewalle. Embedded geometry of the set of\nsymmetric positive semide\ufb01nite matrices of \ufb01xed rank. In 2009 IEEE/SP 15th Workshop on\nStatistical Signal Processing, pages 389\u2013392. IEEE, 2009.\n\n[44] Bradley Voytek, Mark A Kramer, John Case, Kyle Q Lepage, Zechari R Tempesta, Robert T\nKnight, and Adam Gazzaley. Age-related changes in 1/f neural electrophysiological noise.\nJournal of Neuroscience, 35(38):13257\u201313265, 2015.\n\n[45] F. Yger, M. Berar, and F. Lotte. Riemannian approaches in brain-computer interfaces: A review.\nIEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(10):1753\u20131762, Oct\n2017.\n\n12\n\n\f", "award": [], "sourceid": 3995, "authors": [{"given_name": "David", "family_name": "Sabbagh", "institution": "INRIA"}, {"given_name": "Pierre", "family_name": "Ablin", "institution": "Inria"}, {"given_name": "Gael", "family_name": "Varoquaux", "institution": "Parietal Team, INRIA"}, {"given_name": "Alexandre", "family_name": "Gramfort", "institution": "INRIA"}, {"given_name": "Denis", "family_name": "Engemann", "institution": "INRIA Saclay"}]}