{"title": "Identification of Recurrent Patterns in the Activation of Brain Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 674, "page_last": 682, "abstract": "Identifying patterns from the neuroimaging recordings of brain activity related to the unobservable psychological or mental state of an individual can be treated as a unsupervised pattern recognition problem. The main challenges, however, for such an analysis of fMRI data are: a) defining a physiologically meaningful feature-space for representing the spatial patterns across time; b) dealing with the high-dimensionality of the data; and c) robustness to the various artifacts and confounds in the fMRI time-series. In this paper, we present a network-aware feature-space to represent the states of a general network, that enables comparing and clustering such states in a manner that is a) meaningful in terms of the network connectivity structure; b)computationally efficient; c) low-dimensional; and d) relatively robust to structured and random noise artifacts. This feature-space is obtained from a spherical relaxation of the transportation distance metric which measures the cost of transporting ``mass'' over the network to transform one function into another. Through theoretical and empirical assessments, we demonstrate the accuracy and efficiency of the approximation, especially for large problems. While the application presented here is for identifying distinct brain activity patterns from fMRI, this feature-space can be applied to the problem of identifying recurring patterns and detecting outliers in measurements on many different types of networks, including sensor, control and social networks.", "full_text": "Identi\ufb01cation of Recurrent Patterns in the Activation\n\nof Brain Networks\n\nFirdaus Janoos\u2217 Weichang Li Niranjan Subrahmanya\n\nExxonMobil Corporate Strategic Research\n\nAnnandale, NJ 08801\n\nIstv\u00b4an \u00b4A. M\u00b4orocz William M. Wells (III)\n\nHarvard Medical School\n\nBoston, MA 02115\n\nAbstract\n\nIdentifying patterns from the neuroimaging recordings of brain activity related\nto the unobservable psychological or mental state of an individual can be treated\nas a unsupervised pattern recognition problem. The main challenges, however,\nfor such an analysis of fMRI data are: a) de\ufb01ning a physiologically meaningful\nfeature-space for representing the spatial patterns across time; b) dealing with\nthe high-dimensionality of the data; and c) robustness to the various artifacts and\nconfounds in the fMRI time-series.\nIn this paper, we present a network-aware feature-space to represent the states\nof a general network, that enables comparing and clustering such states in a\nmanner that is a) meaningful in terms of the network connectivity structure;\nb)computationally ef\ufb01cient; c) low-dimensional; and d) relatively robust to struc-\ntured and random noise artifacts. This feature-space is obtained from a spherical\nrelaxation of the transportation distance metric which measures the cost of trans-\nporting \u201cmass\u201d over the network to transform one function into another. Through\ntheoretical and empirical assessments, we demonstrate the accuracy and ef\ufb01ciency\nof the approximation, especially for large problems.\n\n1\n\nIntroduction\n\nIn addition to functional localization and integration, mapping the neural correlates of \u201cmental\nstates\u201d or \u201cbrain states\u201d (i.e. the distinct cognitive, affective or perceptive states of the human mind)\nis an important research topic for understanding the connection between mind and brain [2]. In\nfunctional neuroimaging, this problem is equivalent to identifying recurrent spatial patterns from\nthe recorded activation of neural circuits and relating them with the mental state of the subject.\nAlthough clustering the data across time to identify the intrinsic state of an individual from EEG\nand MEG measurements is an established procedure in electrophysiology [19], analysis of tempo-\nral patterns in functional MRI data have generally used supervised techniques such as multivariate\nregression and classi\ufb01cation [18, 11, 9], which restrict analysis to observed behavioral correlates of\nmental state, ignoring any information about the intrinsic mental state that might be present in the\ndata.\nIn contrast to clustering voxels based on the similarity of their functional activity (i.e. along the\nspatial dimension) [15], the problem of clustering fMRI data along the temporal dimension has\nnot been widely explored in literature, primarily because of the following challenges: a) Lack of\n\n\u2217Corresponding Author.firdaus.janoos@exxonmobil.com\n\n1\n\n\fa physiologically meaningful metric to compare the difference between the spatial distribution of\nrecorded brain activity (i.e. brain states) at two different time-points; b) Problems that arise because\nthe number of voxels (i.e. dimensions) is orders of magnitude larger (N \u223c O(105) vs. T \u223c O(102))\nthan the number of scans (i.e. samples) ; and c) Structured and systematic noise due to factors such\nas magnetic baseline drift, respiratory and cardiac activity, and head motion. The dimensionality\nproblem in fMRI has been typically addressed through PCA [16], ICA[3] or by selection of a subset\nof voxels either manually or via regression against the stimulus [18, 11]. PCA has generally been\nfound to be problematic in fMRI [18, 11, 13], since the largest variance principal components usually\ncorrespond to motion and physiological noise such as respiration and pulsatile activity, while ICA\ndoes not provide an automated way of selecting components. On the other hand, supervised feature-\nspaces are inherently biased towards the experimental variables against which they were selected or\nby the investigator\u2019s expectations, and may not capture unexpected patterns in the data.\n\nIn the \ufb01rst contribution of this paper, we address these problems by using a network-aware metric\nthat captures the difference between the states zt1 , zt2 at two different time-points t1, t2 of a\ntemporally evolving function zG : V \u00d7 [0, T ] \u2192 R de\ufb01ned on the vertices V of a network (i.e. an\nweighted undirected graph) G = (V, E), in a manner that is aware of the connectivity structure\nE of the underlying network. Intuitively, this network-aware metric assesses the distance between\nthat differ mainly on proximally connected nodes to be less than the distance\ntwo states zt1 , zt2\nthat differ on unconnected nodes. This concept is illustrated in Fig. 1.\nbetween states zt1 , zt2\nIn the context of neuroimaging, where the net-\nwork measures the functional connectivity [4]\nbetween brain regions, this implies that two\nbrain activation patterns that differ mainly on\nfunctionally similar regions are functionally\ncloser than two that differ on functionally un-\nrelated regions. For example, zt1 and zt2 that\nactivated mainly in the cingulo-opercular net-\nwork would be functionally more similar with\neach other than with zt3 that exhibited activity\nmainly in the fronto-parietal network.\n\nSuch network awareness is provided by the\nKantorovich metric[20], also called the trans-\nportation distance (TD), which measures the\nminimum \ufb02ow of \u201cmass\u201d over the network to\nmake zt1 at time t1 match zt2 that at t2 . The\ncost of this \ufb02ow is encoded by the weights of\nthe edges of the graph. The Earth Movers Dis-\ntance (EMD), closely related to the transporta-\ntion distance, is widely used for clustering and\nretrieval in computer vision, medical imaging,\nbio-informatics and data-mining [21, 22, 7]. One major strength of this family of metrics for neu-\nroimaging applications, over voxel-wise image matching, is that it allows for partial matches thereby\nmitigating the effect of small differences between the measurements that arise due to spatial dis-\nplacement such as head-motion or from random noise [21].\n\nFigure 1: Shown are zt1 , zt2 and zt3, three states\nof the function zG on the network G. Here, zt1 and\nzt2 activate on more proximal regions of the graph and\nare hence assessed to be more similar than zt1 and zt3.\nSimilarly, for zt2 and zt3.\n\nThe TD, however, has the following limitations: Firstly, it is computationally expensive with worst-\ncase complexity of O(NV3 log NV) where NV is the number of nodes in the graph [17]. If the\nnumber of time-series observations is T , clustering requires O(T 2) comparisons, making compu-\ntation prohibitively expensive for large data-sets. Secondly, and more importantly, the metric is the\nsolution to an optimization problem and therefore does not have a tractable geometric structure. For\nexample, there is no closed form expression of the centroid of a cluster under this metric. As a result,\ndetermining the statistical properties of clusters obtained under this metric, leave alone developing\nmore sophisticated models, is not straightforward. Although linear embedding (i.e. Euclidean) ap-\nproximations have been proposed for the EMD [12, 22], they are typically de\ufb01ned for comparing\nprobability distributions over regular grids and extension to functions over arbitrary networks is an\nopen problem.\n\n2\n\nzt1 zt2 zt3 \fThe second contribution of this paper is to address these issues through the development of a linear\nfeature-space that provides a good approximation of the transportation distance. This feature\u2013space\nis motivated by spherical relaxation [14] of the dual polytope of the transportation problem, as\ndescribed in Section 2. The network function zG is then embedded into an Euclidean space via a\nsimilarity transformation such that the the transportation distance is well-approximated by the (cid:96)2\ndistance in this space, as elucidated in Section 3. In contrast to existing linear approximations, the\nfeature-space developed here has a very simple form closely related to the graph Laplacian [6].\nTheoretical bounds to the error the approximation are developed and the accuracy of the method\nis validated empirically in Section 4.1. Here, we show that the feature\u2013space does not deteriorate,\nbut on the contrary, may improve as the size of the graph increases, making it highly suitable for\ndealing with large networks like the brain. Its application to extracting the intrinsic mental-states,\nin an unsupervised manner, from an fMRI study for a visuo-spatial motor task is demonstrated in\nSection 4.2. Detailed proofs and descriptions are provided in the Supplemental to the manuscript.\n\n2 Transportation Distance and Spherical Relaxation\n\nLet zt1 and zt2 denote the states of zG at time-points t1, t2 on the graph G = (V, E), with nodes\nV = {1 . . . NV} and edges E = {(i, j) | i, j \u2208 V}. The symmetric distance matrix WG[i, j] \u2208 R+\nencodes the cost of transport between nodes i and j. Also, de\ufb01ne the difference between two states\ni\u2208V dz[i] = 0 without loss of generality 1 . The minimal cost\nTD(zt1 , zt1 ), of transport f : E \u2192 R+ of \u201cmass\u201d over the network to convert zt1 into zt2, is posed as\nthe following linear program (LP):\n\nas dz = zt1 \u2212 zt2, and assume (cid:80)\n(cid:88)\n\n(cid:88)\n\nsubject to (cid:88)\n\nf [i, j] \u2212(cid:88)\n\nf [j, i] = dz[i].\n\n(1)\n\nf [i, j]WG[i, j],\n\nTD(zt1 , zt2 ) = min\n\nf\n\ni\u2208V\n\nb\u2208V\n\nj\n\nj\n\nThe corresponding TP dual, formulated in the unrestricted dual variables g : V \u2192 R, is:\n\nTD(zt1 , zt2 ) = max\n\ng\n\nsubject to\n\nAg \u2264 wG\n\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8ef\uf8f0\n\n(cid:104)g, dz(cid:105)\n1 \u22121\n1\n...\n1\n\u22121\n0\n...\n...\n\n0\n\n1\n\n0\n0 \u22121\n\n0\n0\n1\n0\n1 \u22121\n\n0\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fa\uf8fb\n\n0\n0\n\n. . .\n. . .\n...\n...\n. . . \u22121\n0\n. . .\n. . .\n0\n...\n...\n. . . \u22121\n...\n...\n\n(2)\n\n.\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ec\uf8ed\n\nWG[1, 2]\nWG[1, 3]\n\n...\n\nWG[1, N ]\nWG[2, 1]\nWG[2, 3]\n\nWG[2, N ]\n\n...\n...\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f7\uf8f8\n\nwhere A =\n\nand wG =\n\nThe feasible set of the dual is a convex polytope formed by the intersection of the half-spaces\nspeci\ufb01ed by the constraints {ai,j, i = 1 : NV, j = 1 . . . NV, i (cid:54)= j}, corresponding to the rows of A,\nand containing a +1 entry in the i\u2013th position and a \u22121 entry in the j\u2013th position. These constraints\nwhich form normals to the hyper-planes bounding this polytope, are symmetrically distributed in\nthe +i \u00d7 \u2212j quadrant of RNV for each combination of i and j . Moreover, A is totally uni-modular\n[5], and has rank of NV \u2212 1 with the LP polytope lying in an NV \u2212 1 dimensional space orthogonal\nto 1NV, the 1 \u2013vector in RNV. In the discussion below, we operate in the original RNV notation,\nby considering its restriction to the NV \u2212 1 dimensional sub-space {g \u2208 RNV | (cid:104)g, 1NV(cid:105) = 0},\ni\u2208V g[i] = 0. The optimal solution to this problem will lie on the NV \u2212 1 simplicial complex\n\u221a\nformed by intersections of the NV \u2212 1 dimensional hyper-planes each at a distance of WG[i, j]/\n2\nfrom the origin, and in the non-degenerate case will coincide with the extreme-points of the polytope\nAg \u2264 wG.\nConsider the a special case for the fully-connected graph with WG[i, j] = 1, \u2200i, j \u2208 V. Here,\n\ni.e.(cid:80)\n\n< g, dz >\n\nTD(zt1 , zt2 ) = max\n\nsubject to\n\u221a\n\u221a\n2 from the origin and the maximum inscribed\nEach hyper-plane of the LP polytope is at distance 1/\n2 touches all the polytope\u2019s hyper-planes. The\nhyper-sphere, with center at the origin and radius 1/\ni\u2208V dz[i] and WG[i, NV +1] = 0,\u2200i \u2208 V.\n\n1 Add dummy node with index NV +1 where dz[NV +1] = \u2212(cid:80)\n\n(3)\n\ng\n\nAg \u2264 1NV\u00d7(NV\u22121).\n\n3\n\n\fmain idea of the embedding is to use the regularity of this polytope, with 2NV \u2212 2 extreme points\nsymmetrically distributed in RNV\u22121 (\u00a7 Proposition 2 in the Supplemental) and approximate it by this\nhyper-sphere. Relaxing the feasible set of the TP dual from the convex polytope to this hyper-sphere,\n\neqn. (2) becomes:(cid:100)TD(zt1 , zt2 ) = max\n\ng\n\n< g, dz >\n\nsuch that\n\nwhich has a direct solution\n\n(cid:100)TD(zt1 , zt2 ) =\n\n||dz|| =\n\n1\u221a\n2\n\n1\u221a\n2\n\n||zt1 \u2212 zt2||\n\nwith\n\n||g||2 =\n\n1\u221a\n2\n\n,\n\n(cid:98)g\n\n\u2217\n\n=\n\n1\u221a\n2\n\ndz\n||dz||\n\n(4)\n\n(5)\n\nThe worst-case error of this approximation is O(||dz||) (\u00a7Theorem 1 of the Supplemental), proving\nthat quality of the linear approximation for a graph where all nodes are equidistant neighbors of each\nother does not deteriorate as the size of the graph increases.\n\n3 Linear Feature Space Embedding\n\nIn the case of an arbitrary distance matrix WG, however, the polytope loses its regular structure, and\nhas a variable number of extreme points. Also, in general, the maximal inscribed hyper-sphere does\nnot touch all the bounding hyper-planes, resulting in a very poor approximation [14]. Therefore, to\nuse the spherical relaxation for the general problem, we apply a similarity transformation M, such\nthat A \u00b7 M = diag{wG}\u22121A and M positive semi-de\ufb01nite. Expressing eqn. (2) in terms of a new\nvariable \u03be (cid:44) Mg, we see that the general problem:\n< g, dz >\n\nAg \u2264 wG\n\nTD(zt1 , zt2 ) = max\n\nsuch that\n\n(6)\n\ng\n\nis equivalent to the special case given by eqn. (3), in a transformed space, as per:\n\n\u2212\n\nsuch that\n\nA\u03be \u2264 1NV\u00d7(NV\u22121),\n\n(7)\n\n\u03be\n\n1\u221a\n2\n\n< M\n\n\u03be, dz >\n\n||M\u22121(cid:62)\n\nTD(zt1 , zt2 ) = max\n\n(zt1 \u2212 zt2 )||.\n\nwhere M\u2212 is the (pseudo-)inverse of M. Then, the approximation of eqn. (4) yields:(cid:100)TD(zt1 , zt2 ) =\nNV LG, where LG = D\u2206G \u2212 \u2206G\n\u2206G[i, j] = WG[i, j]\u22121, \u2200i (cid:54)= j and D\u2206G is the diagonal degree matrix with D\u2206G [i, i] =(cid:80)\nAs shown in Supplemental Section A, the transformation matrix M = 1\nis the un-normalized Laplacian matrix of the graph. Here, \u2206G is the adjacency matrix such that\nj\u2208V \u2206G[i, j]\nthe projection of zt onto the feature space V\u039b\u2212 as (cid:98)zt = \u039b\u2212V(cid:62)zt yields:\nand D\u2206G [i, j] = 0, for i (cid:54)= j. De\ufb01ning V\u039bV(cid:62) = LG as the eigen-system of the graph Laplacian, and\n||(cid:99)zt1 \u2212 (cid:99)zt2||.\n\n(cid:100)TD(zt1 , zt2 ) =\n\ndz|| =\n\n||\u039b\n\u2212\n\n(8)\n\nV\n\n(cid:62)\n\n1\u221a\n2\n\n1\u221a\n2\n\nthe requirement that(cid:80)\n\nConsequently, the transportation distance can be approximated by a (cid:96)2 metric through a similarity\ntransformation of the original space. In this case the error of the approximation is O(\u03bb\nmin||dz||2)\n\u22121\n(\u00a7Theorem 1 of the Supplemental), which implies that the approximation improves as the smallest\neigenvalue of the graph Laplacian increases. Also, notice that the eigenvector vNV of LG corre-\nsponding to the smallest eigenvalue \u03bbNV = 0 is a constant vector, and therefore (cid:104)vNV , dz(cid:105) = 0 by\ni\u2208V dz[i] = 0 , thereby automatically reducing the dimension of the projected\nspace to NV \u2212 1 .\nDimensionality reduction of the feature-space can be achieved by discarding eigenvectors of LG\nwith the P largest eigenvalues whose inverse sum contributes to less than a certain percentage of the\nadditional error in(cid:100)TD(zt1 , zt2 ) is equal to\ntotal inverse spectral energy. If eigenvectors with eigenvalues \u03bb1 \u2265 \u03bb2 \u2265 . . . \u2265 \u03bbP are discarded, the\n\n(cid:113)(cid:80)NV\n\n(cid:113)(cid:80)P\n\nk=P +1 \u03bb\n\n\u22122\nk /\n\nk=1 \u03bb\n\n\u22122\nk\n\n.\n\n4 Results\n\nFirst, we start by providing an empirical validation of the approximation to the transportation dis-\ntance in Section 4.1 And then the feature-space is used to \ufb01nd representative patterns (i.e. brain\nstates) in the dynamically changing activations of the brain during a visuo-motor task in Section 4.2.\n\n4\n\n\f4.1 Validation\n\nTo validate the linear approximation to the transportation distance on networks, like the brain, that\nexhibit a scale-free property [1], we simulated random graphs of NV vertices using the following\nprocedure: a) Create an edge between nodes i and j with probability \u221d \u03b2(di + dj + \u0001) , where di\nis the degree of node i, and \u03b2, \u0001 are constants that are varied across experiments; b) sample the\nweight of the edge from a \u03c72\n1 distribution scaled by a constant \u03b3, varied across experiments. For\neach instance G(n) of the graph, a set of T = 100 states zt : V(i) \u2192 R, t = 1 . . . 104 were sampled\ni dz[i] = 0. The experiment was repeated 10 times\n\nfrom a standard normal distribution such that(cid:80)\n\nat graph sizes of NV = 2n, n = 4 . . . 12.\nThe transportation problem was solved using network simplex [17] in the IBM CPLEX(cid:114) optimiza-\ntion package, while the linear approximation was implemented in Matlab(cid:114). All experiments were\nrun on a 2.6Hz Opteron cluster with 16 processors and 32GB RAM each. The amortized running\ntime for one pair-wise comparison is shown in Fig. 2(a). While an individual run of the network\nsimple algorithm is much faster than the eigen-system computation of the linear feature-space, re-\npeatedly solving TD(zt1 , zt2 ) for all pairs of zt1 , zt2 is orders of magnitude slower than a simple\nEuclidean distance, reducing its net ef\ufb01ciency.\nThe relative error, as shown in Fig. 2(b), reduces with increasing number of vertices, approximately\nas O(NV\u22121). This is because the approximation error for an arbitrary graph is O(\u03bb\nmin||dz||2) , while\n\u22121\nfor random graphs satisfying basic regularity conditions the eigenvalues of the graph Laplacian\nincrease as O(NV) [8]. In comparison, the Euclidean metric ||zt1 \u2212 zt2||2 starts with a much higher\nrelative error with respect to the transportation distance, and although its error also reduces with\ngraph size, the trend is slower. Secondly, the variance of the error is much higher than the linear\nembedding proposed here.\nIn the context of clustering, which is the motivation for this work, a more important property is that\nthe approximation preserve the relative con\ufb01guration (i.e. homomorphism) between observations\nrather than the numerical values of their distances (i.e. isomorphism), as characterized by its abil-\nity to preserve the relative ordering between points (i.e. a topological equivalence property). From\nFig. 2(c), we observe that for data-points that are relatively close to each other, the ordering rela-\ntionships are preserved with very high accuracy and it reduces as the relative distance between the\npoints increases.\nAnother important property for an embedding scheme, especially for non-linear manifolds like that\ninduced by the TD, is its ability to preserve the relative distances between points that are in local\nneighborhoods (i.e. a coordinate chart property ). This is quanti\ufb01ed by a normalized neighborhood\nerror as de\ufb01ned by:\n\n(cid:80)\n\nTD(zt1 , zt2 )\n\nTD(zt1 , ztn )\n\nand b = (cid:100)TD(zt1 , zt2 )\n(cid:100)TD(zt1 , ztn )\n\n(cid:80)\n\n.\n\nNormErr(zt1 , zt2 ) =\n\n, where a =\n\n|a \u2212 b|\n|a|\n\nn\u2208Nt1\n\nThe neighborhoods Nt1 contain the 10 nearest neighbors of zt1 under the TD and (cid:100)TD metrics\neach other. These plot indicate that although(cid:100)TD does not hold for distant points on the manifold\n\nrespectively. The formulation has the effect of normalizing the distance between zt1 , zt2 with respect\nto the local neighborhood of zt1. It can be seen in Fig. 2(d) that the approximation error according\nto this measure is extremely low and almost constant with respect to NV for points that are close to\n\nn\u2208Nt1\n\ninduced by TD, it provides a good approximation of its topology.\n\n4.2 Neuroimaging Data\n\nClustering using the feature-space described in this paper was applied to a data-set of \ufb01fteen sub-\njects performing a visuo-motor task during functional MR imaging to discover salient patterns of\nrecurrent brain activation. The subjects were visually exposed to oriented wedges \ufb01lled with high-\ncontrast random noise patterns and displayed randomly in one of four quadrants. They were asked\nto focus on a center dot and to perform a \ufb01nger-tapping motion with the right or left hand when the\nvisual wedge was active in the upper right or lower left quadrants, respectively. Block length of each\nvisual wedge stimulation varied from 5 to 15s and noise patterns changed at a frequency of 5Hz. A\nmulti-shot 3D Gradient Echo Planar Imaging (EPI) sequence accelerated in the slice encoding di-\nrection with GRAPPA and UNFOLD was used on a GE 3T MRI scanner with a quadrature head\n\n5\n\n\fFigure 2: Fig.(a) shows the (amortized) per-comparison running time in seconds for the transportation dis-\n\ntance TD and its approximation (cid:100)TD with respect to with respect to graph size NV. In Fig.(b) the relative\napproximation error ((cid:100)TD \u2212 TD)/TD (\u00b11 std.dev.) is graphed. The error for an Euclidean approximation\neach zt1, the fraction of {zt2 , t2 = 1 . . . T, t2 (cid:54)= t1} that are misordered by(cid:100)TD(zt1 , zt2 ) with respect to\n||zt1 \u2212 zt2||2 is also shown for comparison. Fig.(c) shows the quartile-wise ordering error (\u00b11 std.dev). For\nthe ordering induced by TD(zt1 , zt2 ) is calculated. The set {zt2} is divided into quartiles according to their\ndistance TD(zt1 , zt2 ) from zt1, where the 25 percentile is set of the \ufb01rst 25% closest points to zt1 (similarly\nfor the 50 and 75%-iles). Also shown is the ordering error of the Euclidean metric with respect to TD. Error-\nbars are omitted for clarity. Fig (d) shows the quartile-wise approximation error normalized by the average\ndistance of its 10 nearest neighbors. The dashed line shows the un-normalized approximation error (\u00a7 Fig.(b))\nfor reference.\n\ncoil and T = 171 volumes were acquired at TR=1.05s, an isotropic resolution of 3mm, with total\nimaging time of 3min and the \ufb01rst \ufb01ve volumes were discarded from the analysis. High resolution\nanatomical scans were also acquired, bias-\ufb01eld corrected, normalized to an MNI atlas space and\nsegmented into gray and white matter regions. The fMRI scans were motion corrected using linear\nregistration and co- registered with the structural scans using SPM8 [16]. Next, the time-series data\nwere high-pass \ufb01ltered (0.5Hz) to remove gross artifacts due to breathing, blood pressure changes\nand scanner drift. The data were \ufb01rst analyzed for task related activity using a general linear model\n(GLM) with SPM8 for reference. The design matrix included a regressor for the presentation of the\nwedge in each quadrant, convolved with a canonical hemodynamic response function. These results\nare shown in Fig. 3(a).\nNote that the data for each subject were processed separately. The mean volume of the time-series\nwas then subtracted, white matter masked out and all further processing was performed on the gray\nmatter. The functional networks for a subject were computed by estimating the correlations be-\ntween voxels using the method described in Supplemental Section C, that is sparse, consistent and\ncomputationally ef\ufb01cient. The distance matrix of the functional connectivity graph was constructed\nas WG[i, j] = \u2212 log(|\u03c1[i, j]|/\u03c4 ), where \u03c1[i, j] is the correlation between voxels i and j and \u03c4 is a\nuser-de\ufb01ned scale parameter (typically set to 10). This mapping has the effect that WG[i, j] \u2192 0 as\n|\u03c1[i, j]| \u2192 1 and WG[i, j] \u2192 \u221e as |\u03c1[i, j]| \u2192 0 .\nThe linear feature-space (\u00a7eqn. (8)) was computed from the graph Laplacian of \u2206G, where \u2206G[i, j] =\nWG[i, j]\u22121, retaining only those basis vectors corresponding to the top 80 eigenvalues (\u2248 50% of\nthe spectral energy), and the fMRI volumes were embedded into this low dimensional space. For\n\n6\n\n0.01 0.1 1 10 128 256 512 1024 2048 4096 8192 16384 Sec Number of nodes (a) Running time (b) Approximation error (c) Ordering error (d) Neighborhood error 0% 20% 40% 60% 80% 128 256 512 1024 2048 4096 8192 16384 Error Number of nodes Euclidean \ud835\udc47\ud835\udc37 approx 0% 5% 10% 15% 20% 25% 30% 35% 128 256 512 1024 2048 4096 8192 16384 Error Number of nodes 25%-ile 50%-ile 75%-ile Overall 0% 20% 40% 60% 80% 100% 128 256 512 1024 2048 4096 8192 16384 Error Number of nodes 25%-ile 50%-ile 75%-ile 25%-ile (Euclidean) 50%-ile (Euclidean) 75%-ile (Euclidean) Euclidean T\ud835\udc37 approx \f[0, 1], (cid:80)\n\nclustering, the state-space method (SSM) of Janoos, et al. [13] was used, which is a modi\ufb01ed hidden\nMarkov model with Gaussian emission probabilities that assigns a state (i.e. cluster) label to each\nscan while accounting for the temporal blurring cause by the hemodynamic response. This method\nassociates each time-point t of the fMRI time-series with a vector \u03c0t = {\u03c0t[1] . . . \u03c0t[K] | \u03c0t[k] \u2208\nk \u03c0t[k] = 1} giving the probability of belonging to state 1 . . . K. A multinomial logistic\nclassi\ufb01er (MLC) was then trained to predict the wedge position at time t from \u03c0t. The number\nof clusters was determined by selecting a value of 5 \u2264 K \u2264 15 that minimized the generalization\nerror of the MLC, which acts as a statistic to assess the quality of the model-\ufb01t and perform model\nselection.\n\nIt should be noted here that identi\ufb01cation of patterns of recorded brain activity was performed in\na purely unsupervised manner. Only model selection and model interpretation was done, post hoc,\nusing observable correlates of the unobservable mental state of the subject. Spatial maps for each\nwedge orientation were computed as an average of cluster centroids weighted by the MLC weights\nfor that orientation. The z-statistic spatial maps for the group from this analysis are shown in\nFig. 3(b), and exhibit the classic contra-lateral retinotopic organization of the primary visual cor-\ntex with the motor representation areas in both hemispheres. Fig. 3(c) shows the distribution of state\nprobabilities for one subject corresponding to a sequence of wedges oriented in each quadrant for\n4\u00d7TRs each. Here, we see that the probability of a particular state is highly structured with respect\nto the orientation of the wedge. For example, at the start of the presentation with the wedge in the\nlower-right quadrant, state 1 is most probable. But by the second interval, state 2 becomes more\ndominant and this distribution remains stable for the rest of this presentation. Then, as the display\ntransitions to the lower-left quadrant, states 3 and 4 become equiprobable. However, as this orien-\ntation is maintained, the probability distribution peaks about state 4 and remains stable. A similar\npattern in observed in the probability distributions for the other orientations.\n\nFor comparison, we also performed the same clustering using a low-dimensional PCA basis explain-\ning \u2248 50% of the variance of the data (d = 60), and the low-dimensional basis (CorrEig) proposed\nby [13] derived from the eigen-decomposition of the voxel-wise correlation matrix (d \u2248 110).\nMultinomial logistic classi\ufb01ers (MLC) were trained for each case and number of states were tuned\nusing the same procedure as above. The spatial maps reconstructed from these two feature-spaces\nand much more diffused as compared to those of the (cid:100)TD feature-space. The error of the MLC in\n(not shown here) exhibited task-speci\ufb01c activation patterns, although the foci were much weaker\n\npredicting the stimulus at time t from the state probability vector \u03c0t, which re\ufb02ects the model\u2019s\nability to capture patterns in the data related to the mental state of the subject, for these three feature\nspaces is listed in Table 1.\n\n(cid:100)TD\n\nPCA\nCorrEig\n\nLower right\n0.17 (\u00b1 0.05)\n0.41 (\u00b1 0.08)\n0.29 (\u00b1 0.05)\n\nLower left\n0.13 (\u00b1 0.02)\n0.37 (\u00b1 0.10)\n0.22 (\u00b1 0.04)\n\nUpper left\n0.21 (\u00b1 0.04)\n0.39 (\u00b1 0.09)\n0.30 (\u00b1 0.06)\n\nUpper right\n0.12 (\u00b1 0.03)\n0.36 (\u00b1 0.08)\n0.23 (\u00b1 0.05)\n\nOverall\n0.16 (\u00b1 0.07)\n0.38 (\u00b1 0.18)\n0.26 (\u00b1 0.10)\n\nTable 1: The generalization error of the multinomial logistic classi\ufb01er to predict the orientation of the wedge\nfrom the distribution of state labels estimated by the SSM trained on three low-dimensional representations of\n\nthe fMRI data: a) the approximate transportation distance(cid:100)TD; b) PCA basis; and c) the eigen basis of the\n\nvoxel-wise correlation matrix (CorrEig). Due to the random presentation of wedge orientations, the chance\nlevel prediction error varied between 68% \u2212 \u221281% for each subject.\n\nWe see that the prediction error \u2013 and therefore the ability of the state-space model to identify\npared to that of [13] (p < 10\u22126, 1-sided 2-sample t-test) while PCA performs signi\ufb01cantly worse\n\nmental-state related patterns in the data \u2013 is signi\ufb01cantly better for the(cid:100)TD feature-space as com-\nthan both the other feature-spaces, as expected. Moreover, the(cid:100)TD representation provides a sig-\n\nni\ufb01cant difference (p < 0.001, 1-sided 2-sample t-test) between the prediction rates for the wedge\norientations with and without the \ufb01nger-tapping task, implying that the model is able to better detect\nbrain patterns when both visual and motor regions are involved as compared to those involving only\nthe visual regions, probably because of the more distinct functional signature of the former.\n\n7\n\n\fFigure 3: Fig. (a): Group-level maximum intensity projections of signi\ufb01cantly activated voxels (p < 0.05,\nFWE corrected) at the four orientations of the wedge and the hand motor actions, computed using SPM8\nFig. (b): Group-level z-maps showing the activity for each orientation of the wedge computed as an average of\ncluster centroids weighted by the MLC weights. Displayed are the posterio-lateral and posterio-medial views of\nthe left and right hemispheres respectively. Values |z| \u2264 1 have been masked out for visual clarity. Fig. (c): The\nSSM state probability vector \u03c0t for one subject. The size of the circles corresponds to the marginal probability\n\u03c0t[k] of state k = 1 . . . 8 during the display of the wedge in lower right, lower left, upper left and upper right\nquadrants for 4TRs each. States have been relabeled for expository purposes.\n\n5 Conclusion\n\nIn this paper, we have presented an approach to compare and identify patterns of brain activation\nduring a mental process using a distance metric that is aware of the connectivity structure of the\nunderlying brain networks. This distance metric is obtained by an Euclidean approximation of the\ntransportation distance between patterns via a spherical relaxation of the linear-programming dual\npolytope. The embedding is achieved by a transformation of the original space of the function\nwith the graph Laplacian of the network. Intuitively, the eigen-system of graph Laplacian indicates\nmin-\ufb02ow / max-cut partitions of the graph [10], and therefore projecting on these basis increases\nthe cost if the difference between two states of the function is concentrated on relatively distant or\ndisconnected regions of the graph.\n\nWe provided theoretical bounds on the quality of the approximation and through empirical validation\ndemonstrated low error that, importantly, decreases as the size of the problem increases. We also\nshowed the superior ability of this distance metric to identify salient patterns of brain activity, related\nto the internal mental state of the subject, from an fMRI study of visuo-motor tasks.\n\nThe framework presented here is applicable to the more general problem of identifying patterns in\ntime-varying measurements distributed over a network that has an intrinsic notion of distance and\nproximity, such as social, sensor, communication, transportation, energy and other similar networks.\nFuture work would include assessing the quality of the approximation for sparse, restricted topology,\nsmall-world and scale-free networks that arise in many real world cases, and applying the method\nfor detecting patterns and outliers in these types of networks.\n\n8\n\n(a) GLM regression (b) SSM based clustering 18765432State label11615141311109876543212TRCluster label (c) Cluster membership probability vs. experimental stimulus 1 12 \fReferences\n\n[1] Achard, S., Salvador, R., Whitcher, B., Suckling, J., Bullmore, E.: A resilient, low-frequency, small-world\nhuman brain functional network with highly connected association cortical hubs. Neurosci 26(1), 63\u201372\n(Jan 2006) 5\n\n[2] Barrett, L.F.: The future of psychology: Connecting mind to brain. Perspect Psychol Sci 4(4), 326\u2013339\n\n(Jul 2009) 1\n\n[3] Calhoun, V.D., Adali, T., Pearlson, G.D., Pekar, J.J.: Spatial and temporal independent component anal-\nysis of functional MRI data containing a pair of task-related waveforms. Hum Brain Map 13(1), 43\u201353\n(May 2001) 2\n\n[4] Cecchi, G., Rish, I., Thyreau, B., Thirion, B., Plaze, M., Paillere-Martinot, M.L., Martelli, C., Martinot,\nJ.L., Poline, J.B.: Discriminative network models of schizophrenia. In: Adv Neural Info Proc Sys (NIPS)\n22, pp. 252\u2013260 (2009) 2\n\n[5] Chandrasekaran, R.: Total unimodularity of matrices. SIAM Journal on Applied Mathematics 17(6), pp.\n\n1032\u20131034 (1969) 3\n\n[6] Chung, F.: Lectures on Spectral Graph Theory. CBMS Reg Conf Series Math, Am Math Soc (1997) 3\n[7] Deng, Y., Du, W.: The Kantorovich metric in computer science: A brief survey. Electronic Notes in\n\nTheoretical Computer Science 253(3), 73 \u2013 82 (2009) 2\n\n[8] Ding, X., Jiang, T.: Spectral distributions of adjacency and Laplacian matrices of random graphs. The\n\nAnnals of Applied Probability 20(6), 2086 \u20132117 (2010) 5\n\n[9] Friston, K., Chu, C., Mouro-Miranda, J., Hulme, O., Rees, G., Penny, W., Ashburner, J.: Bayesian decod-\n\ning of brain images. Neuroimage 39(1), 181\u2013205 (Jan 2008) 1\n\n[10] Grieser, D.: The \ufb01rst eigenvalue of the laplacian, isoperimetric constants, and the max \ufb02ow min cut\n\ntheorem. Archiv der Mathematik 87, 75\u201385 (2006) 8\n\n[11] Haynes, J.D., Rees, G.: Decoding mental states from brain activity in humans. Nature Rev: Neurosci\n\n7(7), 523\u2013534 (Jul 2006) 1, 2\n\n[12] Indyk, P., Thaper, N.: Fast color image retrieval via embeddings. ICCV (2003) 2\n[13] Janoos, F., Singh, S., Wells III, W., M\u00b4orocz, I.A., Machiraju, R.: State\u2013space models of mental processes\n\nfrom fMRI (2011) 2, 7\n\n[14] Khachiyan, L.G., Todd, M.J.: On the complexity of approximating the maximal inscribed ellipsoid for a\n\npolytope. Mathematical Programming 61, 137\u2013159 (1993) 3, 4\n\n[15] Lashkari, D., Sridharan, R., Golland, P.: Categories and functional units: An in\ufb01nite hierarchical model\nfor brain activations. In: Advances in Neural Information Processing Systems. vol. 23, pp. 1252\u20131260\n(2010) 1\n\n[16] Multiple: Statistical Parametric Mapping: The Analysis of Functional Brain Images. Acad Press (2007)\n\n2, 6\n\n[17] Orlin, J.B.: On the simplex algorithm for networks and generalized networks. In: Mathematical Pro-\ngramming Essays in Honor of George B. Dantzig Part I, Mathematical Programming Studies, vol. 24, pp.\n166\u2013178. Springer Berlin Heidelberg (1985) 2, 5\n\n[18] O\u2019Toole, A.J., Jiang, F., Abdi, H., Pnard, N., Dunlop, J.P., Parent, M.A.: Theoretical, statistical, and prac-\ntical perspectives on pattern-based classi\ufb01cation approaches to the analysis of functional neuroimaging\ndata. J Cog Neurosci 19(11), 1735\u20131752 (Nov 2007) 1, 2\n\n[19] Pascual-Marqui, R.D., Michel, C.M., Lehmann, D.: Segmentation of brain electrical activity into mi-\n\ncrostates: model estimation and validation. IEEE Trans Biomed Eng 42(7), 658\u2013665 (Jul 1995) 1\n\n[20] Rachev, S.T., Ruschendorf, L.: Mass transportation problems: Volume I: Theory (probability and its\n\napplications) (March 1998) 2\n\n[21] Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover\u201ds distance as a metric for image retrieval (1998) 2\n[22] Shirdhonkar, S., Jacobs, D.: Approximate earth mover\u2019s distance in linear time. In: Comp Vis Pat Recog.,\n\nIEEE Conf. pp. 1 \u20138 (23-28 2008) 2\n\n9\n\n\f", "award": [], "sourceid": 315, "authors": [{"given_name": "Firdaus", "family_name": "Janoos", "institution": null}, {"given_name": "Weichang", "family_name": "Li", "institution": null}, {"given_name": "Niranjan", "family_name": "Subrahmanya", "institution": null}, {"given_name": "Istvan", "family_name": "Morocz", "institution": null}, {"given_name": "William", "family_name": "Wells", "institution": null}]}