{"title": "Discriminative Network Models of Schizophrenia", "book": "Advances in Neural Information Processing Systems", "page_first": 252, "page_last": 260, "abstract": "Schizophrenia is a complex psychiatric disorder that has eluded a characterization in terms of local abnormalities of brain activity, and is hypothesized to affect the collective, ``emergent working of the brain. We propose a novel data-driven approach to capture emergent features using functional brain networks [Eguiluzet al] extracted from fMRI data, and demonstrate its advantage over traditional region-of-interest (ROI) and local, task-specific linear activation analyzes. Our results suggest that schizophrenia is indeed associated with disruption of global, emergent brain properties related to its functioning as a network, which cannot be explained by alteration of local activation patterns. Moreover, further exploitation of interactions by sparse Markov Random Field classifiers shows clear gain over linear methods, such as Gaussian Naive Bayes and SVM, allowing to reach 86% accuracy (over 50% baseline - random guess), which is quite remarkable given that it is based on a single fMRI experiment using a simple auditory task.", "full_text": "Discriminative Network Models of Schizophrenia\n\nGuillermo A. Cecchi, Irina Rish\nIBM T. J. Watson Research Center\n\nYorktown Heights, NY, USA\n\nBenjamin Thyreau\n\nNeurospin\n\nCEA, Saclay, France\n\nBertrand Thirion\n\nINRIA\n\nSaclay, France\n\nMarion Plaze\n\nINSERM - CEA - Univ. Paris Sud\n\nResearch Unit U.797\n\nNeuroimaging & Psychiatry\n\nSHFJ & Neurospin, Orsay, France\n\nMarie-Laure Paillere-Martinot\n\nAP-HP, Adolescent Psychopathology\nand Medicine Dept., Maison de Solenn,\n\nCochin Hospital, University Paris Descartes\n\nF-75014 Paris, France\n\nCatherine Martelli\n\nDepartement de Psychiatrie\n\net d\u2019Addictologie\n\nCentre Hospitalier Paul Brousse\n\nVillejuif, France\n\nJean-Luc Martinot\n\nINSERM - CEA - Univ. Paris Sud\n\nResearch Unit U.797\n\nNeuroimaging & Psychiatry\n\nSHFJ & Neurospin, Orsay, France\n\nJean-Baptiste Poline\n\nNeurospin\n\nCEA, Saclay, France\n\nAbstract\n\nSchizophrenia is a complex psychiatric disorder that has eluded a characterization\nin terms of local abnormalities of brain activity, and is hypothesized to affect the\ncollective, \u201cemergent\u201d working of the brain. We propose a novel data-driven ap-\nproach to capture emergent features using functional brain networks [4] extracted\nfrom fMRI data, and demonstrate its advantage over traditional region-of-interest\n(ROI) and local, task-speci\ufb01c linear activation analyzes. Our results suggest that\nschizophrenia is indeed associated with disruption of global brain properties re-\nlated to its functioning as a network, which cannot be explained by alteration of\nlocal activation patterns. Moreover, further exploitation of interactions by sparse\nMarkov Random Field classi\ufb01ers shows clear gain over linear methods, such as\nGaussian Naive Bayes and SVM, allowing to reach 86% accuracy (over 50% base-\nline - random guess), which is quite remarkable given that it is based on a single\nfMRI experiment using a simple auditory task.\n\n1 Introduction\n\nIt has been long recognized that extracting an informative set of application-speci\ufb01c features from\nthe raw data is essential in practical applications of machine learning, and often contributes even\nmore to the success of learning than the choice of a particular classi\ufb01er. In biological applications,\nsuch as brain image analysis, proper feature extraction is particularly important since the primary\nobjective of such studies is to gain a scienti\ufb01c insight rather than to learn a \u201cblack-box\u201d predictor;\nthus, the focus shifts towards the discovery of predictive patterns, or \u201cbiomarkers\u201d, forming a basis\nfor interpretable predictive models. Conversely, biological knowledge can drive the de\ufb01nition of\nfeatures and lead to more powerful classi\ufb01cation.\nThe objective of this work is to identify biomarkers predictive of schizophrenia based on fMRI\ndata collected for both schizophrenic and non-schizophrenic subjects performing a simple auditory\ntask in the scanner [14]. Unlike some other brain disorders (e.g., stroke or Parkinsons disease),\nschizophrenia appears to be \u201cdelocalized\u201d, i.e. dif\ufb01cult to attribute to a dysfunction of some par-\n\n1\n\n\fticular brain areas1. The failure to identify speci\ufb01c areas, as well as the controversy over which\nlocalized mechanisms are responsible for the symptoms associated with schizophrenia, have led us\namongst others [7, 1, 10] to hypothesize that this disease may be better understood as a disruption of\nthe emergent, collective properties of normal brain states, which can be better captured by functional\nnetworks [4], based on inter-voxel correlation strength, as opposed (or limited) to activation failures\nlocalized to speci\ufb01c, task-dependent areas.\nTo test this hypothesis, we measured diverse topological features of the functional networks and\ncompared them across the normal subjects and schizophrenic patients groups. Speci\ufb01cally, we\ndecided to ask the following questions: (1) What speci\ufb01c effects does schizophrenia have on the\nfunctional connectivity of brain networks? (2) Does schizophrenia affect functional connectivity\nin ways that are congruent with the effect it has on area-speci\ufb01c, task-dependent activations? (3)\nIs it possible to use functional connectivity to improve the classi\ufb01cation accuracy of schizophrenic\npatients?\nIn answer to these questions, we will show that degree maps, which assign to each voxel the number\nof its neighbors in a network, identify spatially clustered groups of voxels with statistically signif-\nicant group (i.e. normal vs. schizophrenic) differences; moreover, these highly signi\ufb01cant voxel\nsubsets are quite stable over different data subsets. In contrast, standard linear activation maps com-\nmonly used in fMRI analysis show much weaker group differences as well as stability. Moreover,\ndegree maps yield very informative features, allowing for up to 86% classi\ufb01cation accuracy (with\n50% baseline), as opposed to standard local voxel activations. The best accuracy is achieved by fur-\nther exploiting non-local interactions with probabilistic graphical models such as Markov Random\nFields, as opposed to linear classi\ufb01ers.\nFinally, we demonstrate that traditional approaches based on a direct comparison of the correlation\nat the level of relevant regions of interest (ROIs) or using a functional parcellation technique [17],\ndo not reveal any statistically signi\ufb01cant differences between the groups. Indeed, a more data-driven\napproach that exploits properties of voxel-level networks appears to be necessary in order to achieve\nhigh discriminative power.\n2 Background and Related Work\n\nIn Functional Magnetic Resonance Imaging (fMRI), a MR scanner non-invasively records a sub-\nject\u2019s blood-oxygenation-level dependent (BOLD) signal, known to be correlated with neural activ-\nity, as a subject performs a task of interest (e.g., viewing a picture or reading a sentence). Such scans\nproduce a sequence of 3D images, where each image typically has on the order of 10,000-100,000\nsubvolumes, or voxels, and the sequence typically contains a few hundreds of time points, or TRs\n(time repetitions). Standard fMRI analysis approaches, such as the General Linear Model (GLM)\n[9], examine mass-univariate relationships between each voxel and the stimulus in order to build\nso-called statistical parametric maps that associate each voxel with some statistics that re\ufb02ects its\nrelationship to the stimulus. Commonly used activation maps depict the \u201cactivity\u201d level of each\nvoxel determined by the linear correlation of its time course with the stimulus (see Supplemental\nMaterial for details).\nClearly, such univariate analysis can miss important information contained in the interactions among\nvoxels. Indeed, as it was shown in [8], highly predictive models of mental states can be built from\nvoxels with sub-maximal activation. Recently, applying multivariate predictive methods to fMRI\nbecame an active area of research, focused on predicting \u201cmental states\u201d from fMRI data [11, 13, 2].\nHowever, our focus herein is not just predictive modeling, but rather discovery of interpretable\nfeatures with high discriminative power. Also, our problem is much more high-dimensional, since\neach sample (e.g., schizophrenic vs. non-schizophrenic) corresponds to a sequence of 3D images\nover about 400 time points, rather than to a single 3D image as in [11, 13, 2].\nWhile the importance of modeling brain connectivity and interactions became widely recognized in\nthe current fMRI-analysis literature [6, 19, 16], practical applications of the proposed approaches\nsuch as dynamic causal modeling [6], dynamic Bays nets [19], or structural equations [16] were\n\n1This is often referred to as the disconnectionhypothesis[5, 15], and can be traced back to the early research\non schizophrenia: in 1906, Wernicke [18] was the \ufb01rst one to postulate that anatomical disruption of association\n\ufb01ber tracts is at the roots of psychosis; in fact, the term schizophrenia was introduced by Bleuler [3] in 1911,\nand was meant to describe the separation (splitting) of different mental functions.\n\n2\n\n\fROI name\n\n\u2019Temporal mid L\u2019\n\n\u2019Temporal mid et sup L\u2019\n\n\u2019Frontal inf L\u2019\n\n\u2019cuneus L\u2019\n\n\u2019Temporal sup et mid L\u2019\n\n\u2019Angular L\u2019\n\n\u2019Temporal sup R\u2019\n\n\u2019Angular R\u2019\n\n\u2019Cingulum post R\u2019\n\n\u2019ACC\u2019\n\n(x,y,z) position\n\n-44,-48,4\n-56,-36,0\n-40,28,0\n-12,-72,24\n-52,-16,-8\n-44,-48,32\n40,-64,24\n40,-64,24\n4,-32,24\n0,20,30\n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n\nMiddle and superior left temporal\n\nAnatomical position\n\nLeft temporal\n\nLeft Inferior frontal\n\nLeft cuneus\n\nMiddle and superior left temporal\n\nLeft angular gyrus\n\nRight superior temporal\n\nRight angular gyrus\n\nRight posterior cingulum\nAnterior cingulated cortex\n\nFigure 1: Regions of Interest and their location on standard brain.\n\nusually limited to interactions analysis among just a few (e.g., less than 15) known brain regions\nbelieved to be relevant to the task or phenomenon of interest. In this paper, we demonstrate that such\nmodel-based region-of-interest (ROI) analysis may fail to reveal informative interactions which,\nnevertheless, become visible at the \ufb01ner-grain voxel level when using a purely data-driven, network-\nbased approach [4]. Moreover, while recent publications have already indicated that functional\nnetworks in the schizophrenic brain display disrupted topological properties, we demonstrate, for\nthe \ufb01rst time, that (1) speci\ufb01c topological properties (e.g. voxel degrees) of functional networks can\nhelp to construct highly-predictive schizophrenia classi\ufb01ers that generalize well and (2) functional\nnetwork differences cannot be attributed to alteration of local activation patterns, a hypothesis that\nwas not ruled out by the results of [1, 10] and similar work.\n3 Experimental Setup\nThe present study is a reanalysis of image datasets previously acquired according to the method-\nology described in [14]. Two groups of 12 subjects each were submitted to the same experimental\nparadigm involving language: schizophrenic patients and age-matched normal controls (same ex-\nperiment was performed with a third group of alcoholic patients, yielding similar results - see Suppl.\nMaterials for details). The studies had been performed after approval of the local ethics committee\nand all subjects were studied after they gave written informed consent. The task is based on au-\nditory stimuli; subjects listen to emotionally neutral sentences either in native (French) or foreign\nlanguage. Average length (3.5 sec mean) or pitch of both kinds of sentences is normalized. In order\nto catch attention of subjects, each trial begins with a short (200 ms) auditory tone, followed by\nthe actual sentence. The subject\u2019s attention is asserted through a simple validation task: after each\nplayed sentences, a short pause of 750 ms is followed by a 500 ms two-syllable auditory cue, which\nbelongs to the previous sentence or not, to which the subject must answer to by yes (the cue is part\nof the previous sentence) or no with push-buttons, when the language of the sentence was his own.\nFor each subject, two fMRI acquisition runs are acquired, each of which consisted of 420-scans\n(from which the \ufb01rst 4 are discarded to eliminate T1 effect). A full fMRI run contains 96 trials, with\n32 sentences in French (native), 32 sentences in foreign languages, and 32 silence interval controls.\nData were spatially realigned and warped into the MNI template and smoothed (FWHM of 5mm)\nusing SPM5 (www.\ufb01l.ucl.ac.uk); also, standard SPM5 motion correction was performed. Several\nsubjects were excluded from the consideration due to excessive head motion in the scanner, leav-\ning us with 11 schizophrenic and 11 healthy subjects, i.e. the total of 44 samples (there were two\nsamples per subject, corresponding to the two runs of the experiment). Each sample associated with\nroughly 53,000 voxels (after removing out-of-brain voxels from the original 53 \u00d7 63 \u00d7 46 image),\nover 420 time points (TRs), i.e. with more than 22,000,000 voxels/variables. Thus, some kind of\ndimensionality reduction and/or feature extraction is necessary prior to learning a predictive model.\n4 Methods\nWe explored two different data analysis approaches aimed at discovery of discriminative patterns:\n(1) model-driven approaches based on prior knowledge about the regions of interest (ROI) that are\nbelieved to be relevant to schizophrenia, or model-based functional clustering, and (2) data-driven\napproaches based on various features extracted from the fMRI data, such as standard activation maps\nand a set of topological features derived from functional networks.\n4.1 Model-Driven Approach using ROI\nFirst, we decided to test whether the interactions between several known regions of interest (ROIs)\nwould contain enough discriminative information about schizophrenic versus normal subjects. Ten\n\n3\n\n\fregions of interests (ROI) were de\ufb01ned using previous literature on schizophrenia and language\nstudies, including inferior, middle and superior left temporal cortex, left inferior temporal cortex,\nleft cuneus, left angular gyrus, right superior temporal, right angular gyrus, right posterior cingu-\nlum, and anterior cingular cortex (Figure 1). Each region was de\ufb01ned as a sphere of 12mm diameter\ncentered on the x,y,z coordinates of the corresponding ROI. Because prede\ufb01ned regions of inter-\nest may be based on too much a priori knowledge and miss important areas, we also ran a more\nexploratory analysis. A second set of 600 ROI\u2019s was de\ufb01ned automatically using a parcellation al-\ngorithm [17] that estimates, for each subject, a collection of regions based on task-based functional\nsignal similarity and position in the MNI space.\nTime series were extracted as the spatial mean over each ROI, leading to 10 time series per subject\nfor the prede\ufb01ned ROIs and 600 for the parcellation technique. The connectivity measures were\nof two kinds. First, the correlation coef\ufb01cient was computed along time between ROIs blindly with\nrespect to the experimental paradigm. Additionally, we computed a psycho-physiological interaction\n(PPI), by contrasting the correlation coef\ufb01cient weighted by experimental conditions (i.e. correlation\nweighted by the \u201dLanguage French\u201d condition versus correlation weighted by \u201dControl\u201d condition\nafter convolution with a standard hemodynamic response function). Those connectivity measures\nwere then tested for signi\ufb01cance using standards non parametric tests between groups (Wilcoxon\nsigned-rank test) with corrected p-values for multiple comparisons.\n4.2 Data-driven Approach: Feature Extraction\nTopological Features and Degree Maps. In order to continue investigating possible disruptions\nof global brain functioning associated with schizophrenia, we decided to explore lower-level (as\ncompared to ROI-level) functional brain networks [4] constructed at the voxel level: (1) pair-wise\nPearson correlation coef\ufb01cients are computed among all pairs of time-series (vi(t), vj(t)) where\nvi(i) corresponds to the BOLD signal of i-th voxel; (2) an edge between a pair of voxels (i, j) is\nincluded in the network if the correlation between vi and vj exceeds a speci\ufb01ed threshold (herein,\nwe used the same threshold of c(Pearson)=0.7 for all voxel pairs).\nFor each subject, and each run, a separate functional network was constructed. Next, we measured\na number of its topological features, including the degree distribution, mean degree, the size of the\nlargest connected subgraph (giant component), and so on (see the supplemental material for the full\nlist). Besides global topological features, we also computed a series of degree maps based on the\nindividual voxel degree in functional network: (1) full degree maps, where the value assigned to\neach voxel is the total number of links in the corresponding network node, (2) long-distance degree\nmaps, where the value is the number of links making non-local connections (5 voxels apart or more),\nand (3) inter-hemispheric degree maps, where only links reaching across the brain hemispheres are\nconsidered when computing each voxel\u2019s degree.\nActivation maps. To \ufb01nd out whether local task-dependent linear activations alone could possibly\nexplain the differences between the schizophrenic and normal brains, we used as a baseline set of\nfeatures based on the standard voxel activation maps. For each subject, and for each run, activation\nmaps, as well as their differences, or activation contrast maps, were obtained using several regressors\nbased on the language task, as described in the supplemental material (for simplicity, we will refer\nto all such maps as activation maps). The activation values of each voxel were subsequently used\nas features in the classi\ufb01cation task. Similarly to degree maps, we also computed a global feature,\nmean-activation (mean-t-val)), by taking the mean absolute value of the voxel\u2019s t-statistics. Both\nactivation and degree maps for each sample were also normalized, i.e. divided by their maximal\nvalue for the given sample.\n4.3 Classi\ufb01cation Approaches\nFirst, off-the-shelf methods such Gaussian Naive Bayes (GNB) and Support Vector Machines (SVM)\nwere used in order to compare the discriminative power of different sets of features described above.\nMoreover, we decided to further investigate our hypothesis that interactions among voxels contain\nhighly discriminative information, and compare those linear classi\ufb01ers against probabilistic graph-\nical models that explicitly model such interactions. Speci\ufb01cally, we learn a classi\ufb01er based on a\nsparse Gaussian Markov Random Field (MRF) model [12], which leads to a convex problem with\nunique optimal solution, and can be solved ef\ufb01ciently; herein, we used the COVSEL procedure [12].\nThe weight on the l1-regularization penalty serves as a tuning parameter of the classi\ufb01er, allowing\nto control the sparsity of the model, as described below.\n\n4\n\n\f2 e\u2212 1\n\n(cid:80)n\n\nSparse Gaussian MRF classi\ufb01er. Let X = {X1, ..., Xp} be a set of p random variables (e.g.,\nvoxels), and let G = (V, E) be an undirected graphical model (Markov Network, or MRF) rep-\nresenting conditional independence structure of the joint distribution P (X). The set of vertices\nV = {1, ..., p} is in the one-to-one correspondence with the set X. There is no edge between Xi\nand Xj if and only if the two variables are conditionally independent given all remaining variables.\nLet x = (x1, ..., xp) denote a random assignment to X. We will assume a multivariate Gaussian\n2 xT Cx, where C = \u03a3\u22121 is the inverse covari-\nprobability density p(x) = (2\u03c0)\u2212p/2 det(C) 1\nance matrix, and the variables are normalized to have zero mean. Let x1, ..., xn be a set of n i.i.d.\nsamples from this distribution, and let S = 1\ni xi denote the empirical covariance matrix.\nn\nMissing edges in the above graphical model correspond to zero entries in the inverse covariance ma-\ntrix C, and thus the problem of learning the structure for the above probabilistic graphical model is\nequivalent to the problem of learning the zero-pattern of the inverse-covariance matrix 2. A popular\napproach is to use l1-norm regularization that is known to promote sparse solutions, while still al-\nlowing (unlike non-convex lq-norm regularization with 0 < q < 1) for ef\ufb01cient optimization. From\nthe Bayesian point of view, this is equivalent to assuming that the parameters of the inverse covari-\n(cid:80)\nance matrix C = \u03a3\u22121 are independent random variables Cij following the Laplace distributions\n2 e\u2212\u03bbij|Cij\u2212\u03b1ij| with zero location parameters (means) \u03b1ij and equal scale parameters\np(Cij) = \u03bbij\nij |Cij| is the\n\u03bbij = \u03bb. Then p(C) =\n(vector) l1-norm of C. Assume a \ufb01xed parameter \u03bb, our objective is to \ufb01nd arg maxC(cid:194)0 p(C|X),\nwhere X is the n \u00d7 p data matrix, or equivalently, since p(C|X) = P (X, C)/p(X) and p(X) does\nnot include C, to \ufb01nd arg maxC(cid:194)0 P (X, C), over positive de\ufb01nite matrices C. This yields the\nfollowing optimization problem considered, for example, in [12]\n\ne\u2212\u03bb||C||1, where ||C||1 =\n\nj=1 p(Cij) = (\u03bb/2)p2\n\ni=1 xT\n\n(cid:81)p\n\n(cid:81)p\n\ni=1\n\nln det(C) \u2212 tr(SC) \u2212 \u03bb||C||1\n\nmax\nC(cid:194)0\n\nwhere det(A) and tr(A) denote the determinant and the trace (the sum of the diagonal elements) of\na matrix A, respectively. For the classi\ufb01cation task, we estimate on the training data the Gaus-\nsian conditional density p(x|y) (i.e.\nthe (inverse) covariance matrix parameter) for each class\nY = {0, 1} (schizophrenic vs non-schizophrenic), and then choose the most-likely class label\narg maxc p(x|c)P (c) for each unlabeled test sample x.\nVariable Selection: We used variable selection as a preprocessing step before applying a partic-\nular classi\ufb01er, in order to (1) reduce the computational complexity of classi\ufb01cation (especially for\nsparse MRF, which, unlike GNB and SVM, could not be directly applied to over 50,000 variables),\n(2) reduce noise and (3) identify relatively small predictive subsets of voxels. We applied a sim-\nple \ufb01lter-based approach, selecting a subset of top-ranked voxels, where the ranking criterion used\np-values resulting from the paired t-test, with the null-hypothesis being that the voxel values cor-\nresponding to schizophrenic and non-schizophrenic subjects came from distributions with equal\nmeans. The variables were ranked in the ascending order of their p-values (lower p = higher con\ufb01-\ndence in between-group differences), and classi\ufb01cation results on top k voxels will be presented for\na range of k values.\nEvaluation via Cross-validation. We used leave-one-subject-out rather than leave-one-sample-out\ncross-validation, since the two runs (two samples) for each subject are clearly not i.i.d. and must be\nhandled together to avoid biases towards overly-optimistic results.\n5 Results\nModel-driven ROI analysis. First, we observed that correlations (blind to experimental paradigm)\nbetween regions and within subjects were very strong and signi\ufb01cant (p-value of 0.05, corrected\nfor the number of comparisons) when tested against 0 for all subjects (mean correlation > 0.8 for\nevery group). However, these inter-region correlations do not seem to differ signi\ufb01cantly between\nthe groups. The parcellation technique led to some smaller p-values, but also to a stricter correction\nfor multiple comparison and no correlation was close to the corrected threshold. Concerning the\npsycho-physiological interaction, results were closer to signi\ufb01cance, but did not survive multiple\ncomparisons. In conclusion, we could not detect signi\ufb01cant differences between the schizophrenic\npatient data and normal subjects in either the BOLD signal correlation or the interaction between\nthe signal and the main experimental contrast (native language versus silence).\n\n2Note that the inverse of the empirical covariance matrix, even if it exists, does not typically contain exact\n\nzeros. Therefore, an explicit sparsity constraint is usually added to the estimation process.\n\n5\n\n\f(a)\n\n(b)\n\nFigure 2: (a) FDR-corrected 2-sample t-test results for (normalized) degree maps, where the null hypothesis at\neach voxel assumes no difference between the schizophrenic vs normal groups. Red/yellow denotes the areas\nof low p-values passing FDR correction at \u03b1 = 0.05 level (i.e., 5% false-positive rate). Note that the mean\n(normalized) degree at those voxels was always (signi\ufb01cantly) higher for normals than for schizophrenics. (b)\nDirect comparison of voxel p-values and FDR threshold: p-values sorted in ascending order; FDR test select\nvoxels with p < \u03b1 \u00b7 k/N (\u03b1 - false-positive rate, k - the index of a p-value in the sorted sequence, N - the total\nnumber of voxels). Degree maps yield a large number (1033, 924 and 508 voxels in full, long-distance and\ninter-hemispheric degree maps, respectively) of highly-signi\ufb01cant (very low) p-values, staying far below the\nFDR cut-off line, while only a few voxels survive FDR in case of activation maps: 7 and 2 voxels in activation\nmaps 1 (contrast \u201cFrenchNative - Silence\u201d) and 6 (\u201cFrenchNative\u201d), respectively (the rest of the activation maps\ndo not survive the FDR correction at all).\n\nData-driven analysis: topological vs activation features. Empirical results are consistent with our\nhypothesis that schizophrenia disrupts the normal structure of functional networks in a way that is\nnot derived from alterations in the activation; moreover, they demonstrate that topological properties\nare highly predictive, consistently outperforming predictions based on activations.\n1. Voxel-wise statistical analysis. Degree maps show much stronger statistical differences be-\ntween the schizophrenic vs. non-schizophrenic groups than the activation maps. Figure 2 show\nthe 2-sample t-test results for the full degree map and the activation maps, after False-Discovery\nRate (FDR) correction for multiple comparisons (standard in fMRI analysis), at \u03b1 = 0.05 level\n(i.e., 5% false-positive rate). While the degree map (Figure 2a) shows statistically signi\ufb01cant differ-\nences bilaterally in auditory areas (speci\ufb01cally, normal group has higher degrees than schizophrenic\ngroup), the activation maps show almost no signi\ufb01cant differences at all: practically no voxels there\nsurvived the FDR correction (Figure 2b. This suggests that (a) the differences in the collective be-\nhavior cannot be explained by differences in the linear task-related response, and that (b) topology\nof voxel-interaction networks is more informative than task-related activations, suggesting an ab-\nnormal degree distribution for schizophrenic patients that appear to lack hubs in auditory cortex,\ni.e., have signi\ufb01cantly lower (normalized) voxel degrees in that area than the normal group (possibly\ndue to a more even spread of degrees in schizophrenic vs. normal networks). Moreover, degree\nmaps demonstrate much higher stability than activation maps with respect to selecting a subset of\ntop ranked voxels over different subsets of data. Figure 3a shows that degree maps have up to\nalmost 70% top-ranked voxels in common over different training data sets when using the leave-\none-subject out cross-validation, while activation maps have below 50% voxels in common between\ndifferent selected subsets. This property of degree vs activation features is particularly important for\ninterpretability of predictive modeling.\n2. Inter-hemispheric degree distributions. A closer look at the degree distributions reveals that a\nlarge percentage of the differential connectivity appears to be due to long-distance, inter-hemispheric\nlinks. Figure 3a compares (normalized) histograms, for schizophrenic (red) versus normal (blue)\ngroups, of the fraction of inter-hemispheric connections over the total number of connections, com-\nputed for each subject within the group. The schizophrenic group shows a signi\ufb01cant bias towards\nlow relative inter-hemispheric connectivity. A t-test analysis of the distributions indicates that dif-\nferences are statistically signi\ufb01cant (p=2.5x10-2). Moreover, it is evident that a major contributor to\nthe high degree difference discussed before is the presence of a large number of inter-hemispheric\nconnections in the normal group, which is lacking in schizophrenic group. Furthermore, we selected\na bilateral regions of interest (ROI\u2019s) corresponding to left and right Brodmann Area 22 (roughly, the\nclusters in Figure 2a), suchthatthelinearactivationfortheseROI\u2019swasnotsigni\ufb01cantlydifferent\nbetweenthegroups, even in the uncorrected case. For each subject, the link between the left and\n\n6\n\n10010110210310410510\u2212910\u2212810\u2212710\u2212610\u2212510\u2212410\u2212310\u2212210\u22121100k/Np value P values and FDR correction0.05* k/Nactivation 1 FrenchNative\u2212Silence activation 6 FrenchNativedegree (full)degree (long\u2212distance)degree (inter\u2212hemispheric)\f(a)\n\n(b)\n\n(c)\n\nFigure 3: (a) Stability of feature subset selection over CV folds, i.e. the percent of voxels in common among\nthe subsets of k top variables selected at all CV folds. (b) Disruption of global inter-hemispheric connectivity.\nFor each subject, we compute the fraction of inter-hemispheric connections over the total number of connec-\ntions, and plot a normalized histogram over all subjects in a particular group (normal - blue, schizophrenic -\nred). (c) Disruption of task-dependent inter-hemispheric connectivity between speci\ufb01c ROIs (Brodmann Area\n22 selected bilaterally). The ROIs were de\ufb01ned by a 9 mm radius ball centered at [x=-42, y=-24, z=3] and\n[x=42, y=-24, z=3].\n\nFeature\n\ndegree (D)\n\nclustering coeff. (C)\ngeodesic dist. (G)\nmean activation (A)\n\nD + A\nC + A\nG + A\n\nG +D +C\nG+D+C+A\n\nSVM\n27.5%\n42.5%\n45.0%\n45%\n27.5%\n45.0%\n45.0%\n27.5%\n27.5%\n\n(GNB\n27.5%\n30.0%\n67.5%\n40.0%\n27.5%\n27.5%\n45.0%\n37.5%\n30.0%\n\n(a)\n\nMRF(0.01)\n\n27.5%\n45.0%\n45.0%\n72.5%\n32.5%\n55.0%\n72.5%\n27.5%\n32.5%\n\nFeature\n\ndegree (full)\n\ndegree (long-distance)\ndegree (inter-hemis)\nactivation 1 (and 3)\nactivation 2 (and 4)\n\nactivation 5\nactivation 6\nactivation 7\nactivation 8\n\nFalse Pos\n\n27%\n32%\n46%\n29%\n55%\n18%\n27%\n18%\n23%\n\nFalse Neg\n\n5%\n9%\n18%\n82%\n45%\n68%\n46%\n46%\n37%\n\nError\n16%\n21%\n32%\n54%\n50%\n43%\n36%\n32%\n30%\n\n(b)\n\nTable 1: Classi\ufb01cation errors using (a) global features and (b) activation and degree maps (using SVM on the\ncomplete set of voxels (i.e., without voxel subset selection).\n\nright ROIs was computed as the fraction of ROI-to-ROI connections over all connections; Figure\n3c shows the normalized histograms. Clearly, the normal group displays a high density of inter-\nhemispheric connections, which are signi\ufb01cantly disrupted in the schizophrenic group (p=3.7x10-\n7). This provides a strong indication that the group differences in connectivity cannot be explained\nby differences in local activation.\n3. Global features. For each global feature (full list in Suppl. Mat.) we computed its mean for\neach group and p-value produced by the t-test, as well as the classi\ufb01cation accuracies using our\nclassi\ufb01ers. While more details are presented in the supplemental material, we outline here the main\nobservations: while mean activation (we used map 8, the best performer for SVM on the full set of\nvoxels - see Table1b) had an relatively low p-value of 5.5 \u00d7 10\u22124, as compared to less signi\ufb01cant\np = 5.3 \u00d7 10\u22122 for mean-degree, the predictive power of the latter, alone or in combination with\nsome other features, was the best among global features reaching 27.5% in schizophrenic vs normal\nclassi\ufb01cation (Table 1a), while mean activation yielded more than 40% error with all classi\ufb01ers.\n4. Classi\ufb01cation results using degree vs. activation maps. While mean-degree indicates the\npresence of discriminative information in voxel degrees, its generalization ability, though the best\namong global features and their combinations, is relatively poor. However, voxel-level degree maps\nturned out to be excellent predictive features, often outperforming activation features by far. Table\n1b compares prediction made by SVM on complete maps (without voxel subset selection): both\nfull and long-distance degree maps greatly outperform all activation maps, achieving 16% error\nvs. above 30% for even the best-performing activation map 8. Next, in Figure 4, we compare the\npredictive power of different maps when using all three classi\ufb01ers: Support Vector Machines (SVM),\nGaussian Naive Bayes (GNB) and sparse Gaussian Markov Random Field (MRF), on the subsets\nof k top-ranked voxels, for a variety of k values. We used the best-performing activation map 8\nfrom the Table above, as well as maps 1 and 6 (that survived FDR); map 6 was also outperforming\nother activation maps in low-voxel regime. To avoid clutter, we only plot the two best-performing\ndegree maps out of three (i.e., full and long-distance ones). For sparse MRF, we experimented with\na variety of \u03bb values, ranging from 0.0001 to 10, and present the best results. We can see that: (a)\nDegree maps frequently outperform activation maps, for all classi\ufb01ers we used; the differences are\n\n7\n\n01000200030004000500000.10.20.30.40.50.60.7% voxels in common# of top\u2212ranked voxels selectedStability of top\u2212ranked voxel subset degree(full)degree (long distance)degree(inter\u2212hemispheric)activation 1 (and 3)activation 2 (and 4)activation 5activation 6activation 7activation 800.10.200.20.40.6Relative link densityHistograms over samples051015x 10\u2212400.20.40.60.81Relative link densityHistogram over samples\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 4: Classi\ufb01cation results comparing (a) GNB, (b) SVM and (c) sparse MRF on degree versus activation\ncontrast maps; (d) all three classi\ufb01ers compared on long-distance degree maps (best-performing for MRF).\nparticularly noticeable when the number of selected voxels is relatively low. The most signi\ufb01cant\ndifferences are observed for SVM in low-voxel (approx. < 500) and full-map regimes, as well as\nfor MRF classi\ufb01ers: it is remarkable that degree maps can achieve an impressively low error of\n14% with only 100 most signi\ufb01cant voxels, while even the best activation map 6 requires more than\n200-300 to get just below 30% error; the other activation maps perform much worse, often above\n30-40% error, or even just at the chance level. (b) Full and long-distance degree maps perform quite\nsimilarly, with long-distance map achieving the best result (14% error) using MRFs. (c) Among the\nactivation maps only, while the map 8 (\u201cSilence\u201d) outperforms others on the full set of voxels using\nSVM, its behavior in low-voxel regime is quite poor (always above 30-35% error); instead, map\n6 (\u201cFrenchNative\u201d) achieves best performance among activation maps in this regime3. (d) MRF\nclassi\ufb01ers clearly outperform SVM and GNB, possibly due to their ability to capture inter-voxel\nrelationships that are highly discriminative between the two classes (see Figure 4d).\n6 Summary\nThe contributions of this paper are two-fold. From a machine-learning and fMRI analysis perspec-\ntive, we (a) introduced a novel feature-construction approach based on topological properties of\nfunctional networks, that is generally applicable to any multivariate-timeseries classi\ufb01cation prob-\nlems, and can outperform standard linear activation approaches in fMRI analysis \ufb01eld, (b) demon-\nstrated advantages of this data-driven approach over prior-knowledge-based (ROI) approaches, and\n(c) demonstrated advantages of network-based classi\ufb01ers (Markov Random Fields) over linear mod-\nels (SVM, Naive Bayes) on fMRI data, suggesting to exploit voxel interactions in fMRI analyzes\n(i.e., treat brain as a network). From neuroscience perspective, we provided strong support for the\nhypothesis that schizophrenia is associated with the disruption of global, emergent brain properties\nwhich cannot be explained just by alteration of local activation patterns. Moreover, while prior art\nis mainly focused on exploring the differences between the functional and anatomical networks of\nschizophrenic patients versus healthy subjects [10, 1], this work, to our knowledge, is the \ufb01rst at-\ntempt to explore the generalization ability of predictive models of schizophrenia built on network\nfeatures.\nFinally, a word of caution. Note that the schizophrenia patients studied here have been selected for\ntheir prominent, persistent, and pharmaco-resistant auditory hallucinations [14], which might have\nincreased their clinical homogeneity. However, the patient group is not representative of the full\nspectrum of the disease, and thus our conclusions may not necessarily apply to all schizophrenia\npatients, due to the clinical characteristics and size of the studied samples.\nAcknowledgements\nWe would like to thank Rahul Garg for his help with the data preprocessing and many stimulating\ndiscussions that contributed to the ideas of this paper, and Drs. Andr\u00b4e Galinowski, Thierry Gallarda,\nand Frank Bellivier who recruited and clinically rated the patients. We also would like to thank\nINSERM as promotor of the MR data acquired (project RBM 01 \u2212 26).\n\n3We also observed that performing normalization really helped activation maps, since otherwise their per-\n\nformance could get much worse, especially with MRFs - we provide those results in supplemental material.\n\n8\n\n1011021030.10.20.30.40.50.60.70.8classification errorGaussian Naive Bayes schizophrenic vs normal K top voxels (ttest)activation 1 FrenchNative \u2212 Silence activation 6 FrenchNative activation 8 Silence degree (long\u2212distance)degree (full)1011021031041050.10.20.30.40.50.60.70.8K top voxels (ttest) Support Vector Machine:schizophrenic vs normal activation 1 FrenchNative \u2212 Silence activation 6 FrenchNative activation 8 Silence degree (long\u2212distance)degree (full) 501001502002503000.10.20.30.40.50.60.70.8 Markov Random Field:schizophrenic vs normal K top voxels (ttest)activation 1 FrenchNative \u2212 Silenceactivation 6 FrenchNative activation 8 Silence degree (long\u2212distance)degree (full)501001502002503000.10.20.30.40.50.60.70.8 MRF vs GNB vs SVM:schizophrenic vs normal K top voxels (ttest)MRF (0.1): degree (long\u2212distance)GNB: degree (long\u2212distance)SVM:degree (long\u2212distance)\fReferences\n[1] D.S. Bassett, E.T. Bullmore, B.A. Verchinski, V.S. Mattay, D.R. Weinberger, and A. Meyer-\nLindenberg. Hierarchical organization of human cortical networks in health and schizophrenia.\nJ Neuroscience, 28(37):9239\u20139248, 2008.\n\n[2] A. Battle, G. Chechik, and D. Koller. Temporal and cross-subject probabilistic models for\nfmri prediction tasks. In B. Sch\u00a8olkopf, J. Platt, and T. Hoffman, editors, Advances in Neural\nInformation Processing Systems 19, pages 121\u2013128. MIT Press, Cambridge, MA, 2007.\n\n[3] E. Bleuler. Dementia Praecox or the Group of Schizophrenias. International Universities Press,\n\nNew York, NY, 1911.\n\n[4] V.M. Eguiluz, D.R. Chialvo, G.A. Cecchi, M. Baliki, and A.V. Apkarian. Scale-free functional\n\nbrain networks. Physical Review Letters, 94(018102), 2005.\n\n[5] K.J. Friston and C.D. Frith. Schizophrenia: A Disconnection Syndrome? Clinical Neuro-\n\nscience, (3):89\u201397, 1995.\n\n[6] K.J. Friston, L. Harrison, and W.D. Penny. Dynamic Causal Modelling. Neuroimage,\n\n19(4):1273\u20131302, Aug 2003.\n\n[7] A.G. Garrity, G. D. Pearlson, K. McKiernan, D. Lloyd, K.A. Kiehl, and V.D. Calhoun. Aber-\nrant \u201cDefault Mode\u201d Functional Connectivity in Schizophrenia. Am J Psychiatry, 164:450\u2013\n457, March 2007.\n\n[8] J.V. Haxby, M.I. Gobbini, M.L. Furey, A.Ishai, J.L. Schouten, and P. Pietrini. Distributed\nand Overlapping Representations of Faces and Objects in Ventral Temporal Cortex. Science,\n293(5539):2425\u20132430, 2001.\n\n[9] K. J. Friston et al. Statistical parametric maps in functional imaging - a general linear approach.\n\nHuman Brain Mapping, 2:189\u2013210, 1995.\n\n[10] Y. Liu, M. Liang, Y. Zhou, Y. He, Y. Hao, M. Song, C. Yu, H. Liu, Z. Liu, and T. Jiang.\n\nDisrupted Small-World Networks in Schizophrenia. Brain, 131:945\u2013961, February 2008.\n\n[11] T.M. Mitchell, R. Hutchinson, R.S. Niculescu, F. Pereira, X. Wang, M. Just, and S. Newman.\nLearning to Decode Cognitive States from Brain Images. Machine Learning, 57:145\u2013175,\n2004.\n\n[12] O.Banerjee, L. El Ghaoui, and A. d\u2019Aspremont. Model selection through sparse maximum\nlikelihood estimation for multivariate gaussian or binary data. Journal of Machine Learning\nResearch, 9:485\u2013516, March 2008.\n\n[13] F. Pereira and G. Gordon. The Support Vector Decomposition Machine. In ICML2006, pages\n\n689\u2013696, 2006.\n\n[14] M. Plaze, D. Bartrs-Faz, JL Martinot, D. Januel, F. Bellivier, R. De Beaurepaire, S. Chan-\nraud, J. Andoh, JP Lefaucheur, E. Artiges, C. Pallier, and ML Paillere-Martinot. Left superior\ntemporal gyrus activation during sentence perception negatively correlates with auditory hal-\nlucination severity in schizophrenia patients. Schizophrenia Research, 87(1-3):109\u2013115, 2006.\n[15] K.E. Stephan, K.J. Friston, and C.D. Frith. Dysconnection in Schizophrenia: From Abnormal\nSynaptic Plasticity to Failures of Self-monitoring. Schizophrenia Bulletin, 35(3):509\u2013527,\n2009.\n\n[16] A. J. Storkey, E. Simonotto, H. Whalley, S. Lawrie, L. Murray, and D. McGonigle. Learning\nstructural equation models for fmri. In Advances in Neural Information Processing Systems\n19, pages 1329\u20131336. 2007.\n\n[17] B. Thirion, G. Flandin, P. Pinel, A. Roche, P. Ciuciu, and J.-B. Poline. Dealing with the\nshortcomings of spatial normalization: Multi-subject parcellation of fmri datasets. Human\nBrain Mapping, 27(8):678\u2013693, 2006.\n\n[18] C. Wernicke. Grundrisse der psychiatrie. Thieme, 1906.\n[19] L. Zhang, D. Samaras, N. Alia-Klein, N. Volkow, and R. Goldstein. Modeling neuronal in-\nteractivity using dynamic bayesian networks. In Advances in Neural Information Processing\nSystems 18, pages 1593\u20131600. 2006.\n\n9\n\n\f", "award": [], "sourceid": 717, "authors": [{"given_name": "Irina", "family_name": "Rish", "institution": null}, {"given_name": "Benjamin", "family_name": "Thyreau", "institution": null}, {"given_name": "Bertrand", "family_name": "Thirion", "institution": null}, {"given_name": "Marion", "family_name": "Plaze", "institution": null}, {"given_name": "Marie-laure", "family_name": "Paillere-martinot", "institution": null}, {"given_name": "Catherine", "family_name": "Martelli", "institution": null}, {"given_name": "Jean-luc", "family_name": "Martinot", "institution": null}, {"given_name": "Jean-baptiste", "family_name": "Poline", "institution": null}, {"given_name": "Guillermo", "family_name": "Cecchi", "institution": null}]}