{"title": "Hierarchical Mixture of Classification Experts Uncovers Interactions between Brain Regions", "book": "Advances in Neural Information Processing Systems", "page_first": 2178, "page_last": 2186, "abstract": "The human brain can be described as containing a number of functional regions. For a given task, these regions, as well as the connections between them, play a key role in information processing in the brain. However, most existing multi-voxel pattern analysis approaches either treat multiple functional regions as one large uniform region or several independent regions, ignoring the connections between regions. In this paper, we propose to model such connections in an Hidden Conditional Random Field (HCRF) framework, where the classifier of one region of interest (ROI) makes predictions based on not only its voxels but also the classifier predictions from ROIs that it connects to. Furthermore, we propose a structural learning method in the HCRF framework to automatically uncover the connections between ROIs. Experiments on fMRI data acquired while human subjects viewing images of natural scenes show that our model can improve the top-level (the classifier combining information from all ROIs) and ROI-level prediction accuracy, as well as uncover some meaningful connections between ROIs.", "full_text": "Hierarchical Mixture of Classi\ufb01cation Experts\nUncovers Interactions between Brain Regions\n\nBangpeng Yao1\n\nDirk B. Walther2\n\nDiane M. Beck2,3\u2217\n\nLi Fei-Fei1\u2217\n\n1Computer Science Department, Stanford University, Stanford, CA 94305\n\n2Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801\n\n3Psychology Department, University of Illinois at Urbana-Champaign, Champaign, IL 61820\n{bangpeng,feifeili}@cs.stanford.edu {walther,dmbeck}@illinois.edu\n\nAbstract\n\nThe human brain can be described as containing a number of functional regions.\nThese regions, as well as the connections between them, play a key role in in-\nformation processing in the brain. However, most existing multi-voxel pattern\nanalysis approaches either treat multiple regions as one large uniform region or\nseveral independent regions, ignoring the connections between them. In this paper\nwe propose to model such connections in an Hidden Conditional Random Field\n(HCRF) framework, where the classi\ufb01er of one region of interest (ROI) makes\npredictions based on not only its voxels but also the predictions from ROIs that it\nconnects to. Furthermore, we propose a structural learning method in the HCRF\nframework to automatically uncover the connections between ROIs. We illus-\ntrate this approach with fMRI data acquired while human subjects viewed images\nof different natural scene categories and show that our model can improve the\ntop-level (the classi\ufb01er combining information from all ROIs) and ROI-level pre-\ndiction accuracy, as well as uncover some meaningful connections between ROIs.\n\n1 Introduction\n\nIn recent years, machine learning approaches for analyzing fMRI data have become increasingly\npopular [15, 24, 18, 16].\nIn these multi-voxel pattern analysis (MVPA) approaches, patterns of\nvoxels are associated with particular stimuli, leading to veri\ufb01able predictions about independent\ntest data. Voxels are extracted from previously known regions of interest (ROIs) [15, 31], selected\nfrom the brain by some statistical criterion [24], or de\ufb01ned by a sliding window (\u201csearchlight\u201d)\npositioned at each location in the brain in turn [20]. All of these methods, however, ignore the\nhighly interconnected nature of the brain.\nNeuroanatomical evidence from macaque monkeys [10] indicates that brain regions involved in\nvisual processing are indeed highly interconnected. Since research on human subjects is largely\nlimited to non-invasive procedures, considerably less is known about interactions between visual\nareas in the human brain. Here we demonstrate a method of learning the interactions between\nregions from fMRI data acquired while human subjects view images of natural scenes.\nDetermining the category of a natural scene (e.g. classifying a scene as a beach, or a forest) is impor-\ntant for many human activities such as navigation or object perception [30]. Despite the large variety\nof images within and across categories, humans are very good at categorizing natural scenes [27, 9].\nIn our recent study of natural scene categorization in humans with functional magnetic resonance\nimaging (fMRI), we discovered that information about natural scene categories is represented in pat-\nterns of activity in the parahippocampal place area (PPA), the retrosplenial cortex (RSC), the lateral\noccipital complex (LOC), and the primary visual cortex (V1) [31]. We demonstrated that this infor-\nmation can be read out from fMRI activity with a linear support vector machine (SVM) classi\ufb01er.\n\n\u2217\n\nDiane M. Beck and Li Fei-Fei contributed equally to this work.\n\n1\n\n\fGiven the highly interconnected nature of the brain, however, it is unlikely that these regions encode\nnatural scene categories independently of each other.\nAs previous ROI-based MVPA methods studies, in [31] we built predictors for each ROI indepen-\ndently, ignoring their interactions. The method in [31] neither explores connections among the ROIs\nnor uses the connections to build a classi\ufb01er on top of all ROIs. In this work, we propose a method\nfor simultaneously learning the voxel patterns associated with natural scene categories in sev-\neral ROIs and their interactions in a Hidden Conditional Random Field (HCRF) [28] frame-\nwork. In our model, the classi\ufb01er of each ROI makes predictions based on not only its voxels, but\nalso the prediction results of the ROIs that it connects to. Using the same fMRI data set, we also ex-\nplore a mutual information based method to discover functional connectivity [5]. Our current model\ndiffers from [5], however, by applying a generative model to concurrently estimate the structure of\nconnectivity as well as maximize the end behavioral task (in this case, a scene classi\ufb01cation task).\nFurthermore, we propose a structural learning method to automatically uncover the structure\nof the interactions between ROIs for natural scene categorization, i.e.\nto decide which ROIs\nshould be and which ones should not be connected. Unlike existing models for functional connec-\ntivity, which mostly rely on the correlation of time courses of voxels [23], our approach makes use of\nthe patterns of activity in ROIs as well as the category labels of the images presented to the subjects.\nBuilt in the hierarchical framework of HCRF, our structural learning method utilizes information in\nthe voxel values at the bottom layer of the network as well as categorical labels at the top layer. In\nour method, the connections between each pair of ROIs are evaluated for their potential to improve\nprediction accuracy, and only those that show improvement will be added to the \ufb01nal structural map.\nIn the remaining part of this paper, we \ufb01rst elaborate on our model and structural learning approach\nin Section 2. We discuss related work on MVPA and connectivity analysis in Section 3. Finally, we\npresent experimental results in Section 4 and conclude the paper in Section 5.\n\n2 Modeling Interactions of Brain Regions: a HCRF Representation\n\nThe brain is highly interconnected, and the nature of the connections determines to a large extent\nhow information is processed in the brain. We model the connections of brain regions in a Hid-\nden Conditional Random Field (HCRF) framework for the task of natural scene categorization and\npropose a structural learning method to uncover the pattern of connectivity. In the \ufb01rst part of this\nsection we assume that the structural connections between brain regions are already known. We will\ndiscuss in Section 2.2 how these connections are automatically learned.\n\n2.1\n\nIntegrating Information across Brain Regions\n\nSuppose we are given a set of regions of interest (ROIs) and connections between these regions (see\nthe intermediate layer of Fig.1). Existing ROI-based MVPA approaches build a classi\ufb01er for each\nROI independently [15, 24, 18, 16, 31], neglecting the connections between ROIs. It is our objective\nhere to explore the structure of the connections between ROIs to improve prediction accuracy for\ndecoding viewed scene category from fMRI data.\nIn order to achieve these goals, we propose a Hidden Conditional Random Field (HCRF) model\n(Fig.1) to allow each ROI to be in\ufb02uenced by the ROIs that it connects to and build a top-level\nclassi\ufb01er which makes use of information in all ROIs. In this framework, the classi\ufb01er for one ROI\nmakes prediction based on the voxels in this region as well as the results of the classi\ufb01ers of its\nconnected ROIs, thereby improving the accuracy of each ROI. In the absence of evidence about\nthe directionality of connections, we assume them to be symmetric, i.e., to allow the information\nbetween two ROIs to go in both directions to the same extent. On the technical side, using an\nundirected model avoids the dif\ufb01culties of de\ufb01ning a coherent generative process for graph structures\nin directed models, thereby giving us more \ufb02exibility in representing complex patterns [29].\nOur model starts with independently trained classi\ufb01ers for each ROI as in [31] (the bottom layer of\nFig.1). Consider an fMRI data set whose individual brain acquisitions are associated with one of\n\ud835\udc36 class labels. For an acquisition sample \ud835\udc56, the decision values of the \ud835\udc36 independent classi\ufb01ers are\nrepresented as \ud835\udcb3 \ud835\udc56 = {X\ud835\udc56\n\ud835\udc5a,\ud835\udc36}\nare the decision values for the \ud835\udc5a-th ROI, where \ud835\udc65\ud835\udc56\n\ud835\udc5a,\ud835\udc50 is the probability that region \ud835\udc5a assigns sample\n\ud835\udc56 to the \ud835\udc50-th class, irrespective of the information in any other ROI.\n\n\ud835\udc40}, where \ud835\udc40 is the number of ROIs. X\ud835\udc56\n\n1,\u22c5\u22c5\u22c5 , X\ud835\udc56\n\n\ud835\udc5a = {\ud835\udc65\ud835\udc56\n\n\ud835\udc5a,1,\u22c5\u22c5\u22c5 , \ud835\udc65\ud835\udc56\n\n2\n\n\fIllustration of\n\nFigure 1:\nthe HCRF\nmodel for modeling connections between\nROIs. Four ROIs, placed \ufb01guratively on a\nschematic brain, are shown here for illustra-\ntion of the model. Superscripts indexing dif-\nferent samples are omitted in the \ufb01gure. \ud835\udc4d\nis the category label predicted from all ROIs.\n\ud835\udc4c\ud835\udc5a, the hidden variable of the model, is the\nprediction result of the classi\ufb01er of ROI \ud835\udc5a.\nX\ud835\udc5a is the output of an independently trained\nclassi\ufb01er for ROI \ud835\udc5a. Section 2.1 gives de-\ntails about the three types of connections.\nIn the \ufb01gure thicker lines represent stronger\nconnections,\nthinner lines weaker connec-\ntions. The weights of all connections and\nconnectivity pattern of the type-II potentials\nare estimated by the model.\n\n\ud835\udc5a as input, the classi\ufb01er for ROI \ud835\udc5a can directly predict sample \ud835\udc56 as belonging to the \ud835\udc50\u2217-th\nGiven X\ud835\udc56\nclass if \ud835\udc65\ud835\udc56\n\ud835\udc5a,\ud835\udc50\u2217 is the largest component of X\ud835\udc56\n\ud835\udc5a. However, this method ignores the dependencies\nbetween ROIs. To remedy this, our model allows collaborative error-correction over the ROIs by\nusing the given structure of connections (the intermediate layer of Fig.1). Denoting the prediction\nresults of the ROI classi\ufb01ers as \ud835\udcb4 = {\ud835\udc4c1,\u22c5\u22c5\u22c5 , \ud835\udc4c\ud835\udc40}, where \ud835\udc4c\ud835\udc5a \u2208 {1,\u22c5\u22c5\u22c5 , \ud835\udc36} is the classi\ufb01er output\nfor ROI \ud835\udc5a, our model allows for the predictions \ud835\udc4c\ud835\udc5a and \ud835\udc4c\ud835\udc59 to interact if ROIs \ud835\udc5a and \ud835\udc59 are connected\nin the given structure (the intermediate layer in Fig.1).\nBased on the ROI-level prediction results \ud835\udcb4, our model outputs the category label of sample \ud835\udc56: \ud835\udc4d \ud835\udc56 \u2208\n{1,\u22c5\u22c5\u22c5 , \ud835\udc36} (the top layer of Fig.1). Furthermore, because we cannot directly observe the prediction\nof each ROI when acquiring the fMRI data, we treat \ud835\udcb4 as hidden variables. The underlying graphical\nmodel is shown in Fig.1. To estimate the overall classi\ufb01cation probability given the observed voxel\nvalues, we marginalize over all possible values of \ud835\udcb4. The HCRF model is therefore de\ufb01ned as\n\n\ud835\udc5d(\ud835\udc4d \ud835\udc56\u2223\ud835\udcb3 \ud835\udc56; \ud835\udf3d) =\n\n\ud835\udc5d(\ud835\udc4d \ud835\udc56,\ud835\udcb4\u2223\ud835\udcb3 \ud835\udc56; \ud835\udf3d) =\n\n\ud835\udcb4\n\n(1)\nwhere \ud835\udf3d are the parameters of the model, and \u03a8(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 ; \ud835\udf3d) is a potential function parameterized by\n\ud835\udf3d. We de\ufb01ne the potential function \u03a8(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 ; \ud835\udf3d) as the weighted sum of edge potential functions\nde\ufb01ned on every edge \ud835\udc52 (2-clique) of the model:\n\u03a8(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 ; \ud835\udf3d) =\n\n\u2211\n\n\ud835\udf03\ud835\udc52\ud835\udf13\ud835\udc52(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 )\n\n(2)\n\n\u2211\n\n\u2211\n\u2211\n\u2211\n\ud835\udcb4 exp(\u03a8(\ud835\udc4d \ud835\udc56,\ud835\udcb4,\ud835\udcb3 \ud835\udc56; \ud835\udf3d))\n\ud835\udcb4 exp(\u03a8(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 \ud835\udc56; \ud835\udf3d))\n\n\ud835\udc4d\n\n\ud835\udc52\n\nAs shown in Fig.1, there are three types of potentials which describe different edges in the model:\nType-I Potential\n\ud835\udc52 = (X\ud835\udc5a, \ud835\udc4c\ud835\udc5a). Such edges model the distribution of class labels of different\nROIs conditioned on the observations X\ud835\udc5a. The edge connects an X\ud835\udc5a node and a \ud835\udc4c\ud835\udc5a node where\n\ud835\udc5a = 1,\u22c5\u22c5\u22c5 , \ud835\udc40. The edge potential function is de\ufb01ned by:\n\n\ud835\udf13\ud835\udc52(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 ) = \ud835\udc53\ud835\udc66\ud835\udc65(\ud835\udc4c\ud835\udc5a, X\ud835\udc5a) = \ud835\udc65\ud835\udc5a,\ud835\udc4c\ud835\udc5a\n\n(3)\nwhere \ud835\udc65\ud835\udc5a,\ud835\udc4c\ud835\udc5a is the \ud835\udc4c\ud835\udc5a-th component of the vector X\ud835\udc5a. A large weight for (X\ud835\udc5a, \ud835\udc4c\ud835\udc5a) implies that\nthe independent classi\ufb01er trained on voxels of ROI \ud835\udc5a is effective in giving correct predictions.\nType-II Potential\nthat not all pairs of ROIs are connected. The edge potential function is de\ufb01ned by:\n\n\ud835\udc52 = (\ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59). Such edges model the dependencies between the ROIs. Note\n\n{\n\n\ud835\udf13\ud835\udc52(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 ) = \ud835\udc53\ud835\udc66\ud835\udc66(\ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59) =\n\n\ud835\udefc, \ud835\udc4c\ud835\udc5a = \ud835\udc4c\ud835\udc59\n0, \ud835\udc4c\ud835\udc5a \u2215= \ud835\udc4c\ud835\udc59\n\n(4)\n\nwhere \ud835\udefc > 0. If two ROIs are connected, they tend to make similar predictions. A large weight for\n(\ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59) means the connection between \ud835\udc4c\ud835\udc5a and \ud835\udc4c\ud835\udc59 is strong.\nType-III Potential\n\ud835\udc52 = (\ud835\udc4d, \ud835\udc4c\ud835\udc5a). Such edges de\ufb01ne a joint distribution over the class label and the\nprediction result of each ROI. The edge connects a \ud835\udc4c\ud835\udc5a node and the \ud835\udc4d node where \ud835\udc5a = 1,\u22c5\u22c5\u22c5 , \ud835\udc40.\n\n3\n\nY1Y2Y3Y41234ZTop-layerIntermediate-layerBottom-layerType-IType-IIType-IIIPotentialsPotentialsPotentials\fThe edge potential function is de\ufb01ned by:\n\n\ud835\udf13\ud835\udc52(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 ) = \ud835\udc53\ud835\udc66\ud835\udc67(\ud835\udc4c\ud835\udc5a, \ud835\udc4d) =\n\n{\n\n\ud835\udefd, \ud835\udc4c\ud835\udc5a = \ud835\udc4d\n0, \ud835\udc4c\ud835\udc5a \u2215= \ud835\udc4d\n\n(5)\n\nwhere \ud835\udefd > 0. A large weight for (\ud835\udc4d, \ud835\udc4c\ud835\udc5a) means ROI \ud835\udc5a has a big contribution to the top-level\nprediction of the brain.\nAllowing connected ROIs to interact with each other makes our model signi\ufb01cantly different from\nexisting MVPA methods [15, 24, 18, 16], and can improve the prediction accuracy of each ROI.\nIntuitively, if the values of all components in X\ud835\udc56\n\ud835\udc5a are similar, then ROI \ud835\udc5a is likely to have incorrect\npredictions if its classi\ufb01er merely relies on X\ud835\udc56\n\ud835\udc5a. In such situations it is possible for the classi\ufb01er for\none ROI to make better predictions if it can use the information in its connected ROIs.\n\n2.2 Learning the Structural Connections of the Hidden Layer in HCRF Model\n\nWe have described a method that models the connections between ROIs to build a classi\ufb01cation\npredictor on top of all ROIs. However, for many tasks (e.g. scene categorization), one critical sci-\nenti\ufb01c goal is to uncover which ROIs are functionally connected for that task. Automatic learning\nof the structures of graphical models is a dif\ufb01cult problem in machine learning. To illustrate the\ndif\ufb01culty, let us assume that we have 4 ROIs and that we want to explore all possible models of\nconnectivity between them. There are 6 possible connections between the ROIs, so in order to in-\nvestigate whether all possible combinations of connections are present, we need to evaluate 26 = 64\ndifferent models. For 5 ROIs we have 10 potential connections, leading to 210 = 1024. In general,\ngiven \ud835\udc40 ROIs, there are 2\ud835\udc40 (\ud835\udc40\u22121)/2 possible combinations of connections. In situations with many\nROIs, evaluating all possible structures quickly becomes impractical because of the computational\nconstraints. Approximate approaches to learn the structures of directed graphs use the generative\nprocess in the model [21, 19, 32]. For undirected graphs, it is usually assumed that the structures are\npre-de\ufb01ned [29]. Some incremental approaches [26, 22] were proposed for random \ufb01elds construc-\ntion. However the computational complexity of these approaches is still high.\nIn our model shown in Fig.1, the potentials represented by solid lines are \ufb01xed (type-I and type-III).\nThat is to say, each ROI always makes predictions based on the information in its voxels, and the\nresponse at the top level is always in\ufb02uenced by the prediction results of all ROIs. That leaves the\ndependencies between ROIs (type-II edges, the dashed line in Fig.1) to be learned. Therefore, our\nstructural learning starts from a graphical model containing only type-I and type-III potentials, with-\nout any interactions between ROIs. Based on this initial model, we evaluate each type-II potential\nrespectively to decide if it should be added to the model.\nAs we have described in Section 1, connections among ROIs play a key role in information process-\ning. Executing a speci\ufb01c task (e.g., scene categorization) activates certain ROIs as well as rely on\nconnections between some of them. Inspired by this fact, we evaluate whether two ROIs, say ROIs\n\ud835\udc5a and \ud835\udc59, should be connected by comparing two models with and without an edge between \ud835\udc4c\ud835\udc5a and\n\nFigure 2: An illustration for evaluating if ROIs 2 and 4 should be connected. All other ROIs are omitted. We\ncompare the performance of two modes with (left) and without (right) interactions between ROIs 2 and 4.\n\n4\n\nY1Y2Y3Y41234ZTraining accuracy: cPTraining accuracy: nP24Connect and YYIf and only if cnPP>Y1Y2Y3Y41234Z\fInput: \ud835\udc40 ROIs and their feature vectors \ud835\udcb3 = {X1,\u22c5\u22c5\u22c5 , X\ud835\udc40}. A HCRF model \ud835\udca2 with nodes \ud835\udc4d,\n\ud835\udc4c1,\u22c5\u22c5\u22c5 , \ud835\udc4c\ud835\udc40 , X1,\u22c5\u22c5\u22c5 , X\ud835\udc40 , and edges (\ud835\udc4c1, X1),\u22c5\u22c5\u22c5 ,(\ud835\udc4c\ud835\udc40 , X\ud835\udc40 ), (\ud835\udc4d, \ud835\udc4c1),\u22c5\u22c5\u22c5 , (\ud835\udc4d, \ud835\udc4c\ud835\udc40 ).\nforeach pair of ROIs \ud835\udc5a and \ud835\udc59 do\n\n(\ud835\udc4d, \ud835\udc4c\ud835\udc59), (\ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59). Obtain training accuracy \ud835\udc43\ud835\udc50;\n\nTrain an HCRF model with nodes \ud835\udc4d, \ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59,\ud835\udcb3\ud835\udc5a,\ud835\udcb3\ud835\udc59, and edges (\ud835\udc4c\ud835\udc5a, X\ud835\udc5a), (\ud835\udc4c\ud835\udc59, X\ud835\udc59), (\ud835\udc4d, \ud835\udc4c\ud835\udc5a),\nTrain an HCRF model with nodes \ud835\udc4d, \ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59,\ud835\udcb3\ud835\udc5a,\ud835\udcb3\ud835\udc59, and edges (\ud835\udc4c\ud835\udc5a, X\ud835\udc5a), (\ud835\udc4c\ud835\udc59, X\ud835\udc59), (\ud835\udc4d, \ud835\udc4c\ud835\udc5a),\nif \ud835\udc43\ud835\udc50 > \ud835\udc43\ud835\udc5b then Add edge (\ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59) to the input model \ud835\udca2;\n\n(\ud835\udc4d, \ud835\udc4c\ud835\udc59). Obtain training accuracy \ud835\udc43\ud835\udc5b;\n\nOutput: The updated model \ud835\udca2.\n\nAlgorithm 1: The algorithm for uncovering structural connections between ROIs in the HCRF model.\n\n\ud835\udc4c\ud835\udc59. If allowing interactions between ROIs \ud835\udc5a and \ud835\udc59 helps to improve top-level recognition perfor-\nmance, thus more closely approximating human performance, then \ud835\udc5a and \ud835\udc59 should be connected.\nFurthermore, we ignore information in all other ROIs when evaluating the connection between ROIs\n\ud835\udc5a and \ud835\udc59 (Fig.2). So the model will only contain 5 nodes: \ud835\udc4d, \ud835\udc4c\ud835\udc5a, \ud835\udc4c\ud835\udc59, X\ud835\udc5a, and X\ud835\udc59. Although some\nuseful information might be lost compared to evaluating all possible combinations of connections,\napproximating the algorithm in this way can enable the evaluation of many possible connections in\na reasonable amount of time, making this algorithm much more practical.\nThe structural learning algorithm is shown in Algorithm 1, and an illustration of evaluating the\nconnection between ROI 2 and 4 is in Fig.2.\n\n2.3 Model Learning and Inference\n\nLearning In the step of structural learning, we need to estimate model parameters to compare the\nmodels with or without a type-II connection (see Fig.2 for an illustration). Once we have determined\nwhich ROIs should interact, i.e. which type-II potentials should be set, we would like to \ufb01nd out the\nstrength of these connections as well as type-I and III potentials. Here the parameters \ud835\udf3d = {\ud835\udf03\ud835\udc52}\ud835\udc52 are\nlearned by maximizing the conditional log-likelihood of class label \ud835\udc4d on training data \ud835\udcb3 :\n\n\u2211\n\n\ud835\udf3d\u2217 = arg max\n\n\ud835\udf3d\n\n= arg max\n\n\ud835\udf3d\n\n\u2211\n\n\ud835\udc56\n\n\ud835\udc3f(\ud835\udf3d) = arg max\n\n\ud835\udf3d\n\n\u2211\nlog \ud835\udc5d(\ud835\udc4d \ud835\udc56\u2223\ud835\udcb3 \ud835\udc56; \ud835\udf3d)\n\u2211\n\u2211\n\ud835\udcb4 exp(\u03a8(\ud835\udc4d \ud835\udc56,\ud835\udcb4,\ud835\udcb3 \ud835\udc56; \ud835\udf3d))\n\ud835\udcb4 exp(\u03a8(\ud835\udc4d,\ud835\udcb4,\ud835\udcb3 \ud835\udc56; \ud835\udf3d))\n\n\ud835\udc4d\n\n\ud835\udc56\n\nlog\n\n(6)\n\n(8)\n\n(9)\n\nThe objective function is not concave due to the hidden variables \ud835\udcb4. Although \ufb01nding the global\noptimum is dif\ufb01cult, we can still \ufb01nd a local optimum by iteratively updating the values of \ud835\udf3d using\nthe gradient descent method. To be speci\ufb01c, we \ufb01rst set \ud835\udf3d to be initial values \ud835\udf3d(0), and for each\niteration we adopt the following formula to update \ud835\udf3d(\ud835\udc5b) to \ud835\udf3d(\ud835\udc5b+1):\n\n\ud835\udf3d(\ud835\udc5b+1) = \ud835\udf3d(\ud835\udc5b) \u2212\n\nG(\ud835\udf3d(\ud835\udc5b))\u22a4G(\ud835\udf3d(\ud835\udc5b))\n\nG(\ud835\udf3d(\ud835\udc5b))\u22a4H(\ud835\udf3d(\ud835\udc5b))G(\ud835\udf3d(\ud835\udc5b))\n\nG(\ud835\udf3d(\ud835\udc5b))\n\n(7)\n\nwhere G(\ud835\udf3d) and H(\ud835\udf3d) are the gradient vector and Hessian matrix of \ud835\udc3f(\ud835\udf3d) respectively. This it-\nerative updating continues until reaching a maximum number of iterations or \u2225G(\ud835\udf3d)\u2225 is smaller\nthan a threshold. When the number of ROIs is large, marginalizing over all possible values of \ud835\udcb4 is\ntime-consuming. In such situations we can use Gibbs sampling to compute the gradient vector and\nHessian matrix of \ud835\udc3f(\ud835\udf3d). In the case of natural scene categorization, evidence from neuroscience\nstudies have postulated that 7 regions are likely to play critical roles in this task [31]. We therefore\nconsider 7 ROIs in our experiment, allowing us to marginalize over all possible values of Y.\nInference Given the model parameters \ud835\udf3d\u2217 and a sample \ud835\udcb3 , the top-level prediction result is\n\n\ud835\udc4d\u2217 = arg max\n\n\ud835\udc4d\n\n\ud835\udc5d(\ud835\udc4d\u2223\ud835\udcb3 ; \ud835\udf3d\u2217)\n\nAfter \ud835\udc4d\u2217 is obtained, we can get the prediction results corresponding to each ROI by\n\n\ud835\udcb4\u2217 = arg max\ud835\udcb4 \ud835\udc5d(\ud835\udc4d\u2217,\ud835\udcb4\u2223\ud835\udcb3 ; \ud835\udf3d\u2217)\n\n5\n\n\f3 Related Work\n\nIn this paper, we model the dependencies between ROIs in an HCRF framework, which improves\nthe ROI-level as well as the top-level decoding accuracy by allowing ROIs to exchange information.\nOther approaches to inferring connections between brain regions from fMRI data can be broadly\nseparated into effective connectivity and functional connectivity [11]. Models for effective connec-\ntivity, such as Granger causality mapping [14] and dynamic causal modeling [13], model directed\nconnections between brain regions. These approaches were developed to account for biological tem-\nporal dependencies, which is not the case in this work. Functional connectivity refers to undirected\nconnections, which can be either model-driven or data-driven [23]. Model-driven methods usually\ntest a prior hypothesis by correlating the time courses of a seed voxel and a target voxel [12]. Data-\ndriven methods, such as Independent Component Analysis [8], are typically used to identify spatial\nmodes of coherent activity in the brain at rest.\nNone of these methods, however, has the ability to use the speci\ufb01c relation between the patterns\nof voxel activations inside ROIs and the ground truth of the experimental condition. The structural\nlearning method proposed in this paper offers an entirely new way to assess the interactions between\nbrain regions based on the exchange of information between ROIs so that the accuracy of decoding\nexperimental conditions from the data is improved. Furthermore in contrast with the conventional\nmodel comparison approaches of trying to optimize the evidence of each model [2], our method\nrelates the connectivity structure to observed brain activities as well as the classes of stimuli that\nelicited the activities. Therefore the model proposed here provides a novel and natural way to model\nthe implicit dependencies between different ROIs.\n\n4 Experimental Evaluation\n\n4.1 Data Set and Experimental Design\n\nIn order to evaluate the proposed method we re-analyze the fMRI data set from our work in [31].\nIn this experiment, 5 subjects were presented with color images of 6 scene categories: beaches,\nbuildings, forests, highways, industry, and mountains. Photographs were chosen to capture the high\nvariability within each scene category. Images were presented in blocks of 10 images of the same\ncategory lasting for 16 seconds (8 brain acquisitions). Each subject performed 12 runs, with each\nrun containing one block for each of the six categories. Please refer to [31] for more details.\nWe use 7 ROIs that are likely to play critical roles for natural scene categorization. They were\ndetermined in separate localizer scans: V1, left/right LOC, left/right PPA, left/right RSC. The data\nfor two subjects were excluded, because not all of the ROIs could be found in the localizer scans\nfor these subjects. For the analysis we use two nested cross validations over the 12 runs for each\nsubject. In the outer loop we cross-validate on each subject to test the performance of the proposed\nmethod. For each subject, 11 runs out of 12 are selected as training samples and the remaining\nrun is used as the testing set. For each subject this procedure is repeated 12 times, in turn leaving\neach run out for testing once. Average accuracy of the 36 experiments across all subjects is used to\nevaluate the performance of the model. In the inner loop, we use 10 of the 11 training runs to train\nan SVM classi\ufb01er for each ROI and each subject, and the remaining run to learn the connections\nbetween ROIs and train the HCRF model by using outputs of the SVM classi\ufb01ers. We repeat this\nprocedure 11 times, giving us 11 models. Results of the 11 models on the test data in the inner loop\nare combined using bagging [4]. We empirically set both \ud835\udefc in Equ.(4) and \ud835\udefd in Equ.(5) to 0.5.\n\n4.2 Scene Classi\ufb01cation Results and Analysis\n\nIn order to comprehensively evaluate the performance of the proposed structural learning and mod-\neling approach, we consider different settings of the intermediate layer of our HCRF model. While\nalways keeping all type-I and type-III potentials connected, we consider \ufb01ve different dependencies\nbetween the ROIs as shown in Fig.3. The setting in Fig.3(e) possesses all properties of our method:\nthe connections between ROIs are determined by structural learning, and the weights of the connec-\ntions are obtained by estimating model parameters in Equ.(6). In order to estimate the effectiveness\nof our structural learning method, we compare this setting with the situations where no connections\nexists between any of the ROIs (Fig.3(a)), and all ROIs are fully connected (Fig.3(b,c)). In each con-\nnectivity situation, we either use the same (Fig.3(b,d)) or different (Fig.3(c,e)) weights for type-II\n\n6\n\n\fFigure 3: Various settings of the intermediate layer of our model. Dashed lines represent type-II potentials.\nIn each setting we keep all type-I and III potentials connected. For simplicity, we omit the visualizations of\ntype-I and III potentials here. Different line widths represent different potential weights. (a) No connection\nexists between any pair of ROIs. (b,c) The ROIs are fully connected. (d,e) The connections between ROIs are\nobtained by structural learning. (b,d) All type-II potentials have equal weights. (c,e) The weights of different\ntype-II potentials can be different. Note that (e) is the full model in this paper.\n\nTable 1: Recognition accuracy for predicting natural scene categories with different methods (chance is 1/6).\n\u201cOverall classi\ufb01cation\u201d means the accuracy for predicting the categories by the top-level node in Fig.1. We\ncarry out experiments on the HCRF models with different settings of the type-II potentials, as shown in Fig.3.\nNote that we always learn the weights of type-I and type-III potentials. We also list classi\ufb01cation results of\nthe SVM classi\ufb01ers independently trained on each ROI as the baseline. The bolded numbers indicate superior\nperformance compared to all other settings for each ROI.\n\nMethod\n\nOverall classi\ufb01cation\n\nROI\n\nV1\n\nleft LOC\nright LOC\nleft PPA\nright PPA\nleft RSC\nright RSC\n\nSVM Fig.3(a)\n31%\u2217\nN/A\n21%\n22%\n23%\n22%\n24%\n25%\n27%\n27%\n28%\u2217\n26%\n30%\u2217\n30%\n26%\u2217\n27%\n\n\u2217\ud835\udc5d < 0.01;\nFig.3(b)\n29%\u2217\n25%\n27%\n27%\n26%\n28%\u2217\n30%\u2217\n29%\u2217\n\n\u2217\u2217\ud835\udc5d < 0.005.\nFig.3(c)\n33%\u2217\u2217\n24%\n29%\u2217\n30%\u2217\n28%\u2217\n31%\u2217\n32%\u2217\n30%\u2217\n\nFig.3(d)\n34%\u2217\u2217\n27%\n31%\u2217\n29%\u2217\n31%\u2217\n31%\u2217\n33%\u2217\u2217\n30%\u2217\n\nFig.3(e)\n36%\u2217\u2217\n28%\u2217\n32%\u2217\u2217\n33%\u2217\u2217\n31%\u2217\n32%\u2217\u2217\n35%\u2217\u2217\n32%\u2217\u2217\n\npotentials. Note that the type-II potentials of the models in Fig.3(b,d) are also obtaining by learning.\nClassi\ufb01cation accuracy of the \ufb01ve different HCRF models, along with individual SVM classi\ufb01cation\naccuracy for each ROI, is shown in Tbl.1. Note that the model with no type-II potentials (Fig.3(a))\nis different from independent SVM classi\ufb01ers because of the type-I potentials.\nFrom Table 1 it becomes clear that learning both the structure of the connections and their strengths\nleads to more improvement in decoding accuracy than either one of these alone. The overall, top-\nlevel classi\ufb01cation rate increases from 31% for the variant of the model without any connections\n(Fig.3(a)) to 36% for the variant with the structure of the model as well as the connection strengths\nlearned (Fig.3(e)). We see similar improvements for the individual ROIs: 4-5% for PPA and RSC,\n6% for V1, and 9% for LOC. The fact that decoding from LOC bene\ufb01ts most from interacting with\nother ROIs is interesting and signi\ufb01cant. We will discuss this \ufb01nding in more detail below.\n\n4.3 Structural Learning Results and Analysis\n\nHaving established that our full HCRF model outperforms other comparison models in the recogni-\ntion task, we now investigate how our model can shed light on learning connectivity between brain\nregions. In the nested cross validation procedure, 12\u00d711=132 structural maps are learned for each\nsubject. Tbl.2 reports for each subject which connections are present in what fraction of these struc-\ntural maps. A connection is regarded as a strong connection for a subject if it presents in at least\nhalf of the models learned for this subject. In Tbl.2 we use larger font size to denote the connections\nwhich are strong on more subjects. Connections that are strong for all subjects are marked in bold.\nWe see that both LOC and PPA show strong interactions between the contralateral counterparts,\nwhich makes sense for integrating information across the visual hemi\ufb01elds. We also observe strong\ninteractions between PPA and RSC across hemispheres, which underscores the importance of across-\nhemi\ufb01eld integration of visual information. We see a similar effect in the interactions between LOC\nand PPA: strong contralateral interactions. Left LOC also interacts strongly with right RSC.\n\n7\n\nY1Y2Y3Y4Y1Y2Y3Y4Y1Y2Y3Y4Y1Y2Y3Y4Y1Y2Y3Y4\fTable 2: Statistics of structural connections. For each subject we have 132 learned structural maps (12-fold\ncross-validation, each one has 11 models). This table shows the percentage of the times that the corresponding\nconnection is learned in the 132 experiments. Larger font size denotes connections that are strong on more\nsubjects. Connections that are strong on all subjects are marked in bold.\n\nV1-leftLOC\n\nV1-rightLOC\n\nConnection\n\nV1-leftPPA\nV1-rightPPA\nV1-leftRSC\nV1-rightRSC\n\nSbj.1\n0.67\n0.50\n0.44\n0.38\n0.29\n0.36\nleftLOC-rightLOC 0.66\n0.46\n0.75\n0.41\n0.75\n\nleftLOC-rightRSC\n\nleftLOC-rightPPA\n\nleftLOC-leftRSC\n\nleftLOC-leftPPA\n\nSbj.2\n0.25\n0.29\n0.29\n0.33\n0.30\n0.29\n0.88\n0.64\n0.96\n0.78\n0.83\n\nSbj.3\n0.33\n0.54\n0.36\n0.69\n0.23\n0.59\n0.71\n0.76\n0.65\n0.61\n0.76\n\nConnection\n\nrightLOC-leftRSC\nrightLOC-rightRSC\n\nSbj.1\nrightLOC-leftPPA 0.58\n0.36\nrightLOC-rightPPA\n0.63\n0.36\n0.99\n0.97\n0.61\n0.67\n0.93\n0.65\n\nleftPPA-rightRSC\nrightPPA-leftRSC\nrightPPA-rightRSC\n\nleftPPA-rightPPA\n\nleftRSC-rightRSC\n\nleftPPA-leftRSC\n\nSbj.2\n0.58\n0.58\n0.38\n0.30\n0.56\n0.34\n0.53\n0.74\n0.74\n0.20\n\nSbj.3\n0.66\n0.89\n0.31\n0.87\n0.78\n0.46\n0.40\n0.51\n0.41\n0.45\n\nThe strong interactions between PPA and RSC are not surprising, since both are typically associated\nwith the processing of natural scenes [25], albeit with slightly different roles [7]. The interactions\nbetween LOC and PPA are somewhat more surprising, since LOC is usually associated with the\nprocessing of isolated objects. Together with the strong improvement of decoding accuracy for\nnatural scene categories from LOC when it is allowed to interact with other ROIs (see above), this\nsuggests a role for LOC in scene categorization. It is conceivable that the detection of typical objects\n(e.g., a car) helps with determining the scene category (e.g., highway), as has been shown in [17,\n6]. On the other hand, it is also possible that information \ufb02ows the other way, that scene-speci\ufb01c\ninformation in PPA and RSC feeds into LOC to bias object detection based on the scene category\n(see [3, 1]), and that the classi\ufb01er decodes this bias signal in LOC. Fig.4 shows the connections\nwhich are strong on at least two subjects.\n\nFigure 4: Schematic illustration of the connec-\ntions between the seven ROIs obtained by our\nstructural learning method. Activated regions for\nthe seven ROIs are marked in red. The connec-\ntions shown in this \ufb01gure are strong on at least\ntwo of the three subjects. Connections that are\nstrong for all three subjects (marked with bold\nin Table 2) are marked with thicker lines in this\n\ufb01gure.\n\n5 Conclusion\n\nIn this paper we modeled the interactions between brain regions in an HCRF framework. We also\npresented a structural learning method to automatically uncover the connections between ROIs.\nExperimental results showed that our approach can improve the top-level as well as ROI-level pre-\ndiction accuracy, as well as uncover some meaningful connections between ROIs. One direction for\nfuture work is to use an exploratory \u201csearchlight\u201d approach [20] to automatically discover ROIs, and\napply our structural learning and modeling method to those ROIs.\n\nAcknowledgements\nThis work is funded by National Institutes of Health Grant 1 R01 EY019429 (to L.F.-F., D.M.B.,\nD.B.W.), a Beckman Postdoctoral Fellowship (to D.B.W.), a Microsoft Research New Faculty Fel-\nlowship (to L.F.-F.), and the Frank Moss Gift Fund (to L.F-F.). The authors would like to thank\nBarry Chai, Linjie Luo, and Hao Su for helpful comments and discussions.\n\n8\n\nright LOCV1left LOCright PPAleft PPAleftrightRSCRSC\fReferences\n[1] M. Bar. Visual objects in context. Nature Rev Neurosci, 5(8):617\u2013629, 2004.\n[2] D. Barber and C. M. Bishop. Bayesian model comparison by monte carlo chaining. In NIPS, 1997.\n[3] I. Biederman. Perceiving real-world scenes. Science, 177(4043):77\u201380, 1972.\n[4] L. Breiman. Bagging predictors. Mach Learn, 24:123\u2013140, 1996.\n[5] B. Chai\n\u2020,\u2217\n\nbrain using multivariate information analysis. In NIPS, 2009. (\n\nindicates equal contribution).\n\n\u2020\n, D. B. Walther\n\n, and L. Fei-Fei\n\n, D. M. Beck\n\n. Exploring functional connectivities of the human\n\n\u2020\n\n\u2217\n\n\u2217\n\n[6] J. L. Davenport and M. C. Potter. Scene consistency in object and background perception. Psychol Sci,\n\n15(8):559\u2013564, 2004.\n\n[7] R. A. Epstein and J. S. Higgins. Differential parahippocampal and retrosplenial involvement in three types\n\nof scene recognition. Cereb Cortex, 17:1680\u20131693, 2007.\n\n[8] F. Esposito, E. Formisano, E. Seifritz, R. Geobel, R. Morrone, G. Tedeschi, and F. D. Salle. Spatial\nindependent component analysis of functional MRI time-series: To what extent do results depend on the\nalgorithm used. Hum Brain Mapp, 16:146\u2013157, 2002.\n\n[9] L. Fei-Fei, A. Iyer, C. Koch, and P. Perona. What do we perceive in a glance of a real-world scene? J\n\nVision, 7(1):1\u201329, 2007.\n\n[10] D. J. Felleman and D. C. van Essen. Distributed hierarchical processing in the primate cerebral cortex.\n\nCereb Cortex, 1:1\u201347, 1991.\n\n[11] K. J. Friston. Functional and effective connectivity in neuroimaging: a synthesis. Hum Brain Mapp,\n\n2:56\u201378, 1995.\n\n[12] K. J. Friston, C. Frith, F. P. Liddle, and R. Frackowiak. Functional connectivity: The principal-component\n\nanalysis of large (PET) data sets. J Cerebr Blood F Met, 13:5\u201314, 1993.\n\n[13] K. J. Friston, L. Harrison, and W. Penny. Dynamic causal modeling. NeuroImage, 19:1273\u20131302, 2003.\n[14] R. Goebel, A. Roebroeck, D.-S. Kim, and E. Formisano. Investigating directed cortical interactions in\ntime-resolved fMRI data using vector autoregressive modeling and granger causality mapping. Magn\nReson Imaging, 21:1251\u20131261, 2003.\n\n[15] J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. Schouten, and P. Pietrini. Distributed and overlapping\n\nrepresentations of faces and objects in ventral temporal cortex. Science, 293(5539):2425\u20132430, 2001.\n\n[16] J.-D. Haynes and G. Rees. Predicting the orientation of invisible stimuli from activity in human primary\n\nvisual cortex. Nat Neurosci, 8:686\u2013691, 2005.\n\n[17] A. Hollingworth and J. M. Henderson. Accurate visual memory for previously attended objects in natural\n\nscenes. J Exp Psychol Human, 28:113\u2013136, 2002.\n\n[18] Y. Kamitani and F. Tong. Decoding the visual and subjective contents of the human brain. Nat Neurosci,\n\n8:679\u2013685, 2005.\n\n[19] C. Kemp and J. B. Tenenbaum. The discovery of structural form. P Natl Acad Sci USA, 105(31):10687\u2013\n\n10692, 2008.\n\n[20] N. Kriegeskorte, R. Goebel, and P. Bandettini. Information-based functional brain mapping. P Natl Acad\n\nSci USA, 103(10):3863\u20133868, 2006.\n\n[21] W. Lam and F. Bacchus. Learning Bayesian belief networks: An approach based on the mdl principle.\n\nComput Intell, 10(4):269\u2013293, 1994.\n\n[22] S. Lee, V. Ganapahthi, and D. Koller. Ef\ufb01cient structure learning of markov networks using \ud835\udc591-\n\nregularization. In NIPS, 2006.\n\n[23] K. Li, L. Guo, J. Nie, G. Li, and T. Liu. Review of methods for functional brain connectivity detection\n\nusing fmri. Comput Med Imag Grap, 33:131\u2013139, 2009.\n\n[24] D. Neill, A. Moore, F. Pereira, and T. Mitchell. Detecting signi\ufb01cant multidimensional spatial clusters.\n\nIn NIPS, 2004.\n\n[25] K. O\u2019Craven and N. Kanwisher. Mental imagery of faces and places activates corresponding stimulus-\n\nspeci\ufb01c brain regions. J Cognitive Neurosci, 12:1013\u20131023, 2000.\n\n[26] S. D. Pietra, V. D. Pietra, and J. Lafferty.\n\n19(4):380\u2013393, 1997.\n\nInducing features of random \ufb01elds.\n\nIEEE T Pattern Anal,\n\n[27] M. C. Potter. Short-term conceptual memory for pictures. J Exp Psychol - Hum L, 2(5):509\u2013522, 1976.\n[28] A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell. Hidden conditional random \ufb01elds. IEEE\n\nT Pattern Anal, 29(10):1848\u20131852, 2007.\n\n[29] B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, 2002.\n[30] B. Tversky and K. Hemenway. Categories of scenes. Cognitive Psychol, 15:121\u2013149, 1983.\n[31] D. B. Walther, E. Caddigan, L. Fei-Fei\n\n, and D. M. Beck\n\n\u2217\n\n\u2217\n\n. Natural scene categories revealed in dis-\nindicates\n\n\u2217\n\ntributed patterns of activity in the human brain. J Neurosci, 29(34):10573\u201310581, 2009. (\nequal contribution).\n\n[32] M. L. Wong, W. Lam, and K. S. Leung. Using evoluntionary programming and minimum description\n\nlength principle for data mining of bayesian networks. IEEE T Pattern Anal, 21(2):174\u2013178, 1999.\n\n9\n\n\f", "award": [], "sourceid": 570, "authors": [{"given_name": "Bangpeng", "family_name": "Yao", "institution": null}, {"given_name": "Dirk", "family_name": "Walther", "institution": null}, {"given_name": "Diane", "family_name": "Beck", "institution": null}, {"given_name": "Li", "family_name": "Fei-fei", "institution": null}]}