{"title": "A systematic approach to extracting semantic information from functional MRI data", "book": "Advances in Neural Information Processing Systems", "page_first": 2267, "page_last": 2275, "abstract": "This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure.", "full_text": "A systematic approach to extracting semantic\n\ninformation from functional MRI data\n\nFrancisco Pereira\n\nSiemens Corporation, Corporate Technology\n\nPrinceton, NJ 08540\n\nfrancisco.pereira@gmail.com\n\nPrinceton Neuroscience Institute and Department of Psychology\n\nMatthew Botvinick\n\nPrinceton University\nPrinceton NJ 08540\n\nmatthewb@princeton.edu\n\nAbstract\n\nThis paper introduces a novel classi\ufb01cation method for functional magnetic res-\nonance imaging datasets with tens of classes. The method is designed to make\npredictions using information from as many brain locations as possible, instead of\nresorting to feature selection, and does this by decomposing the pattern of brain\nactivation into differently informative sub-regions. We provide results over a com-\nplex semantic processing dataset that show that the method is competitive with\nstate-of-the-art feature selection and also suggest how the method may be used to\nperform group or exploratory analyses of complex class structure.\n\n1\n\nIntroduction\n\nFunctional Magnetic Resonance Imaging (fMRI) is a technique used in psychological experiments\nto measure the blood oxygenation level throughout the brain, which is a proxy for neural activity;\nthis measurement is called brain activation. The data resulting from such an experiment is a 3D grid\nof cells named voxels covering the brain (on the order of tens of thousands, usually), measured over\ntime as tasks are performed and thus yielding one time series per voxel (collected every 1-2 seconds\nand yielding hundreds to thousands of points).\n\nIn a typical experiment, brain activation is measured during a task of interest, e.g. reading words,\nand during a related control condition, e.g. reading nonsense words, with the goal of identifying\nbrain locations where the two differ. The most common analysis technique for doing this \u2013 statisti-\ncal parametric mapping [4] \u2013 tests each voxel individually by regressing its time series on a predicted\ntime series determined by the task contrast of interest. This \ufb01t is scored and thresholded at a given\nstatistical signi\ufb01cance level to yield a brain image with clusters of voxels that respond very differ-\nently to the two tasks (colloquially, these are the images that show parts of the brain that \u201clight up\u201d).\nNote, however, that for both tasks there are many other processes taking place in tandem with this\ntask-contrasting activation: visual processing to read the words, attentional processing due to task\ndemands, etc. The output of this process for a given experiment is a set of 3D coordinates of all the\nvoxel clusters that appear reliably across all the subjects in a study. This result is easy to interpret,\nsince there is a lot of information about what processes each brain area may be involved in. The\ncoordinates are comparable across studies, and thus result reproduciblity is also an expectation.\n\nIn recent years, there has been increasing awareness of the fact that there is information in the entire\npattern of brain activation and not just in saliently active locations. Classi\ufb01ers have been the tool\n\n1\n\n\fof choice for capturing this information and used to make predictions ranging from what stimulus a\nsubject is seeing, what kind of object they are thinking about or what decision they will make [12]\n[14] [8]. The most common situation is to have an example correspond to the average brain image\nduring one or a few performances of the task of interest, and voxels as the features, and we will\ndiscuss various issues with this scenario in mind.\n\nThe goal of this work is generally not (just) classi\ufb01cation accuracy per se, even in diagnostic appli-\ncations, but understanding where the information used to classify is present. If only two conditions\nare being contrasted this is relatively straightforward as information is, at its simplest, a difference in\nactivation of a voxel in the two conditions. It\u2019s thus possible to look at the magnitudes of the weights\na classi\ufb01er puts on voxels across the brain and thus locate the voxels with the largest weights 1; given\nthat there are typically two to three orders of magnitude more voxels than examples, though, classi-\n\ufb01ers are usually trained on a selection of voxels rather than the entire activation pattern. Often, this\nmeans the best accuracy is obtained using few voxels, from all across the brain, and that different\nvoxels will be chosen in different cross-validation folds; this presents a problem for interpretability\nof the locations in question.\n\nOne approach to this problem is to try and regularize classi\ufb01ers so that they include as many infor-\nmative voxels as possible [2], thus identifying localizable clusters of voxels that may overlap across\nfolds. A different approach is to cross-validate classi\ufb01ers over small sections of the grid covering the\nbrain, known as searchlights [10]. This can be used to produce a map of the cross-validated accu-\nracy in the searchlight around each voxel, taking advantage of the pattern of activation across all the\nvoxels contained in it. Such a map can then be thresholded to leave only locations where accuracy\nis signi\ufb01cantly above chance. While these approaches have been used successfully many times over\nthe last decade, they will become progressively less useful in face of the increasing commonality\nof datasets with tens to hundreds of stimuli, and a correspondingly high number of experimental\nconditions. Knowing the location of a voxel does not suf\ufb01ce to interpret what it is doing, as it could\nbe very different from stimulus to stimulus (rather than just active or not, as in the two condition\nsituation). It\u2019s also likely that no small brain regions will allow for a searchlight classi\ufb01er capable of\ndistinguishing between all possible conditions at the spatial resolution of fMRI, and hence de\ufb01ning\na searchlight size or shape is a trade-off between including voxels and making it harder to locate\ninformation or train a classi\ufb01er \u2013 as the number of features increases as the number of examples\nremains constant \u2013 and excluding voxels and thus the number of distinctions that can be made.\n\nThis paper introduces a method to address all of these issues while still yielding an interpretable,\nwhole-brain classi\ufb01er. The method starts by learning how to decompose the pattern of activation\nacross the brain into sub-patterns of activation, then it learns a whole-brain classi\ufb01er in terms of the\npresence and absence of certain subpatterns and \ufb01nally combines the classi\ufb01er and pattern informa-\ntion to generate brain maps indicating which voxels belong to informative patterns and what kind of\ninformation they contain. This method is partially based on the notion of pattern feature introduced\nin an earlier paper by us [15], but has been developed much further so as to dispense with most\nparameters and allow the creation of spatial maps usable for group or exploratory analyses, as will\nbe discussed later.\n\n2 Data and Methods\n\n2.1 Data\n\nThe grid covering the brain contains on the order of tens of thousands voxels, measured over time\nas tasks are performed, every 1-2 seconds, yielding hundreds to thousands of 3D images per experi-\nment. During an experiment a given task is performed a certain number of times \u2013 trials \u2013 and often\nthe images collected during one trial are collapsed or averaged together, giving us one 3D image\nthat can be clearly labeled with what happened in that trial, e.g. what stimulus was being seen or\nwhat decision a subject made. Although the grid covers the entire head, only a fraction of its voxels\ncontain cortex in a typical subject; hence we only consider these voxels as features.\n\n1Interpretation is more complicated if nonlinear classi\ufb01ers are being used [6], [17], but this is far less\n\ncommon\n\n2\n\n\fA searchlight is a small section of the 3D grid, in our case a 27 = 3 \u00d7 3 \u00d7 3 voxel cube. Analyses\nusing searchlights generally entail computing a statistic [10] or cross-validating a classi\ufb01er over the\ndataset containing just those voxels [16], and do so for the searchlight around each voxel in the brain,\ncovering it in its entirety. The intuition for this is that individual voxels are very noisy features, and\nan effect observed across a group of voxels is more trustworthy.\nIn the experiment performed to obtain our dataset 2 [13], subjects observed a word and a line drawing\nof an item, displayed on a screen for 3 seconds and followed by 8 seconds of a blank screen. The\nitems named/depicted belonged to one of 12 categories: animals, body parts, buildings, building\nparts, clothing, furniture, insects, kitchen, man-made objects, tools, vegetables and vehicles. The\nexperimental task was to think about the item and its properties while it was displayed. There were\n5 different exemplars of each of the 12 categories and 6 experimental epochs. In each epoch all 60\nexemplars were shown in random order without repetition, and all epochs had the same exemplars.\nDuring an experiment the task repeated a total of 360 times, and a 3D image of the fMRI-measured\nbrain activation acquired every second.\n\nEach example for classi\ufb01cation purposes is the average image during a 4 second span while the\nsubject was thinking about the item shown a few seconds earlier (a period which contains the peak\nof the signal during the trial; the dataset thus contains 360 examples, as many as there were trials.\nThe voxel size was 3 \u00d7 3 \u00d7 5 mm, with the number of voxels being between 20000 and 21000\ndepending on which of the 9 subjects was considered. The features in each example are voxels, and\nthe example labels are the category of the item being shown in the trial each example came from.\n\nsearchlight:\n- a 3x3x3 voxel cube\n- one centered around\n each voxel in cortex\n- overlapping\n\n3\n\nthis is done for all 66 pairwise\nclassi\ufb01cation tasks\n\nanimals vs insects\n\nanimals vs tools\n\n...\n\nvegetables vs vehicles\n\nsearchlight\n\n4\n\nthe binary vector of signi\ufb01cance\nfor each searchlight is rearranged\ninto a binary confusion matrix\n\n1\n\nfor each classi\ufb01cation\ntask, cross-validate a\nclassi\ufb01er in all of\nthe searchlights\n\ne.g. animals vs insects\n\n2\n\ntest the result at each\nsearchlight, which\nyields a binary\nsigni\ufb01cance image\n\nsearchlight accuracy 0.54 0.76 0.61 0.83 0.55 0.46 0.90 \n\nresult signi\ufb01cant\n\nimage as a vector of voxels\n\n5\n\nand adjacent searchlights\nsupporting similar pairwise\ndistinctions are clustered\ntogether using modularity\n\n...\n\n...\n\nanimals\ninsects\ntools\nbuildings\nclothing\nbody parts\nfurniture\n\nvehicles\n\n...\n\n...\n\n...\n\n...\n\ns\nl\no\no\nt\n\ns\nt\nc\ne\ns\nn\n\ni\n\ns\nl\na\nm\nn\na\n\ni\n\ns\ng\nn\nd\n\ni\n\ni\n\ng\nn\nh\nt\no\nl\nc\n\nl\ni\n\nu\nb\n\ns\ne\nl\nc\ni\nh\ne\nv\n\ne\nr\nu\nt\ni\nn\nr\nu\nf\n\ns\nt\nr\na\np\n \ny\nd\no\nb\n\nFigure 1: Construction of data-driven searchlights.\n\n2.2 Method\n\nThe goal of the experiment our dataset comes from is to understand how a certain semantic category\nis represented throughout the brain (e.g. do \u201cInsects\u201d and \u201cAnimals\u201d share part of their representa-\ntion because both kinds of things are alive?). Intuitively, there is information in a given location if\nat least two categories can be distinguished looking at their respective patterns of activation there;\notherwise, the pattern of activation is noise or common to all categories. Our method is based upon\nthis intuition, and comprises three stages:\n\n2The data were kindly shared with us by Tom Mitchell and Marcel Just, from Carnegie Mellon University.\n\n3\n\n\f1. the construction of data-driven searchlights, parcels of the 3D grid where the same dis-\ncriminations between pairs of categories can be made (these are generally larger than the\n3 \u00d7 3 \u00d7 3 basic searchlight)\n\n2. the synthesis of pattern features from each data-driven searchlight, corresponding to the\n\npresence or absence of a certain pattern of activation across it\n\n3. the training and use of a classi\ufb01er based on pattern features and the generation of an anatom-\n\nical map of the impact of each voxel on classi\ufb01cation\n\nand these are described in detail in each of the following sections.\n\n2.2.1 Construction of data-driven searchlights\n\nCreate pairwise searchlight maps\nIn order to identify informative locations we start by consid-\nering whether a given pair of categories can be distinguished in each of the thousands of 3 \u00d7 3 \u00d7 3\nsearchlights covering the brain:\n\n1. For each searchlight cross-validate a classi\ufb01er using the voxels belonging to it, obtaining\nan accuracy value which will be assigned to the voxel at the center of the searchlight,\nas shown in part 1 of Figure 1. The classi\ufb01er used in this case was Linear Discriminant\nAnalysis (LDA, [7]), with a shrinkage estimator for the covariance matrix [18], as this was\nshown to be effective at both modeling the joint activation of voxels in a searchlight and\nclassi\ufb01cation [16].\n\n2. Transform the resulting brain image with the accuracy of each voxel into a p-value brain\nimage (of obtaining accuracy as high or higher under the null hypothesis that the classes\nare not distinguishable, see [11]), as shown in part 1 of Figure 1.\n\n3. Threshold the p-value brain image using False Discovery Rate [5] (q = 0.01) to correct\nmultiple for multiple comparisons and get a binary brain image with candidate locations\nwhere this pair of categories can be distinguished, as shown in part 2 of Figure 1.\n\nThe outcome for each pair of categories is a binary signi\ufb01cance image, where a voxel is 1 if the\ncategories can be distinguished in the searchlight surrounding it or 0 if not; this is shown for all\npairs of categories in part 3 of Figure 1. This can also be viewed per-searchlight, yielding a binary\nvector encoding which category pairs can be distinguished and which can be rearranged into a binary\nmatrix, as shown in part 4 of Figure 1.\n\nAggregate adjacent searchlights Examining each small searchlight makes sense if we consider\nthat, a priori, we don\u2019t know where the information is or how big a pattern of activation would have\nto be considered (with some exceptions, notably areas that respond to faces, houses or body parts, see\n[9] for a review). That said, if the same categories are distinguishable in two adjacent searchlights\n\u2013 which overlap \u2013 then it is reasonable to assume that all their voxels put together would still be\nable to make the same distinctions. Doing this repeatedly allows us to \ufb01nd data-driven searchlights,\nnot bound by shape or size assumptions. At the same time we would like to constrain data-driven\nsearchlights to the boundaries of known, large, anatomically determined regions of interest (ROI),\nboth for computational ef\ufb01ciency and for interpretability, as will be described later.\n\nAt the start of the aggregation process, each searchlight is by itself and has an associated binary\ninformation vector with 66 entries corresponding to which pairs of classes can be distinguished in\nits surrounding searchlight (part 3 of Figure 1). For each searchlight we compute the similarity\nof its information vector with those of all its neighbours, which yields a 3D grid similarity graph.\nWe then take the portion of the graph corresponding to each ROI in the AAL brain atlas [19], and\nuse modularity [1] to divide it into a number of clusters of adjacent searchlights supporting similar\ndistinctions, as shown in panel 5 of Figure 1. After this is done for all ROIs we obtain a partition of\nthe brain into a few hundred clusters, the data-driven searchlights. Figure 2 depicts the granularity\nof a typical clustering across multiple brain slices of one of the participants.\nThe similarity measure between two vectors vi and vj is obtained by computing the number of\nAND(vi, vj), the number of 1-entries present in only one\n1-entries present in both vectors, Ppairs\nof them, Ppairs\n\nXOR(vi, vj) and then the measure\n\n4\n\n\fFigure 2: Data-driven searchlights for participant P1 (brain slices range from inferior to superior).\n\nsimilarity(vi, vj) =\n\nPpairs\n\nAND(vi, vj) \u2212\n\nPpairs\n\nXOR(vi,vj )\n\n2\n\nPpairs\n\nAND(vi, vj)\n\nThe measure was chosen because it peaks at 1, if the two vectors match exactly, and decreases \u2013\npossibly into negative values \u2013 if there are mismatches; it will tolerate more mismatches if there are\nmore distinctions being made. It will also deem sparse vectors similar as long as there are vew few\nmismatches. The number of entries present in only one is divided by 2 so that the differences do not\nget twice the weight of the similarities.\n\nThe centroid for each cluster encodes the pairs of categories that can be distinguished in that data-\ndriven searchlight. The centroid is obtained by combining the binary information vectors for each\nof the searchlights in it using a soft-AND function, and is itself a binary information vector. A given\nentry is 1 \u2013 the respective pair of categories is distinguishable \u2013 if it is 1 in at least q% of the cluster\nmembers (where q is the false discovery rate used earlier to threshold the binary image for that pair\nof categories).\n\n2.2.2 Generation of pattern features from each data-driven searchlight\n\nvoxels\n\nclusters (across class pairs)\n\nclusters (across all examples)\n\npattern features\n\nsingular vectors\n\nexamples\n\ntraining data\n\n1\n\ncluster 1\n\n...\n\nanimals vs\n insects\n\nanimals vs\ntools\n\nvegetables\nvs vehicles\n\n2\n\ncluster 2\n\n...\n\nbody parts\nvs buildings\n\ncluster 3\n\n...\n\ncluster 4\n\nanimals vs\n insects\n\nanimals vs\ntools\n\n...\n\nbody parts\nvs buildings\n\nvegetables\nvs vehicles\n\nSVD\n\n3\n\nFigure 3: Construction of pattern detectors and pattern features from data-driven searchlights.\n\nConstruct two-way classi\ufb01ers from each data-driven searchlight Each data-driven searchlight\nhas a set of pairs of categories that can be distinguished in it. This indicates that there are particular\npatterns of activation across the voxels in it which are characteristic of one or more categories, and\nabsent in others. We can leverage this to convert the pattern of activation across the brain into a\nseries of sub-patterns, one from each data-driven searchlight.\n\nFor each data-driven searchlight, and for each pairwise category distinction in its information vector,\nwe train a classi\ufb01er using examples of the two categories and just the voxels in the searchlight (a\nlinear SVM with \u03bb = 1, [3]); these will be pattern detectors, outputting a probability estimate for\nthe prediction (which we transform to the [\u22121, 1] range), shown in part 1 of Figure 3.\n\n5\n\n\fUse two-way classi\ufb01ers to generate pattern features The set of pattern-detectors learned from\neach data-driven searchlight can be applied to any example, not just the ones from the categories\nthat were used to learn them. The output of each pattern-detector is then viewed as representing the\ndegree to which the detector thinks that either of the patterns it is sensitive to is present. For each\ndata-driven searchlight, we apply all of its detectors to all the examples in the training set, over the\nvoxels belonging to the searchlight, as illustrated in part 2 of Figure 3. The output of each detector\nacross all examples becomes a new, synthetic pattern feature. The number of these pattern features\nvaries per searchlight, as does the number of searchlights per subject, but at the end we will typically\nhave between 10K and 20K of them.\n\nNote that there may be multiple classi\ufb01ers for a given cluster which produce very similar outputs\n(e.g. ones that captured a pattern present in all animate object categories versus one present in all\ninanimate object ones); these will be highly correlated and redundant. We address this by using\nSingular Value Decomposition (SVD, [7]) to reduce the dimensionality of the matrix of pattern\nfeatures to the same as the number of examples (180), keeping all singular vectors; this is shown in\npart 3 of Figure 3. The detectors and the SVD transformation matrix learned from the training set\nare also applied to the test set.\n\n2.2.3 Classi\ufb01cation and impact maps for each class\n\nsingular vector classi\ufb01er\n for \"tools\"-vs-rest\n\npattern feature classi\ufb01er\n for \"tools\"-vs-rest\n\npattern feature impact values\n\ninvert SVD\n\n1\n\n\"tools\" singular vectors\n\nX\n\n2\n\n\"tools\" pattern features\n\naggregate impact of pattern\nfeatures belonging to each cluster\n\nper-cluster impact values\n\n3\n\ninvert SVD\n\nassign per-cluster impact value\nto the voxels that belong to it\n\nvoxelwise impact values\n\nFigure 4: The process of going from the weights of a one-versus-rest category classi\ufb01er over a\nlow-dimensional pattern feature representation to the impact of each voxel in that classi\ufb01cation.\n\nGiven the low-dimensional pattern feature dataset, we train a one-versus-rest classi\ufb01er (a linear\nSVM with \u03bb = 1, [3]) for each category; these are then applied to each example in the test set, with\nthe label prediction corresponding to the class with the highest class probability.\n\nThe classi\ufb01ers can also be used to determine the extent to which each data-driven searchlight was\nresponsible for correctly predicting each class. A one-versus-rest category classi\ufb01er consists of a\nvector of 180 weights, which can be converted into an equivalent classi\ufb01er over pattern features by\ninverting the SVD, as shown in part 1 of Figure 4. The impact of each pattern feature in correctly\npredicting this category can be calculated by multiplying each weight by the values taken by the\ncorresponding pattern feature over examples in the category, and averaging across all examples; this\nis shown in part 2 of Figure 4. These pattern-feature impact values can then be aggregated by the\ndata-driven searchlight they came from, yielding a net impact value for that searchlight. This is the\nvalue that is propagated to each voxel in the data-driven searchlight (part 3 of Figure 4) in order to\ngenerate an impact map.\n\n3 Experiments and Discussion\n\n3.1 Classi\ufb01cation\n\nOur goal in this experiment is to determine whether transforming the data from voxel features to\npattern features preserves information, and how competitive the results are with a classi\ufb01er com-\nbined with voxel selection. In all experiments we use a split-half cross-validation loop, where the\nhalves contain examples from even and odd epochs, respectively, 180 examples in each (15 per cat-\n\n6\n\n\fegory). If cross-validation inside a split-half training set is required, we use leave-one-epoch out\ncross-validation,\n\nBaseline We contrasted experimental results obtained with our method with a baseline of classi-\n\ufb01cation using voxel selection. The scoring criterion used to rank each voxel was the accuracy of a\nLDA classi\ufb01er \u2013 same as described above \u2013 using the 3 \u00d7 3 \u00d7 3 searchlight around each voxel to\ndo 12-category classi\ufb01cation. The number of voxels to use was selected by nested cross-validation\ninside the training set 3. The classi\ufb01er used was a linear SVM (\u03bb = 1, [3]), same as the whole brain\nclassi\ufb01er in our method.\n\nResults The results are shown in the \ufb01rst line of Table 1; across subjects, our method is better than\nvoxel selection, with the p-value of a sign-test of this being < 0.01. It is substantially better than a\nclassi\ufb01er using all the voxels in the brain directly.\n\nWhereas the accuracy is above chance (0.08) for all subjects, it is rather low for some. There\nare at least two factors responsible for this. The \ufb01rst is that some classes give rise to very similar\npatterns of activation (e.g. \u201cbuildings\u201d and \u201cbuilding parts\u201d), and hence examples in these classes are\nconfusable (confusion matrices bear this out). The second factor is that subjects vary in their ability\nto stay focused on the task and avoid stray thoughts or remembering other parts of the experiment,\nhence examples may not belong to the class corresponding to the label or even any class at all. [13]\nalso points out that accuracy is correlated with a subject\u2019s ability to stay still during the experiment.\n\nTable 1: Classi\ufb01cation accuracy for the 9 subjects using our method, as well as two baselines.\nP9\n0.16\n0.15\n0.15\n2000\n100\n\nour method\nbaseline (voxel selection)\nbaseline (using all voxels)\n#voxels selected (fold 1)\n#voxels selected (fold 2)\n\nP8\n0.21\n0.20\n0.13\n400\n1200\n\nP2\n0.34\n0.33\n0.21\n400\n200\n\nP3\n0.33\n0.24\n0.19\n200\n100\n\nP6\n0.19\n0.16\n0.09\n800\n8000\n\nP7\n0.22\n0.21\n0.14\n800\n100\n\nP1\n0.54\n0.53\n0.31\n1200\n800\n\nP4\n0.42\n0.34\n0.27\n1600\n800\n\nP5\n0.15\n0.14\n0.13\n800\n50\n\n3.2\n\nImpact maps\n\ntool\n\nbuilding\n\nFigure 5: Average example for categories \u201ctool\u201d and \u201cbuilding\u201d in participant P1 (slices ordered\nfrom inferior to superior, red is activation above the image mean, blue below).\n\nAs described in Section 2.2.3, an impact map can be produced for each category, showing the extent\nto which each data-driven searchlight helped classify that category correctly.\nIn order to better\nunderstand better how impact works, consider two categories \u201ctools\u201d and \u201cbuildings\u201d where we\nknow where some of the information resides (for \u201ctools\u201d around the central sulcus, visible on the\nright of slices to the right, for \u201cbuildings\u201d around the parahippocampal gyrus, visible on the lower\nside of slices to the left). Figure 5 shows the average example for the two categories; note how\nsimilar the two examples are across the slices, indicating that most activation is shared between the\ntwo categories.\n\nThe impact maps for the same participant in Figure 6 show that much of the common activation is\neliminated, and that the areas known to be informative are assigned high impact in their respective\n\n3Possible choices were 50, 100, 200, 400, 800, 1200, 1600, 2000, 4000, 8000, 16000 or all voxels.\n\n7\n\n\ftool\n\nbuilding\n\ntool\n\nbuilding\n\nFigure 6: Impact map for categories \u201ctool\u201d and \u201cbuilding\u201d in participant P1.\n\nFigure 7: Average impact map for categories \u201ctool\u201d and \u201cbuilding\u201d across the nine participants.\n\nmaps. Impact is positive, regardless of whether activation in each voxel involved is above or below\nthe mean of the image; the activation of each voxel in\ufb02uences the classi\ufb01er only in the context of\nits neighbours in each data-driven searchlight. Note, also, that unlike a simple one-vs-rest classi\ufb01er\nor searchlight map, the notion of impact can accommodate the situation where the same location is\nuseful, with either different or the same pattern of activation, for two separate classes (rather than\nhave it be downweighted relative to others that might be unique to that particular class).\n\nFinally, consider that impact maps can be averaged across subjects, as shown in Figure 7, or un-\ndergo t-tests or a more complex second-level group analysis. A more exploratory analysis can be\nperformed by considering locations that are high impact for every participant and, through their\ndata-driven searchlight, examine the corresponding cluster centroids and get a complete picture of\nwhich subsets of the classes can be distinguished there (similar to the bottom-up process in part 5 of\nFigure 1, but now done top-down and given a cross-validated classi\ufb01cation result and impact value).\n\nReferences\n\n[1] VD Blondel, JL Guillaume, R Lambiotte, and E Lefebvre. Fast unfolding of communities in large net-\n\nworks. Journal of Statistical Mechanics: Theory and Experiment, (10):1\u201312, 2008.\n\n[2] Melissa K Carroll, Guillermo a Cecchi, Irina Rish, Rahul Garg, and a Ravishankar Rao. Prediction and\ninterpretation of distributed neural activity with sparse models. NeuroImage, 44(1):112\u201322, January 2009.\n\n[3] C.C. Chang and C.J. Lin. LIBSVM: a library for support vector machines. Technical report, 2001.\n[4] Karl J Friston, John Ashburner, Stefan J Kiebel, Thomas E Nichols, and W D Penny. Statistical Paramet-\n\nric Mapping: The Analysis of Functional Brain Images. Academic Press, 2006.\n\n[5] Christopher R Genovese, Nicole a Lazar, and Thomas Nichols. Thresholding of statistical maps in func-\n\ntional neuroimaging using the false discovery rate. NeuroImage, 15(4):870\u20138, 2002.\n\n[6] Stephen Jos\u00b4e Hanson, Toshihiko Matsuka, and James V Haxby. Combinatorial codes in ventral temporal\nlobe for object recognition: Haxby (2001) revisited: is there a \u201dface\u201d area? NeuroImage, 23(1):156\u201366,\n2004.\n\n[7] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining,\n\ninference and prediction. Springer-Verlag, 2001.\n\n[8] J. Haynes and G. Rees. Decoding mental states from brain activity in humans. Nature Reviews Neuro-\n\nscience, 7(7):523\u201334, 2006.\n\n[9] Marcel Adam Just, Vladimir L Cherkassky, Sandesh Aryal, and Tom M Mitchell. A neurosemantic theory\n\nof concrete noun representation based on the underlying brain codes. PloS one, 5(1):e8622, 2010.\n\n8\n\n\f[10] N Kriegeskorte, R. Goebel, and P. Bandettini. Information-based functional brain mapping. Proceedings\n\nof the National Academy of Sciences, 103(10):3863, 2006.\n\n[11] John Langford. Tutorial on Practical Prediction Theory for Classi\ufb01cation. Journal of Machine Learning\n\nResearch, 6:273\u2013306, 2005.\n\n[12] T. M. Mitchell, R. Hutchinson, R. S. Niculescu, F. Pereira, X. Wang, M. Just, and S. Newman. Learning\n\nto Decode Cognitive States from Brain Images. Machine Learning, 57(1/2):145\u2013175, October 2004.\n\n[13] T. M. Mitchell, S. V. Shinkareva, A. Carlson, K. Chang, V. L. Malave, R. A. Mason, and M. A. Just.\nPredicting human brain activity associated with the meanings of nouns. Science, 320(5880):1191\u20135,\n2008.\n\n[14] K. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby. Beyond mind-reading: multi-voxel pattern\n\nanalysis of fMRI data. Trends in cognitive sciences, 10(9):424\u201330, 2006.\n\n[15] F Pereira and M Botvinick. Classi\ufb01cation of functional magnetic resonance imaging data using infor-\nmative pattern features. Proceedings of the 17th ACM SIGKDD international conference on Knowledge\ndiscovery and data mining - KDD \u201911, page 940, 2011.\n\n[16] F. Pereira and M. Botvinick. Information mapping with pattern classi\ufb01ers: a comparative study. Neu-\n\nroImage, 56(2):835\u2013850, 2011.\n\n[17] Peter Mondrup Rasmussen, Kristoffer Hougaard Madsen, Torben Ellegaard Lund, and Lars Kai Hansen.\nVisualization of nonlinear kernel models in neuroimaging by sensitivity maps. NeuroImage, 55(3):1120\u2013\n31, April 2011.\n\n[18] Juliane Sch\u00a8afer and Korbinian Strimmer. A shrinkage approach to large-scale covariance matrix estima-\ntion and implications for functional genomics. Statistical applications in genetics and molecular biology,\n4:Article32, January 2005.\n\n[19] N Tzourio-Mazoyer, B Landeau, D Papathanassiou, F Crivello, O Etard, N Delcroix, B Mazoyer, and\nM Joliot. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcel-\nlation of the MNI MRI single-subject brain. NeuroImage, 15(1):273\u201389, 2002.\n\n9\n\n\f", "award": [], "sourceid": 1118, "authors": [{"given_name": "Francisco", "family_name": "Pereira", "institution": null}, {"given_name": "Matthew", "family_name": "Botvinick", "institution": null}]}