{"title": "Categories and Functional Units: An Infinite Hierarchical Model for Brain Activations", "book": "Advances in Neural Information Processing Systems", "page_first": 1252, "page_last": 1260, "abstract": "We present a model that describes the structure in the responses of different brain areas to a set of stimuli in terms of stimulus categories\" (clusters of stimuli) and \"functional units\" (clusters of voxels). We assume that voxels within a unit respond similarly to all stimuli from the same category, and design a nonparametric hierarchical model to capture inter-subject variability among the units. The model explicitly captures the relationship between brain activations and fMRI time courses. A variational inference algorithm derived based on the model can learn categories, units, and a set of unit-category activation probabilities from data. When applied to data from an fMRI study of object recognition, the method finds meaningful and consistent clusterings of stimuli into categories and voxels into units.\"", "full_text": "Categories and Functional Units: An In\ufb01nite\n\nHierarchical Model for Brain Activations\n\nDanial Lashkari\n\nRamesh Sridharan\n\nPolina Golland\n\nComputer Science and Arti\ufb01cial Intelligence Laboratory\n\nMassachusetts Institute of Technology\n\nCambridge, MA 02139\n\n{danial, rameshvs, polina}@csail.mit.edu\n\nAbstract\n\nWe present a model that describes the structure in the responses of different brain\nareas to a set of stimuli in terms of stimulus categories (clusters of stimuli) and\nfunctional units (clusters of voxels). We assume that voxels within a unit respond\nsimilarly to all stimuli from the same category, and design a nonparametric hier-\narchical model to capture inter-subject variability among the units. The model ex-\nplicitly encodes the relationship between brain activations and fMRI time courses.\nA variational inference algorithm derived based on the model learns categories,\nunits, and a set of unit-category activation probabilities from data. When applied\nto data from an fMRI study of object recognition, the method \ufb01nds meaningful\nand consistent clusterings of stimuli into categories and voxels into units.\n\n1\n\nIntroduction\n\nThe advent of functional neuroimaging techniques, in particular fMRI, has for the \ufb01rst time provided\nnon-invasive, large-scale observations of brain processes. Functional imaging techniques allow us to\ndirectly investigate the high-level functional organization of the human brain. Functional speci\ufb01city\nis a key aspect of this organization and can be studied along two separate dimensions: 1) which sets\nof stimuli or cognitive tasks are treated similarly by the brain, and 2) which areas of the brain have\nsimilar functional properties. For instance, in the studies of visual object recognition the \ufb01rst ques-\ntion de\ufb01nes object categories intrinsic to the visual system, while the second characterizes regions\nwith distinct pro\ufb01les of selectivity. To answer these questions, fMRI studies examine the responses\nof all relevant brain areas to as many stimuli as possible within the domain under study. Novel\nmethods of analysis are needed to extract the patterns of functional speci\ufb01city from the resulting\nhigh-dimensional data.\n\nClustering is a natural choice for answering questions we pose here regarding functional speci\ufb01city\nwith respect to both stimuli and voxels. Applying clustering in the space of stimuli identi\ufb01es stimuli\nthat induce similar patterns of response and has been recently used to discover object categories\nfrom responses in the human inferior temporal cortex [1]. Applying clustering in the space of brain\nlocations seeks voxels that show similar functional responses [2, 3, 4, 5]. We will refer to a cluster\nof voxels with similar responses as a functional unit.\n\nIn this paper, we present a model to investigate the interactions between these two aspects of func-\ntional speci\ufb01city. We make the natural assumptions that functional units are organized based on\ntheir responses to the categories of stimuli and the categories of stimuli can be characterized by the\nresponses they induce in the units. Therefore, categories and units are interrelated and informative\nabout each other. Our generative model simultaneously learns the speci\ufb01city structure in the space of\nboth stimuli and voxels. We use a block co-clustering framework to model the relationship between\nclusters of stimuli and brain locations [6]. In order to account for variability across subjects in a\ngroup study, we assume a hierarchical model where a group-level structure generates the clustering\nof voxels in different subjects (Fig. 1). A nonparametric prior enables the model to search the space\n\n1\n\n\fFigure 1: Co-clustering fMRI data across subjects. The \ufb01rst row shows a hypothetical data set of\nbrain activations. The second row shows the same data after co-clustering, where rows and columns\nare re-ordered based on the membership in categories and functional units.\n\nof different numbers of clusters. Furthermore, we tailor the method speci\ufb01cally to brain imaging\nby including a model of fMRI signals [7]. Most prior work applies existing machine learning algo-\nrithms to functional neuroimaging data. In contrast, our Bayesian integration of the co-clustering\nmodel with the model of fMRI signals informs each level of the model about the uncertainties of\ninference in the other levels. As a result, the algorithm is better suited to handling the high levels of\nnoise in fMRI observations.\n\nWe apply our method to a group fMRI study of visual object recognition where 8 subjects are\npresented with 69 distinct images. The algorithm \ufb01nds a clustering of the set of images into a\nnumber of categories along with a clustering of voxels in different subjects into units. We \ufb01nd that\nthe learned categories and functional units are indeed meaningful and consistent.\n\nRelated Work Different variants of co-clustering algorithms have found applications in biological\ndata analysis [8, 9, 10]. Our model is closely related to the probabilistic formulations of co-clustering\n[11, 12] and the application of In\ufb01nite Relational Models to co-clustering [13]. Prior work in the\napplications of advanced machine learning techniques to fMRI has mainly focused on supervised\nlearning, which requires prior knowledge of stimulus categories [14]. Unsupervised learning meth-\nods such as Independent Component Analysis (ICA) have also been applied to fMRI data to de-\ncompose it into a set of spatial and temporal (functional) components [15, 16]. ICA assumes an\nadditive model for the data and allows spatially overlapping components. However, neither of these\nassumptions is appropriate for studying functional speci\ufb01city. For instance, an fMRI response that\nis a weighted combination of a component selective for category A and another component selective\nfor category B may be better described by selectivity for a new category (the union of both). We\nalso note that Formal Concept Analysis, which is closely related to the idea of block co-clustering,\nhas been recently applied to neural data from visual studies in monkeys [17].\n\n2 Model\n\nOur model consists of three main components:\n\nI. Co-clustering structure expressing the relationship between the clustering of stimuli (cate-\n\ngories) and the clustering of brain voxels (functional units),\n\nII. Hierarchical structure expressing the variability among functional units across subjects,\nIII. Signal model expressing the relationship between voxel activations and observed fMRI\n\ntime courses.\n\nThe co-clustering level is the key element of the model that encodes the interactions between stim-\nulus categories and functional units. Due to the differences in the level of noise among subjects, we\ndo not expect to \ufb01nd the same set of functional units in all subjects. We employ the structure of the\nHierarchical Dirichlet Processes (HDP) [18] to account for this fact. The \ufb01rst two components of the\nmodel jointly explain how different brain voxels are activated by each stimulus in the experiment.\nThe third component of the model links these binary activations to the observed fMRI time courses\n\n2\n\n\fxjis\nzji\ncs\n\u03c6k,l\n\u03b2j\n\u03c0\n\u03b1, \u03b3\n\u03c1\n\u03c7\n\u03c4\nyjit\nejih\naji\n\u03bbji\n\u00b5a\n\u00b5e\n\u03baj , \u03b8j\n\nj , \u03c3a\njh, \u03c3e\n\nj\n\njh\n\nactivation of voxel i in subject j to stimulus s\nunit membership of voxel i in subject j\ncategory membership of stimulus s\nactivation probability of unit k to category l\nunit prior weight in subject j\ngroup-level unit prior weight\nunit HDP scale parameters\ncategory prior weight\ncategory DP scale parameters\nprior parameters for actviation probabilities \u03c6\nfMRI signal of voxel i in subject j at time t\nnuisance effect h for voxel i in subject j\namplitude of activation of voxel i in subject j\nvariance reciprocal of noise for voxel i in subject j\nprior parameters for response amplitudes\nprior parameters for nuisance factors\nprior parameters for noise variance\n\nFigure 2: The graphical representation of our model where the set of voxel response variables\nh, \u03baj, \u03b8j) are denoted by \u03b7ji\n(aji, ejih, \u03bbji) and their corresponding prior parameters (\u00b5a\nand \u03d1j, respectively.\n\nh, \u03c3e\n\nj , \u03c3a\n\nj , \u00b5e\n\nof voxels. Sec. 2.1 presents the hierarchical co-clustering part of the model that includes both the\n\ufb01rst and the second components above. Sec. 2.2 presents the fMRI signal model that integrates the\nestimation of voxel activations with the rest of the model. Sec. 2.3 outlines the variational algorithm\nthat we employ for inference. Fig. 2 shows the graphical model for the joint distribution of the\nvariables in the model.\n\n2.1 Nonparametric Hierarchical Co-clustering Model\n\nLet xjis \u2208 {0, 1} be an activation variable that indicates whether stimulus s activates voxel i in\nsubject j. The co-clustering model describes the distribution of voxel activations xjis based on\nthe category and the functional units to which stimulus s and voxel i belong. We assume that all\nvoxels within functional unit k have the same probability \u03c6k,l of being activated by a particular\ncategory l of stimuli. Let z = {zji}, (zji \u2208 {1, 2, \u00b7 \u00b7 \u00b7 }) be the set of unit memberships of voxels\nand c = {cs}, (cs \u2208 {1, 2, \u00b7 \u00b7 \u00b7 }) the set of category memberships of the stimuli. Our model of\nco-clustering assumes:\n\nxjis | zji, cs, \u03c6\n\ni.i.d.\u223c Bernoulli(\u03c6zji,cs ).\n\nThe set \u03c6 = {\u03c6k,l} of the probabilities of activation of functional units to different categories\nsummarizes the structure in the responses of voxels to stimuli.\n\nWe use the stick-breaking formulation of HDP [18] to construct an in\ufb01nite hierarchical prior for\nvoxel unit memberships:\n\n(1)\n\n(2)\n\n(3)\n\ni.i.d.\u223c Mult(\u03b2j),\nzji | \u03b2j\n\u03b2j | \u03c0 i.i.d.\u223c Dir(\u03b1\u03c0),\n\n(4)\nHere, GEM(\u03b3) is a distribution over in\ufb01nitely long vectors \u03c0 = [\u03c01, \u03c02, \u00b7 \u00b7 \u00b7 ]T , named after Grif\ufb01ths,\nEngen and McCloskey [19]. This distribution is de\ufb01ned as:\n\n\u03c0 | \u03b3 \u223c GEM(\u03b3).\n\nk\u22121\n\n\u03c0k = vk\n\n(1 \u2212 vk\u2032) ,\n\nvk | \u03b3 i.i.d.\u223c Beta(1, \u03b3),\n\n(5)\n\nYk\u2032=1\n\nwhere the components of the generated vectors \u03c0 sum to one with probability 1.\nIn subject j,\nvoxel memberships are distributed according to subject-speci\ufb01c weights of functional units \u03b2j. The\nweights \u03b2j are in turn generated by a Dirichlet distribution centered around \u03c0 with a degree of\nvariability determined by \u03b1. Therefore, \u03c0 acts as the group-level expected value of the subject-\nspeci\ufb01c weights. With this prior over the unit memberships of voxels z, the model in principle\nallows an in\ufb01nite number of functional units; however, for any \ufb01nite set of voxels, a \ufb01nite number\nof units is suf\ufb01cient to include all voxels.\n\nWe do not impose a similar hierarchical structure on the clustering of stimuli among subjects.\nConceptually, we assume that stimulus categories re\ufb02ect how the human brain has evolved to\n\n3\n\n\forganize the processing of stimuli within a system and are therefore identical across subjects. Even\nif any variability exists, it will be hard to learn such a complex structure from data since we can\npresent relatively few stimuli in each experiment. Hence, we assume identical clustering c in the\nspace of stimuli for all subjects, with a Dirichlet process prior:\n\ncs | \u03c1 i.i.d.\u223c Mult(\u03c1),\n\n\u03c1 | \u03c7 \u223c GEM(\u03c7).\n\nFinally, we construct the prior distribution for unit-category activation probabilities \u03c6:\n\n\u03c6k,l\n\ni.i.d.\u223c Beta(\u03c41, \u03c42).\n\n2.2 Model of fMRI Signals\n\n(6)\n\n(7)\n\nFunctional MRI yields a noisy measure of average neuronal activation in each brain voxel at different\ntime points. The standard linear time-invariant model of fMRI signals expresses the contribution of\neach stimulus by the convolution of the spike train of stimulus onsets with a hemodynamic response\nfunction (HRF) [20]. The HRF peaks at about 6-9 seconds, modeling an intrinsic delay between\nthe underlying neural activity and the measured fMRI signal. Accordingly, measured signal yjit in\nvoxel i of subject j at time t is modeled as:\n\nyjit =Xs\n\nbjisGst +Xh\n\nejihFht + \u01ebjit,\n\n(8)\n\ni.i.d.\u223c Normal(0, \u03bb\u22121\n\nwhere Gst is the model regressor for stimulus s, Fht represents nuisance factor h, such as a baseline\nor a linear temporal trend, at time t and \u01ebjit is gaussian noise. We use the simplifying assumption\nthroughout that \u01ebjit\nji ). In the absence of any priors, the response bjis of voxel i\nto stimulus s can be estimated by solving the least squares regression problem.\nUnfortunately, fMRI signal does not have a meaningful scale and may vary greatly across trials and\nexperiments. In order to use this data for inferences about brain function across subjects, sessions,\nand stimuli, we need to transform it into a standard and meaningful space. The binary activation\nvariables x, introduced in the previous section, achieve this transformation by assuming that in\nresponse to any stimulus a voxel is either in an active or a non-active state, similar to [7]. If voxel\ni is activated by stimulus s, i.e., if xjis = 1, its response takes positive value aji that speci\ufb01es the\nvoxel-speci\ufb01c amplitude of response; otherwise, its response remains 0. We can write bjis = ajixjis\nand assume that aji represents uninteresting variability in fMRI signal. When making inference on\nbinary activation variable xjis, we consider not only the response, but also the level of noise and\nresponses to other stimuli. Therefore, the binary activation variables can be directly compared across\ndifferent subjects, sessions, and experiments.\n\nWe assume the following priors on voxel response variables:\njh, \u03c3e\nj , \u03c3a\n\u03bbji \u223c Gamma (\u03baj, \u03b8j) ,\n\nejih \u223c Normal(cid:0)\u00b5e\naji \u223c Normal+(cid:0)\u00b5a\n\njh(cid:1) ,\nj(cid:1) ,\n\nwhere Normal+ de\ufb01nes a normal distribution constrained to only take positive values.\n\n(9)\n(10)\n(11)\n\n2.3 Algorithm\n\nThe size of common fMRI data sets and the space of hidden variables in our model makes stochastic\ninference methods, such as Gibbs sampling, prohibitively slow. Currently, there is no faster split-\nmerge-type sampling technique that can be applied to hierarchical nonparametric models [18]. We\ntherefore choose a variational Bayesian inference scheme, which is known to yield faster algorithms.\n\nTo formulate the inference for the hierarchical unit memberships, we closely follow the derivation\nof the Collapsed Variational HDP approximation [21]. We integrate over the subject-speci\ufb01c unit\nweights \u03b2 = {\u03b2j} and introduce a set of auxiliary variables r = {rjk} that represent the number\nof tables corresponding to unit (dish) k in subject (restaurant) j according to the Chinese restaurant\nfranchise formulation of HDP [18]. Let h = {x, z, c, r, a, \u03c6, e, \u03bb, v, u} denote the set of all un-\nobserved variables. Here, v = {vk} and u = {ul} are the stick breaking fractions corresponding\n\n4\n\n\fto distributions \u03c0 and \u03c1, respectively. We approximate the posterior distribution on the hidden vari-\nables given the observed data p(h|y) by a factorizable distribution q(h). The variational method\nminimizes the Gibbs free energy function F[q] = E[log q(h)] \u2212 E[log p(y, h)] where E[\u00b7] indicates\nexpected value with respect to distribution q. We assume a distribution q of the form:\n\nq(h) = q(r|z)Yk\n\nq(vk)Yl\n\nq(ul)Yk,l\n\nq(\u03c6k,l)Ys\n\nq(cs) \u00b7Yj,i \"q(aji)q(\u03bbji)q(zji)Ys\n\nq(xjis)Yh\n\nq(ejih)# .\n\nWe apply coordinate descent in the space of q(\u00b7) to minimize the free energy. Since we explicitly\naccount for the dependency of the auxiliary variables on unit memberships in the posterior, we can\nderive closed form update rules for all hidden variables. Due to space constraints in this paper, we\npresent the update rules and their derivations in the Supplementary Material.\n\nIterative application of the update rules leads to a local minimum of the Gibbs free energy. Since\nvariational solutions are known to be biased toward their initial con\ufb01gurations, the initialization\nphase becomes critical to the quality of the results. For initialization of the activation variables xjis,\nwe estimate bjis in Eq. (8) using least squares regression and for each voxel normalize the estimates\nto values between 0 and 1 using the voxel-wise maximum and minimum. We use the estimates\nof b to also initialize \u03bb and e. For memberships, we initialize q(z) by introducing the voxels one\nby one in random order to the collapsed Gibbs sampling scheme [18] constructed for our model\nwith each stimulus as a separate category and the initial x assumed known. We initialize category\nmemberships c by clustering the voxel responses across all subjects. Finally, we set the hyperparam-\neters of the fMRI model such that they match the corresponding statistics computed by least squares\nregression on the data.\n\n3 Results\n\nWe demonstrate the performance of the\nmodel and the inference algorithm on\nboth synthetic and real data. As a base-\nline algorithm for comparison, we use the\nBlock Average Co-clustering (BAC) al-\ngorithm [6] with the Euclidean distance.\nFirst, we show that the hierarchical struc-\nture of our algorithm enables us to retrieve\nthe cluster membership more accurately in\nsynthetic group data. Then, we present the\nresults of our method in an fMRI study of\nvisual object recognition.\n\n3.1 Synthetic Data\n\n NBC\nBAC\n\n0510\n \n2 4 6\n\nDataset 1\n\nClassification Accuracy (CA)\nDataset 4\n\nDataset 3\n\nDataset 2\n\nDataset 5\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nDataset 1\n\nNormalized Mutual Information (NMI)\nDataset 2\n\nDataset 3\n\nDataset 4\n\nDataset 5\n\n1\n0.75\n0.5\n0.25\n0\n\n1\n0.75\n0.5\n0.25\n0\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nVoxelsStimuli\n\nFigure 3: Comparison between our nonparametric\nBayesian co-clustering algorithm (NBC) and Block\nAverage Co-clustering (BAC) on synthetic data. Both\nclassiciation accuracy (CA) and noramlized mutual in-\nformation (NMI) are reported.\n\nWe generate synthetic data from a stochastic process de\ufb01ned by our model with the set of parameters\n\u03b3 = 3, \u03b1 = 100, \u03c7 = 1, and \u03c41 = \u03c42 = 1, Nj = 1000 voxels, S = 100 stimuli, and J =\n4 subjects. For the model of the fMRI signals, we use parameters that are representative of our\nexperimental setup and the corresponding hyperparameters estimated from the data. We generate 5\ndata sets with these parameters; they have between 5 to 7 categories and 13 to 21 units. We apply\nour algorithm directly to time courses in 5 different data sets generated using the above scheme. To\napply BAC to the same data sets, we need to \ufb01rst turn the time-courses into voxel-stimulus data.\nWe use the least squares estimates of voxel responses (bjis) normalized in the same way as we\ninitialize our fMRI model. We run each algorithm 20 times with different initializations. The BAC\nalgorithm is initialized by the result of a soft k-means clustering in the space of voxels. Our method\nis initialized as explained in the previous section. For BAC, we use the true number of clusters while\nour algorithm is always initialized with 15 clusters.\n\nWe evaluate the results of clustering with respect to both voxels and stimuli by comparing cluster-\ning results with the ground truth. Since there is no consensus on the best way to compare different\nclusterings of the same set, here we employ two different clustering distance measures. Let P (k, k\u2032)\ndenote the fraction of data points (voxels or stimuli) assigned to cluster k in the ground truth and k\u2032\n\n5\n\n\fin the estimated clustering. The \ufb01rst measure is the so-called classi\ufb01cation accuracy (CA), which\nis de\ufb01ned as the fraction of data points correctly assigned to the true clusters [22]. To compute this\nmeasure, we need to \ufb01rst match the cluster indices in our results with the true clustering. We \ufb01nd\na one-to-one matching between the two sets of clusters by solving a bipartite graph matching prob-\nlem. We de\ufb01ne the graph such that the two sets of cluster indices represent the nodes and P (k, k\u2032)\nrepresents the weight of the edge between node k and k\u2032. As the second measure, we use the normal-\nized mutual information (NMI), which expresses the proportion of the entropy (information) of the\nground truth clustering that is shared with the estimated clustering. We de\ufb01ne two random variables\nX and Y that take values in the spaces of the true and the estimated cluster indices, respectively.\nAssuming a joint distribution P (X=k, Y =k\u2032) = P (k, k\u2032), we set N M I = I(X; Y )/H(X). Both\nmeasures take values between 0 and 1, with 1 corresponding to perfect clustering.\n\nFig. 3 presents the clustering quality measures for the two algorithms on the 5 generated data sets.\nAs expected, our method performs consistently better in \ufb01nding the true clustering structure on data\ngenerated by the co-clustering process. Since the two algorithms share the same block co-clustering\nstructure, the advantage of our method is in its model for the hierarchical structure and fMRI signals.\n\n3.2 Experiment\n\nWe apply our method to data from an fMRI study where 8 subjects view 69 distinct images. Each\nimage is repeated on average about 40 times in one of the two sessions in the experiment. The data\nincludes 42 slices of 1.65mm thickness with in plane voxel size of 1.5mm, aligned with the tempo-\nral lobe (ventral visual pathway). As part of the standard preprocessing stream, the data was \ufb01rst\nmotion-corrected separately for the two sessions [23], and then spatially smoothed with a Gaussian\nkernel of 3mm width. The time course data included 120 volumes per run and from 24 to 40 runs\nfor each subject. We registered the data from the two sessions to the subject\u2019s native anatomical\nspace [24]. We removed noisy voxels from the analysis by performing an ANOVA test and only\nkeeping the voxels for which the stimulus regressors signi\ufb01cantly explained the variation in the time\ncourse (threshold p=10\u22124 uncorrected). This procedure selects on average about 6,000 voxels for\neach subject. Finally, to remove the idiosyncratic aspects of responses in different subjects, such as\nattention to particular stimuli, we regressed out the subject-average time course from voxel signals\nafter removing the baseline and linear trend. We split trials of each image into two groups of equal\nsize and consider each group as an independent stimulus forming a total of 138 stimuli. Hence, we\ncan examine the consistency of our stimulus categorization with respect to identical trials.\n\nWe use \u03b1 = 100, \u03b3 = 5, \u03c7 = 0.1, and \u03c41 = \u03c42 = 1 for the nonparametric prior. We initialize our\nalgorithm 20 times and choose the solution that achieves the lowest Gibbs free energy. Fig. 4 shows\nthe categories that the algorithm \ufb01nds on the data from all 8 subjects. First, we note that stimulus\npairs corresponding to the same image are generally assigned to the same category, con\ufb01rming the\nconsistency of the resuls across trials. Category 1 corresponds to the scene images and, interestingly,\nalso includes all images of trees. This may suggest a high level category structure that is not merely\ndriven by low level features. Such a structure is even more evident in the 4th category where images\nof a tiger that has a large face join human faces. Some other animals are clustered together with\nhuman bodies in categories 2 and 9. Shoes and cars, which have similar shapes, are clustered\ntogether in category 3 while tools are mainly found in category 6.\n\nThe interaction between the learned categories and the functional units is summarized in the poste-\nrior unit-category activation probabilities E[\u03c6k,l] ( Fig. 4, right ). The algorithm \ufb01nds 18 units across\nall subjects. The largest unit does not show preference for any of the categories. Functional unit 2\nis the most selective one and shows high activation for category 4 (faces). This \ufb01nding agrees with\nprevious studies that have discovered face-selective areas in the brain [25]. Other units show selec-\ntivity for different combinations of categories. For instance, Unit 6 prefers categories that mostly\ninclude body parts and animals, unit 8 prefers category 1 (scenes and trees), while the selectivity of\nunit 5 seems to be correlated with the pixel-size of the image.\n\nOur method further learns sets of variables {q(zji=k)}Nj\ni=1 that represent the probabilities that dif-\nferent voxels in subject j belong to functional unit k. Although the algorithm does not use any\ninformation about the spatial location of voxels, we can visualize the posterior membership proba-\nbilities in each subject as a spatial map. To see whether there is any degree of spatial consistency in\nthe locations of the learned units across subjects, we align the brains of all subjects with the Montreal\n\n6\n\n\fCategories\n\n1:\n\n2:\n\n3:\n\n4:\n\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n\nUnit 1\n\nUnit 2\n\nUnit 3\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\n1\n\n0.5\n\n0\n\nUnit 5\n\nUnit 8\n\nUnit 11\n\nUnit 14\n\nUnit 17\n\n1 2 3 4 5 6 7 8 91011\n\nCategories\n\nUnit 4\n\nUnit 7\n\nUnit 10\n\nUnit 13\n\nUnit 16\n\n1 2 3 4 5 6 7 8 91011\n\nCategories\n\nUnit 6\n\nUnit 9\n\nUnit 12\n\nUnit 15\n\nUnit 18\n\n1 2 3 4 5 6 7 8 91011\n\nCategories\n\nFigure 4: Categories (left) and activation probabilities of functional units (E[\u03c6k,l]) (right) estimated\nby the algorithm from all 8 subjects in the study.\n\ns\nt\nc\ne\nj\nb\nu\nS\n8\n\n1\n\np\nu\no\nr\nG\n\nUnit 2\n\nUnit 5\n\nUnit 6\n\n NBC\nBAC\n\n0510\n \n2 4 6\n\nCA\n\nGroup 1\n\nGroup 2\n\nVoxels Stimuli\n\nVoxels Stimuli\n\nNMI\n\nGroup 1\n\nGroup 2\n\nVoxels Stimuli\n\nVoxels Stimuli\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\nFigure 5: (Left) Spatial maps of functional unit overlap across subjects in the normalized space. For\neach voxel, we show the fraction of subjects in the group for which the voxel was assigned to the\ncorresponding functional unit. We see that functional units with similar pro\ufb01les between the two\ndatasets show similar spatial extent as well. (Right) Comparison between the clustering robustness\nin the results of our algorithm (NBC) and the best results of Block Average Co-clustering (BAC) on\nthe real data.\n\nNeurological Institute coordinate space using af\ufb01ne registration [26]. Fig. 5 (left) shows the average\nmaps across subjects for units 2, 5, and 6 in the normalized space. Despite the relative sparsity of\nthe maps, they have signi\ufb01cant overlap across subjects.\n\nAs with many other real world applications of clustering, the validation of results is challenging\nin the absence of ground truth. In order to assess the reliability of the results, we examine their\nconsistency across subjects. We split the 8 subjects into two groups of 4 and perform the analysis\non the two group data separately. Fig. 6 (left) shows the categories found for one of the two groups\n(group 1), which show good agreement with the categories found in the data from all subjects (cat-\negories are indexed based on the result of graph matching). As a way to quantify the stability of\nclustering across subjects, we compute the measures CA and NMI for the results in the two groups\n\n7\n\n\fCategories\n\nCategories\n\n1:\n\n2:\n\n3:\n\n4:\n\n5:\n\n6:\n\n7:\n\n1:\n\n2:\n\n3:\n\n4:\n\n5:\n\n6:\n\n7:\n\n8:\n9:\n10:\n11:\nFigure 6: Categories found by our algorithm in group 1 (left) and by BAC in all subjects for (l, k) =\n(14, 14) (right).\n\n9:\n\n8:\n\nrelative to the results in the 8 subjects. We also apply the BAC algorithm to response values esti-\nmated via least squares regression in all 8 subjects and the two groups. Since the number of units\nand categories is not known a priori, we perform the BAC algorithm for all pairs of (l, k) such that\n5 \u2264 l \u2264 15 and k \u2208 {10, 12, 14, 16, 18, 20}. Fig. 5 (right) compares the clustering measures for\nour method with those found by the best BAC results in terms of average CA and NMI measures\n(achieved with (l, k) = (6, 14) for CA, and (l, k) = (14, 14) for NMI). Fig. 6 (right) shows the\ncategories for (l, k) = (14, 14), which appear to lack some of the structures found in our results.\nWe also obtain better measures of stability compared to the best BAC results for clustering stimuli,\nwhile the measures are similar for clustering voxels. We note that in contrast to the results of BAC,\nour \ufb01rst unit is always considerably larger than all the others including about 70% of voxels. This\nseems neuroscienti\ufb01cally plausible since we expect large areas of the visual cortex to be involved in\nprocessing low level features and therefore incapable of distinguishing different objects.\n\n4 Conclusion\n\nThis paper proposes a model for learning large-scale functional structures in the brain responses of\na group of subjects. We assume that the structure can be summarized in terms of functional units\nwith similar responses to categories of stimuli. We derive a variational Bayesian inference scheme\nfor our hierarchical nonparametric Bayesian model and apply it to both synthetic and real data. In\nan fMRI study of visual object recognition, our method \ufb01nds meaningful structures in both object\ncategories and functional units.\n\nThis work is a step toward devising models for functional brain imaging data that explicitly en-\ncode our hypotheses about the structure in the brain functional organization. The assumption that\nfunctional units, categories, and their interactions are suf\ufb01cient to describe the structure, although\nproved successful here, may be too restrictive in general. A more detailed characterization may\nbe achieved through a feature-based representation where a stimulus can simultaneously be part of\nseveral categories (features). Likewise, a more careful treatment of the structure in the organization\nof brain areas may require incorporating spatial information. In this paper, we show that we can turn\nsuch basic insights into principled models that allow us to investigate the structures of interest in\na data-driven fashion. By incorporating the properties of brain imaging signals into the model, we\nbetter utilize the data for making relevant inferences across subjects.\n\n8\n\n\fAcknowledgments\nWe thank Ed Vul, Po-Jang Hsieh, and Nancy Kanwisher for the insight they have offered us throughout our\ncollaboration, and also for providing the fMRI data. This research was supported in part by the NSF grants\nIIS/CRCNS 0904625, CAREER 0642971, the MIT McGovern Institute Neurotechnology Program grant, and\nNIH grants NIBIB NAMIC U54-EB005149 and NCRR NAC P41-RR13218.\n\nReferences\n[1] N. Kriegeskorte, M. Mur, D.A. Ruff, R. Kiani, J. Bodurka, H. Esteky, K. Tanaka, and P.A. Bandettini.\nMatching categorical object representations in inferior temporal cortex of man and monkey. Neuron,\n60(6):1126\u20131141, 2008.\n\n[2] B. Thirion and O. Faugeras. Feature characterization in fMRI data: the Information Bottleneck approach.\n\nMedIA, 8(4):403\u2013419, 2004.\n\n[3] D. Lashkari and P. Golland. Exploratory fMRI analysis without spatial normalization. In IPMI, 2009.\n[4] D. Lashkari, E. Vul, N. Kanwisher, and P. Golland. Discovering structure in the space of fMRI selectivity\n\npro\ufb01les. NeuroImage, 50(3):1085\u20131098, 2010.\n\n[5] D. Lashkari, R. Sridharan, E. Vul, P.J. Hsieh, N. Kanwisher, and P. Golland. Nonparametric hierarchical\n\nBayesian model for functional brain parcellation. In MMBIA, 2010.\n\n[6] A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D.S. Modha. A generalized maximum entropy approach\n\nto bregman co-clustering and matrix approximation. JMLR, 8:1919\u20131986, 2007.\n\n[7] S. Makni, P. Ciuciu, J. Idier, and J.-B. Poline. Joint detection-estimation of brain activity in functional\n\nMRI: a multichannel deconvolution solution. TSP, 53(9):3488\u20133502, 2005.\n[8] Y. Cheng and G.M. Church. Biclustering of expression data. In ISMB, 2000.\n[9] S.C. Madeira and A.L. Oliveira. Biclustering algorithms for biological data analysis: a survey. TCBB,\n\n1(1):24\u201345, 2004.\n\n[10] Y. Kluger, R. Basri, J.T. Chang, and M. Gerstein. Spectral biclustering of microarray data: coclustering\n\ngenes and conditions. Genome Research, 13(4):703\u2013716, 2003.\n\n[11] B. Long, Z.M. Zhang, and P.S. Yu. A probabilistic framework for relational clustering. In ACM SIGKDD,\n\n2007.\n\n[12] D. Lashkari and P. Golland. Coclustering with generative models. CSAIL Technical Report, 2009.\n[13] C. Kemp, J.B. Tenenbaum, T.L. Grif\ufb01ths, T. Yamada, and N. Ueda. Learning systems of concepts with\n\nan in\ufb01nite relational model. In AAAI, 2006.\n\n[14] K.A. Norman, S.M. Polyn, G.J. Detre, and J.V. Haxby. Beyond mind-reading: multi-voxel pattern analysis\n\nof fMRI data. Trends in Cognitive Sciences, 10(9):424\u2013430, 2006.\n\n[15] C.F. Beckmann and S.M. Smith. Probabilistic independent component analysis for functional magnetic\n\nresonance imaging. TMI, 23(2):137\u2013152, 2004.\n\n[16] M.J. McKeown, S. Makeig, G.G. Brown, T.P. Jung, S.S. Kindermann, A.J. Bell, and T.J. Sejnowski.\nAnalysis of fMRI data by blind separation into independent spatial components. Hum Brain Mapp,\n6(3):160\u2013188, 1998.\n\n[17] D. Endres and P. F\u00a8oldi\u00b4ak. Interpreting the neural code with Formal Concept Analysis. In NIPS, 2009.\n[18] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei. Hierarchical dirichlet processes. JASA, 101(476):1566\u2013\n\n1581, 2006.\n\n[19] J. Pitman. Poisson\u2013Dirichlet and GEM invariant distributions for split-and-merge transformations of an\n\ninterval partition. Combinatorics, Prob, Comput, 11(5):501\u2013514, 2002.\n\n[20] KJ Friston, AP Holmes, KJ Worsley, JP Poline, CD Frith, RSJ Frackowiak, et al. Statistical parametric\n\nmaps in functional imaging: a general linear approach. Hum Brain Mapp, 2(4):189\u2013210, 1994.\n\n[21] Y.W. Teh, K. Kurihara, and M. Welling. Collapsed variational inference for HDP. In NIPS, 2008.\n[22] M. Meil\u02d8a and D. Heckerman. An experimental comparison of model-based clustering methods. Machine\n\nLearning, 42(1):9\u201329, 2001.\n\n[23] R.W. Cox and A. Jesmanowicz. Real-time 3D image registration for functional MRI. Magn Reson Med,\n\n42(6):1014\u20131018, 1999.\n\n[24] D.N. Greve and B. Fischl. Accurate and robust brain image alignment using boundary-based registration.\n\nNeuroImage, 48(1):63\u201372, 2009.\n\n[25] N. Kanwisher and G. Yovel. The fusiform face area: a cortical region specialized for the perception of\n\nfaces. R Soc Lond Phil Trans, Series B, 361(1476):2109\u20132128, 2006.\n\n[26] J. Talairach and P. Tournoux. Co-planar Stereotaxic Atlas of the Human Brain. Thieme, New York, 1988.\n\n9\n\n\f", "award": [], "sourceid": 513, "authors": [{"given_name": "Danial", "family_name": "Lashkari", "institution": null}, {"given_name": "Ramesh", "family_name": "Sridharan", "institution": null}, {"given_name": "Polina", "family_name": "Golland", "institution": null}]}