{"title": "Nonparametric sparse hierarchical models describe V1 fMRI responses to natural images", "book": "Advances in Neural Information Processing Systems", "page_first": 1337, "page_last": 1344, "abstract": "We propose a novel hierarchical, nonlinear model that predicts brain activity in area V1 evoked by natural images. In the study reported here brain activity was measured by means of functional magnetic resonance imaging (fMRI), a noninvasive technique that provides an indirect measure of neural activity pooled over a small volume (~ 2mm cube) of brain tissue. Our model, which we call the SpAM V1 model, is based on the reasonable assumption that fMRI measurements reflect the (possibly nonlinearly) pooled, rectified output of a large population of simple and complex cells in V1. It has a hierarchical filtering stage that consists of three layers: model simple cells, model complex cells, and a third layer in which the complex cells are linearly pooled (called \u00e2\u0080\u009cpooled-complex\u00e2\u0080\u009d cells). The pooling stage then obtains the measured fMRI signals as a sparse additive model (SpAM) in which a sparse nonparametric (nonlinear) combination of model complex cell and model pooled-complex cell outputs are summed. Our results show that the SpAM V1 model predicts fMRI responses evoked by natural images better than a benchmark model that only provides linear pooling of model complex cells. Furthermore, the spatial receptive fields, frequency tuning and orientation tuning curves of the SpAM V1 model estimated for each voxel appears to be consistent with the known properties of V1, and with previous analyses of this data set. A visualization procedure applied to the SpAM V1 model shows that most of the nonlinear pooling consists of simple compressive or saturating nonlinearities.", "full_text": "Nonparametric sparse hierarchical models\n\ndescribe V1 fMRI responses to natural images\n\nPradeep Ravikumar, Vincent Q. Vu and Bin Yu\n\nDepartment of Statistics\n\nUniversity of California, Berkeley\n\nBerkeley, CA 94720-3860\n\nThomas Naselaris, Kendrick N. Kay and Jack L. Gallant\n\nDepartment of Psychology\n\nUniversity of California, Berkeley\n\nBerkeley, CA\n\nAbstract\n\nWe propose a novel hierarchical, nonlinear model that predicts brain activity in\narea V1 evoked by natural images. In the study reported here brain activity was\nmeasured by means of functional magnetic resonance imaging (fMRI), a nonin-\nvasive technique that provides an indirect measure of neural activity pooled over\na small volume (\u2248 2mm cube) of brain tissue. Our model, which we call the\nV-SPAM model, is based on the reasonable assumption that fMRI measurements\nre\ufb02ect the (possibly nonlinearly) pooled, recti\ufb01ed output of a large population of\nsimple and complex cells in V1. It has a hierarchical \ufb01ltering stage that consists\nof three layers: model simple cells, model complex cells, and a third layer in\nwhich the complex cells are linearly pooled (called \u201cpooled-complex\u201d cells). The\npooling stage then obtains the measured fMRI signals as a sparse additive model\n(SpAM) in which a sparse nonparametric (nonlinear) combination of model com-\nplex cell and model pooled-complex cell outputs are summed. Our results show\nthat the V-SPAM model predicts fMRI responses evoked by natural images bet-\nter than a benchmark model that only provides linear pooling of model complex\ncells. Furthermore, the spatial receptive \ufb01elds, frequency tuning and orientation\ntuning curves of the V-SPAM model estimated for each voxel appears to be con-\nsistent with the known properties of V1, and with previous analyses of this data\nset. A visualization procedure applied to the V-SPAM model shows that most of\nthe nonlinear pooling consists of simple compressive or saturating nonlinearities.\n\n1 Introduction\n\nAn important step toward understanding the neural basis of vision is to develop computational mod-\nels that describe how complex visual stimuli are mapping onto evoked neuronal responses. This task\nis made challenging in part by the inherent dif\ufb01culty of obtaining neurophysiological recordings\nfrom single neurons in vivo. An alternative approach is to base models on brain activity measured\nby means of functional magnetic resonance imaging (fMRI). fMRI measures changes in blood oxy-\ngenation and \ufb02ow throughout the brain that occur as a consequence of metabolic demands. Although\nthe relationship between measured fMRI activity and the spiking activity of neurons is rather com-\nplex, as a \ufb01rst-order approximation the fMRI signal can be considered to be monotonically related\nto the pooled activity of the underlying neural population.\n\n1\n\n\fIn this paper we consider the task of predicting fMRI brain activity evoked by a series of gray-\nscale natural images. Natural images are a useful stimulus set for ef\ufb01ciently probing the visual\nsystem, because they are likely to evoke response from both early visual areas and from more central,\nhighly nonlinear visual areas. The fMRI scanner provides a three-dimensional image of the brain\nwith a spatial resolution of a few cubic millimeters and fairly low temporal resolution (about 0.5\u20131\nHz). After pre-processing the fMRI signals are represented as a vector of three-dimensional volume\nelements called voxels. Here we restrict our analysis to voxels sampled from visual area V1, the\nprimary visual area in humans.\nThere are two problems that make predicting evoked responses of fMRI voxels dif\ufb01cult. First, fMRI\nsignals are noisy and non-stationary in time. Second, each voxel re\ufb02ects the combined in\ufb02uence of\nhundreds of thousands of neurons [4]. fMRI scans of a single voxel in human V1 likely re\ufb02ect the\nnonlinearly-pooled, recti\ufb01ed outputs of two functionally distinct classes of neurons: simple cells that\nare sensitive to spatial phase, and phase-invariant complex cells [2]. Even if an accurate predictive\nmodel is obtained, there remains the issue of interpretability. It is not suf\ufb01cient to construct a model\nthat provides good predictions but whose function remains opaque (i.e., a black box). In order for\na predictive model to advance our understanding of the brain, the function of any predictive model\nmust be conceptually interpretable.\nIn this paper we propose a new model that aims to overcome some of these problems. Our V-SPAM\nmodel is a hierarchical and sparse nonparametric additive model. It combines a biologically-inspired\nhierarchical \ufb01ltering scheme with a nonlinear (nonparametric) pooling of the outputs from various\nlevels of the hierarchical \ufb01ltering stage. The model is estimated separately for each recorded fMRI\nvoxel using a \ufb01t data set, and then its predictions are evaluated against an entirely separate data set\nreserved for this purpose.\nThe \ufb01ltering component of the model consists of three distinct layers: simple cells, complex cells,\nand linear combinations of the complex cells (here called pooled-complex cells). The fMRI response\nis then modeled as a sparse additive combination of nonlinear (nonparametric) functions of the\ncomplex and pooled-complex cell model outputs. This last step automatically learns the optimal\ncombinatorial output nonlinearity of the hierarchical \ufb01ltering stage, and so permits us to model\nnonlinear V1 responses not captured by the simple and complex cell model components alone [6].\nThe fMRI dataset used in this paper was collected as part of an earlier study by [5]. That study\nalso used a \ufb01ltering model to describe the relationship between natural images and evoked fMRI\nsignals, and used the estimated models in turn to decode (identify) images. However, the earlier\nstudy only provided linear pooling of model complex cell \ufb01lters. Our results show that the V-SPAM\nmodel predicts fMRI responses evoked by natural images better than does the earlier linear pooling\nmodel. Furthermore, the spatial receptive \ufb01elds, frequency tuning and orientation tuning curves of\nthe V-SPAM model estimated for each voxel appear to be consistent with the known properties of\nV1, and with the previous results [5].\n\n2 Background\n\n2.1 Sparse Additive Models\nThe regression task consists of estimating the regression function E(Y |X) for a real-valued response\nY \u2208 R and a predictor-vector X = (X1, . . . , Xp) \u2208 Rp from data {(Xi, Yi), i = 1, . . . n}.\nIn the nonparametric regression model, the response Yi = m(Xi) + \u0001i, where m is a general\nsmooth function. Estimating this function (i.e., smoothing) becomes challenging when the num-\nber of predictors p is large. Even estimating linear models of the form Yi = \u03b2(cid:62)Xi + \u0001i, is\nchallenging in these high-dimensional settings. For linear models however, when the vector \u03b2 is\nsparse, Tibshirani [8] and others have shown that the (cid:96)1 penalized estimator (also called the Lasso),\nj=1 |\u03b2j| can estimate a sparse model and has strong theoret-\n\u02c6\u03b2 = arg min\u03b2\nical properties.\nThe sparse additive model (SpAM) framework of Ravikumar et al [7] extends these sparse linear\nmodels to the nonparametric domain. In additive models, introduced by Hastie and Tibshirani [3],\nj=1 fj(Xj) + \u0001\nHere the functions {fj} are constrained to lie in a class of smooth functions, such as the space of\n\nthe response Y is an additive combination of functions of the predictors, Y = (cid:80)p\n\n(cid:80)\ni(Yi \u2212 \u03b2(cid:62)Xi)2 + \u03bb(cid:80)p\n\n2\n\n\ffunctions with square integrable double derivatives (i.e., the Sobolev space of order two). A sparse\nadditive model then imposes a sparsity constraint on the set J = {j : fj (cid:54)\u2261 0} of functions fj that\nare nonzero.\n\n2.2 Fitting Algorithm for Sparse Additive Models\n\nThe paper [7] proposes a \ufb01tting procedure for sparse additive models that has good statistical prop-\nerties even in the large p small n regime. Their SpAM \ufb01tting algorithm is summarized in Figure 1.\nIt performs a coordinate descent (in the L2(P n) space, with P n the sample distribution). At each\nstep the algorithm performs nonparametric regression of the current residual onto a single predictor,\nand then does a soft threshold.\n\nj\n\nInput: Data (Xi, Yi), regularization parameter \u03bb.\nInitialize fj = f (0)\nIterate until convergence:\n\n, for j = 1, . . . , p.\n\nFor each j = 1, . . . , p:\n\nCompute the residual: Rj = Y \u2212(cid:80)\nj = n\u22121(cid:80)n\n\n\u02c6P 2\nj (i).\n\nk(cid:54)=k fk(Xk);\n\nEstimate the conditional expectation Pj = E[Rj| Xj] by smoothing: \u02c6Pj = SjRj;\nSet s2\nSoft-threshold: fj = [1 \u2212 \u03bb/\u02c6sj]+ \u02c6Pj;\nCenter: fj \u2190 fj \u2212 mean(fj).\n\nOutput: Component functions fj and estimator \u02c6m(Xi) =(cid:80)\n\ni=1\n\nj fj(Xij).\n\nFigure 1: THE SPAM BACKFITTING ALGORITHM\n\n3 A model for pooled neural activity of voxels\n\nOur V-SPAM model combines a biologically-inspired \ufb01ltering scheme and a novel algorithm that\npermits nonlinear pooling of the outputs of the \ufb01ltering stage. The \ufb01ltering stage itself consists of\nthree distinct layers, arranged hierarchically: simple cells, complex cells, and linear combinations\nof the complex cells (here called pooled-complex cells). The output of this \ufb01ltering operation is then\nfed to an algorithm that estimates a nonlinear pooling function that optimizes predictive power.\n\n3.1 Simple Cell Model\n\nThe \ufb01rst stage of the hierarchical \ufb01lter is inspired by simple cells that are known to exist in area V1.\nThe receptive \ufb01elds of V1 simple cells are known to be generally consistent with a Gabor wavelet\nmodel [6]. Most importantly, they are spatially localized, oriented, spatial frequency band-pass and\nphase selective. (see Figure 2.)\n\nFigure 2: Gabor wavelets. Each row shows a family of Gabor wavelets that share a common spatial\nlocation and frequency, but differ in orientation. This is only a small fraction of all of the wavelets\nin the pyramid.\n\n3\n\n\fIn our model the simple cell \ufb01lter bank was implemented as a Gabor wavelet pyramid, as follows.\nLet I denote an image, and d the number of pixels. It can thus be represented as a pixel vector\nin Rd. Denote by (cid:31)j a Gabor wavelet sampled on a grid the size of the image, so that it too can\nbe represented as vector in Rd. Then our simple cell model, for the activation given the image I\nas stimulus, is given by, Xj(I) = [(cid:31)(cid:31)j(cid:44) I(cid:30)]+, where (cid:31)(cid:183) (cid:44)(cid:183) (cid:30)is the Euclidean inner product, and [(cid:183) ]+\nis a non-negative recti\ufb01cation. (See Figure 3.) Correspondingly, Xj(I) = [(cid:31)(cid:29)(cid:31)j(cid:44) I(cid:30)]+ gives the\nactivation of the 180(cid:31)spatial phase counterpart.\n\nimage\n\nGabor wavelet\n\nnon-negative recti\ufb01cation\n\noutput\n\nFigure 3: Simple cell model. The activation of a model simple cell given an image is the inner\nproduct of the image with a Gabor wavelet, followed by a non-negative recti\ufb01cation.\n\n3.2 Complex Cell Model\n\nThe second stage of the hierarchical \ufb01lter is inspired by complex cells that are also known to exist in\narea V1. Complex cells are similar to simple cells, except they are not sensitive to spatial phase. In\nour model the complex cell \ufb01lter bank was implemented by taking the sum of squares of the outputs\nof four simple cells (corresponding to the wavelet pairs that are identical up to phase), followed by\na \ufb01xed output nonlinearity. The activation of the model complex cell given an image I is given by,\n\n(cid:31)\n(cid:31)\n[(cid:31)(cid:31)j(cid:44) I(cid:30)]2 + [(cid:30)(cid:31)(cid:30)\n\n[(cid:31)(cid:31)j(cid:44) I(cid:30)]2\n\n+ + [(cid:31)(cid:29)(cid:31)j(cid:44) I(cid:30)]2\n\nj(cid:44) I(cid:29)]2)(cid:44)\n\nXj(I) = log(1 +\n\n= log(1 +\n\n+ + [(cid:30)(cid:29)(cid:31)(cid:30)\nj(cid:44) I(cid:29)]2\n+ + [(cid:30)(cid:31)(cid:30)\nj(cid:44) I(cid:29)]2\n\n+)\n\n(1)\n\n(2)\n\nwhere (cid:31)j and (cid:31)(cid:30)\nure 4).\n\nj are Gabor wavelets identical up to phase (also called a quadrature pair; see Fig-\n\nimage\n\n+\n\noutput\n\nGabor wavelet quadrature pair\n\nsquaring\n\n\ufb01xed nonlinearity\n\nFigure 4: Complex cell model. The activation of a model complex cell given an image is the sum of\nsquares of the inner products of the image with a quadrature pair of Gabor wavelets followed by a\nnonlinearity. This is equivalently modeled by summing the squares of 4 simple cell model outputs,\nfollowed by a nonlinearity.\n\n3.3 Pooled-complex Cell Model\n\nThe hierarchical \ufb01ltering component of our model also includes a third \ufb01ltering stage, linear pooling\nof complex cells sharing a common spatial location and frequency. This stage has no direct biolog-\nical interpretation in terms of area V1, but has been included to improve representational power of\nthe model: a linear combination of complex cells (the pooled-complex cell), followed by a nonlin-\nearity, cannot be expressed as an additive combination of nonlinear functions of individual complex\ncells. Note that this element might be particularly useful for modeling responses in higher visual\nareas beyond V1.\nIf (cid:123) Xj1 (cid:44) (cid:46)(cid:46)(cid:46)(cid:44) Xjk(cid:125) correspond to complex cells with the same spatial location and frequency, then\nthe corresponding pooled-complex cell (which thus sums over different orientations) is given by,\n\nZj1(cid:46)(cid:46)(cid:46)jk =(cid:28) k\n\nl=1 Xjl. (See Figure 5.)\n\n4\n\n\f+++\n\n+++\n\ncomplex cells\n\nimage\n\n+\n\noutput\n\nFigure 5: Pooled-complex cell model. Subsets of complex cells that share a common spatial location\nand frequency are summed.\n\n3.4 V-SPAM model\n\nFinally, the predicted fMRI response Y is obtained as a sparse additive combination of complex\ncell and pooled-complex cell outputs. Denote the complex cell outputs by (cid:123) X1(cid:44) (cid:46)(cid:46)(cid:46)(cid:44) Xp(cid:125) , and the\npooled-complex cell outputs by (cid:123) Z1(cid:44) (cid:46)(cid:46)(cid:46)(cid:44) Zq(cid:125) . Then the fMRI response Y is modeled as a sparse\nl=1 gl(Zl) + \u03c6. Figure 6 summarizes the\n\nadditive (nonparametric) model, Y =(cid:28) p\n\nentire V-SPAM model, including both \ufb01ltering and pooling components.\n\nj=1 fj(Xj) +(cid:28) q\n\nimage\n\nsimple cell\noutputs\n\ncomplex cell\n\noutputs\n\npooled-complex cell\n\noutputs\n\nnonlinearities\n\n+\n\nfMRI voxel \nresponse\n\nFigure 6: V-SPAM model. The fMRI voxel response is modeled as the summation of nonlinear\nfunctions of complex and pooled-complex cell outputs. The connections and components in the\ndashed region are to be estimated from the data under the assumption that many of them are null.\n\n4 Experiments\n\n4.1 Data description\n\nThe data set analyzed in this paper consists of a total of 1,294 voxels recorded from area V1 of\none human observer. A 4T Varian MRI scanner provided voxels of size 2mm x 2mm x 2.5mm\nat a frequency of 1Hz. The visual stimuli used in the experiment consisted of 1,750 20-by-20\ndegree grayscale natural images, masked by a circular aperture. A two-stage procedure was used for\ndata collection. In the \ufb01rst stage, 1,750 natural images were presented to the subject 2 times each.\nThis data set was used to \ufb01t the model. In the second stage, 120 additional natural images were\npresented 13 times each. This data set was used for model validation. (Note that the images used for\nestimation and validation were distinct.) In all cases images were \ufb02ashed brie\ufb02y 3 times during a 1\nsecond display period, and there was a blank period of 3 seconds between successive images. After\nacquisition the fMRI signals were pre-processed to reduce temporal non-stationarity and increase\nsignal-to-noise [5]. Complete details of the fMRI experiment can be found in [5].\n\n5\n\n\f4.2 V-SPAM model \ufb01tting\n\nThe V-SPAM model was \ufb01tted separately for each of the 1,294 voxels using the training set of\n1,750 images and the evoked fMRI responses. The \ufb01tting procedure can be conceptualized in four\nsuccessive stages that roughly parallel the hierarchical layers of the model itself.\nIn the \ufb01rst stage, the model complex cell outputs are computed according to equation (2) using a\npyramid (or family) of Gabor wavelets sampled on a grid of 128 x 128 pixels. The pyramid includes\n5 spatial frequencies (or scales): 1, 2, 4, 8, 16, and 32 cycles/\ufb01eld of view. At each spatial frequency\n\u03c9 the wavelets are positioned evenly on a \u03c9 \u00d7 \u03c9 grid covering the image. All combinations of 8\norientations and 2 phases occur at each of the \u03c9 \u00d7 \u03c9 positions. In total, the pyramid consists of\n10,920 quadrature pairs plus 1 constant wavelet (corresponding to mean luminance).\nIn the second stage, the model complex cell outputs are pre-screened in order to eliminate complex\ncell outputs that are unrelated to a voxel\u2019s response, and to reduce the computational complexity\nof successive stages of \ufb01tting. This is accomplished by considering the squared-correlation of the\nresponse of each complex cell with the evoked voxel response, using the 1,750 images in the training\nset. Only the top k complex cells are retained. In pilot studies we found empirically that k = 100\nwas enough to give good statistical and computational performance (data not shown).\nIn the third stage, pooled-complex cells (see Section 3) are formed from the complex cell outputs\nthat passed the pre-screening in \ufb01tting stage 2.\nIn the fourth and \ufb01nal stage, the complex and pooled-complex cell responses to the images in the\ntraining set are used as predictors in the SpAM \ufb01tting algorithm (see Figure 1), and this is optimized\nto \ufb01t the voxel responses evoked by the same 1,750 images in the training set. The smoothing is done\nby means of Gaussian kernel regression with plug-in bandwidth, and the regularization parameter is\nselected by the Akaike information criterion (AIC).\n\n4.3 Model validation\n\nestimated complex cell outputs. This model has the form, Y (I) = \u03b20 +(cid:80)p\n\nFor each voxel, we evaluate the \ufb01tted V-SPAM models by computing the predictive R2 (squared\ncorrelation) of the predicted and actual fMRI responses evoked by each of the 120 images in the\nvalidation set.\nTo permit a more complete evaluation of the V-SPAM model, we used the same data to \ufb01t a simpler\nmodel more directly comparable to the one used in earlier work with this data set [5]. The sparse\nlinear pooling model aims to predict each voxel\u2019s response as a linear combination of all 10,921\nj=1 \u03b2jXj(I) + \u0001, where\nthe Xj(I) are the complex cell outputs estimated according to (2), with the p = 10, 921 Gabor\nwavelets described in Section 4.2. The coef\ufb01cients \u03b2j, j = 0, . . . , p, were estimated by L2 Boosting\n[1] with the stopping criterion determined by 5-fold cross-validation within the same data set. This\nmodel is a sparsi\ufb01ed version of the one used in [5], and has comparable prediction performance.\n\n5 Results\n\nFigure 7 (left) shows a scatterplot comparing the performance of the V-SPAM model with that of\nthe sparse linear pooling model for all 1,294 voxels. The vertical axis gives performance of the\nV-SPAM model, and the horizontal axis the sparse linear pooling model. Each point corresponds\nto a single voxel. The inset region contains 429 voxels for which both models had some predictive\npower (R2 \u2265 0.1). For these voxels, the relative improvement of the V-SPAM model over the sparse\nlinear pooling model is shown in the histogram to the right. The predictions of the V-SPAM model\nwere on average 14% better than those of the sparse linear pooling model (standard deviation 17%).\n\n5.1 Estimated receptive \ufb01elds and tuning curves\n\nFigure 8 shows the spatial receptive-\ufb01elds (RF\u2019s) and joint frequency and orientation tuning curves\nestimated using the V-SPAM model for 3 voxels. These voxels were chosen because they had high\npredictive power (R2\u2019s of 0.65, 0.59, and 0.63, respectively from left to right) and so were modeled\naccurately. The upper row of the \ufb01gure shows the spatial RF of each voxel. The intensity at each\n\n6\n\n\fFigure 7: Predictive R2 of the \ufb01tted V-SPAM model compared against the \ufb01tted sparse linear pooling\nmodel. (Left) Each of the 1,294 points in the scatterplot corresponds to a single voxel. (Right)\nRelative performance for the 429 voxels contained in the inset region on the left.\n\nlocation in the spatial RF represents the standardized predicted response of the voxel to an image\nstimulus consisting of a single pixel at that location. The spatial RF\u2019s of these voxels are clearly\nlocalized in space, consistent with the known retinotopic organization of V1 and previous fMRI\nresults [9]. The lower row of Figure 8 shows the joint frequency and orientation tuning properties\nof these same 3 voxels. Here the tuning curves were estimated by computing the predicted response\nof the \ufb01tted voxel model to cosine gratings of varying orientation (degrees) and spatial frequency\n(cycles/\ufb01eld of view). All of the voxels are tuned to spatial frequencies above about 8 cycles/\ufb01eld of\nview, while orientation tuning varies from voxel to voxel. The joint spatial frequency and orientation\ntuning of all 3 voxels appears to be non-separable (i.e.\ntheir orientation tuning is not a constant\nfunction of frequency).\n\nFigure 8: (upper) Spatial receptive-\ufb01elds (RF\u2019s) and (lower) joint frequency and orientation tuning\ncurves estimated by the V-SPAM model for 3 voxels with high predictive power (R2\u2019s of 0.65, 0.59,\n0.63, left to right). Each location in the spatial RF shows the standardized predicted response of\nthe voxel to an image consisting of a single pixel at that location. The tuning curves show the\nstandardized predicted response of the voxel to cosine gratings of varying orientation (degrees) and\nspatial frequency (cycles/\ufb01eld of view).\n\n5.2 Nonlinearities\n\nOne of the potential advantages of the V-SPAM model over other approaches is that it can reveal\nnovel nonlinear tuning and pooling properties, as revealed by the nonlinear summation occurring\nin the \ufb01nal stage of the V-SPAM model. Figure 9 illustrates some of these functions estimated for\na typical voxel with high predictive power (R2 of 0.63). These correspond to the nonlinearities\nappearing in the \ufb01nal stage of the V-SPAM model (see Figure 6). Here the horizontal axis is the\ninput in standard units of the corresponding model complex or pooled-complex cell outputs, and\nthe vertical axis is the output in standard units of predicted responses. For this voxel, these are the\n\n7\n\nllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.00.10.20.30.40.50.60.00.10.20.30.40.50.60.7sparse linear pooling modelSpAM V1 Model(mean = 14, SD = 17, median = 12, IQR = 17)relative improvement (%)Frequency\u2212200204060801000501001500510150510051015orientfreq0816243204590135180\u22122\u22121012orientfreq0816243204590135180\u22121012orientfreq0816243204590135180\u22122\u22121012\f4 largest (ranked by L2 norm) nonlinearities. All 4 of these nonlinearities are compressive. The\nremaining 75 nonlinearities present in the voxel\u2019s \ufb01tted model have similar shapes, but are much\nsmaller and hence contribute less to the predicted response. They are overlaid in the \ufb01nal panel of\nFigure 9.\n\nFigure 9: Nonlinearities estimated in the V-SPAM model for a voxel with high predictive power\n(R2: 0.63). The 4 largest (ranked by L2 norm) are shown left to right by the thick lines. The other\n75 nonlinearities for this voxel (overlaid in the right panel) are smaller and contribute less to the\npredicted response.\n\n6 Discussion and conclusions\n\nOur V-SPAM model provides better predictions of fMRI activity evoked by natural images than\ndoes a sparse linear model similar to that used in an earlier study of this data set [5]. This increased\npredictive power of the V-SPAM model re\ufb02ects the fact that it can describe explicitly the nonlinear\npooling that likely occurs among the many neurons whose pooled activity contributes to measured\nfMRI signals. These pooled output nonlinearities are likely a critical component of nonlinear com-\nputation across the visual hierarchy. Therefore, the SpAM framework may be particularly useful for\nmodeling neurons or fMRI signals recorded in higher and more nonlinear stages of visual processing\nbeyond V1.\n\nReferences\n[1] Peter B\u00a8uhlmann and Bin Yu. Boosting with the l2 loss: Regression and classi\ufb01cation. Journal of the\n\nAmerican Statistical Association, 98(462):324\u2013339, 2003.\n\n[2] R.L. De Valois and K. K. De Valois. Spatial Vision. Oxford University Press, 1990.\n[3] Trevor Hastie and Robert Tibshirani. Generalized additive models. Chapman & Hall Ltd., 1999.\n[4] D. J. Heeger, A. C. Huk, W. S. Geisler, and D. G. Albrecht. Spikes versus bold: what does neuroimaging\n\ntell us about neuronal activity? Nat Neurosci, 3(7):631\u2013633, 2000.\n\n[5] Kendrick N. Kay, Thomas Naselaris, Ryan J. Prenger, and Jack L. Gallant. Identifying natural images from\n\nhuman brain activity. Nature, 452(7185):352\u2013355, 2008.\n\n[6] Bruno A. Olshausen and David J. Field. Emergence of simple-cell receptive \ufb01eld properties by learning a\n\nsparse code for natural images. Nature, 381(6583):607\u2013609, June 1996.\n\n[7] Pradeep Ravikumar, Han Liu, John Lafferty, and Larry Wasserman. Spam: Sparse additive models. Neural\n\nInformation Processing Systems, 2007.\n\n[8] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., 58, No. 1:267\u2013288,\n\n1996.\n\n[9] Brian A. Wandell, Serge O. Dumoulin, and Alyssa A. Brewer. Visual \ufb01eld maps in human cortex. Neuron,\n\n56(2):366\u2013383, 2007.\n\n8\n\n\u22121012\u22120.2\u22120.10.00.10.2inputoutput\u221210123\u22120.2\u22120.10.00.10.2inputoutput\u22121012\u22120.2\u22120.10.00.10.2inputoutput\u221210123\u22120.2\u22120.10.00.10.2inputoutput\f", "award": [], "sourceid": 963, "authors": [{"given_name": "Vincent", "family_name": "Vu", "institution": null}, {"given_name": "Bin", "family_name": "Yu", "institution": null}, {"given_name": "Thomas", "family_name": "Naselaris", "institution": null}, {"given_name": "Kendrick", "family_name": "Kay", "institution": null}, {"given_name": "Jack", "family_name": "Gallant", "institution": null}, {"given_name": "Pradeep", "family_name": "Ravikumar", "institution": null}]}