{"title": "Adaptive stimulus selection for optimizing neural population responses", "book": "Advances in Neural Information Processing Systems", "page_first": 1396, "page_last": 1406, "abstract": "Adaptive stimulus selection methods in neuroscience have primarily focused on maximizing the firing rate of a single recorded neuron. When recording from a population of neurons, it is usually not possible to find a single stimulus that maximizes the firing rates of all neurons. This motivates optimizing an objective function that takes into account the responses of all recorded neurons together. We propose \u201cAdept,\u201d an adaptive stimulus selection method that can optimize population objective functions. In simulations, we first confirmed that population objective functions elicited more diverse stimulus responses than single-neuron objective functions. Then, we tested Adept in a closed-loop electrophysiological experiment in which population activity was recorded from macaque V4, a cortical area known for mid-level visual processing. To predict neural responses, we used the outputs of a deep convolutional neural network model as feature embeddings. Images chosen by Adept elicited mean neural responses that were 20% larger than those for randomly-chosen natural images, and also evoked a larger diversity of neural responses. Such adaptive stimulus selection methods can facilitate experiments that involve neurons far from the sensory periphery, for which it is often unclear which stimuli to present.", "full_text": "Adaptive stimulus selection for optimizing neural\n\npopulation responses\n\nBenjamin R. Cowley1,2, Ryan C. Williamson1,2,5, Katerina Acar2,6,\n\nMatthew A. Smith\u2217,2,7, Byron M. Yu\u2217,2,3,4\n\n1Machine Learning Dept., 2Center for Neural Basis of Cognition, 3Dept. of Electrical\n\n\u2217denotes equal contribution.\n\nand Computer Engineering, 4Dept. of Biomedical Engineering, Carnegie Mellon University\n\n5School of Medicine, 6Dept. of Neuroscience, 7Dept. of Ophthalmology, University of Pittsburgh\n\nbcowley@cs.cmu.edu, {rcw30, kac216, smithma}@pitt.edu, byronyu@cmu.edu\n\nAbstract\n\nAdaptive stimulus selection methods in neuroscience have primarily focused on\nmaximizing the \ufb01ring rate of a single recorded neuron. When recording from\na population of neurons, it is usually not possible to \ufb01nd a single stimulus that\nmaximizes the \ufb01ring rates of all neurons. This motivates optimizing an objective\nfunction that takes into account the responses of all recorded neurons together.\nWe propose \u201cAdept,\u201d an adaptive stimulus selection method that can optimize\npopulation objective functions. In simulations, we \ufb01rst con\ufb01rmed that population\nobjective functions elicited more diverse stimulus responses than single-neuron\nobjective functions. Then, we tested Adept in a closed-loop electrophysiological\nexperiment in which population activity was recorded from macaque V4, a cortical\narea known for mid-level visual processing. To predict neural responses, we used\nthe outputs of a deep convolutional neural network model as feature embeddings.\nNatural images chosen by Adept elicited mean neural responses that were 20%\nlarger than those for randomly-chosen natural images, and also evoked a larger di-\nversity of neural responses. Such adaptive stimulus selection methods can facilitate\nexperiments that involve neurons far from the sensory periphery, for which it is\noften unclear which stimuli to present.\n\n1\n\nIntroduction\n\nA key choice in a neurophysiological experiment is to determine which stimuli to present. Often, it\nis unknown a priori which stimuli will drive a to-be-recorded neuron, especially in brain areas far\nfrom the sensory periphery. Most studies either choose from a class of parameterized stimuli (e.g.,\nsinusoidal gratings or pure tones) or present many randomized stimuli (e.g., white noise) to \ufb01nd the\nstimulus that maximizes the response of a neuron (i.e., the preferred stimulus) [1, 2]. However, the\n\ufb01rst approach limits the range of stimuli explored, and the second approach may not converge in a\n\ufb01nite amount of recording time [3]. To ef\ufb01ciently \ufb01nd a preferred stimulus, studies have employed\nadaptive stimulus selection (also known as \u201cadaptive sampling\u201d or \u201coptimal experimental design\u201d)\nto determine the next stimulus to show given the responses to previous stimuli in a closed-loop\nexperiment [4, 5]. Many adaptive methods have been developed to \ufb01nd the smallest number of\nstimuli needed to \ufb01t parameters of a model that predicts the recorded neuron\u2019s activity from the\nstimulus [6, 7, 8, 9, 10, 11]. When no encoding model exists for a neuron (e.g., neurons in higher\nvisual cortical areas), adaptive methods rely on maximizing the neuron\u2019s \ufb01ring rate via genetic\nalgorithms [12, 13, 14] or gradient ascent [15, 16] to home in on the neuron\u2019s preferred stimulus. To\nour knowledge, all current adaptive stimulus selection methods focus solely on optimizing the \ufb01ring\nrate of a single neuron.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Responses\nof two macaque V4\nneurons. A. Differ-\nent neurons prefer dif-\nferent stimuli. Dis-\nplayed images evoked\n5 of top 25 largest\nB. Im-\nresponses.\nages placed accord-\ning to their responses.\nGray dots represent\nresponses to other im-\nages. Same neurons\nas in A.\n\nDevelopments in neural recording technologies now enable the simultaneous recordings of tens to\nhundreds of neurons [17], each of which has its own preferred stimulus. For example, consider two\nneurons recorded in V4, a mid-level visual cortical area (Fig. 1A). Whereas neuron 1 responds most\nstrongly to teddy bears, neuron 2 responds most strongly to arranged circular fruit. Both neurons\nmoderately respond to images of animals (Fig. 1B). Given that different neurons have different\npreferred stimuli, how do we select which stimuli to present when simultaneously recording from\nmultiple neurons? This necessitates de\ufb01ning objective functions for adaptive stimulus selection that\nare based on a population of neurons rather than any single neuron. Importantly, these objective\nfunctions can go beyond simply maximizing the \ufb01ring rates of neurons and instead can be optimized\nfor other attributes of the population response, such as maximizing the scatter of the responses in a\nmulti-neuronal response space (Fig. 1B).\nWe propose Adept, an adaptive stimulus selection method that \u201cadeptly\u201d chooses the next stimulus to\nshow based on a population objective function. Because the neural responses to candidate stimuli\nare unknown, Adept utilizes feature embeddings of the stimuli to predict to-be-recorded responses.\nIn this work, we use the feature embeddings of a deep convolutional neural network (CNN) for\nprediction. We \ufb01rst con\ufb01rmed with simulations that Adept, using a population objective function,\nelicited larger mean responses and a larger diversity of responses than optimizing the response of\neach neuron separately. Then, we ran Adept on V4 population activity recorded during a closed-loop\nelectrophysiological experiment. Images chosen by Adept elicited higher mean \ufb01ring rates and more\ndiverse population responses compared to randomly-chosen images. This demonstrates that Adept\nis effective at \ufb01nding stimuli to drive a population of neurons in brain areas far from the sensory\nperiphery.\n\n2 Population objective functions\n\nDepending on the desired outcomes of an experiment, one may favor one objective function over\nanother. Here we discuss different objection functions for adaptive stimulus selection and the resulting\nresponses r \u2208 Rp, where the ith element ri is the response of the ith neuron (i = 1, . . . , p) and p is the\nnumber of neurons recorded simultaneously. To illustrate the effects of different objective functions,\nwe ran an adaptive stimulus selection method on the activity of two simulated neurons (see details\nin Section 5.1). We \ufb01rst consider a single-neuron objective function employed by many adaptive\nmethods [12, 13, 14]. Using this objective function f (r) = ri, which maximizes the response for the\nith neuron of the population, the adaptive method for i = 1 chose stimuli that maximized neuron 1\u2019s\nresponse (Fig. 2A, red dots). However, images that produced large responses for neuron 2 were not\nchosen (Fig. 2A, top left gray dots).\nA natural population-level extension to this objective function is to maximize the responses of all\nneurons by de\ufb01ning the objective function to be f (r) = (cid:107)r(cid:107)2. This objective function led to choosing\nstimuli that maximized responses for neurons 1 and 2 individually, as well as large responses for\nboth neurons together (Fig. 2B). Another possible objective function is to maximize the scatter of the\nresponses. In particular, we would like to choose the next stimulus such that the response vector r is\nfar away from the previously-seen response vectors r1, . . . , rM after M chosen stimuli. One way to\nachieve this is to maximize the average Euclidean distance between r and r1, . . . , rM , which leads\n\n2\n\nV4 neuron 1spikes/sec010001500sorted image indicesV4 neuron 2010001500sorted image indicesspikes/secV4 neuron 2 (spikes/sec)01000100V4 neuron 1 (spikes/sec)AB\fFigure 2: Different objective functions for adaptive stimulus selection yield different observed\npopulation responses (red dots). Blue * denote responses to stimuli used to initialize the adaptive\nmethod (the same for each panel).\n\n(cid:80)M\nto the objective function f (r, r1, . . . , rM ) = 1\nj=1 (cid:107)r \u2212 rj(cid:107)2. This objective function led to a\nM\nlarge scatter in responses for neurons 1 and 2 (Fig. 2C, red dots near and far from origin). This is\nbecause choosing stimuli that yield small and large responses produces the largest distances between\nresponses.\n(cid:80)M\nFinally, we considered an objective function that favored large responses that are far away from\none another. To achieve this, we summed the objectives in Fig. 2B and 2C. The objective function\nf (r, r1, . . . , rM ) = (cid:107)r(cid:107)2 + 1\nj=1 (cid:107)r\u2212 rj(cid:107)2 was able to uncover large responses for both neurons\n(Fig. 2D, red dots far from origin). It also led to a larger scatter than maximizing the norm of r\nalone (e.g., compare red dots in bottom right of Fig. 2B and Fig. 2D). For these reasons, we use this\nobjection function in the remainder of this work. However, the Adept framework is general and can\nbe used with many different objective functions, including all presented in this section.\n\nM\n\n3 Using feature embeddings to predict norms and distances\n\nWe now formulate the optimization problem using the last objective function in Section 2. Consider\na pool of N candidate stimuli s1, . . . , sN . After showing (t \u2212 1) stimuli, we are given previously-\nrecorded response vectors rn1 , . . . , rnt\u22121 \u2208 Rp, where n1, . . . , nt\u22121 \u2208 {1, . . . , N}. In other words,\nrnj is the vector of responses to the stimulus snj . At the tth iteration of adaptive stimulus selection,\nwe choose the index nt of the next stimulus to show by the following:\n\nt\u22121(cid:88)\n\nj=1\n\nnt =\n\ns\u2208{1,...,N}\\{n1,...,nt\u22121}(cid:107)rs(cid:107)2 +\n\narg max\n\n1\nt \u2212 1\n\n(cid:107)rs \u2212 rnj(cid:107)2\n\n(1)\n\nwhere rs is the unseen population response vector to stimulus ss.\nIf the rs were known, we could directly optimize Eqn. 1. However, in an online setting, we do not\nhave access to the rs. Instead, we can directly predict the norm and average distance terms in Eqn. 1\nby relating distances in neural response space to distances in a feature embedding space. The key\nidea is that if two stimuli have similar feature embeddings, then the corresponding neural responses\nwill have similar norms and average distances. Concretely, consider feature embedding vectors\nx1, . . . , xN \u2208 Rq corresponding to candidate stimuli s1, . . . , sN . For example, we can use the\nactivity of q neurons from a CNN as a feature embedding vector for natural images [18]. To predict\nthe norm of unseen response vector rs \u2208 Rp, we use kernel regression with the previously-recorded\nresponse vectors rn1 , . . . , rnt\u22121 as training data [19]. To predict the distance between rs and a\npreviously-recorded response vector rnj , we extend kernel regression to account for the paired nature\nof distances. Thus, the norm and average distance in Eqn. 1 for the unseen response vector rs to the\nsth candidate stimulus are predicted by the following:\n(cid:86)\n\n(cid:80)\nK(xs, xnk )\n(cid:96) K(xs, xn(cid:96))(cid:107)rnk \u2212 rnj(cid:107)2\n(2)\nwhere k, (cid:96) \u2208 {1, . . . , t \u2212 1}. Here we use the radial basis function kernel K(xj, xk) = exp(\u2212(cid:107)xj \u2212\nxk(cid:107)2\nWe tested the performance of this approach versus three other possible prediction approaches. The\n\ufb01rst two approaches use linear ridge regression and kernel regression, respectively, to predict rs. Their\n\n(cid:80)\nK(xs, xnk )\n(cid:96) K(xs, xn(cid:96))(cid:107)rnk(cid:107)2,\n\n2/h2) with kernel bandwidth h, although other kernels can be used.\n\n(cid:107)rs(cid:107)2\n\n=\n\n(cid:107)rs \u2212 rnj(cid:107)2\n\n=\n\n(cid:88)\n\n(cid:88)\n\nk\n\n(cid:86)\n\nk\n\n3\n\nBCADunseen responsesmaxr\uf731max1MMj=1r\u2212rj2+r2max1MMj=1r\u2212rj2maxr2responses to chosen stimuli080080 neuron 2\u2019s activityneuron 1\u2019s activity080080neuron 1\u2019s activity080080neuron 1\u2019s activity080080neuron 1\u2019s activity\fprediction \u02c6rs is then used to evaluate the objective in place of rs. The third approach is a linear ridge\nregression version of Eqn. 2 to directly predict (cid:107)rs(cid:107)2 and (cid:107)rs \u2212 rnj(cid:107)2. To compare the performance\nof these approaches, we developed a testbed in which we sampled two distinct populations of neurons\nfrom the same CNN, and asked how well one population can predict the responses of the other\npopulation using the different approaches described above. Formally, we let x1, . . . , xN be feature\nembedding vectors of q = 500 CNN neurons, and response vectors rn1 , . . . , rn800 be the responses of\np = 200 different CNN neurons to 800 natural images. CNN neurons were from the same GoogLeNet\nCNN [18] (see CNN details in Results). To compute performance, we took the Pearson\u2019s correlation \u03c1\nbetween the predicted and actual objective values on a held out set of responses not used for training.\nWe also tracked the computation time \u03c4 (computed on an Intel Xeon 2.3GHz CPU with 36GB RAM)\nbecause these computations need to occur between stimulus presentations in an electrophysiological\nexperiment. The approach in Eqn. 2 performed the best (\u03c1 = 0.64) and was the fastest (\u03c4 = 0.2 s)\ncompared to the other prediction approaches (\u03c1 = 0.39, 0.41, 0.23 and \u03c4 = 12.9 s, 1.5 s, 48.4 s,\nfor the three other approaches, respectively). The remarkably faster speed of Eqn. 2 over other\napproaches comes from the evaluation of the objective function (fast matrix operations), the fact\nthat no training of linear regression weight vectors is needed, and the fact that distances are directly\npredicted (unlike the approaches that \ufb01rst predict \u02c6rs and then must re-compute distances between \u02c6rs\nand rn1 , . . . , rnt\u22121 for each candidate stimulus s). Due to its performance and fast computation time,\nwe use the prediction approach in Eqn. 2 for the remainder of this work.\n\n4 Adept algorithm\n\nWe now combine the optimization problem in Eqn. 1 and prediction approach in Eqn. 2 to formulate\nthe Adept algorithm. We \ufb01rst discuss the adaptive stimulus selection paradigm (Fig. 3, left) and then\nthe Adept algorithm (Fig. 3, right).\nFor the adaptive stimulus selection paradigm (Fig. 3, left), the experimenter \ufb01rst selects a candidate\nstimulus pool s1, . . . , sN from which Adept chooses, where N is large. For a vision experiment,\nthe candidate stimulus pool could comprise natural images, textures, or sinusoidal gratings. For\nan auditory experiment, the stimulus pool could comprise natural sounds or pure tones. Next,\nfeature embedding vectors x1, . . . , xN \u2208 Rq are computed for each candidate stimulus, and the\npre-computed N \u00d7 N kernel matrix K(xj, xk) (i.e., similarity matrix) is input into Adept. For\nvisual neurons, the feature embeddings could come from a bank of Gabor-like \ufb01lters with different\norientations and spatial frequencies [20], or from a more expressive model, such as CNN neurons in\na middle layer of a pre-trained CNN. Because Adept only takes as input the kernel matrix K(xj, xk)\nand not the feature embeddings x1, . . . , xN , one could alternatively use a similarity matrix computed\nfrom psychophysical data to de\ufb01ne the similarity between stimuli if no model exists. The previously-\nrecorded response vectors rn1, . . . , rnt\u22121 are also input into Adept, which then outputs the next\nchosen stimulus snt to show. While the observer views snt, the response vector rnt is recorded and\nappended to the previously-recorded response vectors. This procedure is iteratively repeated until the\nend of the recording session. To show as many stimuli as possible, Adept does not choose the same\nstimulus more than once.\nFor the Adept algorithm (Fig. 3, right), we initialize by randomly choosing a small number of stimuli\n(e.g., Ninit = 5) from the large pool of N candidate stimuli and presenting them to the observer.\nUsing the responses to these stimuli R(:, 1:Ninit), Adept then adaptively chooses a new stimulus\nby \ufb01nding the candidate stimulus that yields the largest objective (in this case, using the objective\nde\ufb01ned by Eqns. 1 and 2). This search is carried out by evaluating the objective for every candidate\nstimulus. There are three primary reasons why Adept is computationally fast enough to consider all\ncandidate stimuli. First, the kernel matrix KX is pre-computed, which is then easily indexed. Second,\nthe prediction of the norm and average distance is computed with fast matrix operations. Third, Adept\nupdates the distance matrix DR, which contains the pairwise distances between recorded response\nvectors, instead of re-computing DR at each iteration.\n\n5 Results\n\nWe tested Adept in two settings. First, we tested Adept on a surrogate for the brain\u2014a pre-trained\nCNN. This allowed us to perform comparisons between methods with a noiseless system. Second, in\na closed-loop electrophysiological experiment, we performed Adept on population activity recorded\nin macaque V4. In both settings, we used the same candidate image pool of N \u2248 10,000 natural\n\n4\n\n\fAlgorithm 1: Adept algorithm\nInput: N candidate stimuli, feature embeddings X(q \u00d7 N ),\nInitialization:\n\nkernel bandwidth h (hyperparameter)\n\nKX(j, k) = exp(\u2212(cid:107)X(:, j) \u2212 X(:, k)(cid:107)2\n2/h2) for all j, k\nR(:, 1:Ninit) \u2190 responses to Ninit initial stimuli\nDR(j, k) = (cid:107)R(:, j) \u2212 R(:, k)(cid:107)2 for j, k = 1, . . . , Ninit\nind_obs \u2190 indices of Ninit observed stimuli\n\nOnline algorithm:\nfor tth stimulus to show do\n\nfor sth candidate stimulus do\n\nkX = KX(ind_obs, s)/(cid:80)\n(cid:80)\n\n(cid:96)\u2208ind_obs KX((cid:96), s)\n% predict norm from recorded responses\nT diag(\u221aRT R)\nnorms(s) \u2190 (cid:107)rs(cid:107)2\n% predict average distance from recorded responses\navgdists(s) \u2190 1\nt\u22121\n\n(cid:96) (cid:107)rs \u2212 rn(cid:96)(cid:107)2\n\n(cid:86)\n\n= kX\n\n(cid:86)\n\n= mean(kX\n\nT DR)\n\nend\nind_obs(Ninit + t) \u2190 argmax(norms + avgdists)\nR(:, Ninit + t) \u2190 recorded responses to chosen stimulus\nupdate DR with (cid:107)R(:, Ninit + t) \u2212 R(:, (cid:96))(cid:107)2 for all (cid:96)\n\nend\n\nFigure 3: Flowchart of the adaptive sampling paradigm (left) and the Adept algorithm (right).\n\nimages from the McGill natural image dataset [21] and Google image search [22]. For the predictive\nfeature embeddings in both settings, we used responses from a pre-trained CNN different from the\nCNN used as a surrogate for the brain in the \ufb01rst setting. The motivation to use CNNs was inspired\nby the recent successes of CNNs to predict neural activity in V4 [23].\n\n5.1 Testing Adept on CNN neurons\n\nThe testbed for Adept involved two different CNNs. One CNN is the surrogate for the brain. For\nthis CNN, we took responses of p = 200 neurons in a middle layer of the pre-trained ResNet CNN\n[24] (layer 25 of 50, named \u2018res3dx\u2019). A second CNN is used for feature embeddings to predict\nresponses of the \ufb01rst CNN. For this CNN, we took responses of q = 750 neurons in a middle layer of\nthe pre-trained GoogLeNet CNN [18] (layer 5 of 10, named \u2018icp4_out\u2019). Both CNNs were trained for\nimage classi\ufb01cation but had substantially different architectures. Pre-trained CNNs were downloaded\nfrom MatConvNet [25], with the PVT version of GoogLeNet [26]. We ran Adept for 2,000 out of\nthe 10,000 candidate images (with Ninit = 5 and kernel bandwidth h = 200\u2014similar results were\nobtained for different h), and compared the CNN responses to those of 2,000 randomly-chosen\nimages. We asked two questions pertaining to the two terms in the objective function in Eqn. 1. First,\nare responses larger for Adept than for randomly-chosen images? Second, to what extent does Adept\nproduce larger scatter of responses than if we had chosen images at random? A larger scatter implies\na greater diversity in evoked population responses (Fig. 1B).\nTo address the \ufb01rst question, we computed the mean response across all 2,000 images for each CNN\nneuron. The mean responses using Adept were on average 15.5% larger than the mean responses to\nrandomly chosen images (Fig. 4A, difference in means was signi\ufb01cantly greater than zero, p < 10\u22124).\nFor the second question, we assessed the amount of response scatter by computing the amount of\nvariance captured by each dimension. We applied PCA separately to the responses to images chosen\nby Adept and those to images selected randomly. For each dimension, we computed the ratio between\nthe Adept eigenvalue divided by the randomly-chosen-image eigenvalue. In this way, we compared\nthe dimensions of greatest variance, followed by the dimensions of the second-most variance, and so\non. Ratios above 1 indicate that Adept explored a dimension more than the corresponding ordered\ndimension of random selection. We found that Adept produced larger response scatter compared to\nrandomly-chosen images for many dimensions (Fig. 4B). Ratios for dimensions of lesser variance\n(e.g., dimensions 10 to 75) are nearly as meaningful as those of the dimensions of greatest variance\n\n5\n\nobserver (e.g., monkey) recordedresponsesresponse chosenstimuluss1,...,sNK(xj,xk)model (e.g., CNN)compute similaritycandidate stimulus poolx1,...,xNfeatureembeddingssntrntrn1,...,rnt\u22121Adept\fFigure 4: CNN testbed for Adept. A. Mean responses (arbitrary units) to images chosen by Adept\nwere greater than to randomly-chosen images. B. Adept produced higher response variance for each\nPC dimension than when randomly choosing images. Inset: Percent variance explained. C. Relative\nto the full objective function in Eqn. 1, population objective functions (green) yielded higher response\nmean and variance than those of single-neuron objective functions (blue). D. Feature embeddings for\nall CNN layers were predictive. Error bars are \u00b1 s.d. across 10 runs.\n\n(i.e., dimensions 1 to 10), as the top 10 dimensions explained only 16.8% of the total variance\n(Fig. 4B, inset).\nNext, we asked to what extent does optimizing a population objective function perform better than\noptimizing a single-neuron objective function. For the single-neuron case, we implemented three\ndifferent methods. First, we ran Adept to optimize the response of a single CNN neuron with the\nlargest mean response (\u2018Adept-1\u2019). Second, we applied Adept in a sequential manner to optimize\nthe response of 50 randomly-chosen CNN neurons individually. After optimizing a CNN neuron\nfor 40 images, optimization switched to the next CNN neuron (\u2018Adept-50\u2019). Third, we sequentially\noptimized 50 randomly-chosen CNN neurons individually using a genetic algorithm (\u2018genetic-50\u2019),\nsimilar to the ones proposed in previous studies [12, 13, 14]. We found that Adept produced higher\nmean responses than the three single-neuron methods (Fig. 4C, blue points in left panel), likely\nbecause Adept chose images that evoked large responses across neurons together. All methods\nproduced higher mean responses than randomly choosing images (Fig. 4C, black point above blue\npoints in left panel). Adept also produced higher mean eigenvalue ratios across the top 75 PCA\ndimensions than the three single-neuron methods (Fig. 4C, blue points in right panel). This indicates\nthat Adept, using a population objective, is better able to optimize population responses than using a\nsingle-neuron objective to optimize the response of each neuron in the population.\nWe then modi\ufb01ed the Adept objective function to include only the norm term (\u2018Adept-norm\u2019, Fig. 2B)\nand only the average distance term (\u2018Adept-avgdist\u2019, Fig. 2C). Both of these population methods\nperformed better than single-neuron methods (Fig. 4C, green points below blue points). While their\nperformance was comparable to Adept using the full objective function, upon closer inspection,\nwe observed differences in performance that matched our intuition about the objective functions.\nThe mean response ratio for Adept using the full objection function and Adept-norm was close to 1\n(Fig. 4C, left panel, Adept-norm on red-dashed line, p = 0.65), but the eigenvalue ratio was greater\nthan 1 (Fig. 4C, right panel, Adept-norm above red-dashed line, p < 0.005). Thus, Adept-norm\nmaximizes mean responses at the expense of less scatter. On the other hand, Adept-avgdist produced\na lower mean response than that of Adept using the full objective function (Fig. 4C, left panel,\nAdept-avgdist above red-dashed line, p < 10\u22124), but an eigenvalue ratio of 1 (Fig. 4C, right panel,\nAdept-avgdist on red-dashed line, p = 0.62). Thus, Adept-avgdist increases the response scatter at\nthe expense of a lower mean response.\nThe results in this section were based on middle layer neurons in the GoogLeNet CNN predicting\nmiddle layer neurons in the ResNet CNN. However, it is possible that CNN neurons in other layers\nmay be better predictors than those in a middle layer. To test for this, we asked which layers of the\nGoogLeNet CNN were most predictive of the objective values of the middle layer of the ResNet CNN.\nFor each layer of increasing depth, we computed the correlation between the predicted objective\n(using 750 CNN neurons from that layer) and the actual objective of the ResNet responses (200\nCNN neurons) (Fig. 4D). We found that all layers were predictive (\u03c1 \u2248 0.6), although there was\nvariation across layers. Middle layers were slightly more predictive than deeper layers, likely because\n\n6\n\nABD* mean responsefraction of CNN neurons-0.400.40.80.20\u00b5Adept\u00b5randomdimension index\u03c32Adept/\u03c32random1204060750.81.01.41.80%5%075%\u03c32Adeptrandomdimequal to random selectionCrandomAdept-1Adept-50genetic-50nAdept-ormAdept-avgdist\u03c32Adept\u03c32method/Adept better\u00b5Adept/\u00b5methodsingle neuronmulti-neuron1.01.21.01.4randomAdept-1Adept-50genetic-50nAdept-ormAdept-avgdist0.00.20.40.6CNN layer index12345678910corr predicted vs. actualbetter predictionequal to Adeptequal to Adept\fdeeper layers of GoogLeNet have a different embedding of natural images than the middle layer of\nthe ResNet CNN.\n\n5.2 Testing Adept on V4 population recordings\n\nNext, we tested Adept in a closed-loop neurophysiological experiment. We implanted a 96-electrode\narray in macaque V4, whose neurons respond differently to a wide range of image features, including\norientation, spatial frequency, color, shape, texture, and curvature, among others [27]. Currently, no\nexisting parametric encoding model fully captures the stimulus-response relationship of V4 neurons.\nThe current state-of-the-art model for predicting the activity of V4 neurons uses the output of middle\nlayer neurons in a CNN previously trained without any information about the responses of V4 neurons\n[23]. Thus, we used a pre-trained CNN (GoogLeNet) to obtain the predictive feature embeddings.\nThe experimental task \ufb02ow proceeded as follows. On each trial, a monkey \ufb01xated on a central\ndot while an image \ufb02ashed four times in the aggregate receptive \ufb01elds of the recorded V4 neurons.\nAfter the fourth \ufb02ash, the monkey made a saccade to a target dot (whose location was unrelated to\nthe shown image), for which he received a juice reward. During this task, we recorded threshold\ncrossings on each electrode (referred to as \u201cspikes\u201d), where the threshold was de\ufb01ned as a multiple\nof the RMS voltage set independently for each channel. This yielded 87 to 96 neural units in each\nsession. The spike counts for each neural unit were averaged across the four 100 ms \ufb02ashes to\nobtain mean responses. The mean response vector for the p neural units was then appended to the\npreviously-recorded responses and input into Adept. Adept then output an image to show on the\nnext trial. For the predictive feature embeddings, we used q = 500 CNN neurons in the \ufb01fth layer\nof GoogLeNet CNN (kernel bandwidth h = 200). In each recording session, the monkey typically\nperformed 2,000 trials (i.e., 2,000 of the N =10,000 natural images would be sampled). Each Adept\nrun started with Ninit = 5 randomly-chosen images.\nWe \ufb01rst recorded a session in which we used Adept during one block of trials and randomly chose\nimages in another block of trials. To qualitatively compare Adept and randomly selecting images,\nwe \ufb01rst applied PCA to the response vectors of both blocks, and plotted the top two PCs (Fig. 5A,\nleft panel). Adept uncovers more responses that are far away from the origin (Fig. 5A, left panel,\nred dots farther from black * than black dots). For visual clarity, we also computed kernel density\nestimates for the Adept responses (pAdept) and responses to randomly-chosen images (prandom), and\nplotted the difference pAdept \u2212 prandom (Fig. 5A, right panel). Responses for Adept were denser than\nfor randomly-chosen images further from the origin, whereas the opposite was true closer to the\norigin (Fig. 5A, right panel, red region further from origin than black region). These plots suggest\nthat Adept uncovers large responses that are far from one another. Quantitatively, we veri\ufb01ed that\nAdept chose images with larger objective values in Eqn. 1 than randomly-chosen images (Fig. 5B).\nThis result is not trivial because it relies on the ability of the CNN to predict V4 population responses.\nIf the CNN predicted V4 responses poorly, the objective evaluated on the V4 responses to images\nchosen by Adept could be lower than that evaluated on random images.\nWe then compared Adept and random stimulus selection across 7 recording sessions, including the\nabove session (450 trials per block, with three sessions with the Adept block before the random\nselection block, three sessions with the opposite ordering, and one session with interleaved trials).\nWe found that the images chosen by Adept produced on average 19.5% higher mean responses than\nrandomly-chosen images (Fig. 5C, difference in mean responses were signi\ufb01cantly greater than zero,\np < 10\u22124). We also found that images chosen by Adept produced greater response scatter than for\nrandomly-chosen images, as the mean ratios of eigenvalues were greater than 1 (Fig. 5D, dimensions\n1 to 5). Yet, there were dimensions for which the mean ratios of eigenvalues were less than 1 (Fig. 5D,\ndimensions 9 and 10). These dimensions explained little overall variance (< 5% of the total response\nvariance).\nFinally, we asked to what extent do the different CNN layers predict the objective of V4 responses,\nas in Fig. 4D. We found that, using 500 CNN neurons for each layer, all layers had some predictive\nability (Fig. 5E, \u03c1 > 0). Deeper layers (5 to 10) tended to have better prediction than super\ufb01cial\nlayers (1 to 4). To establish a noise level for the V4 responses, we also predicted the norm and\naverage distance for one session (day 1) with the V4 responses of another session (day 2), where\nthe same images were shown each day. In other words, we used the V4 responses of day 2 as\nfeature embeddings to predict V4 responses of day 1. The correlation of prediction was much higher\n\n7\n\n\fFigure 5: Closed-loop experiments in V4. A. Top 2 PCs of V4 responses to stimuli chosen by Adept\nand random selection (500 trials each). Left: scatter plot, where each dot represents the population\nresponse to one stimulus. Right: difference of kernel densities, pAdept \u2212 prandom. Black * denotes a\nzero response for all neural units. B. Objective function evaluated across trials (one stimulus per trial)\nusing V4 responses. Same data as in A. C. Difference in mean responses across neural units from\n7 sessions. D. Ratio of eigenvalues for different PC dimensions. Error bars: \u00b1 s.e.m. E. Ability of\ndifferent CNN layers to predict V4 responses. For comparison, we also used V4 responses from a\ndifferent day to predict the same V4 responses. Error bars: \u00b1 s.d. across 100 runs.\n\n(\u03c1 \u2248 0.5) than that of any CNN layer (\u03c1 < 0.25). This discrepancy indicates that \ufb01nding feature\nembeddings that are more predictive of V4 responses is a way to improve Adept\u2019s performance.\n\n5.3 Testing Adept for robustness to neural noise and over\ufb01tting\n\nA potential concern for an adaptive method is that stimulus responses are susceptible to neural noise.\nSpeci\ufb01cally, spike counts are subject to Poisson-like variability, which might not be entirely averaged\naway based on a \ufb01nite number of stimulus repeats. Moreover, adaptation to stimuli and changes\nin attention or motivation may cause a gain factor to scale responses dynamically across a session\n[9]. To examine how Adept performs in the presence of noise, we \ufb01rst recorded a \u201cground-truth\u201d,\nspike-sorted dataset in which 2,000 natural images were presented (100 ms \ufb02ashes, 5 to 30 repeats\nper image randomly presented throughout the session). We then re-ran Adept on simulated responses\nunder three different noise models (whose parameters were \ufb01t to the ground truth data): a Poisson\nmodel (\u2018Poisson noise\u2019), a model that scales each response by a gain factor that varies independently\nfrom trial to trial [28] (\u2018trial-to-trial gain\u2019), and the same gain model but where the gain varies\nsmoothly across trials (\u2018slowly-drifting gain\u2019). Because the drift in gain was randomly generated and\nmay not match the actual drift in the recorded dataset, we also considered responses in which the drift\nwas estimated across the recording session and added to the mean responses as their corresponding\nimages were chosen (\u2018recorded drift\u2019). For reference, we also ran Adept on responses with no noise\n(\u2018no noise\u2019). To compare performance across the different settings, we computed the mean response\nand variance ratios between responses based on Adept and random selection (Fig. 6A). All settings\nshowed better performance using Adept than random selection (Fig. 6A, all points above red-dashed\nline), and Adept performed best with no noise (Fig. 6, \u2018no noise\u2019 point at or above others). For a fair\ncomparison, ratios were computed with the ground truth responses, where only the chosen images\ncould differ across settings. These results indicate that, although Adept would bene\ufb01t from removing\nneural noise, Adept continues to outperform random selection in the presence of noise.\nAnother concern for an adaptive method is over\ufb01tting. For example, when no relationship exists\nbetween the CNN feature embeddings and neural responses, Adept may over\ufb01t to a spurious stimulus-\n\n8\n\nPC1 (spikes/sec)0600-2004000AdeptrandompAdept>pAdept=prandompAdept<012008001600avgdist + normABtrial numberAdeptrandomprandomprandom12345678910corr predicted vs. actual0.00.20.40.6CNN layer indexresponsesfromday 2predict day 1 responses with day 2 responsespredict day 1 responses with CNN responsesCDE0600-2004000*fraction of neural units0.30-40-2002040mean response (spikes/sec)\u00b5Adept\u00b5random\u03c32Adept/\u03c32random0.81.01.21.41.640%0%%\u03c32Adeptrandom115dimension index151015PC1 (spikes/sec)PC2 (spikes/sec)dim\fFigure 6: A. Adept\nis robust to neural\nnoise. B. Adept\nshows no over-\n\ufb01tting when re-\nsponses are shuf-\n\ufb02ed across images.\nError bars: \u00b1 s.d.\nacross 10 runs.\n\nresponse mapping and perform worse than random selection. To address this concern, we performed\ntwo analyses using the same ground truth dataset as in Fig. 6A. For the \ufb01rst analysis, we ran Adept\non the ground truth responses (choosing 500 of the 2,000 candidate images) to yield on average a\n6% larger mean response and a 21% larger response scatter (average over top 5 PCs) than random\nselection (Fig. 6B, unshuf\ufb02ed responses). Next, to break any stimulus-response relationship, we\nshuf\ufb02ed all of the ground truth responses across images, and re-ran Adept. Adept performed no worse\nthan random selection (Fig. 6B, shuf\ufb02ed responses, blue points on red-dashed line). For the second\nanalysis, we asked if Adept focuses on the most predictable neurons to the detriment of other neurons.\nWe shuf\ufb02ed all of the ground truth responses across images for half of the neurons, and ran Adept\non the full population. Adept performed better than random selection for the subset of neurons with\nunshuf\ufb02ed responses (Fig. 6B, unshuf\ufb02ed subset), but no worse than random selection for the subset\nwith shuf\ufb02ed responses (Fig. 6B, shuf\ufb02ed subset, green points on red-dashed line). Adept showed no\nover\ufb01tting in either scenario, likely because Adept cannot choose exceedingly similar images (i.e.,\ndiffering by a few pixels) from its discrete candidate pool.\n\n6 Discussion\n\nHere we proposed Adept, an adaptive method for selecting stimuli to optimize neural population\nresponses. To our knowledge, this is the \ufb01rst adaptive method to consider a population of neurons\ntogether. We found that Adept, using a population objective, is better able to optimize population\nresponses than using a single-neuron objective to optimize the response of each neuron in the\npopulation (Fig. 4C). While Adept can \ufb02exibly incorporate different feature embeddings, we take\nadvantage of the recent breakthroughs in deep learning and apply them to adaptive stimulus selection.\nAdept does not try to predict the response of each V4 neuron, but rather uses the similarity of CNN\nfeature embeddings to different images to predict the similarity of the V4 population responses to\nthose images.\nWidely studied neural phenomena such as changes in responses due to attention [29] and trial-to-trial\nvariability [30, 31] likely depend on mean response levels [32]. When recording from a single neuron,\none can optimize to produce large mean responses in a straightforward manner. For example, one can\noptimize the orientation and spatial frequency of a sinusoidal grating to maximize a neuron\u2019s \ufb01ring\nrate [9]. However, when recording from a population of neurons, identifying stimuli that optimize\nthe \ufb01ring rate of each neuron can be infeasible due to limited recording time. Moreover, neurons far\nfrom the sensory periphery tend to be more responsive to natural stimuli [33], and the search space\nfor natural stimuli is vast. Adept is a principled way to ef\ufb01ciently search through a space of natural\nstimuli to optimize the responses of a population of neurons. Experimenters can run Adept for a\nrecording session, and then present the Adept-chosen stimuli in subsequent sessions when probing\nneural phenomena.\nA future challenge for adaptive stimulus selection is to generate natural images rather than selecting\nfrom a pre-existing pool of candidate images. For Adept, one could use a parametric model to\ngenerate natural images, such as a generative adversarial network [34], and optimize Eqn. 1 with\ngradient-based or Bayesian optimization.\n\n9\n\nA1.01.1unshu(cid:31)ed responsesshu(cid:31)ed responsesunshu(cid:31)ed subsetshu(cid:31)ed subset1.01.3unshu(cid:31)ed responsesshu(cid:31)ed responsesunshu(cid:31)ed subsetshu(cid:31)ed subset\u00b5Adept/\u00b5randomB\u00b5Adept/\u00b5random1.01.11.01.3no noisePoisson noisetrial-to-trial gainslowly-drifting gainrecorded driftno noisePoisson noisetrial-to-trial gainslowly-drifting gainrecorded drift\u03c32Adept\u03c32random/\u03c32Adept\u03c32random/equal torandom selection\fAcknowledgments\n\nB.R.C. was supported by a BrainHub Richard K. Mellon Fellowship. R.C.W. was supported by NIH\nT32 GM008208, T90 DA022762, and the Richard K. Mellon Foundation. K.A. was supported by\nNSF GRFP 1747452. M.A.S. and B.M.Y. were supported by NSF-NCS BCS-1734901/1734916.\nM.A.S. was supported by NIH R01 EY022928 and NIH P30 EY008098. B.M.Y. was supported\nby NSF-NCS BCS-1533672, NIH R01 HD071686, NIH R01 NS105318, and Simons Foundation\n364994.\n\nReferences\n\n[1] D. Ringach and R. Shapley, \u201cReverse correlation in neurophysiology,\u201d Cognitive Science, vol. 28, no. 2,\n\npp. 147\u2013166, 2004.\n\n[2] N. C. Rust and J. A. Movshon, \u201cIn praise of arti\ufb01ce,\u201d Nature Neuroscience, vol. 8, no. 12, pp. 1647\u20131650,\n\n2005.\n\n[3] O. Schwartz, J. W. Pillow, N. C. Rust, and E. P. Simoncelli, \u201cSpike-triggered neural characterization,\u201d\n\nJournal of Vision, vol. 6, no. 4, pp. 13\u201313, 2006.\n\n[4] J. Benda, T. Gollisch, C. K. Machens, and A. V. Herz, \u201cFrom response to stimulus: adaptive sampling in\n\nsensory physiology,\u201d Current Opinion in Neurobiology, vol. 17, no. 4, pp. 430\u2013436, 2007.\n\n[5] C. DiMattina and K. Zhang, \u201cAdaptive stimulus optimization for sensory systems neuroscience,\u201d Closing\n\nthe Loop Around Neural Systems, p. 258, 2014.\n\n[6] C. K. Machens, \u201cAdaptive sampling by information maximization,\u201d Physical Review Letters, vol. 88,\n\nno. 22, p. 228104, 2002.\n\n[7] C. K. Machens, T. Gollisch, O. Kolesnikova, and A. V. Herz, \u201cTesting the ef\ufb01ciency of sensory coding\n\nwith optimal stimulus ensembles,\u201d Neuron, vol. 47, no. 3, pp. 447\u2013456, 2005.\n\n[8] L. Paninski, \u201cAsymptotic theory of information-theoretic experimental design,\u201d Neural Computation,\n\nvol. 17, no. 7, pp. 1480\u20131507, 2005.\n\n[9] J. Lewi, R. Butera, and L. Paninski, \u201cSequential optimal design of neurophysiology experiments,\u201d Neural\n\nComputation, vol. 21, no. 3, pp. 619\u2013687, 2009.\n\n[10] M. Park, J. P. Weller, G. D. Horwitz, and J. W. Pillow, \u201cBayesian active learning of neural \ufb01ring rate maps\n\nwith transformed gaussian process priors,\u201d Neural Computation, vol. 26, no. 8, pp. 1519\u20131541, 2014.\n\n[11] J. W. Pillow and M. Park, \u201cAdaptive bayesian methods for closed-loop neurophysiology,\u201d in Closed Loop\n\nNeuroscience (A. E. Hady, ed.), Elsevier, 2016.\n\n[12] E. T. Carlson, R. J. Rasquinha, K. Zhang, and C. E. Connor, \u201cA sparse object coding scheme in area V4,\u201d\n\nCurrent Biology, vol. 21, no. 4, pp. 288\u2013293, 2011.\n\n[13] Y. Yamane, E. T. Carlson, K. C. Bowman, Z. Wang, and C. E. Connor, \u201cA neural code for three-dimensional\nobject shape in macaque inferotemporal cortex,\u201d Nature Neuroscience, vol. 11, no. 11, pp. 1352\u20131360,\n2008.\n\n[14] C.-C. Hung, E. T. Carlson, and C. E. Connor, \u201cMedial axis shape coding in macaque inferotemporal cortex,\u201d\n\nNeuron, vol. 74, no. 6, pp. 1099\u20131113, 2012.\n\n[15] P. F\u00f6ldi\u00e1k, \u201cStimulus optimisation in primary visual cortex,\u201d Neurocomputing, vol. 38, pp. 1217\u20131222,\n\n2001.\n\n[16] K. N. O\u2019Connor, C. I. Petkov, and M. L. Sutter, \u201cAdaptive stimulus optimization for auditory cortical\n\nneurons,\u201d Journal of Neurophysiology, vol. 94, no. 6, pp. 4051\u20134067, 2005.\n\n[17] I. H. Stevenson and K. P. Kording, \u201cHow advances in neural recording affect data analysis,\u201d Nature\n\nNeuroscience, vol. 14, no. 2, pp. 139\u2013142, 2011.\n\n[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich,\n\u201cGoing deeper with convolutions,\u201d in Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition, pp. 1\u20139, 2015.\n\n[19] G. S. Watson, \u201cSmooth regression analysis,\u201d Sankhy\u00afa: The Indian Journal of Statistics, Series A, pp. 359\u2013\n\n372, 1964.\n\n[20] E. P. Simoncelli and W. T. Freeman, \u201cThe steerable pyramid: A \ufb02exible architecture for multi-scale\nderivative computation,\u201d in Image Processing, 1995. Proceedings., International Conference on, vol. 3,\npp. 444\u2013447, IEEE, 1995.\n\n[21] A. Olmos and F. A. Kingdom, \u201cA biologically inspired algorithm for the recovery of shading and re\ufb02ectance\n\nimages,\u201d Perception, vol. 33, no. 12, pp. 1463\u20131473, 2004.\n\n10\n\n\f[22] \u201cGoogle google image search.\u201d http://images.google.com. Accessed: 2017-04-25.\n[23] D. L. Yamins and J. J. DiCarlo, \u201cUsing goal-driven deep learning models to understand sensory cortex,\u201d\n\nNature Neuroscience, vol. 19, no. 3, pp. 356\u2013365, 2016.\n\n[24] K. He, X. Zhang, S. Ren, and J. Sun, \u201cDeep residual learning for image recognition,\u201d in Proceedings of the\n\nIEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013778, 2016.\n\n[25] A. Vedaldi and K. Lenc, \u201cMatconvnet \u2013 convolutional neural networks for Matlab,\u201d in Proceeding of the\n\nACM Int. Conf. on Multimedia, 2015.\n\n[26] J. Xiao, \u201cPrinceton vision and robotics toolkit,\u201d 2013. Available from: http://3dvision.princeton.\n\nedu/pvt/GoogLeNet/.\n\n[27] A. W. Roe, L. Chelazzi, C. E. Connor, B. R. Conway, I. Fujita, J. L. Gallant, H. Lu, and W. Vanduffel,\n\n\u201cToward a uni\ufb01ed theory of visual area V4,\u201d Neuron, vol. 74, no. 1, pp. 12\u201329, 2012.\n\n[28] I.-C. Lin, M. Okun, M. Carandini, and K. D. Harris, \u201cThe nature of shared cortical variability,\u201d Neuron,\n\nvol. 87, no. 3, pp. 644\u2013656, 2015.\n\n[29] M. R. Cohen and J. H. Maunsell, \u201cAttention improves performance primarily by reducing interneuronal\n\ncorrelations,\u201d Nature Neuroscience, vol. 12, no. 12, pp. 1594\u20131600, 2009.\n\n[30] A. Kohn, R. Coen-Cagli, I. Kanitscheider, and A. Pouget, \u201cCorrelations and neuronal population informa-\n\ntion,\u201d Annual Review of Neuroscience, vol. 39, pp. 237\u2013256, 2016.\n\n[31] M. Okun, N. A. Steinmetz, L. Cossell, M. F. Iacaruso, H. Ko, P. Barth\u00f3, T. Moore, S. B. Hofer, T. D.\nMrsic-Flogel, M. Carandini, et al., \u201cDiverse coupling of neurons to populations in sensory cortex,\u201d Nature,\nvol. 521, no. 7553, pp. 511\u2013515, 2015.\n\n[32] M. R. Cohen and A. Kohn, \u201cMeasuring and interpreting neuronal correlations,\u201d Nature Neuroscience,\n\nvol. 14, no. 7, pp. 811\u2013819, 2011.\n\n[33] G. Felsen, J. Touryan, F. Han, and Y. Dan, \u201cCortical sensitivity to visual features in natural scenes,\u201d PLoS\n\nbiology, vol. 3, no. 10, p. e342, 2005.\n\n[34] A. Radford, L. Metz, and S. Chintala, \u201cUnsupervised representation learning with deep convolutional\n\ngenerative adversarial networks,\u201d arXiv preprint arXiv:1511.06434, 2015.\n\n11\n\n\f", "award": [], "sourceid": 901, "authors": [{"given_name": "Benjamin", "family_name": "Cowley", "institution": "Carnegie Mellon University"}, {"given_name": "Ryan", "family_name": "Williamson", "institution": "Carnegie Mellon University"}, {"given_name": "Katerina", "family_name": "Clemens", "institution": "University of Pittsburgh"}, {"given_name": "Matthew", "family_name": "Smith", "institution": "University of Pittsburgh"}, {"given_name": "Byron", "family_name": "Yu", "institution": "Carnegie Mellon University"}]}