{"title": "Modeling Complex Cells in an Awake Macaque during Natural Image Viewing", "book": "Advances in Neural Information Processing Systems", "page_first": 236, "page_last": 242, "abstract": null, "full_text": "Modeling Complex Cells in an A wake \n\nMacaque During Natural Image Viewing \n\nWilliam E.  Vinje \n\nvinjeCsocrates.berkeley.edu \n\nJack L.  Gallant \n\ngallantCsocrates.berkeley.edu \n\nDepartment of Molecular  and \n\nDepartment of Psychology \n\nCellular Biology,  Neurobiology Division \n\nUniversity of California, Berkeley \n\nUniversity of California, Berkeley \n\nBerkeley,  CA,  94720 \n\nBerkeley,  CA,  94720 \n\nAbstract \n\nWe  model  the  responses  of cells  in  visual  area  VI  during  natural \nvision.  Our model consists  of a  classical energy  mechanism whose \noutput is divided by nonclassical gain control and texture  contrast \nmechanisms.  We  apply  this  model  to  review  movies,  a  stimulus \nsequence  that replicates  the stimulation a  cell  receives  during free \nviewing  of natural  images.  Data  were  collected  from  three  cells \nusing five  different  review movies, and the model was fit  separately \nto  the data from  each  movie.  For  the  energy  mechanism alone  we \nfind  modest  but  significant  correlations  (rE  =  0.41,  0.43,  0.59, \n0.35)  between  model  and  data.  These  correlations  are  improved \nsomewhat when we  allow for suppressive surround effects  (rE+G  = \n0.42,  0.56,  0.60,  0.37).  In  one  case  the  inclusion  of a  delayed \nsuppressive  surround dramatically improves the  fit  to the  data by \nmodifying the  time course  of the model's response. \n\nINTRODUCTION \n\n1 \nComplex  cells  in  the  primary  visual  cortex  (area  VI  in  primates)  are  tuned  to \nlocalized  visual  patterns  of a  given  spatial frequency,  orientation,  color,  and  drift \ndirection  (De  Valois &  De  Valois,  1990).  These  cells  have  been  modeled  as  linear \nspatio-temporal filters  whose  output  is  rectified  by  a  static  nonlinearity  (Adelson \n&  Bergen,  1985);  more  recent  models  have  also  included  a  divisive  contrast  gain \ncontrol  mechanism  (Heeger,  1992;  Wilson &  Humanski,  1993;  Geisler  &  Albrecht, \n1997).  We  apply  a  modified  form  of these  models  to  a  stimulus  that  simulates \nnatural vision.  Our model uses  relatively few  parameters yet incorporates the cells' \ntemporal response  properties  and suppressive  influences  from  beyond  the classical \nreceptive  field  (C RF). \n\n\fModeling  Complex Cells during Natural Image Viewing \n\n237 \n\n2  METHODS \nData Collection: Data were collected from One awake behaving Macaque monkey, \nusing  single  unit  recording  techniques  described  elsewhere  (Connor  et  al.,  1997).1 \nFirst,  the cell's receptive field  size  and location were  estimated manually, and tun(cid:173)\ning curves were objectively characterized using two-dimensional sinusoidal gratings. \nNext  a  static  color  image of a  natural  scene  was  presented  to  the  animal and  his \neye  position was recorded continuously as he freely scanned the image for  9 seconds \n(Gallant  et  al.,  1998).2  Image patches  centered  on  the  position of the  cell's C RF \n(and  2-4  times the CRF diameter) were  then extracted  using  an automated proce(cid:173)\ndure.  The sequence  of image patches  formed  a  continuous  9  second  review  movie \nthat  simulated  all  of the  stimulation that  had  occurred  in  and  around  the  C RF \nduring  free  viewing. 3  Although  the  original  image was  static,  the  review  movies \ncontain the temporal dynamics of the saccadic eye  movements made by the animal \nduring free  viewing.  Finally, the review  movies were played in and around the C RF \nwhile  the  animal performed  a fixation  task. \n\nDuring  free  viewing  each  eye  position  is  unique,  so  each  image  patch  is  likely  to \nenter  the C RF only once.  The review movies were  therefore  replayed several times \nand the cell's average response  with respect  to the movie timestream was computed \nfrom the peri-stimulus time histogram  (PSTH).  These  review  movies also form  the \nmodel's stimulus  input,  while  its  output  is  relative  spike  probability  versus  time \n(the  model cell's  PSTH). \n\nBefore  applying  the  model  each  review  movie  was  preprocessed  by  converting  to \ngray  scale  (since  the  model  does  not  consider  color  tuning),  setting  the  average \nluminance level to zero  (on a frame by frame basis)  and prefiltering with the human \ncontrast sensitivity function to more accurately reflect  the information reaching cells \nin VI. \nDivisive Normalization Model:  The model consists of a classical receptive field \nenergy  mechanism, ECRF,  whose output is  divided by two  nonclassical suppressive \nmechanisms, a  gain control field,  G,  and a  texture  contrast field,  T. \n\nPSTHmodel(t)  ex  1 + Q  G(t - d) + f3T(t  - d) \n\nECRF(t) \n\n(1) \n\nWe  include a delay parameter for suppressive effects,  consistent with the hypothesis \nthat  these  effects  may  be  mediated  by  local  cortical  interactions  (Heeger,  1992; \nWilson  &  Humanski,  1993).  Any  latency  difference  between  the  central  energy \nmechanism and the suppressive surround  will  be reflected  as  a  positive delay offset \n(15  > 0 in  Equation 1). \nClassical Receptive Field Energy Mechanism: The energy mechanism, ECRF, \nis composed of four phase-dependent subunits, Uti>.  Each subunit computes an inner \nproduct  in  space  and  a  convolution  in  time  between  the  model  cell's  space-time \nclassical receptive field,  CRFtI>(x, y, r),  and  the image, I(x, y, t). \n\nU<P(t)  = J J J CRFtI>(x, y, r)  . I(x, y, t - r) dx dydr \n\n(2) \n\n1 Recorcling  was  performed under  a university-approved  protocol  and  conformed  to all \n\nrelevant  NIH  and USDA  guidelines. \n\n2 Images were taken from a Corel Corporation photo-CD library at 1280xl024 resolution. \n3Eye position data were  collected at  1 KHz,  whereas the monitor clisplay  rate was  72.5 \nHz  (14  ms  per frame).  Therefore  each  review  movie  frame  was  composed  of  the  average \nstimulation  occurring  during  the corresponcling  13.8 ms  of free  viewing. \n\n\f238 \n\nW.  E.  Vinje and 1. L  Gallant \n\nThe model presented  here  incorporates  the simplifying assumption of a space-time \nseparable  receptive  field  structure,  CRF4>(x, y, r) = CRF4>(x, y) CRF(r). \n\nu4>(t)  = L: CRF(r) (L: L: CRF4>(x, y)  . I(x, y, t - r)) \n\n(3) \n\nT \n\nX \n\nY \n\nTime is  discretized  into frames  and space  is  discretized  into pixels  that  match  the \nreview  movie input.  CRF4>(x, y)  is modeled as a sinusoidal grating that is spatially \nweighted  by  a  Gaussian  envelope  (i.e.  a  Gabor function).  In  this  paper  CRF(r) \nis  approximated  as  a  delta function  following  a  constant  latency.  This  minimizes \nmodel  parameters and highlights the  model's responses  to  the  stimulus present  at \neach  fixation.  The  latency,  orientation  and  spatial  frequency  of the  grating,  and \nthe size  of the C RF envelope,  are  all determined empirically by  maximizing the fit \nbetween  model and data. 4 \n\nA  static  non-linearity  ensures  that  the  model  PSTH  does  not  become  negative. \nWe  have  e~amined both  half-wave  rectification,  fj4>(t)  =  max[U4>(t), O],  and  half(cid:173)\nsquaring,  U4>(t)  =  (max[U4>(t) , 0])2;  here  we  present  the  results  from  half-wave \nrectification.  Half-squaring produces  small changes  in  the  model  PSTH  but  does \nnot  improve  the fit  to the  data. \n\nThe energy mechanism is made phase invariant by averaging over the rectified phase(cid:173)\ndependent  subunits: \n\n(4) \n\nGain Control Field:  Cells  in V 1 incorporate  a  contrast  gain control mechanism \nthat compensates for  changes in local luminance.  The gain control field,  G,  models \nthis effect as the total image power in a region encompassing the C RF and surround. \n\nG(t-<5) = L:CRF(r)  (L:L:VP(kx,ky,r)  ) \n\nT \n\nk% \n\nky \n\nP(kx, ky, r) =  F FT[PG(x, y, r)]  F FT*[JlG(x, y, r)] \n\n(5) \n\n(6) \n\n(7) \nP(kx, ky, r)  is  the spatial Fourier power of JlG(x,  y, r)  and VG  is  a  two dimensional \nGaussian weighting function whose  width sets  the size  of the gain control field. \n\nJlG(x,  y, r) =  vG(x, y)  I(x, y, (t  - <5)  - r) \n\nHeeger's  (1992)  divisive  gain control  term sums over  many discrete  energy  mecha(cid:173)\nnisms that tile space in and around the area of the C RF.  Equation 5 approximates \nHeeger's  approach in  the limiting case  of dense  tiling. \n\nTexture Contrast Field:  Cells in area VI can be affected by the image surround(cid:173)\ning  the  region  of the  CRF  (Knierim &  Van  Essen,  1992) .  The  responses  of many \nVI cells are  highest when  the optimal stimulus is  presented  alone within the CRF, \nand  lowest  when  that  stimulus is  surrounded  with  a  texture  of similar orientation \nand frequency.  The texture contrast field,  T,  models this effect  as  the image power \n\n4 As  a  fit  statistic  we  use  the linear  correlation  coefficient  (Pearson's  r)  between  model \nand data.  Fitting is  done  with a  gradient  ascent  algorithm.  Our choice  of correlation  as a \nstatistic eliminates the need to explicitly  consider model normalization as a  variable, and is \nvery sensitive  to latency mismatches between model and data.  However, linear correlation \nis  more prone  to noise  contamination  than is  X2 \u2022 \n\n\fModeling  Complex Cells during Natural Image Viewing \n\n239 \n\nin the spatial region surrounding the C RF that matches the C RF's orientation and \nspatial frequency. \n\nT(t-J) =  4  1:  1: CRF(r)  \"\u00a31: Jp4>(kx,ky,r) \n\n1  90,180,270 [ (  \n\n4>=0 \n\nT \n\nk\",  ky \n\n)] \n\n(8) \n\nP4>(kx, ky, r)  =  F FT[p~(x, y, r)]  F FT\u00b7[p~(x, y, r)] \n\nJ.t~(x, y, r) = ~*(x, y)  (1  - lICRF(X, y))  I(x, y, (t  - J)  - r) \n\n(9) \n(10) \n~* is  a Gabor function whose  orientation and spatial frequency  match those of the \nbest' fit C RF4> (x, y).  The envelope of ~* defines the size of the texture contrast field. \nlICRF  is  a  two  dimensional Gaussian weighting function  whose  width  matches the \nC RF envelope,  and  which  suppresses  the  image center.  Thus the  texture  contrast \nterm picks  up oriented power from an  annular region of the image surrounding the \nC RF envelope.  T  is  made phase invariant by averaging over  phase. \n\n3  RESULTS \nThus  far  our  model  has  been  evaluated  on  a  small data  set  collected  as  part  of \na  different  study  (Gallant  et  ai.,  1998).  Two  cells,  87A  and  98C,  were  examined \nwith one review movie each, while cell 97 A was examined with three review movies. \nUsing  this data set  we  compare the model's response  in  two  interesting  situations: \ncell  97 A,  which  had  high  orientation-selectivity,  versus  cell  87 A,  which  had  poor \norientation-selectivity;  and  cell  98C,  which  was  directionally-selective,  versus  cell \n97 A,  which  was  not directionally-selective. \nCRF Energy Mechanism:  We  separately fit  the  energy  mechanism parameters \nto  each  of the  three  different  cells.  For  cell  97 A  the  three  review  movies  were  fit \nindependently  to test for  consistency of the  best  fit  parameters. \nTable 1 shows  the correlation between model and data using only  the C RF energy \nmechanism  (a  =  f3  =  0  in  Equation  1).  The  significance  of the  correlations  was \nassessed  via  a  permutation  test.  The  correlation  values  for  cells  97 A  and  98C, \nthough  modest,  are  significant  (p  <  0.01).  For  these  cells  the  95%  confidence \nintervals  on  the  best  fit  parameter  values  are  consistent  with  estimates from  the \nflashed  grating tests.  The best fit  parameter values for  cell  97 A are  also  consistent \nacross  the  three independently fit  review  movies. \n\nThe model best accounts for  the data from cell 97 A. This cell was highly selective for \nvertical gratings and  was  not  directionally-selective.  Figure  1 compares the  PSTH \nobtained  from  cell  97 A  with  movie  B  to  the  model  PSTH.  The  model  generally \nresponds to the same features that drive the real cell,  though the match is imperfect. \nMuch of the discrepancy between the model and data arises from our approximation \nof CRF(t)  as  a  delta function.  The  model's  response  is  roughly  constant  during \n\nCell \nMovie \nOriented \nDirectional \nrE \n\n97A \nA \nYes \nNo \n\n97A \n87A \nB \nA \nYes \nNo \nNo \nNo \nNA  0.41  0.43 \n\n97A \nC \nYes \nNo \n0.59 \n\n98C \nA \nYes \nYes \n0.35 \n\nTable  1:  Correlations  between  model  and  data  PSTHs.  Oriented  cells  showed \norientation-selectivity  in  the  flashed  grating  test  while  Directional  cells  showed \ndirectional-selectivity during manual characterization.  rE is the correlation between \nECRF  and  the data.  No  fit  was obtained for  cell  87 A. \n\n\f240 \n\nW.  E.  Vinje and J  L. Gallant \n\n1~--~~--~----~----~----T-----~--~----~----~ \n\n.(cid:173).-. \ncO.8 \n~ \n~ \u00a3 0.6 \n~ 'a \n~ 0.4 \n~ \n.~ \n.-. \n~ \n~ 0.2 \n\n~ \n\n1 \n\n2 \n\n3 \n\n4 \n\n5 \n\nTime (seconds) \n\n6 \n\n7 \n\n8 \n\n9 \n\nFigure  1:  CRF energy mechanism  versus  data (Cell 97A,  Movie  B) . White indicates  that \nthe model response  is  greater than the data,  while black indicates  the data is  greater than \nthe model and gray indicates regions  of overlap.  A perfect match between model and data \nwould  result in the entire area under the curve being  gray.  Our approximation  of CRF(t) \nleads  to  a  relatively  constant  model  PSTH during  each  fixation.  In  contrast  the real  cell \ngenerally  gives  a  phasic  response  as  each saccade brings  a  new  stimulus  into  the  CRF.  In \ngeneral  the  same movie  features  drive  both model  and cell. \n\neach  fixation,  which  causes  the  model  PSTH  to  appear  stepped.  In  contrast  the \ndata PSTH shows  a  strong  phasic response  at the beginning of each  fixation  when \na  new  stimulus patch enters  the cell's CRF . \n\nThe  model  is  less  successful  at  accounting  for  the  responses  of  the  directionally(cid:173)\nselective  cell,  98C.  This is  probably  because  the  model's space-time  separable  re(cid:173)\nceptive  field  misses motion energy  cues  that drive  the  cell.  The model completely \nfailed  to  fit  the  data from  cell  87 A. This cell  was  not  orientation-selective,  so  the \nfitting procedure was  unable to find  an appropriate orientation for  the CRF\u00a2(x, y) \nGabor function. 5 \n\nCRF  Energy  Mechanism with Suppressive Surround: Table  2  lists  the  im(cid:173)\nprovements in correlation obtained by adding the  gain control  term  (a  > 0, fJ  =  0 \nin  Equation  1).  For  cell  97 A  (all  three  movies)  the  best  correlations  are  obtained \nwhen  the  surround  effects  are  delayed  by  56  ms  relative  to  the  center.  The  best \ncorrelation for  cell 98C  is  obtained  when  the surround is  not  delayed. \n\nIn  three  out  of four  cases  the  correlation  values  are  barely  improved  when  the \nsurround effects  are included, suggesting that the cells were  not strongly surround(cid:173)\ninhibited by these review movies.  However,  the improvement is quite striking in the \n\nSFor cell  87 A the correlation  values  in  the orientation  and spatial frequency  parameter \nsubspace  contained  three  roughly  equivalent  maxima.  Contamination  by  multiple  cells \nwas  unlikely  due  to this  cell's excellent  isolation. \n\n\fModeling  Complex Cells during Natural Image Viewing \n\n241 \n\nCell \nMovie \nrE+G \n~r \n\n97A \nA \n0.42 \n\n97A \nB \n0.56 \n+0.01  +0.13 \n\n97A \nC \n0.60 \n+0.01 \n\n98C \nA \n0.37 \n+0.02 \n\nTable 2:  Correlation improvements due  to surround gain control mechanism.  rE+G \ngives  the correlation  value  between  the  best  fit  model  and  the  data.  ~r gives  the \nimprovement  over  rEo  Including  G  in  Equation  1  leads  to  a  dramatic correlation \nincrease for  cell 97 A,  movie B,  but not for  the other review  movies. \n\ncase of cell  97 A,  movie B.  Figure 2 compares the data with a model using both Ecr f \nand G  in Equation 1.  Here  the delayed surround suppresses  the sustained responses \nseen  in Figure 1 and results in a more phasic model PSTH that closely matches the \ndata. \nWe  consider  G  and  T  fields  both  independently  and  in  combination.  For  each \nwe  independently  fit  for  Q,  {3,  &,  and  the  size  of the  suppressive  fields.  However, \nthe oriented Fourier power correlates with the total Fourier power for  our sample of \nnatural images, so that G and T  are highly correlated.  Combined fitting of G and T \nterms leads to competition and dominance by G (i.e.  (3  -r 0).  In this paper we  only \nreport  the  effects  of the  gain  control  mechanism;  the  texture  contrast  mechanism \nresults  in similar  (though slightly degraded)  results. \n\n1~--~----~----~----~----~----~---.----~----~ \n\ncO.8 \n..... \n..... \n...... \n~ \n~ ct  0.6 \n] ..... \n0-\n~ 0.4 \n..... \n> \n~ ...... \n0:::  0.2 \n\n(I) \n\no o \n\n1 \n\n2 \n\n3  456  \n\nTime (seconds) \n\n7 \n\n8 \n\n9 \n\nFigure  2:  C RF energy  mechanism  with  delayed  surround  gain  control  versus  data (Cell \n97A,  Movie  B).  Color scheme  as  in Figure  1.  The inclusion  of  the delayed  G  term results \nin  a  more  phasic  model  response  which  greatly  improves  the  match  between  model  and \ndata. \n\n\f242 \n\nW.  E.  Vinje and 1. L.  Gallant \n\n4  DISCUSSION \nThis preliminary study suggests  that  models of the  form  outlined  here  show  great \npromise  for  describing  the  responses  of area  V1  cells  during  natural  vision.  For \ncomparison consider the correlation values obtained from  an earlier  neural network \nmodel that  attempted to  reproduce  V1  cells'  responses  to  a  variety of spatial pat(cid:173)\nterns  (Lehky  et  al.  1992).  They  report  a  median  correlation  value  of  0.65  for \ncomplex stimuli, whereas  the  average correlation score from Table 2 is 0.49.  This is \nremarkable considering  that our  model  has  only  7 free  parameters,  a  very  hmited \ndata set  for  fitting,  doesn't  yet  consider  color  tuning  or  directional-selectivity  and \nconsiders  response  across  time. \n\nFuture  implementations of the  model  will  use  a  more  sophisticated  energy  mech(cid:173)\nanism  that  allows  for  nonseparable  space  time  receptive  field  structure  and  more \nrealistic  temporal  response  dynamics.  We  will  also  incorporate  more  detail  into \nthe surround mechanisms, such as  asymmetric surround structure and a broadband \ntexture  contrast  term. \nBy  abstracting  physiological  observation  into  approximate  functional  forms  our \nmodel balances explanatory power against parametric complexity.  A cascaded series \nof these models may form the foundation for future modeling of cells in extra-striate \nareas  V2  and V4.  Natural image stimuli may provide  an appropriate stimulus set \nfor  development  and validation of these  extrastriate models. \n\nAcknowledgements \nWe  thank  Joseph  Rogers  for  assistance  in  this study,  Maneesh  Sahani for  the  ex(cid:173)\ntremely  useful  suggestion  of fitting  the  CRF parameters,  Charles  Connor for  help \nwith data collection and David Van  Essen  for  support of data collection. \n\nReferences \nAdelson,  E.  H.  &  Bergen,  J.  R.  (1985)  Spatiotemporal energy  models for  the  per(cid:173)\nception of motion.  Journal  of the  Optical Society  of America,  A, 2,  284-299. \nConnor,  C.  C.,  Preddie,  D.  C.,  Gallant,  J .  L.  &  Van  Essen,  D.  C.  (1997)  Spatial \nattention effects  in  macaque area V4.  Journal  of Neuroscience,  77,  3201-3214. \nDe  Valois,  R.  L.  &  De  Valois,  K.  K.  (1990)  Spatial  Vision.  New  York:  Oxford \nUniversity  Press. \nGallant, J.  L.,  Connor,  C.  E.,  &  Van  Essen,  D. C.  (1998)  Neural Activity in  Areas \nV1 ,  V2  and  V4  During  Free  Viewing  of Natural  Scenes  Compared  to  Controlled \nViewing.  NeuroReport,  9 . \n\nGeisler,  W.  S., Albrecht,  D.  G.  (1997)  Visual cortex  neurons  in monkeys and cats: \nDetection,  discrimination, and identification.  Visual  Neuroscience,  14, 897-919. \n\nHeeger,  D.  J.  (1992)  Normalization of cell  responses  in  cat  striate  cortex.  Visual \nNeuroscience,  9,  181-198. \nKnierim,  J .  J .  &  Van  Essen,  D.  C.  (1992)  Neuronal  responses  to  static  texture \npatterns in  area V1  of the  alert  macaque monkey.  Journal  of Neurophysiology, 67, \n961-980. \nLehky,  S.  R.,  Sejnowski,  T .  J .  &  Desimone,  R.  (1992)  Predicting  Responses  of \nNonlinear  Neurons  in  Monkey  Striate  Cortex  to  Complex  Patterns.  Journal  of \nNeuroscience,  12,  3568-3581. \nWilson,  H.  R.  &  Humanski,  R.  (1993)  Spatial frequency  adaptation  and  contrast \ngain control.  Vision  Research,  33, 1133-1149. \n\n\f", "award": [], "sourceid": 1403, "authors": [{"given_name": "William", "family_name": "Vinje", "institution": null}, {"given_name": "Jack", "family_name": "Gallant", "institution": null}]}