{"title": "An Information-Theoretic Framework for Understanding Saccadic Eye Movements", "book": "Advances in Neural Information Processing Systems", "page_first": 834, "page_last": 840, "abstract": null, "full_text": "An Information-Theoretic Framework for \nUnderstanding Saccadic Eye Movements \n\nTai  Sing Lee * \n\nDepartment of Computer Science \n\nCarnegie Mellon  University \n\nPittsburgh,  PA  15213 \n\ntai@es.emu.edu \n\nStella X.  Yu \n\nRobotics Institute \n\nCarnegie Mellon  University \n\nPittsburgh,  PA  15213 \nstella@enbe.emu.edu \n\nAbstract \n\nIn  this paper,  we  propose that information maximization can  pro(cid:173)\nvide  a  unified  framework  for  understanding  saccadic  eye  move(cid:173)\nments.  In  this framework,  the mutual information among the cor(cid:173)\ntical  representations  of the  retinal  image,  the  priors  constructed \nfrom  our  long  term  visual  experience,  and  a  dynamic  short-term \ninternal  representation  constructed  from  recent  saccades  provides \na  map for  guiding eye  navigation .  By  directing  the  eyes  to loca(cid:173)\ntions  of maximum complexity in  neuronal  ensemble  responses  at \neach  step,  the  automatic saccadic  eye  movement system  greedily \ncollects information about the external world, while modifying the \nneural  representations  in  the  process.  This  framework  attempts \nto  connect  several  psychological  phenomena,  such  as  pop-out  and \ninhibition of return,  to long term visual experience  and short term \nworking  memory.  It  also  provides  an  interesting  perspective  on \ncontextual computation and formation of neural  representation  in \nthe visual system. \n\n1 \n\nIntroduction \n\nWhen  we  look  at a  painting or  a  visual  scene,  our  eyes  move  around  rapidly and \nconstantly to look at different parts of the scene.  Are there rules and principles that \ngovern  where  the  eyes  are  going  to  look  next  at  each  moment?  In  this  paper,  we \nsketch  a  theoretical framework based on information maximization to reason  about \nthe organization of saccadic eye  movements. \n\n\u00b7Both authors  are  members  of the Center for  the  Neural  Basis  of Cognition  - a  joint \ncenter  between  University  of Pittsburgh  and  Carnegie  Mellon  University.  Address:  Rm \n115,  Mellon  Institute,  Carnegie  Mellon  University,  Pittsburgh,  PA  15213. \n\n\flnformation-Theoretic Framework for Understanding Saccadic Behaviors \n\n835 \n\nVision  is  fundamentally  a  Bayesian  inference  process.  Given  the  measurement  by \nthe  retinas,  the  brain's  memory  of eye  positions  and  its  prior  knowledge  of the \nworld,  our brain has  to make an inference  about what is  where  in the visual scene. \nThe  retina,  unlike  a  camera,  has  a  peculiar  design.  It has  a  small  foveal  region \ndedicated  to  high-resolution  analysis  and  a  large  low-resolution  peripheral  region \nfor  monitoring  the  rest  of the  visual  field.  At  about  2.5 0  visual  angle  away  from \nthe center  of the fovea,  visual acuity  is  already reduced  by  a  half.  When  we  'look' \n(foveate)  at  a  certain  location  in  the  visual  scene,  we  direct  our  high-resolution \nfovea to analyze information in that location, taking a snap shot of the scene  using \nour  retina.  Figure  lA-C  illustrate  what  a  retina  would  see  at  each  fixation.  It \nis  immediately obvious  that  our  retinal  image is  severely  limited - it  is  clear  only \nin  the  fovea  and is  very  blurry  in  the  surround,  posing  a  severe  constraint  on  the \ninformation available to our inference system.  Yet,  in our subjective experience,  the \nworld  seems  to  be stable,  coherent  and complete in  front  of us.  This is  a  paradox \nthat  have  engaged  philosophical  and  scientific  debates  for  ages.  To overcome  the \nconstraint of the retinal image, during perception, the brain actively moves the eyes \naround to (1)  gather information to construct  a mental image of the world, and  (2) \nto make inference  about the world  based on this mental image.  Understanding  the \nforces  that drive saccadic eye  movements is  important to elucidating the principles \nof active perception. \n\nA \n\nB \n\nC \n\nD \n\nFigure  1.  A-C:  retinal  images  in  three separate fixations.  D:  a  mental mosaic  created  by \nintegrating  the retinal images from  these  three and other three fixations. \n\nIt is  intuitive  to  think  that  eye  movements  are  used  to  gather  information.  Eye \nmovements  have  been  suggested  to  provide  a  means  for  measuring  the  allocation \nof attention or  the  values  of each  kind  of information in  a  particular context  [16]. \nThe basic  assumption of our  theory  is  that  we  move our eyes  around  to  maximize \nour  information intake from  the  world,  for  constructing  the  mental image and for \nmaking inference  of the  scene.  Therefore,  the  system  should  always  look  for  and \nattentively fixate  at  a  location in the retinal  image that is  the most unusual or the \nmost unexplained - and hence  carries  the maximum amount of information. \n\n2  Perceptual Representation \n\nHow can the brain decide which part of the retinal image is  more unusual?  First of \nall,  we  know  the responses  of VI  simple cells,  modeled  well  by  the  Gabor  wavelet \npyramid  [3,7],  can  be  used  to reconstruct  completely  the  retinal  image.  It is  also \nwell  established  that the receptive  fields  of these  neurons  developed in such  a  way \nas  to provide  a  compact code  for  natural images  [8,9,13,14].  The idea of compact \ncode or sparse code,  originally proposed by Barlow [2],  is  that early visual  neurons \ncapture  the  statistical correlations  in  natural  scenes  so  that  only  a  small  number \n\n\f836 \n\nT.  S.  Lee and S.  X  Yu \n\nof cells  out of a  large set  will  be  activated  to  represent  a  particular scene  at each \nmoment.  Extending  this  logic,  we  suggest  that  the  complexity or  the  entropy  of \nthe neuronal ensemble response  of a  hypercolumn in  VI is  therefore  closely  related \nto  the strangeness  of the  image features  being  analyzed  by the  machinery  in  that \nhypercolumn.  A  frequent  event  will  have  a  more  compact  representation  in  the \nneuronal ensemble response.  Entropy is  an information measure  that captures  the \ncomplexity or  the  variability of signals.  The entropy  of a  neuronal  ensemble  in  a \nhypercolumn can therefore be used to quantify the strangeness of a particular event. \n\nA  hypercolumn  in  the  visual  cortex  contains  roughly  200,000  neurons,  dedicated \nto  analyzing  different  aspects  of the  image in  its  'visual  window'.  These  cells  are \ntuned  to  different  spatial positions,  orientations,  spatial frequency,  color  disparity \nand other cues.  There might also be a  certain degree  of redundancy,  i.e.  a  number \nof neurons  are  tuned  to  the  same feature .  Thus  a  hypercolumn forms  the  funda(cid:173)\nmental computational unit for image analysis within a  particular window in visual \nspace.  Each hypercolumn contains cells with receptive fields of different sizes,  many \nsignificantly smaller than  the aggregated 'visual window'  of the hypercolumn.  The \nentropy  of  a  hypercolumn's  ensemble  response  at  a  certain  time t  is  the  sum  of \nentropies  of all  the channels,  given  by, \n\nH(u(R:;, t)) = - 2: 2:p(u(R:;, v, 0', B, t)) log2P(u(R:;, v, 0', B, t)) \n\n9,<7 \n\nV \n\nwhere  u(R:;, t)  denotes  the responses  of all  complex cell  channels  inside  the  visual \nwindow  R:;  of a  hypercolumn  at time t,  computed within  a  20  msec  time window. \nu(i, 0', B, t)  is  the response  of a VI complex cell  channel of a  particular scale  0'  and \norientation  0'  at spatial location i  at t.  p(u(R:;, v, 0', B, t))  is  the probability of cells \nin  that  channel  within  the  visual  window  R:;  of the  hypercolumn  firing  v  number \nof spikes .  v  can  be  computed  as  the  power  modulus of the  corresponding  simple \ncell  channels,  modeled  by  Gabor  wavelets  [see  7] .  L:v p(u(R:;, v, 0', B, t))  = 1.  The \nprobability p(u(R:;, v, 0', B, t))  can  be computed  at each  moment in  time because of \nthe  variations in  spatial  position  of the  receptive  fields  of similar cell  within  the \nhypercolumn  - hence  the  'same'  cells  in  the  hypercolumn  are  analyzing  different \nimage patches,  and  also because of the redundancy of cells  coding similar features. \n\nThe neurons' responses in a hypercolumn are subject to contextual modulation from \nother  hypercolumns, partly in the form of lateral inhibition from cells  with similar \ntunings.  The  net  observed  effect  is  that  the  later  part  of VI  neurons'  response, \nstarting at about 80 msec, exhibits differential suppression  depending on the spatial \nextent  and  the  nature  of the  surround  stimulus.  The  more  similar  the  surround \nstimulus is  to  the  center  stimuli,  and  the  larger  the spatial extent  of the  'similar \nsurround',  the  stronger  is  the suppressive  effect  [e.g.  6].  Simoncelli and  Schwartz \n[15]  have  proposed  that the  steady state responses  of the cells  can be  modeled by \ndividing the response of the cell  (i.e.  modeled by the wavelet coefficient  or its power \nmodulus) by a weighted combination ofthe responses of its spatial neighbors in order \nto remove the statistical dependencies  between  the  responses  of spatial  neighbors. \nThese weights are found by minimizing a  predictive error between  the center signal \nfrom the surround signals.  In our context, this idea of predictive coding [see also 14] \nis  captured  by the concept  of mutual information between  the ensemble  responses \nof the different  hypercolumns as given below, \n\nI(u(Rx, t); u(Ox, t - dtd) \n\nH(u(R:;, t)) - H(u(Rx , t)lu(Ox, t - dtd) \n2: 2: [P(U(R:;,VR,O',B,t),u(O:;,vn,O',B,t)) \n\n<7,9  VR,VO \np(u(Rx, VR,  0', B, t), u(Ox, vn, 0', B, t))  ] \nI \nog2  p(u(R:;, vR,O',B,t)),p(u(O:;,vn, O',B,t))  . \n\n\fInformation-Theoretic Framework for Understanding Saccadic Behaviors \n\n837 \n\nwhere u(Rx, t) is the ensemble response of the hypercolumn in question,  and u(Ox, t) \nis the ensemble response of the surrounding hypercolumns.  p(u(Rx, VR, (1',  (), t)) is the \nprobability that cells  of a  channel in  the center hypercolumn assumes the response \nvalue VR  and p(u(Ox, VR, (1',  (), t))  the probability that cells of a similar channel in the \nsurrounding hypercolumns assuming the response value Vn.  tl is the delay by which \nthe surround  information exerts  its effect  on  the center  hypercolumn.  The mutual \ninformation I  can be computed from  the joint probability of ensemble responses  of \nthe center  and the surround. \n\nThe steady state responses of the VI neurons,  as a result of this contextual modula(cid:173)\ntion, are said to be more correlated  to  perceptual pop-out than the neurons'  initial \nresponses  [5,6].  The  complexity  of the  steady  state  response  in  the  early  visual \ncortex is  described  by  the following conditional entropy, \n\nH(u(Rx, t)lu(Ox, t - dtd) = H(u(Rx, t)) - I(u(Rx , t); u(Ox, t - dtd). \n\nHowever,  the  computation in  VI  is  not  limited to  the  creation  of compact  repre(cid:173)\nsentation  through  surround  inhibition.  In  fact,  we  have  suggested  that  VI  plays \nan active role in scene interpretation particularly when such inference involves high \nresolution details [6].  Visual tasks such as the inference of contour and surface likely \ninvolve  VI  heavily.  These  computations could  further  modify the  steady  state re(cid:173)\nsponses  of VI, and  hence  the control of saccadic  eye  movements. \n\n3  Mental  Mosaic  Representation \n\nThe perceptual representation provides the basic force for the brain to steer the high \nresolution fovea to locations of maximum uncertainty or maximum signal complex(cid:173)\nity.  Foveation captures the maximum amount of available information in a location. \nOnce  a  location is  examined by the fovea,  its information uncertainty is greatly re(cid:173)\nduced.  The eyes should move on and not to return to the same spot within a certain \nperiod of time.  This is called the 'inhibition of return'. \n\nHow  can  we  model  this  reduction  of interest?  We  propose  that  the  mind  creates \na  mental mosaic of the  scene  in  order  to  keep  track  of the  information that  have \nbeen  gathered.  By  mosaic,  we  mean  that  the  brain  can  assemble  successive  reti(cid:173)\nnal  images  obtained from  multiple fixations  into  a  coherent  mental  picture  of the \nscene.  Figure  ID  provides  an  example of a  mental  mosaic  created  by  combining \ninformation from  the  retinal  images from  6  fixations.  Whether  the  brain  actually \nkeeps  such  a  mental mosaic of the scene  is  currently  under  debate.  McConkie  and \nRayner  [10]  had suggested  the idea of an  integrative  visual  buffer  to integrate  in(cid:173)\nformation across  multiple saccades.  However,  numerous experiments demonstrated \nwe  actually  remember  relatively  little  across  saccades  [4].  This  lead  to  the  idea \nthat  brain may not need  an explicit internal representation  of the world.  Since the \nworld is always out there,  the brain can access whatever information it needs at the \nappropriate details by  moving the eyes  to the appropriate place  at the appropriate \ntime.  The subjective feeling  of a  coherent  and  a  complete world  in front  of us  is  a \nmere illusion [e.g.  1]. \n\nThe mental mosaic represented in Figure ID might resemble McConkie and Rayner's \ntheory superficially.  But the existence of such a detailed high-resolution buffer with \na  large spatial support  in  the brain  is  rather  biologically  implausible.  Rather,  we \nthink that  the information corresponding  to  the mental mosaic is  stored  in  an  in(cid:173)\nterpreted and  semantic  form  in  a  mesh  of Bayesian  belief  networks  in  the  brain \n(e.g.  involving  PO,  IT  and  area 46).  This  distributed  semantic  representation  of \n\n\f838 \n\nT.  S.  Lee and S. X  Yu \n\nthe mental mosaic, however, is capable of generating detailed  (sometimes false)  im(cid:173)\nagery  in  early  visual  cortex using  the  massive recurrent  convergent  feedback  from \nthe  higher  areas  to VI.  However,  because  of the  limited support  provided  by  VI \nmachinery,  the  instantiation  of mental imagery in  VI  has  to  be  done  sequentially \none 'retinal image' frame at a  time, presumably in conjunction with eye movement, \neven  when the eyes are closed.  This might explain why vivid visual dream is always \naccompanied  by  rapid  eye  movement in  REM  sleep.  The mental mosaic  accumu(cid:173)\nlates  information from  the  retinal  images  up  to  the  last  fixation  and  can  provide \nprediction  on  what  the  retina  will  see  in  the  current  fixation.  For  each  u(i, (T, 0) \ncell,  there  is a corresponding effective  prediction signal m(i, (T, 0)  fed  back from the \nmental mosaic. \n\nThis prediction signal can  reduce  the conditional entropy  or complexity of the en(cid:173)\nsemble response  in  the  perceptual  representation  by discounting  the  mutual infor(cid:173)\nmation between  the ensemble response  to the retinal image and  the mental mosaic \nprediction  as  follow, \n\nH(u(Rx, t)lm(Rx, t - 6t2))  =  H(u(Rx, t)) - I(u(Rx, t), m(Rx, t - dt2)) \n\nwhere 6t2  is  the transmission delay  from the mental mosaic back  to VI. \nAt  places  where  the  fovea  has  visited,  the  mental  mosaic representation  has  high \nresolution  information and  m(i, (T, 0, t - 6t2)  can  explain  u(i, (T,  0, t)  fully.  Hence, \nthe mutual information is  high at those hypercolumns and  the conditional entropy \nH(u(Rx, t)lm(Rx, t - 6t2))  is  low, with  two consequences:  (1)  the  system  will  not \nget  the  eyes  stuck  at  a  particular  location;  once  the  information  at  i  is  updated \nto  the  mental  mosaic,  the  system  will  lose  interest  and  move  on;  (2)  the  system \nwill  exhibit  'inhibition  of return'  as  the  information  in  the  visited  locations  are \nfully  predicted  by the mental mosaic.  Also,  from  this standpoint,  the  'habituation \ndynamics'  often  observed  in  visual  neurons  when  the  same  stimulus  is  presented \nmultiple times  might not  be  simply due  to  neuro-chemical  fatigue,  but  might be \nunderstood  in  terms of mental mosaic being  updated  and  then fed  back  to explain \nthe perceptual  representation  in VI.  The mental mosaic is  in  effect  our short-term \nmemory of the  scene.  It  has  a  forgetting  dynamics,  and  needs  to be  periodically \nupdated.  Otherwise,  it will rapidly fade  away. \n\n4  Overall Reactive Saccadic Behaviors \n\nNow,  we  can  combine the  influence  of the  two  predictive  processes  to  arrive  at  a \ndiscounted  complexity measure of the hypercolumn's ensemble response: \n\nH(u(Rx, t)) \n-I(u(Rx , t); u(Ox, t - 6tt)) \n-I(u(Rx, t); m(Rx, t - 6t2)) \n+I(u(Ox , t - 6td; m(Rx, t - 6t2)) \n\nIf we  can  assume the long range surround priors and the mental mosaic short term \nmemory  are  independent  processes,  we  can  leave  out  the  last  term,  I(u(Ox, t  -\n6td; m(Rx, t - 6t2)),  of the equation. \nThe system,  after  each  saccade,  will  evaluate the  new  retinal  scene  and  select  the \nlocation where the perceptual representation has the maximum conditional entropy. \nTo maximize the information gain, the system must constantly search for  and make \na saccade to the locations of maximum uncertainty  (or complexity) computed from \n\n\fInformation-Theoretic Framework for Understanding Saccadic Behaviors \n\n839 \n\nthe hypercolumn ensemble responses  in  VI  at each fixation.  Unless  the number of \nsaccades  is  severely  limited,  this  locally  greedy  algorithm,  coupled  the  inhibition \nof  return  mechanism,  will  likely  steer  the  system  to  a  relatively  optimal  global \nsampling of the world - in  the sense  that the average  information gain per saccade \nis  maximized, and  the mental mosaic's dissonance with  the world  is minimized . \n\n5  Task-dependent schema Representation \n\nHowever,  human eye  movements are not simply controlled  by the generic  informa(cid:173)\ntion  in  a  bottom-up fashion .  Yarbus  [16]  has  shown  that,  when  staring  at  a  face, \nsubjects'  eyes  tend  to  go  back  to  the  same locations  (eyes,  mouth)  over  and  over \nagain.  Further,  he  showed  that when  asked  different  questions,  subjects  exhibited \ndifferent  kinds of scan-paths  when  looking at  the  same picture.  Norton  and  Stark \n[12]  also  showed  that eye  movements are not  random,  but often  exhibit repetitive \nor even  idiosyncratic path patterns. \n\nTo  capture  these  ideas,  we  propose  a  third  representation,  called  task  schema,  to \nprovide  the  necessary  top-down  information to  bias  the  eye  movement control.  It \nspecifies  the  learned  or  habitual  scan-paths  for  a  particular  task  in  a  particular \ncontext  or assigns  weights  to different  types  of information.  Given  that we  arenot \nmostly  unconscious  of the scan-path  patterns  we  are  making,  these  task-sensitive \nor context-sensitive habitual scan-patterns might be encoded  at the levels of motor \nprograms,  and  be downloaded  when  needed  without  our conscious  control.  These \nmotor  programs  for  scan-paths  can  be  trained  from  reinforcement  learning.  For \nexample,  since  the  eyes  and  the mouths  convey  most of the  emotional content  of \na  facial  expression,  a  successful  interpretation  of another  person's  emotion  could \nprovide the reward signal to reinforce the motor programs just executed or the fixa(cid:173)\ntions to certain facial features.  These unconscious scan-path motor programs could \nprovide the additional modulation to automatic saccadic eye movement generation. \n\n6  Discussion \n\nIn this paper, we propose that information maximization might provide a theoretical \nframework to understand the automatic saccadic eye movement behaviors in human. \nIn this proposal, each hypercolumn in V 1 is considered a fundamental computational \nunit.  The relative complexity or entropy  of the neuronal  ensemble response  in  the \nVI  hypercolumns,  discounted  by the predictive effect  of the surround,  higher order \nrepresentations  and working memory, creates  a force  field  to guide eye  navigation. \n\nThe  framework  we  sketched  here  bridge  natural scene  statistics  to eye  movement \ncontrol via the more established ideas of sparse coding and predictive coding in neu(cid:173)\nral  representation.  Information maximization has  been  suggested  to be  a  possible \nexplanation for  shaping the receptive fields  in  the early  visual  cortex according  to \nthe statistics of natural images [8,9,13,14]  to create  a  minimum-entropy code  [2,3]. \nAs  a  result,  a  frequent  event  is  represented  efficiently  with  the  response  of a  few \nneurons  in  a  large  set,  resulting  in  a  lower  hypercolumn  ensemble  entropy,  while \nunusual  events  provoke ensemble responses  of higher  complexity.  We suggest  that \nhigher complexity in ensemble responses  will arouse attention and draw scrutiny by \nthe eyes,  forcing  the  neural representation  to continue adapting to the statistics of \nthe natural scenes.  The formulation here also suggests that information maximiza(cid:173)\ntion might provide an explanation for the formation of horizontal predictive network \nin  VI  as well  as  higher  order internal representations,  consistent  with  the ideas  of \npredictive coding [11,  14,  15].  Our theory hence predicts that the adaptation of the \n\n\f840 \n\nT.  S.  Lee and S.  X  Yu \n\nneural representations  to the statistics of natural scenes  will lead to the adaptation \nof\u00b7 saccadic  eye  movement behaviors. \n\nAcknowledgements \n\nThe authors  have  been  supported  by  a grant from  the  McDonnell  Foundation and \na  NSF grant  (LIS 9720350).  Yu is  also being supported  in  part by  a grant to Takeo \nKanade. \n\nReferences \n\n[1]  Ballard,  D.  Hayhoe,  M.M.  Pook,  P.K.  &  Rao,  RP.N.  (1997).  Deictic  codes  for  the \nembodiment  of cognition.  Behavioral and Brain  Science, 20:4,  December,  723-767. \n\n[2]  Barlow,  H.B.  (1989).  Unsupervised  learning.  Neural  Computation,  1,  295-311. \n\n[3]  Daugman,  J.G.  (1989).  Entropy  reduction  and  decorrelation  in  visual  coding  by  ori(cid:173)\nented neural receptive  fields.  IEEE  Transactions on Biomedical Engineering 36:, 107-114. \n\n[4]  Irwin,  D.  E,  1991.  Information  Integration  across Saccadic  Eye Movements.  Cognitive \nPsychology,  23(3):420-56. \n\n[5]  Knierim,  J.  &  Van  Essen,  D.C.  Neural  response  to  static  texture patterns in  area  VI \nof macaque  monkey.  J.  Neurophysiology,  67:  961-980. \n\n[6]  Lee,  T.S.,  Mumford,  D.,  Romero  R  &  Lamme,  V.A.F.  (1998).  The  role  of primary \nvisual  cortex in  higher level  vision.  Vision  Research 38,  2429-2454. \n\n[7]  Lee,  T.S.  (1996).  Image  representation  using  2D  Gabor  wavelets.  IEEE  Transaction \nof Pattern Analysis and Machine  Intelligence.  18:10, 959-971. \n\n[8]  Lewicki,  M.  &  Olshausen,  B.  (1998).  Inferring sparse,  overcomplete  image  codes using \nan  efficient  coding  framework.  In  Advances in  Neural Information  Processing System  10, \nM.  Jordan,  M.  Kearns and  S.  Solla  (eds).  MIT Press. \n\n[9]  Linsker,  R  (1989).  How  to generate  ordered maps by maximizing  the mutual informa(cid:173)\ntion  between input  and output  signals.  Neural  Computation,  1:  402-411.0 \n[10]  McConkie,  G.W.  &  Rayner,  K.  (1976) .  Identifying  the  span  of effective  stimulus  in \nreading.  Literature  review  and  theories  of reading.  In  H.  Singer and  RB.  Ruddell  (Eds), \nTheoretical models and processes of reading, 137-162.  Newark, D.E.:  International Reading \nAssociation. \n\n[11]  Mumford, D.  (1992).  On the computational architecture of the neocortex II.  Biological \ncybernetics, 66,  241-251. \n\n[12]  Norton,  D.  and  Stark,  1.  (1971)  Eye  movements  and  visual  perception.  Scientific \nAmerican, 224,  34-43. \n\n[13]  Olshausen,  B.A.,  &  Field,  D.J.  (1996),  Emergence  of simple  cell  receptive  field  prop(cid:173)\nerties  by  learning  a  sparse  code for natural  images.  Nature,  381:  607-609. \n\n[14]  Rao  R.,  &  Ballard,  D.H.  (1999).  Predictive  coding  in  the  visual  cortex:  a  functional \ninterpretation  of some extra-classical  receptive  field  effects.  Nature  Neuroscience,  2:1  79-\n87. \n\n[15]  Simoncelli,  E.P.  &  Schwartz,  O.  (1999).  Modeling  surround  suppression  in  VI neu(cid:173)\nrons  with a  statistically-derived  normalization  model.  In  Advances in  Neural Information \nProcessing Systems 11,  .  M.S.  Kearns,  S.A.  Solla,  and  D.A.  Cohn  (eds).  MIT Press. \n\n[16]  Yarbus,  A.L.  (1967).  Eye  movements  and vision.  Plenum  Press. \n\n\f", "award": [], "sourceid": 1647, "authors": [{"given_name": "Tai Sing", "family_name": "Lee", "institution": null}, {"given_name": "Stella", "family_name": "Yu", "institution": null}]}