{"title": "Learning the Solution to the Aperture Problem for Pattern Motion with a Hebb Rule", "book": "Advances in Neural Information Processing Systems", "page_first": 468, "page_last": 476, "abstract": null, "full_text": "468 \n\nLEARNING THE SOLUTION TO THE \nAPERTURE PROBLEM FOR PATTERN \n\nMOTION WITH A HEBB RULE \n\nMartin I. Sereno \n\nCognitive Science C-015 \n\nUniversity of California, San Diego \n\nLa Jolla, CA  92093-0115 \n\nABSTRACT \n\nto \n\nThe  primate  visual  system  learns  to  recognize  the  true  direction  of \npattern  motion  using  local  detectors  only  capable  of  detecting  the \ncomponent  of  motion  perpendicular \nthe  orientation  of  the \nmoving  edge.  A  multilayer  feedforward  network  model  similar  to \nLinsker's  model  was  presented  with  input  patterns  each  consisting \nof  randomly  oriented  contours  moving  in  a  particular  direction. \nInput  layer  units  are  granted  component  direction  and  speed  tuning \ncurves  similar  to  those  recorded  from  neurons  in  primate  visual \narea  VI  that  project  to  area  MT.  The  network  is  trained  on  many \nsuch  patterns  until  most  weights  saturate.  A  proportion  of  the \nunits  in  the  second  layer  solve  the  aperture  problem  (e.g.,  show  the \nto  gratings), \nsame  direction-tuning  curve  peak \nresembling  pattern-direction  selective  neurons,  which  ftrst  appear \ninareaMT. \n\nto  plaids  as \n\nINTRODUCTION \n\nSupervised  learning  schemes  have  been  successfully  used  to  learn  a  variety  of input(cid:173)\noutput  mappings.  Explicit  neuron-by-neuron  error  signals  and  the  apparatus  for \npropagating  them  across  layers,  however,  are  not  realistic  in  a  neurobiological \ncontext.  On  the  other  hand,  there  is  ample  evidence  in  real  neural  networks  for \nconductances  sensitive  to  correlation  of  pre- and  post-synaptic  activity,  as  well  as \nmultiple  areas  connected  by \nfeedforward \nprojections.  The  present  project  was  to  try  to  learn  the  solution  to  the  aperture \nproblem  for  pattern  motion  using  a  simple  hebb  rule  and  a  layered  feedforward \nnetwork. \n\nsomewhat  divergent \n\ntopographic, \n\nSome  of  the  connections  responsible  for  the  selectivity  of  cortical  neurons  to  local \nstimulus  features  develop  in  the absence  of pattered  visual  experience.  For example, \nnewborn  cats  and  primates  already  have  orientation-selective  neurons  in  primary \nvisual  cortex  (area  17  or VI),  before  they  open  their  eyes.  The  prenatally  generated \norientation  selectivity  is  sharpened  by  subsequent  visual  experience.  Linsker  (1986) \n\n\fLearning the Solution to the Aperture Problem \n\n469 \n\nthat  feedforward  networks  with  somewhat  divergent, \n\ntopOgraphic \nhas  shown \ninterlayer  connections,  linear  summation,  and  simple  hebb  rules  develop  units  in \ntertiary  and  higher  layers  that  have  parallel,  elongated  excitatory  and  inhibitory \nsub fields when trained solely on random inputs to the frrst layer. \n\nindices \n\nsuggest \n\ndirection  noise \n\ntotally  unresponsive \n\nBy  contrast,  the  development  of  the  circuitry  in  secondary  and  tertirary  visual \ncortical  areas  necessary  for  processing  more  complex,  non-local  features  of  visual \narrays--e.g.,  orientation  gradients,  shape  from  shading,  pattern  translation,  dilation, \nrotation--is  probably  much  more  dependent  on  patterned  visual  experience.  Parietal \nfor  example, \nvisual  cortical  areas, \nare  almost \nin \ndark-reared  monkeys,  despite  the  fact \nthat  these  monkeys  have  a  normal(cid:173)\nappearing  VI \n(Hyvarinen,  1984). \nthat \nBehavioral \ndevelopment  of \nsome  perceptual \nrequire  months  of \nabilities  may \nexperience. \nHuman  babies, \nfor \nexample,  only  evidence  seeing \nthe \ntransition  between  randomly  moving \ndots  and  circular  2-D  motion  at  6 \ntransition  from \nmonths,  while \ndots  with \nhorizontally  moving \nrandom  x -axis  velocities \nto  dots \nwith \nsinusoidally  varying  X-axIS \nvelocities \nthe \npercept  of a  rotating  3-D  cylinder)  is \nonly  detected  after  7  months  (Spitz, \nStiles-Davis,  &  Siegel,  1988)  (see \nFig. 1). \nDuring  the  first  6  months  of  its  life, \na  human  baby \ntypically  makes \napproximately  30  million  saccades,  experiencing  in  the  process  many  views  which \ncontain  large  moving  fields  and  smaller  moving  objects.  The  importance  of  these \nmillions  of  glances  for  the  development  of  the  ability  to  recognize  complex  visual \nobjects  has  often  been  acknowledged.  Brute visual  experience  may.  however.  be just \nas  important  in  developing  a  solution  to  the  simpler  problem  of  detecting  pattern \nmotion using local cues. \n\nFigure  1.  Motion  field  transitions \n\nhorizontal \n\ndirection  noise \n\n(the \n\nlatter \n\ngives \n\n2-D  rotation \n\n3-D  cylinder \n\nthe \n\nNETWORK ARCHITECTURE \n\nMoving  visual  stimuli  are  processed  in  several  stages  in  the  primate  visual  system. \nThe  first  cortical  stage  is  layer  4C-alpha  of  area  VI,  which  receives  its  main \nascending  input  from  the  magnocellular  layers  of  the  lateral  geniculate  nucleus. \nLayer  4C-alpha  projects  to  layer  4B,  which  contains  many  tightly-tuned  direction(cid:173)\nselective  neurons  (Movshon  et  aI.,  1985).  These  neurons,  however,  respond  to \n\n\f470 \n\nSereno \n\nif  these  contours  were  moving  perpendicular  their \n\nlocal \nmoving  contours  as \norientation--Le .\u2022  they  fire  in  proportion  to  the  difference  between  the  orthogonal \ncomponent  of motion  and  their  best  direction  (for  a  bar).  An  orientation  series  run \nfor  a  layer  4B  nemon  using  a  plaid  (2  orthogonal  moving  gratings)  thus  results  in \ntwo  peaks  in  the  direction  tuning  curve.  displaced  45  degrees  to  either  side  of  the \npeak  for  a  single  grating  (Movshon  et  al..  1985).  The  aperture  problem  for  pattern \nmotion  (see  e.g .\u2022  Horn  &  Schunck.  1981)  thus  exists  for  cells  in  area  VI  of  the \nadult (and presumably infant) primate. \n\nLayer  4B  neurons  project  topographically  via  direct  and  indirect  pathways  to  area \nMT.  a  small  exttastriate  area  specialized  for  processing  moving  stimuli.  A  subset \n\n\\ \n\n\\ \n\\ \n\nI \ni \n/ \n/ \n\nSecond \nLayer \n(=MT) \n\nInput \nLayer \n\n(=Vl, layer 4B) \n\nFigure  2.  Network  Architecture \n\nof neurons  in  MT  show  a  single  peak  in  their  direction  tuning  curves  for  a  plaid  that \nis  lined  up  with  the  peak  for  a  single  grating--Le.,  they  fire  in  proportion  to  the \ndifference  between  the  true  pattern  direction  and  their  best  direction  (for  a  bar). \nThese  neurons  therefore  solve  the  aperture  problem  presented  to  them  by  the  local \ntranslational  motion  detectors  in  layer  4B  of VI.  The  excitatory  receptive  fields  of \nall  MT  neurons  are  much  larger than  those  in  VI  as a  result of divergence  in  the  VI(cid:173)\nMT projection as well as the smaller areal extent of MT compared to VI. \n\nM.E.  Sereno  (1987)  showed  using  a  supervised  learning  rule  that  a  linear t  two  layer \nnetwork  can  satisfactorily  solve  the  aperture  problem  characterized  above.  The \npresent  task  was  to  see  if unsupervised  learning  might  suffice.  A  simple  caraicature \nof  the  Vl-to-MT  projection  was  constructed.  At  each  x-y  location  in  the  first \nlayer  of the  network.  there  are  a  set of units  tuned  to  a  range  of local  directions  and \nspeeds.  The  input  layer  thus  has  four  dimensions.  The  sample  network  illustrated \nabove  (Fig.  2)  has  5  different  directions  and  3  speeds  at  each  x-y  location. \nInput \nunits  are  granted  tuning  curves  resembling  those  found  for  neurons  in  layer  4B  of \n\n\fLearning the Solution to the Aperture Problem \n\n471 \n\n~:~ \n\no \n\nX \n\n2X \n\nvelocity  component \northogonal  to  contour \n\n=pl  L>\\t)(\\ \n\n0.5 \n\nspeed component \n\northogonal  to  contour \n\n1 \n\nI \n\no \n\nI \n\no \n\nFigure  3.  Excitatory  Tuning \n\nCurves  (1st  layer) \n\narea  VI.  The  tuning  curves  are  linear.  with  half-height  overlap  for  both  direction \nand  speed  (see  Fig.  3--for  12  directions  and 4  speeds).  and direction  and speed tuning \nInhibition  is  either  tuned  or  untuned  (see  Fig.  4).  and  scaled  to \ninteract  linearly. \nbalance  excitation.  Since  direction  tuning  wraps  around.  there  is  a  trough  in  the \ntuned  inhibition  condition.  Speed  tuning  does  not  wrap  around.  The  relative  effect \nof direction and speed tuning in the output of ftrst layer units is set by a parameter. \n\nAs  with  Linsker.  the  probability  that  a  unit  in  the  fust  layer  will  connect  with  a \nunit  in  the  second  layer  falls  off  as  a  gaussian  centered  on  the  retinotopically \n\nresp \no \n\nresp \no \n\nuntuned  1\\ \n\ntuned \n\n~[\"\"\"\"\"\"'\" ~\"\"'\"\"\"\"\".  \\,. \no \n\nu: ........................ . \n\n2X \n\nX \n\nvelocity  component \northogonal  to  contour \n\n............... \n\n........................ \no \n\n0.5 \n\n1 \n\nspeed component orthogonal \nto  contour (no  wrap-around) \n\nFigure 4.  Tuned  vs.  Untuned  Inhibition \n\n\f472 \n\nSereno \n\nequivalent point in  the  second  layer (see  Fig.  2).  New  random  numbers  are  drawn  to \ngenerate  the  divergent  gaussian  projection  pattern  for  each  first  layer  unit  (Le.,  all \nof  the  units  at  a  single  x-y \nlocation  have  different.  overlapping  projection \npatterns).  There are no local connections within a layer. \n\nThe  network  update  rule  is  similar  to  that  of Linsker  except  that  there  is  no  term \nlike  a  decay  of  synaptic  weights  (k1)  and  no  offset  parameter  for  the  correlations \n(k,J.  Also,  all  of the  units  in  each  layer  are  modeled  explicitly.  The  activation,  Yj' \nfor  each  unit  is  a  linear  weighted  sum  of its  Ui  inputs,  scaled  by  a, and  clipped  to  a \nmaximum or minimum value: \n\n{\n\ny.  = \n) \n\na  I. u\u00b7 w\u00b7\u00b7 \nI) \n\nI \n\nYmax.min \n\nWeights  are  also  clipped  to  maximum  and  minimum  values.  The  change  in  each \nweight.  .1wij,  is  a  simple  fraction,  a,  of  the  product  of  the  pre- and  post-synaptic \nvalues: \n\n.1w\u00b7\u00b7  =  au\u00b7y\u00b7 \nI  ) \n\nI) \n\nRESULTS \n\nThe  network  is  tr:ained  with  a  set  of  fullfield  texture  movements.  Each  stimulus \nconsists  of a  set  of randomly  oriented  contours--one  at  each  x-y  point--all  moving \nin  the  same,  randomly  chosen  pattern  direction.  A  typical  stimulus  is  drawn  in \nfigure  5  as  the  set  of  component  motions  visible  to  neurons  in  VI  (i.e .\u2022  direction \ncomponents  perpendicular  to  the  local  contour);  the  local  speed  component  varies  as \nthe  cosine  of  the  angle  between  the  pattern  direction  and  the  perpendicular  to  the \nlocal  contour.  The  single  component  motion  at  each  point  is  run  through  the  first \nlayer  tuning  curves.  The  response  of the  input  layer  to  such  a  pattern  is  shown  in \nFigure  6.  Each  rectangular  box  represents  a  single  x-y  location,  containing  48  units \n\n--./' --........ \n\nv \n\n~ \n\nt-\n\nt-\n\n\" \n\n........ \n\n.......  \" \n\nv  ,  .... \n......  .-.  '.  ? \nI'  /'  -.. ,  ~ ...... \n--+  ./'  I'  ~ -- ~  --+  \" \n\"-\n'- ,  ./'  ,  A \n\n...... \n\n\"-\n\n\"-\n\n~ \n\n'-\n\nv \n\n1 \n\n., \n\n.-.  \"  ~ \n\n\"-\n\"-\n\n~  ~  I' \n\n'\"  .-.  -+  ......  ?  --\n\n.......  .-.  ....... \n1  -..  \"-\n\n'-\nI' \n\"  -+  /'  \"-\n./'  -+ \n\nII' \n.- /'  ~ \n\n.. \n\"  ~ \n-. \n., , \n\n.. \n\"-\n\nf \n\n/' \n\nA \n\nFigure 5.  Local  Compqnent Motions  from  a Training \n\nStimulus  (pattern  direction  is  toward  right) \n\n\fLearning the Solution to the Aperture Problem \n\n473 \n\n\u2022  \u2022  \u2022 . . . \u2022 \n. . . . \n\np.  .  .  .  \u2022  \u2022 \u2022 .  [J  \u2022 \no \u2022 \u2022 \u2022 \u2022 \u2022 \u2022\u2022\u2022 0  \u2022  \u2022\u2022 \n\u00b7 Q. \n\u00b70 \u00b7 \u00b7 \u00b7 \u00b7 \u00b7  \u00b700\u00b7\u00b7 \n-0-.\u00b7 \n-0- \u2022\u2022\u2022\u2022\u2022\u2022 0  \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022  0  . \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022  aa  \u2022 \u2022 \u2022 \u2022. \u2022 \u2022 \u2022 \u2022 \u2022  \n\n. . . .  \u00b7Da  . \u2022 \u2022 \u2022 \u2022  aa  \u2022\u2022\u2022\u2022\u2022\u2022 \n\n. . . \n. . . . \n\u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  0  \u2022  \u2022  \u2022  \u2022 \n\n\u2022  \u2022\u2022\u2022\u2022 c  \u2022\u2022\u2022\u2022\u2022\u2022 \n\n\u00b7  . . .  00\u00b7  . .  \n\n\u00b7  .. \u00b70\u00b7 \n\n\u2022  [J D\u00b7  \u2022\u2022 \n\n-ac-\n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022  \u2022  \u2022 \n\n\u00b7  .  00 . .  \n\u2022  \u2022  a  a  \u2022  \u2022 \n\u2022  0  a  \u2022 \n\n\u2022  \u00b7 . . . . \n\u00b7 . . . \n\u00b7 . . . . \n\u2022  \u2022  \u2022  \u2022  \u2022  \u2022  a a \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022 \n\u00b7 \u00b7 \u2022 \n\u2022  \u2022  \u2022  \u2022  0  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022 \n\u2022  \u2022  \u2022  \u2022  0  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022 \n\u2022  \u2022  \u2022  \u2022  0  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022 \n\n\u00b7 00 \u00b7 \u00b7 \n\n\u00b7 ald\u00b7 \n\n\u2022  0 \n\n. \n\n\u00b7 \u00b7 \u2022 \n\n\u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  0  \u2022  \u2022  \u2022  \u2022  \u2022 \n. \u2022 .  00\u00b7  .\u2022 \u2022  \u2022 \nJ  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022 \n\u2022 \n.,  \u2022 \n\u2022 \n\u2022 \n~ \u2022 \n0-.  \u2022  \u2022  \u2022  \u2022\u2022  \u2022  \u2022  \u2022\u2022  \u2022  \u2022  \u2022 c\u00b7\u00b7 \n\n\u2022 \n\u2022  \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022 \u2022  0  \u2022 \u2022  \n\n\u2022  C  \u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022  D  \u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\n\u2022 \u2022  C  \u2022 \u2022 \u2022  \n\n\u2022 \n\n\u2022 \n\n\u2022  \u2022\u2022. 00 . \n\u00b7  . . .  aD\u00b7 \n\n. . . . . . . \u00b70\u00b7 . . \n\n\u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  C  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  \u2022  D  a  \u2022 \n\n\u2022  \u2022  \u2022 \n\n\u2022 \n\n.  0  a \n\n. \n\n.  . \u2022 \u2022 \u2022\n\n.  a  0  \u2022 \u2022 \u2022 \u2022 \u2022  \n\nFigure 6.  Output of Portion  of First Layer to a \n\nTraining  Slimulus  (untuned  inhibition) \n\n(12  directions \n\nto  different  combinations  of  direction  and  speed \n\ntuned \nrun \nhorizontally  and  4  speeds  run  vertically).  Open  and  filled  squares  indicate  positive \nand  negative  outputs.  Inhibition  is  untuned  here.  The  hebb  sensitivity, a,  was  set so \nthat  1,000  such  patterns  could  be  presented  before  most  weights  saturated  at \nmaximum  values.  Weights  intially  had  small  random  values  drawn  from  a  flat \ndistribution  centered  around  zero.  The  scale  parameter  for  the  weighted  sum,  a, \nwas  set  low  enough  to  prevent  second  layer  units  from  saturating  all  the  time. \nIn \nFigure  6,  direction  tuning  is  2.5  times  as  important  as  speed  tuning  in  detennining \nthe output of a unit \n\nSelectivity  of second  layer  units  for  pattern  direction  was  examined  both  before  and \nafter  training  using  four  stimulus  conditions:  1)  grating--contours  perpendicular  to \npattern  direction,  2)  random  grating--contours  randomly  oriented  with  respect  to \npattern  direction  (same  as  the  training  condition),  3)  plaid--contours  oriented  45  or \n67  degrees  from  perpendicular  to  pattern  direction,  4)  random  plaid--contours \nrandomly  oriented,  but  avoiding  angles  nearly  perpendicular  to  pattern  direction. \nThe  pre-training  direction  tuning  curves  for  the  grating  conditions  usually  showed \nsome  weak  direction  selectivity.  Pre-training  direction  tuning  curves  for  the  plaid \nconditions,  however,  were  often \ntwin-peaked,  exhibiting  pattern  component \nresponses  displaced  to  either  side  of  the  grating  peak.  Mter  training,  by  contrast, \nthe  direction  tuning  peaks  in  all  test  conditions  were  single  and  sharp,  and  the  plaid \ncondition peaks were usually aligned with the grating peaks. \n\nAn  example  of  the  weights  onto  a  mature  pattern  direction  selective  unit  is  shown \nin  Figure  7.  As  before,  each  rectangular  box  contains  48  units  representing  one \npoint  in  x-y  space  of the  input  layer  (the  tails  of the  2-D  gaussian  are  cropped  in \nthis  illustration),  except  that  the  black  and  white  boxes  now  represent  negative  and \npositive  weights  onto  a  single  second  layer  unit.  Within  each  box,  12  directions  run \nhorizontally  and  4  speeds  run  vertically.  The  peaks  in  the  direction  tuning  curves \nfor gratings and 135 degree plaids for this unit were sharp and aligned. \n\n\f474 \n\nSereno \n\n...... .. \n..  ..  .. \n.. \n\n.. \n\n:: \n\n:: \n\n:: \n\n\" \n\n..  :: . :: \n\n:: \n\n...... ..\n\n..\n\nFigure  7.  Mature  Weights  Onto  Pattern \n\nDirection-Selective  Unit \n\nfU'St \n\nin  determining \n\nthe  output  of \n\nPattern  direction  selective  units  such  as  this  comprised  a  significant  fraction  of  the \nsecond  layer  when  direction  tuning  was  set  to  be  2  to  4  times  as  important as  speed \ntuning \nlayer  units.  Post-training  weight \nstructures  under  these  conditions  actually  formed  a  continuum--from  units  with \ncomponent  direction  selectivity,  to  units  with  pattern  direction  selectivity,  to  units \nwith  component  speed  selectivity.  Not  surprisingly,  varying  the  relative  effects  of \ndirection  and  speed  in  the  VI  tuning  curves  generated  more  direction-tuned-only  or \nspeed-tuned-only  units. \nIn  all  conditions,  units  showed  clear  boundaries  between \nmaximum  and  minimum  weights  in  the  direction-speed  subspace  each  x-y  point,  and \na  single  best  direction.  The  location  of  these  boundaries  was  always  correlated \nacross  different  x-y \ninput  points.  Most  units  showing  unambiguous  pattern \ndirection  selectivity  were  characterized  by \ntwo  oppositely  sloping  diagonal \nboundaries  between  maximum  and  minimum  weights  in  direction-speed  subspace  (see \ne.g., Fig. 7). \n\nThe  stimuli  used  to  train  the  network  above--fullfield  movrnents  of  a  rigid  texture \nfield  of  randomly  oriented  contours--are  unnatural;  generally,  there  may  be  one  or \nmore  objects  in  the  field  moving  in  different  directions  and  at  different  speeds  than \nthe  surround.  Weight  distributions  needed  to  solve  the  aperture  problem  appear \nwhen \ntrained  on  occluding  moving  objects  against  moving \nbackgrounds  (object  and  background  velocities  chosen  randomly  on  each  trial),  as \nlong  as  the  object  is  made  small  or  large  relative  to  the  receptive  field  size.  The \nsolution  breaks  down  when  the  moving  objects  occupy  a  significant  fraction  of  the \narea of a second layer receptive field. \n\nthe  network. \n\nis \n\n\fLearning the Solution to the Aperture Problem \n\n475 \n\nFor  comparison,  the  network  was  also  trained  using  two  different  kinds  of  noise \nIn  the  fIrst  condition  (unit  noise),  each  new  stimulus  consisted  of  random \nstimuli. \ninput  values  on  each  input  unit  With  other  network  parameters  held  the  same,  the \ntypical  mature  weight  pattern  onto  a  second \nintimate \nintermixture  of  maximum  and  minimum  weights  in  the  direction-speed  subspace  at \nIn  the  second  condition  (direction  noise),  each  new  stimulus \neach  x-y  location. \nconsisted  of a  random  direction  at  each  x-y  location.  The  mature  weight  patterns \nnow  showed  continuous  regions  of  all-maximum  or  all-minimum  weights  in  the \nIn  contrast  to  the  situation  with \nspeed-direction  supspace  at  each  x-y  point. \nfullfieid  texture  movement  stimuli,  however,  the  best  directions  at  each  of  the  x-y \npoints  providing  input  to  a  given  unit  were  uncorrelated. \nIn  addition,  multiple  best \ndirections at a single x-y point sometimes appeared. \n\nlayer  unit  showed  an \n\nDISCUSSION \n\nThis  simple  model  suggests  that  it  may  be  possible  to  learn  the  solution  to  the \naperture  problem  for  pattern  motion  using  only  biologically  realistic  unsupervised \nlearning  and  minimally  structured  motion  fields. \nUsing  a  similar  network \narchitecture,  M.E.  Sereno  had  previously  shown  that  supervised  learning  on  the \nproblem  of  detecting  pattern  motion  direction  from \nthe \nemergence  of  chevron  shaped  weight  structures  in  direction-speed  space  (M.E. \nSereno,  1986).  The  weight  structures  generated  here  are  similar  except  that  the \ninside  or  outside  of  the  chevron  is  filled  in,  and  upside-down  chevrons  are  more \ncommon.  This results in decreased selectivity to pattern speed in the second layer. \n\nlocal  cues \n\nleads \n\nto \n\nThe  model  needs  to  be  extended  to  more  complex  motion  correlations  in  the  input-(cid:173)\ne.g.,  rotation,  dilation,  shear,  multiple  objects,  flexible  objects.  MT  in  primates \ndoes  not  respond  selectively  to  rotation  or  dilation,  while  its  target  area  MST  does. \nThus,  biological  estimates  of  rotation  and  dilation  are  made  in  two  stages--rotation \nand  dilation  are  not  detected  locally,  but  instead  constructed  from  estimates  of local \ntranslation.  Higher  layers  in  the  present  model  may  be  able  to  learn  interesting \n'second-order' things about rotation, dilation, segmentation, and transparency. \n\nThe  real  primate  visual  system,  of course,  has  a  great  many  more  parts  than  this \nmodel.  There  are  a  large  number  of interconnected  cortical  visual  areas--perbaps  as \nmany  as  25.  A substantial  portion  of the  600  possible  between-area connections  may \nbe  present  (for  review,  see  M.I.  Sereno,  1988).  There  are  at  least  6  map-like  visual \nstructures,  and  several  more  non-retinotopic  visual  structures \nthalamus \n(beyond  the  dLGN)  that  interconnect  with  the  cortical  visual  areas.  Each  visual \ncortical  area  then  has  its  own  set  of  layers  and  intedayer  connections.  The  most \nunbiological  aspect  of this  model  is  the  lack  of  time  and  the  crude  methods  of gain \ncontrol  (clipped  synaptic  weights  and \ninput/output  functions). \nFuture  models \nshould employ within-area connections and time-dependent hebb rules. \n\nthe \n\nin \n\nMaking  a  biologically  realistic  model  of  intermediate  and  higher  level  visual \nprocessing  is  difficult  since  it  ostensibly  requires  making  a  biologically  realistic \nmodel  of earlier,  yet  often  not  less  complex  stations  in  the  system--e.g.,  the  retina, \n\n\f476 \n\nSereno \n\ndLGN,  and  layer 4C  of primary  visual  cortex  in  the  present case.  One  way  to  avoid \nhaving  to  model  all  of the  stations  up  to  the  one  of interest  is  to  use  physiological \ndata  about  how  the  earlier  stations  respond  to  various  stimuli,  as  was  done  in  the \npresent  model.  This  shortcut  is  applicable  to  many  other  problems  in  modeling  the \nvisual  system. \nIn  order  for  this  to  be  most  effective,  physiologists  and  modelers \nneed  to  cooperate  in  generating  useful  libraries  of  response  profiles  to  arbitrary \nstimuli.  Many  stimulus  parameters  interact,  often  nonlinearly,  to  produce  the  final \nIn  the  case  of  simple  moving  stimuli  in  VI  and  MT,  we \noutput  of  a  cell. \nminimally  need  to  know  the  interaction  between  stimulus  size,  stimulus  speed, \nstimulus  direction,  surround  speed,  surround  direction,  and  x-y  starting  point  of  the \nmovement  relative  to  the  classical  excitatory  receptive  field.  Collecting  this  many \nresponse  combinations  from  single  cells  requires  faster  serial  presentation  of stimuli \nis  customary  in  visual  physiology  experiments. \nThere  is  no  obvious  reason, \nhowever,  why  the  rate  of  stimulus  presentation  need  be  any  less  than  the  rate  at \nwhich the visual system nonnally operates--namely, 3-5 new views per second. \n\nAlso,  we  need  to  get  a  better  understanding  of the  'stimulus  set'.  The  very  large  set \nof  stimuli  on  which  the  real  visual  system  is  trained  (millions  of  views)  is  still \nvery  poorly  characterized. \nIt  would  be  worthwhile  and  practical,  nevertheless,  to \ncollect a naturalistic corpus of perhaps 1000 views (several hours of viewing). \n\nAcknowledgements \nI  thank  M.E.  Sereno  and  U.  Wehmeier  for  discussions  and  comments.  Supported  by \nNIH  grant  F32  EY05887.  Networks  and  displays  were  constructed  on \nthe \nRochester Connectionist Simulator. \n\nReferences \nB.K.P. Hom &  B.G. Schunck.  Determining optical flow.  Artiflntell., 17, 185-203 \n\n(1981). \n\nJ.  Hyvarinen.  The Parietal Cortex.  Springer-Verlag (1984). \n\nR. Linsker.  From basic network principles to neural architecture: emergence of \n\norientation-selective cells.  Proc. Nat. Acad. Sci. 83,8390-8394 (1986). \n\nJ.A. Movshon, E.H. Adelson, M.S. Gizzi & W.T. Newsome.  Analysis of moving \nvisual patterns.  In Pattern Recognition Mechanisms.  Springer-Verlag, pp.  117-\n151  (1985). \n\nM.E. Sereno.  Modeling stages of motion processing in neural networks.  Proc. 9th \n\nAnn. Con! Cog. Sci.  Soc.  pp. 405416 (1987). \n\nM.I. Sereno.  The visual system.  In I.W.v. Seelen, U.M. Leinhos, & G. Shaw \n\n(eds.), Organization of Neural Networks.  VCH,  pp.  176-184 (1988). \n\nR.V. Spitz, J.  Stiles-Daves & R.M. Siegel.  Infant perception of rotation from rigid \n\nstructure-from-motion displays.  Neurosci. Abstr. 14, 1244 (1988). \n\n\f", "award": [], "sourceid": 121, "authors": [{"given_name": "Martin", "family_name": "Sereno", "institution": null}]}