{"title": "Connectionist Models for Auditory Scene Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 1069, "page_last": 1076, "abstract": null, "full_text": "Connectionist  Models  for \nA uditory  Scene  Analysis \n\nRichard o.  Duda \n\nDepartment of Electrical Engineering \n\nSan  Jose  State University \n\nSan Jose,  CA  95192 \n\nAbstract \n\nAlthough  the  visual  and  auditory  systems  share  the  same  basic \ntasks  of informing  an  organism  about  its  environment,  most  con(cid:173)\nnectionist  work  on  hearing  to  date  has  been  devoted  to  the  very \ndifferent  problem of speech  recognition .  VVe  believe that the  most \nfundamental task of the auditory system is  the analysis of acoustic \nsignals into components corresponding to individual sound sources, \nwhich  Bregman has called auditory scene  analysis .  Computational \nand connectionist work on auditory scene analysis is  reviewed,  and \nthe  outline  of  a  general  model  that  includes  these  approaches  is \ndescribed. \n\n1 \n\nINTRODUCTION \n\nThe  primary  task  of any  perceptual system is  to  tell  us  about  the  external  world. \nThe primary problem is that the sensory inputs provide too much data and too little \ninformation.  A  perceptual system  must  glean  from  the  flood  of incomplete,  noisy, \nredundant and constantly changing streams  of data those invariant  properties that \nreveal important objects and events in the environment.  For humans, the perceptual \nsystems with the widest bandwidths are the visual system and the auditory system. \nThere  are  many  obvious similarities  and  differences  between  these  modalities,  and \nin addition to using them to perceive different aspects of the physical world,  we  also \nuse  them in  quite  different  ways  to communicate with  one  another. \n\n1069 \n\n\f1070 \n\nDuda \n\nThe  earliest  neural-network  models  for  vision  and  hearing  addressed  problems  in \npattern  recognition,  with  optical  character  recognition  and  isolated  word  recogni(cid:173)\ntion  among  the  first  engineering  applications.  However,  about  twenty  years  ago \nthe  research  goals  in  vision  and  hearing  began  to  diverge.  In  particular,  the  need \nfor  computers  to perceive  the external environment motivated vision  researchers to \nseek  the  principles  and  procedures  for  recovering  information  about  the  physical \nworld  from  visual  data  (Marr,  1982;  Ballard  and  Brown,  1982).  By  contrast,  the \nvast  majority  of work  on  machine  audition  remained  focused  on  the  communica(cid:173)\ntion problem of speech recognition (Morgan and Scofield,  1991;  Rabiner and Juang, \n1993).  While  this  focus  has  produced  considerable  progress,  the  resulting systems \nare  still  not  very  robust,  and  perform  poorly  in  uncontrolled  environments.  Fur(cid:173)\nthermore,  as  Richards  (1988)  has  noted,  \" ...  Speech,  like  writing  and  reading,  is \na  specialized  skill  of advanced  animals,  and  understanding speech  need  not  be  the \nbest  route  to  understanding  how  we  interpret  the  patterns  of natural sounds  that \ncomprise  most of the  acoustic spectrum about  us.\" \n\nIn  recent  years,  some  researchers  concerned  with  modeling  audition  have begun  to \nshift their attention from speech understanding to sound understanding.  The inspi(cid:173)\nration  for  much  of this  activity  has  come  from  the  work  of Bregman,  whose  book \non  auditory  scene  analysis  documents  experimental evidence  for  important  gestalt \nprinciples  that  summarize  the  ways  that  people  group  elementary  events  in  fre(cid:173)\nquency /time into sound objects or streams (Bregman,  1990).  In  this survey  paper, \nwe  briefly  review  this  activity  and  consider its implications for  the  development  of \nconnectionist models for  auditory scene  analysis. \n\n2  AUDITORY  SCENE  ANALYSIS \n\nIn  vision,  Marr  (1982)  emphasized  the  importance  of identifying  the  tasks  of the \nvisual  system  and  developing  a  computational  theory  that  is  distinct from  partic(cid:173)\nular  algorithms  or  implementations.  The  computational  theory  had  to specify  the \nproblems to be solved,  the sensory data that is  available,  and the additional knowl(cid:173)\nedge  or  assumptions  required  to  solve  the  problems.  Among  the  various  tasks  of \nthe  visual system,  Marr  believed  that the  recovery of the  three-dimensional shapes \nof the surfaces of objects from the sensory image data was  the  most fundamental. \n\nThe  auditory  system  also  has  basic  tasks  that  are  more  primitive  than  the  recog(cid:173)\nnition  of speech.  These  include  (1)  the  separation  of  different  sound  sources,  (2) \nthe  localization of the sources in  space (3)  the  suppression of echoes  and  reverber(cid:173)\nation,  (4)  the decoupling  of sources from  the environment,  (5)  the  characterization \nof the sources,  and  (6)  the  characterization of the environment.  Unfortunately,  the \nrelation  between  physical sound sources and perceived sound streams is  not  a  sim(cid:173)\nple  one-to-one  correspondence.  Distributed  sound  sources,  echoes,  and  synthetic \nsounds  can easily  confuse  auditory perception.  Nevertheless,  humans still  do  much \nbetter  at  these  six  basic  tasks  than any  machine  hearing system that exists  today. \n\nFrom the  standpoint of physics,  the  raw  data available  for  performing  these  tasks \nis  the  pair  of  acoustic  signals  arriving  at  the  two  ears.  From  the  standpoint  of \nneurophysiology,  the raw  data is  the activity on the auditory nerve.  The  nonlinear, \nmechallo-neural  spectral  analysis  performed  by  the  cochlea  converts  sound  pres(cid:173)\nsure  fluctuations  into  auditory  nerve  firings.  For  better  or  for  worse ,  the  cochlea \n\n\fConnectionist Models for Auditory Scene Analysis \n\n1071 \n\ndecomposes the signal into many frequency  components,  transforming it into a  fre(cid:173)\nquency /time  (or,  more  accurately,  a  place/time)  spectrogram-like  representation. \nThe auditory system must find  the  underlying order in  this dynamic flow  of data. \n\nFor  a  specific  case,  consider  a  simple  musical  mixture  of several  periodic  signals. \n\\Vithin  its  limits  of resolution,  the  cochlea  decomposes  each  individual signal  into \nits  discrete  harmonic  components.  Yet,  under  ordinary  circumstances,  we  do  not \nhear  these  components  as  separate  sounds,  but  rather  we  fuse  them  into  a  single \nsound  having,  as  musicians  say,  its  particular  timbre  or  tone  color.  However,  if \nthere is  something distinctive  about  the  different  signals  (such  as  different  pitch  or \ndifferent modulation), we  do not fuse  all of the sounds together, but rather hear the \nseparate sources,  each  with  its  own  timbre. \n\nWhat information is available to group the spectral components into sound streams? \nHartmann  (1988)  identifies  the following  factors  that influence  grouping:  (1)  com(cid:173)\nmon  onset/offset,  (2)  common  harmonic  relations,  (3)  common  modulation,  (4) \ncommon  spatial  origin,  (5)  continuity of spectral envelope,  (6)  duration,  (7)  sound \npressure  level,  and  (8)  context.  These  properties  are  easier  to  name  than  to  pre(cid:173)\ncisely  specify,  and it is  not surprising that no current  model  incorporates  them all. \nHowever,  several auditory scene analysis systems have been built that exploit some \nsubset of these  cues  (''''eintraub,  1985;  Cooke,  1993;  Mellinger,  1991;  Brown,  1992; \nBrown  and  Cooke,  1993;  Ellis,  1993).  Although  these  are  computational  rather \nthan  connectionist  models,  most  of them  at  least  find  inspiration  in  the  structure \nof the  mammalian  auditory system. \n\n3  NEURAL  AND  CONNECTIONIST  MODELS \n\nThe neural pathways from the cochlea through the brainstem nuclei to the  auditory \ncortex are  complex,  but have  been  extensively  investigated.  Although  this system \nis  far  from completely understood,  neurons in the brainstem nuclei are known to be \nsensitive  to various acoustic features -\nonsets, offsets and modulation in the dorsal \ncochlear  nucleus,  interaural  time  differences  (lTD's)  in  the  medial  superior  olive \n(MSO),  interaural  intensity  differences  (IID's)  in  the  lateral  superior  olive  (LSO), \nand spatial location  maps in  the inferior  colliculus  (Pickles,  1988). \n\nBoth functional and connectionist models have been developed for  all of these func(cid:173)\ntions.  Because  it  is  both  important  and  relatively  well  understood,  the  cochlea \nhas  received  by  far  the  most  attention  (Allen,  1985).  As  a  result  of this  work,  we \nnow have real-time implementations for  some of these models as analog VLSI  chips \n(Lyon and Mead,  1988;  Lazzaro et al.,  1993).  Connectionist models for  sound local(cid:173)\nization have also been extensively explored.  Indeed, one  of the earliest of all  neural \nnetwork  models  was  Jeffress's  classic crosscorrelation  model  (Jeffress,  1948),  which \nwas  hypothesized forty years before neural crosscorrelation structures were  actually \nfound  in  the  barn  owl  (Carr  and  Konishi,  1988).  Models  have  subsequently  been \nproposed  for  both  the  LSO  (Reed  and  Blum,  1990)  and  the  TvISO  (Han  and  Col(cid:173)\nburn,  1991).  Mathematically,  both the lTD  and IID  cues  for  binaural  localization \nare exposed by crosscorrelation.  Lyon showed that cross correlation can also be used \nto separate as  well  as localize  the signals  (Lyon,  1983).  VLSI  cross correlation chips \ncan  provide  this  information  in  real  time  (Lazzaro  and  Mead,  1989;  Bhadkamkar \nand  Fowler,  1993). \n\n\f1072 \n\nDuda \n\nWhile  interaural  crosscorrelation  can  determine  the  azimuth  to  a  sound  source, \nfull  three-dimensional localization  also  requires  the  determination of elevation  and \nrange.  Because of a lack of symmetry in  the orientation of its ears, the barn owl  can \nactually determine azimuth from the lTD  and elevation from  the IID. This at least \nin  part explains  why  it  has  been such  a  popular  choice  for  connectionist  modeling \n(Spence  et  al.,  1990;  Moiseff et  al.,  1991;  Palmieri  et  al.,  1991;  Rosen,  Rumelhart \nand  Knudsen,  1993) .  Unfortunately,  the  localization  mechanisms  used  by  humans \nare more  complicated. \n\nIt is well known that humans use monaural, spectral shape cues to estimate elevation \nin  the  median  sagittal  plane  (Blauert,  1983;  Middlebrooks  and  Green,  1991),  and \nsource localization models based on this approach have been developed (Neti, Young \nand Schneider,  1992;  Zakarauskas and Cynander, 1993).  The author has shown that \nthere are strong binaural cues for elevation at short distances away from the median \nplane,  and  has  used  statistical  methods  to  estimate  both  azimuth  and  elevation \naccurately  from  IID  data alone  (Duda,  1994).  In  addition,  backprop  models  have \nbeen  developed  that can  estimate  azimuth  and elevation from IID  and lTD  inputs \njointly (Backman  and  Karjalainen 93;  Anderson,  Gilkey  and Janko,  1994). \n\nFinally,  psychologists  have  long  been  aware  of  an  important  reverberation(cid:173)\nsuppression  phenomenon  known  as  the  precedence  effect  or  the  law  of  the  first \nwavefront  (Zurek , 1987).  It is  usually summarized by saying that echoes of a  sound \nsource  have  little  effect  on  its  localization,  and  are  not  even  consciously  heard  if \nthey  are  not  delayed  more  than  the  so-called  echo  threshold,  which  ranges  from \n5-10  ms  for  sharp  clicks  to  more  than  50  ms  for  music.  It is  generally  believed \nthat  the  precedence  effect  can  be  accounted  for  by  contralateral  inhibition  in  the \ncrosscorrelation process,  and Lindemann has accounted for  many of the phenomena \nby  a  conceptually simple  connectionist  model  (Lindemann,  1986). \n\nHowever,  Clifton  and  her  colleagues  have  found  that  the  echoes  are  indeed  heard \nif  the  timing  of  the  echoes  suddenly  changes,  as  might  happen  when  one  moves \nfrom  one  acoustic  environment  into  another  one  (Clifton  1987;  Freyman,  Clifton \nand  Litovsky,  1991).  Clifton  conjectures  that  the  auditory  system  is  continually \nanalyzing  echo patterns  to model the  acoustic environment,  and that the  resulting \nexpectations modify the  echo  threshold .  This suggests that simple  crosscorrelation \nmodels  will  not  be  adequate  when  the  listener  is  moving,  and  thus  that  even  the \nlocalization problem is  still  unsolved. \n\n4  ARCHITECTURE OF  AN  AUDITORY  MODEL \n\nIf we  look back  at the six basic tasks for  the auditory system,  we  see  that only  one \n(source  localization)  ha.s  received  much  attention  from  connectionist  researchers, \nand  its  solution  is  incomplete.  In  particular,  current  localization  models  cannot \nhandle multiple sources and cannot cope with significant room echoes and reverber(cid:173)\nation.  The  common  problem for  all  of the  basic  tasks  is  that  of source  separation, \nwhich  only  the  a.uditory  scene  analysis  systems have  addressed. \n\nFig.  1  shows  a  functional  block  diagram  for  a  hypothetical  auditory  model  that \ncombines  the  computational  and  connectionist  models  and  has  the  potential  of \ncoping  with  a  multisource  environment.  The  inputs  to  the  model  are  the  left-ear \n\n\fConnectionist Models for Auditory Scene Analysis \n\n1073 \n\nand right-ear signals,  and the main output is  a  dynamic set of streams.  The system \nis  primarily data driven,  although low-bandwidth efferent  paths could be added for \ntasks such  as  automatic gain  control. \n\nData flow  on  the  left  half of the  diagram  is  monaural,  and  dataflow  of the  right \nhalf is  binaural.  The  binaural  processing  is  based  on  crosscorrelation  analysis  of \nthe  cochlear  outputs.  The  author  has  shown  that  interaural  differences  not  only \neffective in determining azimuth, but can also be used to determine elevation as  well \n(Duda,  1994).  V\\'e  have chosen  to follow  Slaney and  Lyon  (Slaney  and  Lyon,  1993) \nin  basing  the  monaural  analysis  on  autocorrelation  analysis.  Originally  proposed \nby  Licklider  (1951)  to  explain  pitch  phenomena,  autocorrelation  is  a  biologically \nplausible  operation that supports the  common  onset,  modulation  and  harmonicity \nanalysis  needed  for  stream  formation  (Duda,  Lyon  and  Slaney,  1990;  Brown  and \nCooke,  1993). \n\nWhile  the  processes  at  lower  levels  of this  diagram  are  relatively  well  understood, \nthe  process  of stream  formation  is  problematic.  Bregman  (1990)  has  posed  this \nproblem in terms of grouping  the components of the  \"neural spectrogram\"  in  both \nfrequency  and  time.  He  has  identified  two  principles  that  seem  to  be  employed \nin  stream formation:  exclusive  allocation  (a  component  may  not  be  used  in  more \nthan  one  description  at  a  time)  and accounting  (all  incoming  components  must  be \nassigned  to  some  source).  The  various  auditory  scene  analysis  systems  that  we \nmentioned  earlier  provide  different  mechanisms  for  exploiting  these  principles  to \nform  auditory streams.  Unfortunately,  the  principles  admit  of exceptions,  and  the \nexisting implementations seem rather  ad  hoc  and  arbitrary.  The  development  of a \nbiologically  plausible  model  for  stream  formation  is  the  central  unsolved  problem \nfor  connectionist  research  in  audition. \n\nShort\u00b7 Term \n\nAud~ory Memory \n\nStream Formation \n\nMonaural Maps \n\nAuto-Correlatlon \n\nAnalysis \n\nCross-Correiation \n\nAnalysis \n\nSpectral Analysis \n(Cochlear Model) \n\nLeft Input \n\nSpectral Analysis \n(Cochlear Model) \n\nI \n\nRight Input \n\nFigure  1:  Block  diagram for  a  basic  auditory  model \n\n\f1074 \n\nDuda \n\nAcknowledgements \n\nThis  work  was  supported  by  the  National  Science  Foundation  under  NSF  Grant \nNo.  IRI-9214233.  This  paper  could  not  have  been  written  without  the  many  dis(cid:173)\ncussions  on  these  topics  with  Al  Bregman,  Dick  Lyon,  David  Mellinger,  Bernard \nMontReynaud,  John  R.  Pierce,  Malcolm  Slaney  and  J.  Martin  Tenenbaum,  and \nfrom the stimulating CCRMA Hearing Seminar at Stanford University that Bernard \ninitiated and that  Malcolm has  maintained  and invigorated. \n\nReferences \n\nAllen,  J.  B.  (1985).  \"Cochlear  modeling,\"  IEEE ASSP Magazine,  vol.  2,  pp.  3-29. \n\nAnderson,  T.  R.,  R.  H.  Gilkey  and  J.  A.  Janko  (1994).  \"Using  neural networks  to \nmodel human sound localization,\"  in T. Anderson and R.  H. Gilkey  (eds.),  Binaural \nand  Spatial Hearing.  Hillsdale,  NJ:  Lawrence  Erlbaum Associates. \n\nBackman,  J.  and  M.  Karjalainen  (1993). \n\"Modelling  of  human  directional  and \nspatial hearing using  neural  networks,\"  ICASSP93 ,  pp.  1-125-1-128.  (Minneapolis, \nMN). \n\nBhadkamkar,  N.  and  B.  Fowler  (1993).  \"A  sound  localization  system  based  on \nbiological  analogy,\"  1993  IEEE  International  Conference  on  Neural  Networks, \npp.  1902-1907.  (San  Francisco,  CA). \n\nBallard,  D.  H.  and  C . M.  Brown  (1982).  Computer  Vision.  Englewood  Cliffs,  NJ: \nPrentice-Hall. \n\nBlauert,  J.  P.  (1983).  Spatial  Hearing.  Cambridge,  MA:  MIT Press. \n\nBregman,  A.  S.  (1990).  Auditory  Scene  Analysis.  Cambridge,  MA:  MIT  Press, \n1990. \n\nBrown,  G.  J.  (1992). \ntional  approach,\"  PhD  dissertation,  Department  of Computer Science,  University \nof Sheffield,  Sheffield,  England,  UK. \n\n\"Computational  auditory  scene  analysis:  A  representa(cid:173)\n\nBrown,  G.  J.  and  Iv!.  Cooke  (1993).  \"Physiologically-motivated signal  representa(cid:173)\ntions for  computational auditory scene analysis,\"  in M.  Cooke, S. Beet and M.  Craw(cid:173)\nford  (eds.),  Visual Representations of Speech  Signals,  pp.  181-188.  Chichester, Eng(cid:173)\nland:  John Wiley  and  Sons. \n\nCarr,  C.  E.  and  M.  Konishi  (1988).  \"Axonal  delay  lines  for  time measurement  in \nthe owl's  brainstem,\"  Proc.  Nat.  Acad.  Sci.  USA,  vol.  85,  pp.  8311-8315. \n\nClifton,  R.  K.  (1987).  \"Breakdown  of echo  suppression  in  the  precedence  effect,\" \nJ.  Acoust.  Soc.  Am., vol.  82,  pp.  1834-1835. \n\nCooke,  M.  P.  (1993).  Modelling  Auditory Processing  and  Organisation.  Cambridge, \nUK:  Cambridge  University  Press. \n\nDuda,  R.  0., R.  F.  Lyon  and  M.  Slaney  (1990).  \"Correlograms and the separation \nof sounds,\"  Proc.  24th Asilomar Conf.  on  Signals,  Systems and Computers, pp. 457-\n461  (Asilomar,  CA). \n\n\fConnectionist Models for Auditory Scene Analysis \n\n1075 \n\nDuda,  R.  O.  (1994).  \"Elevation dependence  of the interaural transfer function,\"  in \nT.  Anderson  and  R.  H.  Gilkey  (eds.),  Binaural and  Spatial Hearing.  Hillsdale,  NJ: \nLawrence  Erlbaum Associates. \nEllis,  D.  P.  VI.  (1993).  \"Hierarchic  models  of hearing for  sound separation  and re(cid:173)\nconstruction,\"  1993 IEEE  Workshop  on  Applications  of Signal Processing  to  Audio \nand  Acoustics. \nFreyman,  R.  L.,  R.  K.  Clifton  and  R.  Y.  Litovsky  (1991).  \"Dynamic  processes  in \nthe  precedence effect,\"  J.  Acoust.  Soc.  Am., vol.  90,  pp.  874-884. \n\nHan, Y.  and H.  S.  Colburn (1991).  \"A  neural cell  model of MSO,\"  Proc.  1991  IEEE \nSeventeenth  Annual  Northeast  Bioenginering  Conference,  pp.  121-122  (Hartford, \nCT). \nHartmann, \\V.  A.  (1988).  \"Pitch perception and  the segregation and integration of \nauditory entities,\"  in  G.  M.  Edelman, \\V.  E. Gail and \\V.  M.  Cowan  (eds.),  Auditory \nFunction.  New  York,  NY:  John  'Wiley  and  Sons,  Inc. \n\nJeffress,  L.  A.  (1948).  \"A  place  theory  of  sound  localization,\"  J.  Compo  Phys(cid:173)\niol.  Psychol.,  vol.  41,  pp. 35-39. \n\nLazzaro,  J.  and  C.  A.  Mead  (1989).  \"A  silicon  model  of  auditory  localization,\" \nNeural  Computation, vol.  1,  pp.  47-57. \nLazzaro, J., J. \\Vawrzynek, :M.  Mahowald, M.  Sivilotti and D.  Gillespie (1993).  \"Sil(cid:173)\nicon  auditory  processors  as  computer  peripherals,\"  IEEE  Transactions  on  Neural \nNetworks,  vol.  4,  pp.  523-528. \n\nLicklider, J. C. R.  (1951).  \"A duplex theory of pitch perception,\"  Experentia, vol.  7, \npp.  128-133. \n\nLindemann,  W.  (1986). \nby  contralateral  inhibition. \nnals,\" J.  Acoust.  Soc.  Am.,  vol.  80,  pp.  1608-1622;  II.  The  law  of  the  first  wave \nfront,\"  J.  Acoust.  Soc.  Am.,  vol.  80,  pp.  1623-1630. \n\n\"Extension  of  a  binaural  cross-correlation  model \nI.  Simulation  of  lateralization  for  stationary  sig(cid:173)\n\nLyon,  R.  F.  (1983).  \"A  computational  model  of binaural  localization  and  separa(cid:173)\ntion,\"  ICASSP83 , pp.  1148-1151.  (Boston,  MA). \n\nLyon,  R.  F.  and  C.  Mead  (1988). \nTrans.  Acoustics,  Speech  and  Signal  Processing,  vol.  36,  pp.  1119-1134. \n\n\"An  analog  electronic  cochlea,\" \n\nIEEE \n\nMarr,  D.  (1982).  Vision.  San  Francisco,  CA:  \\V.  H.  Freeman  and Company. \n\n\"Event  formation  and  separation  of  musical  sound,\" \nMellinger,  D.  K.  (1991). \nPhD dissertation,  Department of Music, Stanford University, Stanford, CA;  Report \nNo.  STAN-M-77,  Center for  Computer  Research  in  Music  and Acoustics,  Stanford \nUniversity,  Stanford,  CA. \n\nMiddlebrooks,  J. C.  and D.  M.  Green  (1991).  \"Sound localization by  human listen(cid:173)\ners,\"  Annu.  Rev.  Psychol.,  vol.  42,  pp.  135-159. \n\nMoiseff,  A.  et  al.  (1991).  \"An  artificial  neural network for  studying binaural sound \nlocalization,\"  Proc.  1991  IEEE Seventeenth  Annual Northeast  Bioengineering  Con(cid:173)\nference,  pp.  1-2  (Hartford,  CT). \n\n\f1076 \n\nDuda \n\nMorgan,  D.  P.  and  C.  1. Scofield  (1991).  Neural  Networks  and  Speech  Processing. \nBoston,  MA:  Kluwer  Academic  Publishers. \n\nNeti,  C.,  E.  D.  Young  and  M.  H.  Schneider  (1992).  \"Neural  network  models  of \nsound localization based on  directional filtering  by the pinna,\"  J.  Acoust.  Soc.  Am., \nvol.  92,  pp.  3140-3156. \n\n\"Sound  localization \nPalmieri,  F.,  M.  Datum,  A.  Shah  and  A.  Moiseff  (1991). \nwith  a  neural  network  trained  with  the  multiple  extended  Kalman  algorithm,\" \nProc.  Int.  Joint  Conf.  on  Neural  Networks,  pp. 1125-1131  (Seattle,  \\VA). \n\nPickles, James O.  (1988).  An Introduction to the  Physiology of Hearing,  2nd edition. \nLondon,  Academic  Press,  1988. \n\nRabiner,  L.  and  B-H  Juang  (1993).  Fundamentals  of Speech  Recognition.  Engel(cid:173)\nwood  Cliffs,  NJ:  Prentice-Hall. \nReed,  M.  C.  and J . J.  Blum (1990).  \"A  model for  the computation and encoding of \nazimuthal  information  by  the  lateral superior  olive,\"  J.  Acoust.  Soc.  Am.,  vol.  88, \npp.  1442-1453. \nRichards,  Vi.  (1988).  \"Sound interpretation,\"  in  \\TV.  Richards  (ed.),  Natural  Com(cid:173)\nputation,  pp.  301-308.  Cambridge,  MA:  MIT  Press. \n\nRosen,  D.  ,  D.  Rumelhart  and  E.  Knudsen  (1993).  \"A  connectionist  model  of the \nowl's  localization  system,\"  in  J.  D.  Cowan,  G.  Tesauro  and  J.  Alspector  (eds.), \nAdvances in  Neural Information  Processing Systems  6.  San  Francisco,  CA:  Morgan \nKaufmann  Publishers. \n\nSlaney,  M.  and  R.  F.  Lyon  (1993).  \"On  the  importance  of time  - A  temporal \nrepresentation  of  sound,\"  in  M.  Cooke,  S.  Beet  and  M.  Crawford  (eds.),  Visual \nRepresentations  of Speech  Signals,  pp.  95-116.  Chichester,  England:  John  \\TViley \nand  Sons. \n\nSpence,  C.  D.  and J.  C.  Pearson  (1990).  \"The  computation of sound  source  eleva(cid:173)\ntion  in  the  barn  owl,\"  in  D.  S.  Touretzsky  (ed.),  Advances  in  Neural  Information \nProcessing  Systems  2, pp.  10-17.  San  Mateo,  CA:  Morgan  Kaufmann. \n\nWeintraub,  M.  (1985).  \"A  theory  and  computational model  of auditory  monaural \nsound  separation,\"  PhD  dissertation,  Department of Electrical  Engineering,  Stan(cid:173)\nford  University,  Stanford,  CA. \n\nZakarauskas,  P.  and  M.  S.  Cynander  (1993).  \"A  computational  theory  of spectral \ncue localization,\"  J.  Acoust.  Soc.  Am., vol.  94,  pp.  1323-1331. \n\nZurek ,  P.  M.  (1987).  \"The  precedence  effect,\"  in  \\TV.  A.  Yost  and  G .  Gourevitch \n(eds.)  Directional Hearing,  pp.  85-106.  New York,  NY:  Springer  Verlag. \n\n\f", "award": [], "sourceid": 827, "authors": [{"given_name": "Richard", "family_name": "Duda", "institution": null}]}