{"title": "Connectionism for Music and Audition", "book": "Advances in Neural Information Processing Systems", "page_first": 1163, "page_last": 1164, "abstract": null, "full_text": "Connectionism for  Music and  Audition \n\nAndreas S.  Weigend \n\nDepartment of Computer Science \nand  Institute of Cognitive Science \n\nUniversity of Colorado \nBoulder,  CO  80309-0430 \n\nAbstract \n\nThis  workshop  explored  machine  learning  approaches  to  3  topics: \n(1)  finding structure in  music (analysis,  continuation, and comple(cid:173)\ntion  of an  unfinished  piece),  (2)  modeling  perception  of time  (ex(cid:173)\ntraction  of musical meter,  explanation of human  data on  timing), \nand  (3)  interpolation in  timbre space. \n\nIn  recent  years,  NIPS  has  heard  neural  networks  generate  tunes  and  harmonize \nchorales.  With  a  large  amount of music  becoming available  in  computer  readable \nform,  real  data  can  be  used  to  train  connectionist  models.  At  the  beginning  of \nthis  workshop,  Andreas  Weigend focused  on  architectures  to  capture  structure \non  multiple  time scales.  J.  S.  Bach's  last  (unfinished)  fugue  from  Die  Kunst  der \nFuge  served  as  an  example (Dirst  &  Weigend,  1994).1  The prediction  approach  to \ncontinuation  and  completion,  as  well  as  to  modeling expectations,  can  be  charac(cid:173)\nterized  by  the  question  \"What's  next?\".  Moving  to  time  as  the  primary  medium \nof musical communication, the inquiry in music perception  and cognition shifted to \nthe question  \"When next?\" . \nIn other words, so far we  have considered patterns  in  time.  They assume prior iden(cid:173)\ntification and subsequent  processing  of events.  Bob Port, coming from  the speech \ncommunity, considered patterns of time, discussing timing in linguistic polyrhythms \n(e.g.,  hot  cup  of tea).  He  also  drew  parallels  between  timing in  Japanese  language \nand timing in  music,  supporting the hypothesis  that perceptional  rhythms entrain \nattentional rhythms.  As  a mechanism for entrainment, Devin McAuley presented \nadaptive oscillators:  the oscillators adapt  their  frequencies  such  that  their  \"firing\" \ncoincides  with the  beat of the music (McAuley,  1994). \n\nAs  the  beat  can  be  viewed  as  entrainment  of  an  individual  oscillator,  the  meter \ncan  be  viewed  as  entrainment of multiple oscillators.  Ed  Large described  human \nperception of metrical structure in analogy to two pendulum clocks that synchronize \ntheir  motions  by  hanging  on  the  same  wall.  An  advantage  of these  entrainment \n\n1 This fugue is  available  via anonymous ftp from  ftp. santafe. edu as data set F. dat of \n\nthe Santa  Fe  Time Series  Analysis  and  Prediction  Competition. \n\n1163 \n\n\f1164 \n\nWeigend \n\napproaches  (which  focus  on  time as  time) over  traditional approaches  (which focus \non music notation and treat time symbolically) is their ability to model phenomena \nin music performance, such  as  expressive  timing. \nTaking a  Gibsonian perspective,  Fred Cummins emphasized the relevance of eco(cid:173)\nlogical constraints on audition:  perceptually relevant features  are not easily spotted \nin  the  wave  form  or  the  spectrum.  Among  the  questions  he  posed  were:  what \n\"higher-order\"  features  might  be  useful  for  audition,  and  whether  recurrent  net(cid:173)\nworks  could  be  useful  to extract such features. \n\nThe  last  contribution  also  addressed  the  issue  of representation,  but  with  sound \nsynthesis in  mind:  wouldn't a  musician like  to control sound in a  perceptually rele(cid:173)\nvant space,  rather than fiddling  with non-intuitive coefficients  of an FM-algorithm? \nSuch a  space was constructed  with human input:  subjects were  asked  to similarity(cid:173)\njudge sounds from different instruments (normalized in pitch, duration and volume). \nMultidimensional scaling  was  used  to  define  a  low-dimensional  sub-space  keeping \nthe  distance  relations.  Michael  Lee first  trained  a  network  to find  a  map  from \ntimbre space  to the  space  of the  first  33  harmonics  (Lee,  1994).  He  then  used  the \nnetwork  to generate rich  new  sounds  by  interpolating in  this  perceptually  relevant \nspace,  through  physical gestures, such  as from a data glove, or through an interface \nmusicians might be comfortable with, such  as  a  cello. \n\nThe  discussion  turned  to  the  importance of working  with  perceptually  adequate, \n\"ecologically sound\"  representations (e.g., by using a cochlea model as pre-processor, \nor  a speech  model as post-processor  for  sonification applications).  Finally, to probe \nhuman  cognition,  we  discussed  synthetic  sounds,  designed  to  reveal  fundamental \ncharacteristics of the auditory system, independent of our daily experience.  Return(cid:173)\ning to the  title,  the workshop  turned out to be  problem driven:  people presented  a \nproblem or a finding and searched for a solution-connectionist or otherwise-rather \nthan applying canned  connectionist  ideas  to music and  cognition. \n\nthank \n\nI \nthe  speakers,  Fred  Cummins  (fcummins@indiana.edu),  Ed  Large \n(large@cis.ohio-state.edu),  Michael  Lee  (lee@cnmat.berkeley.edu),  Devin  McAuley \n(mcauley@cs.indiana.edu),  Robert  Port  (port@indiana.edu),  as  well  as  all  partici(cid:173)\npants.  I  also thank Tom Ngo  (ngo@interval.com) for  sending me the  notes  he  took \nat the workshop, and  Eckhard  Kahle (kahle@ircam.fr) for  discussing  this summary. \n\nReferences \n\nDirst,  M.,  and  A.  S.  Weigend  (1994)  \"Baroque  Forecasting:  On  Completing \nJ.  S.  Bach's  Last  Fugue.\"  In  Time  Series  Prediction:  Forecasting  the  Future  and \nUnderstanding  the  Past,  edited  by  A.  S.  Weigend  and  N.  A.  Gershenfeld,  pp.  151-\n172.  Addison-Wesley. \nLee,  M., and D.  Wessel  (1992)  \"Connectionist Models for  Real-Time Control of Syn(cid:173)\nthesis  and  Compositional  Algorithms.\"  In  Proceedings  of the  International  Com(cid:173)\nputer Music  Conference,  pp.  277-280.  San  Francisco,  CA:  International  Computer \nMusic  Association. \nMcAuley,  J.  D.  (1994)  \"Finding metrical structure  in  time.\"  In  Proceedings  of the \n1993  Connectionist  Models  Summer School,  edited  by  M.  C.  Mozer,  P.  Smolensky, \nD.  S.  Touretzky, J.  L.  Elman and A.  S.  Weigend,  pp.  219-227.  Lawrence  Erlbaum. \n\n\f", "award": [], "sourceid": 859, "authors": [{"given_name": "Andreas", "family_name": "Weigend", "institution": null}]}