{"title": "Bayesian Modeling and Classification of Neural Signals", "book": "Advances in Neural Information Processing Systems", "page_first": 590, "page_last": 597, "abstract": null, "full_text": "Bayesian Modeling and  Classification of \n\nNeural  Signals \n\nMichael S.  Lewicki \n\nComputation and  Neural  Systems Program \nCalifornia Institute of Technology 216-76 \n\nPasadena, CA 91125 \n\nlewickiOcns.caltech.edu \n\nAbstract \n\nSignal  processing  and  classification  algorithms often  have limited \napplicability resulting from an inaccurate model of the signal's un(cid:173)\nderlying  structure.  We  present  here  an  efficient,  Bayesian  algo(cid:173)\nrithm for modeling a signal composed of the superposition of brief, \nPoisson-distributed functions.  This methodology is  applied  to  the \nspecific  problem  of modeling  and  classifying  extracellular  neural \nwaveforms which  are  composed  of a  superposition of an  unknown \nnumber of action potentials CAPs).  Previous  approaches have had \nlimited success due largely to the problems of determining the spike \nshapes,  deciding  how  many  are  shapes  distinct,  and  decomposing \noverlapping APs.  A  Bayesian solution to each of these problems is \nobtained  by inferring  a  probabilistic model of the  waveform.  This \napproach quantifies the uncertainty of the form and number of the \ninferred  AP  shapes  and  is  used  to  obtain  an  efficient  method  for \ndecomposing  complex overlaps.  This algorithm can extract  many \ntimes  more  information than  previous methods  and facilitates  the \nextracellular  investigation of neuronal  classes  and  of interactions \nwithin neuronal  circuits. \n\n590 \n\n\fBayesian Modeling and Classification of Neural Signals \n\n591 \n\n1 \n\nINTRODUCTION \n\nExtracellular electrodes typically record  the activity of several neurons in the vicin(cid:173)\nity of the  electrode  tip  (figure  1).  Most  electrophysiological  data is  collected  by \nisolating action  potentials (APs)  from  a  single  neuron  by using  a  level  detector  or \nwindow  discriminator.  Methods for  extracting  APs  from  multiple neurons  can,  in \naddition  to  the  obvious  advantage of providing  more  data,  provide  the  means  to \ninvestigate local neuronal interactions and response  properties of neuronal popula(cid:173)\ntions.  Determining from  the  voltage  waveform what  cell  fired  when  is  a  difficult, \nill-posed  problem which is  compounded  by the fact  that cells frequently  fire  simul(cid:173)\ntaneously resulting in large variations in the observed shapes. \n\nThere  are  three  major  difficulties  in  identifying  and  classifying  action  potentials \n(APs)  in a neuron waveform.  The first  is  determining the AP shapes, the second  is \ndeciding  the  number of distinct  shapes,  and  the  third  is  decomposing  overlapping \nspikes  into  their  component  parts.  In  general,  these  problems  cannot  be  solved \nindependently, since  the  solution of one will  affect  the solution of the others. \n\n2:  rn_Cl. \n\nFigure  1:  Each  neuron  generates  a stereotyped  action  potential (AP)  which  is  observed \nthrough  the  electrode  as  a  voltage  fluctuation.  This  shape  is  primarily  a  function  of \nthe  position  of  a  neuron  relative  to  the  tip.  The  extracellular  waveform  shows  several \ndifferent APs generated by  an  unknown number of neurons.  Note the frequent  presence of \noverlapping APs  which can completely obscure individual spikes. \n\nThe approach summarized here is to model the waveform directly to obtain a prob(cid:173)\nabilistic  description  of each  action  potential and,  in  turn,  of the  whole  waveform. \nThis method  allows  us  to  compute  the  class  conditional probabilities of each  AP. \nIn  addition, it is  possible to quantify the certainty of both the form  and  number of \nspike  shapes.  Finally,  we  can  use  this  description  to  decompose  overlapping APs \nefficiently and  assign probabilities to alternative spike sequences. \n\n2  MODELING SINGLE ACTION  POTENTIALS \n\nThe  data from  the  event  observed  (at  time  zero)  is  modeled  as  resulting  from  a \nfixed  underlying spike function,  s(t),  plus  noise: \n\n(1) \n\n\f592 \n\nLewicki \n\nwhere  v  is  the  parameter  vector  that  defines  the  spike  function.  The  noise,  1],  is \nmodeled  as  Gaussian with  zero mean  and standard deviation u1]' \nFrom the Bayesian perspective,  the task  is to infer  the posterior distribution of the \nspike function  parameters  (assuming, for  the moment, that u1]  and  Uw are  known): \n\nP(  ID \n\nv \n\n'O\"1]'O\"w, \n\nM) - P(Dlv, 0\"'1'  M) P(vluw, M) \n. \n\nP(DIO\"1],O\"w,M) \n\n-\n\n(2) \n\nThe  two terms  specifying  the  posterior  distribution of v  are  1)  the  probability of \nthe data given  the model: \n\n(3) \n\nand  2)  the prior  assumptions  of the  structure  of s(t)  which  are  assumed  to be  of \nthe form: \n\n(4) \n\nThe superscript  (m)  denotes  differentiation which for  these  demonstrations  we  as(cid:173)\nsumed  to  be  m  =  1  corresponding  to  linear  splines.  The  smoothness  of s(t)  is \ncontrolled through  Uw  with  small values of Uw  penalizing large fluctuations. \nThe final  step  in  determining  the  posterior  distribution  is  to eliminate the  depen(cid:173)\ndence of P(vID, 0\"1]'  O\"w,  M) on 0\"1]  and  O\"w.  Here,  we  use  the  approximation: \n\n(5) \n\nThe most probable values of 0\"1]  and O\"w  were obtained using the methods of MacKay \n(1992) in which reestimation formulas are obtained from a Gaussian approximation \nof the posterior  distribution for  0\"1]  and O\"w,  P(O\"1] , O\"wID,  M).  Correct inference of O\"w \nprevents the spike function from overfitting the data. \n\n3  MODELING MULTIPLE  ACTION POTENTIALS \n\nWhen a waveform contains multiple types of APs,  determining the component spike \nshapes is more difficult because the classes  are not known  a priori.  The uncertainty \nof which class  an event belongs to can be incorporated with a  mixture  distribution. \nThe probability of a  particular event,  D n ,  given all spike models,  M 1 :K ,  is \n\nP(Dnlvl:K' 1r, 0\"1]'  M1 :K) = L 1I\"k P(Dnlvk, 0\"'1'  Mk), \n\nK \n\nk=l \n\n(6) \n\nwhere  1I\"k  is  the  a  priori probability that  a  spike  will  be  an  instance  of Mk,  and \nE 1I\"k  = l. \nAs before, the objective is to determine the posterior distribution for the parameters \ndefining a set of spike models,  P(V 1 :K, 1rID 1:N ,  0\"1]1  trw, M 1 :K) which is obtained again \nusing  Bayes' rule. \n\n\fBayesian Modeling and Classification of Neural Signals \n\n593 \n\nFinding the conditions satisfied  at a posterior maximum leads to the equation: \n\n(7) \n\nwhere 'Tn  is the inferred occurrence time (typically to sub-sample period accuracy) of \nthe event Dn.  This equation is solved iteratively to obtain the most probable values \nof V l :K \u2022  Note that the error for  each event, D n ,  is  weighted by P(Mk IOn, Vk, 1r, 0''7) \nwhich is the probability that the event is  an instance of the kth spike model.  This is \na  soft clustering procedure,  since the events are not explicitly assigned  to particular \nclasses.  Maximizing the  posterior  yields  accurate  estimates  of the  spike functions \neven  when  the clusters  are highly overlapping. \n\nThe  techniques  described  in  the  previous  section  are  used  to determine  the  most \nprobable values for  0''7  and  rTw  and, in turn,  the most  probable values of V l :K  and 1r. \n\n4  DETERMINING THE NUMBER OF SPIKE SHAPES \n\nChoosing a  set  of spike models that  best  fit  the  data, would result  eventually in  a \nmodel for each event in the waveform.  Heuristics might indicate whether  two spike \nmodels  are  identical or  distinct,  but  ad  hoc  criteria are  notoriously  dependent  on \nparticular circumstances,  and it is  difficult  to state precisely  what  information the \nrules  take into account. \n\nTo determine the most probable number of spike models, we apply probability theory. \nLet  Sj  = {MHJ}  denote  a  set  of spike  models  and  H  denote  information known \na  priori.  The  probability of Sj,  conditioned  only  on  H  and  the  data,  is  obtained \nusing  Bayes' rule: \n\n(8) \n\nThe  only  data-dependent  term  is  P(OI:NISj, H)  which  is  the  evidence  for  Sj \n(MacKay, 1992).  With the assumption that all hypotheses SI :3 are equally probable \na priori,  P(D l :NISj, H)  ranks  alternative spike sets in  terms of their  probability. \nThe evidence  term  P(OI :N[Sj, H) is  convenient because  it is  the  normalizing con(cid:173)\nstant  for  the  posterior  distribution  of the  parameters  defining  the  spike  set.  Al(cid:173)\nthough  calculation  of P(O I :N I Sj ,H)  is  analytically  intractable,  it  is  often  well(cid:173)\napproximated with a  Gaussian integral which was the approximation used for  these \ndemonstrations. \n\nA  convenient  way  of collapsing  the spike  set  is  to  compare  spike  models  pairwise. \nTwo models  in the spike  set  are  selected  along with  a  sampled  set  of events fit  by \neach  model.  We  then  evaluate  P(DISl)  and  P(D[S2)'  S1  is  the  hypothesis  that \nthe  data is  modeled  by a  single spike shape,  S2  says there are  two spike shapes.  If \nP(D[S1) > P(D[S2), we replace both models in S2  by the one in S1.  The procedure \nterminates  when  no more pairs can be  combined to increase the evidence. \n\n\f594 \n\nLewicki \n\n5  DECOMPOSING  OVERLAPPING SPIKES \n\nOverlaps  must  be  decomposed  into their  component  spikes  for  accurate  inference \nof the  spike  functions  and  accurate  classification  of the  events.  Determining  the \nbest-fitting  decomposition  is  difficult  becaus(~ of the  enormous  number of possible \nspike sequences,  not  only  all  possible  model  combinations for  each  event  but  also \nall possible event times. \n\nA  brute-force  approach  to  this  problem  is  to perform  an  exhaustive  search  of the \nspace  of overlapping  spike  functions  and  event  times  to  find  the  sequence  with \nmaximum probability.  This approach was used  by Atiya (1992)  in the  case  of two \noverlapping spikes with  the  times optimized  to  one sample period.  Unfortunately, \nthis is often computationally too demanding even for off-line analysis. \n\nWe  make  this  search  efficient  utilizing  dynamic  programming  and  k-dimensional \ntrees  (Friedman et  al.,  1977).  Once  the best-fitting decomposition can be obtained, \nhowever,  it  may  not  be  optimal,  since  adding  more  spike  shapes  can  overfit  the \ndata.  This problem is  minimized  by evaluating the  probability for  alternative  de(cid:173)\ncompositions to determine the  most probable spike sequence  (figure  2) . \n\na \n\n.. ,,' \n\n.  b' \n\nc \n\nFigure  2:  Many  spike function  sequences  can  account for  the  same  region  of data.  The \nthick lines show the data,  thin lines show individual spike functions.  In this case,  the best(cid:173)\nfitting  overlap solution  is  not  the  most  probable:  the  sequence  with  4  spike functions  is \nmore than 8 time&  more probable than the other solutions,  even though  these have smaller \nmean  squared  error.  Using the  best-fitting overlap solution  may  increase the classification \nerror.  Classification error is  minimized by using  t he  overlap solution that is  most probable. \n\n6  PERFORMANCE \n\nThe  algorithm  was  tested  on  40  seconds  of neurophysiological  data.  The  task  is \nto determine  the form  and  number of spike  ~hapes in  a  waveform  and  to infer  the \noccurrence  times  of each  spike  shape.  The  output  of  the  algorithm  is  shown  in \nfigure  3.  The  uniformity of the  residual  error  indicates  that  the  six inferred  spike \nshapes  account  for  the  entire 40  seconds  of data.  The spike functions  M2  and  M3 \nappear similar by eye, but the probabilities calculated with the methods in section 4 \nindicate that the two functions are significantly different.  When plotted against each \n\n\fBayesian Modeling and Classification of Neural Signals \n\n595 \n\n\"Xl  +----IHf----+-----+---+----j \n\n.~ ... \n. .. ...\u2022  ;: ~ \n\n. \n\n. , '\" \n\n,.\"  ,...-.:.' ... \n\n\\\"f' \n\ny \n\n\u00b7m+----+~~-+_---+---+----1 \n\n~~-- --r----r_--_r----r_--~ \n\n. .... ,: .....  v~\u00b7 \u00b7 \n\n.... ''' .. '.,  '. ~;: \n\nTlme(rTS) \n\nT .... ( ... l \n\n'(Xl  +_---iIIIr---+_-- -+---+---_j \n\n. \"J~ \n\n'~l,~ \n. m +_---+--'W',.~ .. -.+_----+---~---_j \n\n\\ .<\" \n\n~'~~'k'! ..  \",  . \n\n-bI \n\n!~>,\"1:~.~iJ;o:~~\u00b7\u00b7\u00b7  >' .. ;;,:; \u00b7\"t',,' \n'\",:.',1' ;~\\' .\u2022 ~.  . .::\\t\" ,~;-\u2022. i .. \n\n'::\"~a\\f ~\":''';'lf'. . \n\n;\";': .. ~:~\";':-~.  ,~; ,'; \n\n:;'~::~ ';R~:' \n. ..... ~:._:.,l..,  :., . . ',~.?\" .. \".:  :-.. !  :-t'r.i\"~'~. \n\nTlrTl {rT15} \n\nM5 \n\n\"\"+_---+---+----;----+-------1 \n\n.  \"\"  l:..,E:\u00b7 \n.. ~~~ .. . \n\n.~.  \" \n\n.:.(,  :.:.  , ;. \n:: .\u2022\u2022.. \n\n.,,,,, +_---;----+-----+-----t--.-\n\n', : , .... \n\ni';', \n\n. \u00b7\u00b7:.h'i  II\" \n\n... \n\n.' ',.\" .' , , \n\n,  .~ \n\nr\"~ \u2022..\u2022 ' \n\nc.  ':\"\"\". ', . .,..\n\n. \n\n\"-\".'.,'. :. \n\nTme,,,..) \n\n\"Xl  +----+---+--- -+----+-----j \n\n,(Xl  +_- --+----I----4---.-+------l \n\n\u2022 L'tI\" ... 'IJ ... \n\n.':~  '  \u2022\n\n,\"\" ..... , , -~,  r-' ..... \n\n\\..  ,.,~~.:.~~ \n\n' \n\n\u2022\u2022 ~.,,:< .. , '::;', \",,~.: \n.'  .-:.'\"'' \n\n.:-:;,\\'., ,'.  ,:,;\\ ... ~: \n. :'~:: \".  d' ., \n\n; ...... :, .~;.:~. \n\n\u00b7m +_---+---+-----+---+- - -_ j  \n\n' 300  +-----+-~~_+__---+--... +-----l \n\n., \n\nTIIT.'mI) \n\n.\"\" \n\n~ \n\n-\\. \" \n'~' ... ,.,' .,\".' \" \n\n' 4~ '\"  . 'Ii\", ~:\",'~ ' \n\":' ,:\" \n,:,:\" \"\" \n\n, ...\u2022. ;/;~~~: ... : ''':~; . .. <,.h/.,  6-' ... , .. ~{:~:..r.J,G:, \n:  \u2022  :'.:r:-':\"\"-':.\"  .. ',  \"\".  '.'\\ \n.-:'''i.-.c  . \n\nFigure  3:  The solid  lines  are  the  inferred  spike  models.  The  data overlying each  model \nis  a  sample  of  at  most  40  events  with  overlapping  spikes  subtracted  out.  The  residual \nerrors  are  plotted below each  model.  This spike set  was  obtained  after three iterations of \nthe algorithm,  decomposing  overlaps and  determining the most  probable  number of spike \nfunctions  after  each  iteration.  The  whole  inference  procedure  used  3  minutes  of  CPU \ntime  on  a  Sparc  IPX.  Once  the spike set  is infe! red,  classification  of the same  40  second \nwaveform takes  about  10  seconds. \n\n\f596 \n\nLewicki \n\nother,  the two populations of APs are distinctly separated in the region  around the \npeak with  M3  being wider  than M 2 \u2022 \n\nThe accuracy of the  algorithm was  tested  by generating an  artificial data set  com(cid:173)\nposed  of the  six  inferred  shapes  shown in  figure  3.  The event  times  were  Poisson \ndistributed  with frequency  equal the inferred firing  rate of the real  data set.  Gaus(cid:173)\nsian noise  was then  added  with standard  deviation  equal  to  0\"'1.  The  classification \nresults  are summarized  in  the tables below. \n\nTable 1:  Results of the spike model inference algorithm on the synthesized data set. \n\nI  Model \nI b.max/O\"fJ \n\n1  I  2 \n\nII \n/I \nII  0.44  I 0.36  I 1.07  I 0.78  I 0.84  I 0.40  II \n\nI  3 \n\nI  4 \n\nI  5 \n\nI  6 \n\nThe number of spike models was correctly determined by the algorithm with the six-model \nspike set was preferred over the most probable five-model  spike set byexp(34)  : 1 and  over \nthe most probable seven-model spike set by exp(19)  : 1.  The inferred shapes were accurate \nto  within  a  maximum  error  of  1.0717'1.  The  row  elements  show  the  maximum  absolute \ndifference,  normalized  by  17'1'  between  each  true  spike  function  and  the  corresponding \ninferred function. \n\nTable 2:  Classification results for the synthesized data set (non-overlapping events). \n\nTrue \nModels \n\n1 \n2 \n3 \n4 \n5 \n6 \n\n1 \n17 \n0 \n0 \n0 \n0 \n0 \n\nInferred  Models \n6 \n2 \n5 \n0 \n0 \n0 \n0 \n0 \n25 \n0 \n0 \n0 \n0 \n0 \n0 \n56 \n0 \n0 \n0  393 \n0 \n\n4 \n0 \n0 \n0 \n116 \n0 \n0 \n\n3 \n0 \n1 \n15 \n0 \n0 \n0 \n\nTotal \nMissed \nEvents  Events \n17 \n26 \n15 \n117 \n73 \n647 \n\n0 \n0 \n0 \n1 \n17 \n254 \n\nTable 3:  Classification results for  the synthesized  data set  (overlapping events). \n\nTrue \nModels \n\n1 \n2 \n3 \n4 \n5 \n6 \n\nInferred  Models \n6 \n2 \n5 \n1 \n0 \n0 \n22 \n0 \n0 \n0 \n0  36 \n0 \n0 \n0 \n0 \n1 \n0 \n1 \n0 \n61 \n0 \n1 \n0 \n2  243 \n0 \n0 \n\n4 \n0 \n0 \n0 \n116 \n1 \n3 \n\n3 \n0 \n1 \n20 \n0 \n0 \n0 \n\nMissed \nTotal \nEvents  Events \n22 \n37 \n20 \n121 \n82 \n408 \n\n0 \n0 \n0 \n3 \n19 \n160 \n\nTables 2 and 3:  Each matrix component indicates the  number of times true model  i was \nclassified  as  inferred  model  j.  Events  were  missed  if the  true spikes  were  not  detected \nin  an  overlap sequence  or  if all  sample  values for  the  spike fell  below  the  event detection \nthreshold  (417'1).  There was 1 false  positive for  Ms  and  7 for  M 6 \u2022 \n\n\fBayesian Modeling and Classification of Neural Signals \n\n597 \n\n7  DISCUSSION \n\nFormulating the task as  having to infer  a  probabilistic model made clear what was \nnecessary to obtain accurate spike models.  The soft clustering procedure accurately \ndetermines the  spike shapes even  when  the true underlying shapes are  similar.  U n(cid:173)\nless the spike shapes  are well-separated, commonly used hard clustering procedures \nwill lead  to inaccurate estimates. \n\nProbability theory also allowed for  an objective means of determining the number of \nspike models which is an essential reason for the success of this algorithm.  With the \nwrong  number of spike  models overlap  decomposition  becomes  especially  difficult . \nThe evidence has proved to be a sensitive indicator of when two classes are distinct . \n\nProbability theory  is  also  essential  to  accurate  overlap  decomposition.  Simply fit(cid:173)\nting data with  compositions of spike models  leads  to the same overfitting problem \nencountered  in  determining  the  number  of spike  models  and  in  determining  the \nspike shapes.  Previous  approaches have been  able to handle only  a  limited class  of \noverlaps, mainly due to the difficultly in making the fit  efficient.  The algorithm used \nhere  can fit  an overlap sequence  of virtually arbitrary complexity in milliseconds. \n\nIn  practice,  the  algorithm  extracts  many  times  more  information  from  a  neural \nwaveform than  previous  methods.  Moreover,  this  information is  qualitatively dif(cid:173)\nferent  from  a  simple  list  of spike  times.  Having  reliable  estimates  of the  action \npotential  shapes  makes  it  possible  to  study  the  properties  of these  classes,  since \ndistinct  neuronal  types  can  have distinct  neuronal  spikes.  Finally,  accurate  over(cid:173)\nlap decomposition makes it possible to investigate interactions among local neurons \nwhich  were  previously very difficult  to observe. \n\nAcknowledgements \n\nI  thank  David MacKay for  helpful  discussions  and  Jamie  Mazer  for  many conver(cid:173)\nsations  and  extensive  help  with  the  development  of the  software.  This  work  was \nsupported  by Caltech fellowships and  an  NIH  Research  Training Grant. \n\nReferences \n\nA.F. Atiya.  (1992)  Recognition of multiunit neural  signals.  IEEE  Transactions  on \nBiomedical Engineering  39(7):723-729. \n\nJ .H.  Friedman, J.L.  Bently,  and R.A.  Finkel.  (1977)  An  algorithm for  finding  best \nmatches in logarithmic expected  time.  ACM Trans.  Math.  Software 3(3):209-226. \n\nD. J. C. MacKay.  (1992) Bayesian interpolation.  Neural Computation 4(3):415-445. \n\n\f", "award": [], "sourceid": 777, "authors": [{"given_name": "Michael", "family_name": "Lewicki", "institution": null}]}