{"title": "Predicting Speech Intelligibility from a Population of Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1409, "page_last": 1416, "abstract": "", "full_text": " \n\n \n \n \n \n \n \n \n\n \n \n \n \n\nPredicting Speech Intelligibility from a \n\nPopulation of Neurons \n\nDept. of Electrical Engineering \n\nDept. of Electrical Engineering \n\nIan C. Bruce \n\nMcMaster University \n\nHamilton, ON \nibruce@ieee.org   \n\nJeff Bondy \n\nMcMaster University \n\nHamilton, ON \n\njeff@soma.crl.mcmaster.ca \n\nSuzanna Becker \n\nDept. of Psychology \nMcMaster University \nbecker@mcmaster.ca \n\nAbstract \n\nSimon Haykin \n\nDept. of Electrical Engineering \n\nMcMaster University \nhaykin@mcmaster.ca \n\n \n\nA  major  issue  in  evaluating  speech  enhancement  and  hearing \ncompensation algorithms is  to come up with a suitable  metric that \npredicts  intelligibility  as  judged  by  a  human  listener.  Previous \nmethods such as the widely used Speech Transmission Index (STI) \nfail  to  account  for  masking  effects  that  arise  from  the  highly \nnonlinear  cochlear  transfer  function.  We  therefore  propose  a \nNeural  Articulation \nspeech \nintelligibility  from  the  instantaneous  neural  spike  rate  over  time, \nproduced when a signal is  processed by an auditory  neural model. \nBy  using  a  well  developed  model  of  the  auditory  periphery  and \ndetection  theory  we  show  that  human  perceptual  discrimination \nclosely  matches  the  modeled  distortion  in  the  instantaneous  spike \nrates  of  the  auditory  nerve.  In  highly  rippled  frequency  transfer \nconditions  the  NAI\u2019s  prediction  error  is  8%  versus  the  STI\u2019s \nprediction error of 10.8%. \n\nestimates \n\nIndex \n\n(NAI) \n\nthat \n\n1  Introduction \n\nA  wide  range  of  intelligibility  measures  in  current  use  rest  on  the  assumption  that \nintelligibility  of  a  speech  signal  is  based  upon  the  sum  of  contributions  of \nintelligibility  within  individual  frequency  bands,  as  first  proposed  by  French  and \nSteinberg  [1].  This  basic  method  applies  a  function  of  the  Signal-to-Noise  Ratio \n(SNR)  in  a  set  of  bands,  then  averages  across  these  bands  to  come  up  with  a \nprediction of intelligibility. French and Steinberg\u2019s original Articulation Index (AI) \nis  based  on  20  equally  contributing  bands,  and  produces  an  intelligibility  score \nbetween zero and one: \n\n=\n\nAI\n\n1 20\n\u2211\n20\n=\ni\n1\n\niTI\n\n,\n\n \n\n(1) \n\n\fwhere TIi (Transmission Index i) is the normalized intelligibility in the ith band. The \nTI per band is a function of the signal to noise ratio or: \n\n \n\n=\n\nTI\n\ni\n\n \n\n(2) \n\n12+\n\nSNR\ni\n30\n\nfor SNRs between \u201312 dB and 18 dB. A SNR of greater than 18 dB means that the \nband has perfect intelligibility and TI equals 1, while an SNR under \u201312 dB means \nthat a  band is  not contributing at all, and the TI  of that band equals  0. The overall \nintelligibility  is  then  a  function  of  the  AI,  but  this  function  changes  depending  on \nthe semantic context of the signal. \nKryter validated many of the underlying AI principles [2]. Kryter also presented the \nmechanics  for  calculating  the  AI  for  different  number  of  bands  -  5,6,15  or  the \noriginal 20 - as well as important correction factors [3]. Some of the most important \ncorrection  factors  account  for  the  effects  of  modulated  noise,  peak  clipping,  and \nreverberation.  Even  with  the  application  of  various  correction  factors,  the  AI  does \nnot  predict  intelligibility  in  the  presence  of  some  time-domain  distortions.  \nConsequently,  the  Modulation  Transfer  Function  (MTF)  has  been  utilized  to \nmeasure  the  loss  of  intelligibility  due  to  echoes  and  reverberation  [4].  Steeneken \nand Houtgast later extended this approach to include nonlinear distortions, giving a \nnew name to the predictor: the Speech Transmission Index (STI) [5]. These metrics \nproved more valid for a larger range of environments and interferences. \nThe STI test signal is a long-term average speech spectrum, gaussian random signal, \namplitude  modulated  by  a  0.63  Hz  to  12.5  Hz  tone.  Acoustic  components  within \ndifferent frequency bands are switched on and off over the testing sequence to come \nup  with  an  intelligibility  score  between  zero  and  one.  Interband  intermodulation \nsources can be discerned, as long as the product does not fall into the testing band. \nTherefore,  the  STI  allows  for  standard  AI-frequency  band  weighted  SNR  effects, \nMTF-time  domain  effects,  and  some  limited  measurements  of  nonlinearities.  The \nSTI shows a high correlation with empirical tests, and has been codified as an ANSI \nstandard  [6].  For  general  acoustics  it  is  very  good.  However,  the  STI  does  not \naccurately  model \nthe \nunderlying auditory mechanisms (outside of independent frequency bands) \nWe therefore sought to extend the AI/STI concepts to predict intelligibility, on the \nassumption  that  the  closest  physical  variable  we  have  to  the  perceptual  variable  of \nintelligibility is the auditory nerve response. Using a spiking model of the auditory \nperiphery  [7]  we  form  the  Neuronal  Articulation  Index  (NAI)  by  describing \ndistortions in the spike trains of different frequency bands. The spiking over time of \nan auditory  nerve fiber for  an undistorted speech signal (control case) is compared \nto the neural spiking over time for the same signal after undergoing some distortion \n(test case). The difference in the estimated instantaneous discharge rate for the two \ncases is used to calculate a neural equivalent to the TI, the Neural Distortion (ND), \nfor  each  frequency  band.  Then  the  NAI  is  calculated  with  a  weighted  average  of \nNDs  at  different  Best  Frequencies  (BFs).  In  general  detection  theory  terms,  the \ncontrol  neuronal  response  sets  some  locus  in  a  high  dimensional  space,  then  the \ndistorted  neuronal  response  will  project  near  that  locus  if  it  is  perceptually \nequivalent,  or  very  far  away  if  it  is  not.  Thus,  the  distance  between  the  control \nneuronal response and the distorted neuronal response is a function of intelligibility. \nDue to the limitations of the STI mentioned above it is predicted that a measure of \nthe neural coding error will be a better predictor than SNR for human intelligibility \nword-scores.  Our  method  also  has  the  potential  to  shed  light  on  the  underlying \nneurobiological mechanisms. \n\nintraband  masker  non-linearities,  phase  distortions  or \n\n\f \n\n2  Method \n\n2. 1   M o d e l  \n\nThe  auditory  periphery  model  used  throughout  (and  hereafter  referred  to  as  the \nAuditory Model) is from [7]. The system is shown in Figure 1. \n\n \n\nFigure  1  Block diagram of the computational model  of the auditory  periphery \nfrom  the  middle  ear  to  the  Auditory  Nerve.  Reprinted  from  Fig.  1  of  [7]  with \npermission from the Acoustical Society of America \u00a9 (2003). \n\nThe  auditory  periphery  model  comprises  several  sections,  each  providing  a \nphenomenological  description  of  a  different  part  of  the  cat  auditory  periphery \nfunction. \nThe  first  section  models  middle  ear  filtering.  The  second  section,  labeled  the \n\u201ccontrol  path,\u201d  captures  the  Outer  Hair  Cells  (OHC)  modulatory  function,  and \nincludes a wideband, nonlinear, time varying, band-pass filter followed by an OHC \nnonlinearity  (NL)  and  low-pass  (LP)  filter.  This  section  controls  the  time-varying, \nnonlinear behavior of the narrowband signal-path basilar membrane (BM) filter. The \ncontrol-path  filter  has  a  wider  bandwidth  than  the  signal-path  filter  to  account  for \nwideband nonlinear phenomena such as two-tone rate suppression. \nThe  third  section  of  the  model,  labeled  the  \u201csignal  path\u201d,  describes  the  filter \nproperties  and  traveling  wave  delay  of  the  BM  (time-varying,  narrowband  filter); \nthe nonlinear transduction and low-pass filtering of the Inner Hair Cell (IHC NL and \nLP);  spontaneous  and  driven  activity  and  adaptation  in  synaptic  transmission \n(synapse  model);  and  spike  generation  and  refractoriness  in  the  auditory  nerve \n(AN).  In this model, CIHC and COHC are scaling constants that control IHC and OHC \nstatus, respectively.  \nThe  parameters  of  the  synapse  section  of  the  model  are  set  to  produce  adaptation \nand  discharge-rate  versus  level  behavior  appropriate  for  a  high-spontaneous-\n\n\frate/low-threshold auditory nerve fiber.  In order to avoid having to generate many \nspike  trains  to  obtain  a  reliable  estimate  of  the  instantaneous  discharge  rate  over \ntime, we instead use the synaptic release rate as an approximation of the discharge \nrate, ignoring the effects of neural refractoriness. \n\n2. 2   N e u r a l   a r t i c u l at i o n   i n d e x  \n\n \n\nThese results emulate most of the simulations described in Chapter 2 of Steeneken\u2019s \nthesis  [8],  as  it  describes  the  full  development  of  an  STI  metric  from  inception  to \nend.  For  those  interested,  the  following  simulations  try  to  map  most  of  the  second \nchapter, but instead of basing the distortion metric on a SNR calculation, we use the \nneural distortion. \nThere  are  two  sets  of  experiments.  The  first,  in  section  3.1,  deals  with  applying  a \nfrequency weighting  structure to combine the  band  distortion  values,  while  section \n3.2 introduces redundancy factors also. The bands, chosen to match [8], are octave \nbands centered at [125, 250, 500, 1000, 2000, 4000, 8000] Hz. Only seven bands are \nused here. The Neural AI (NAI) for this is: \n+\n\n=\n\n\u22c5\n\n\u22c5\n\n\u22c5\n\n\u03b1\n1\n\nNTI\n1\n\n\u03b1\n2\n\nNTI\n2\n\n++\n...\n\n\u03b1\n7\n\nNTI\n7\n\n,\n\n \n\nNAI\n\ni  is  the  ith  bands  contribution  and  NTIi  is  the  Neural  Transmission  Index  in \n\nwhere  (cid:127)\nthe  ith  band.  Here  all  the  (cid:127)s  sum  to  one,  so  each  (cid:127)  factor  can  be  thought  of  as  the \npercentage  contribution  of  a  band  to  intelligibility.  Since  NTI  is  between  [0,1],  it \ncan also be thought of as the percentage of acoustic features that are intelligible in a \nparticular  band.  The  ND  per  band  is  the  projection  of  the  distorted  (Test) \ninstantaneous spike rate against the clean (Control) instantaneous spike rate. \n\n(3) \n\nND\n\n= \u2212\n1\n\n\u22c5\n\nTest Control\n\nT\nControl Control\n\n\u22c5\n\n, \n\nT\n\n(4) \n\nwhere  Control  and  Test  are  vectors  of  the  instantaneous  spike  rate  over  time, \nsampled  at  22050  Hz.  This  type  of  error  metric  can  only  deal  with  steady  state \nchannel  distortions,  such  as  the  ones  used  in  [8].  ND  was  then  linearly  fit  to \nresemble the TI equation 1-2, after normalizing each of the seven bands to have zero \nmeans and unit standard deviations across each of the seven bands. The NTI in the \nith band was calculated as \n\nNTI\ni\n\n=\n\nm\n\n\u00b5\ni\n\n\u2212\nND\ni\n\u03c3\ni\n\n+\n\n.b\n\n \n\n(5) \n\nNTIi is then thresholded to be no less then 0 and no greater then 1, following the TI \nthresholding. In equation (5) the factors, m = 2.5, b = -1, were the best linear fit to \nproduce NTIi\u2019s in bands with SNR greater then 15 dB of 1, bands with 7.5 dB SNR \nproduce  NTIi\u2019s  of  0.75,  and  bands  with  0  dB  SNR  produced  NTIi\u2019s  of  0.5.  This \nclosely  followed  the  procedure  outlined  in  section  2.3.3  of  [8].  As  the  TI  is  a  best \nlinear fit of SNR to intelligibility, the NTI is a best linear fit of neural distortion to \nintelligibility. \nThe  input  stimuli  were  taken  from  a  Dutch  corpus  [9],  and  consisted  of  10 \nConsonant-Vowel-Consonant  (CVC)  words,  each  spoken  by  four  males  and  four \nfemales  and  sampled  at  44100  Hz.  The  Steeneken  study  had  many  more,  but  the \nexact  corpus  could  not  be  found.  80  total  words  is  enough  to  produce  meaningful \nfrequency weighting factors. There were 26 frequency channel distortion conditions \nused for male speakers, 17 for female and three SNRs (+15 dB, +7.5 dB and 0 dB). \nThe channel conditions were split into four groups given in Tables 1 through 4 for \nmales, since females have negligible signal in the 125 Hz band, they used a subset, \nmarked with an asterisk in Table 1 through Table 4. \n \n\n\fTable 1: Rippled Envelope \n\n \n\nOCTAVE-BAND CENTRE FREQUENCY \n\n125 \n1 \n0 \n1 \n0 \n1 \n0 \n1 \n0 \n\n125 \n1 \n0 \n0 \n\n125 \n1 \n1 \n1 \n0 \n0 \n0 \n\n250 \n1 \n0 \n1 \n0 \n1 \n0 \n0 \n1 \n\n250 \n1 \n1 \n0 \n\n250 \n0 \n0 \n0 \n1 \n1 \n0 \n\n500 \n1 \n0 \n0 \n1 \n0 \n1 \n1 \n0 \n\n500 \n1 \n1 \n0 \n\n500 \n1 \n1 \n0 \n0 \n0 \n1 \n\n1K \n1 \n0 \n0 \n1 \n0 \n1 \n0 \n1 \n\n1K \n0 \n1 \n1 \n\n1K \n0 \n0 \n1 \n1 \n0 \n0 \n\n2K \n0 \n1 \n0 \n1 \n1 \n0 \n1 \n0 \n\n2K \n0 \n0 \n1 \n\n2K \n1 \n0 \n0 \n0 \n1 \n1 \n\n4K \n0 \n1 \n1 \n0 \n1 \n0 \n0 \n1 \n\n4K \n0 \n0 \n1 \n\n4K \n0 \n1 \n1 \n0 \n0 \n0 \n\n8K \n0 \n1 \n1 \n0 \n0 \n1 \n1 \n0 \n\n8K \n0 \n0 \n0 \n\n8K \n0 \n0 \n0 \n1 \n1 \n1 \n\nTable 2: Adjacent Triplets \n\nOCTAVE-BAND CENTRE FREQUENCY \n\nTable 3: Isolated Triplets \n\nOCTAVE-BAND CENTRE FREQUENCY \n\nTable 4: Contiguous Bands \n\nOCTAVE-BAND CENTRE FREQUENCY \n\n125 \n\n250 \n\n500 \n\n1K \n\n2K \n\n4K \n\n8K \n\n0 \n0 \n0 \n1 \n0 \n0 \n1 \n0 \n1 \n\n1 \n0 \n0 \n1 \n1 \n0 \n1 \n1 \n1 \n\n1 \n1 \n0 \n1 \n1 \n1 \n1 \n1 \n1 \n\n1 \n1 \n1 \n1 \n1 \n1 \n1 \n1 \n1 \n\n1 \n1 \n1 \n1 \n1 \n1 \n1 \n1 \n1 \n\n0 \n1 \n1 \n0 \n1 \n1 \n1 \n1 \n1 \n\n0 \n0 \n1 \n0 \n0 \n1 \n0 \n1 \n1 \n\n \nID # \n1* \n2* \n3* \n4* \n5* \n6* \n7* \n8* \n\n \nID # \n9 \n10 \n11* \n\n \nID # \n12 \n13 \n14 \n15* \n16* \n17 \n\n \nID # \n\n18* \n19* \n20* \n21 \n22* \n23* \n24 \n25 \n26* \n\nIn the above tables a one represents a passband and a zero a stop band. A 1353 tap \nFIR  filter  was  designed  for  each  envelope  condition.  The  female  envelopes  are  a \nsubset  of  these  because  they  have  no  appreciable  speech  energy  in  the  125  Hz \noctave band. Using the 40 male utterances and 40 female utterances under distortion \nand calculating the NAI following equation (3) produces only a value between [0,1]. \nTo produce a word-score intelligibility prediction between zero and 100 percent the \nNAI  value  was  fit  to  a  third  order  polynomial  that  produced  the  lowest  standard \ndeviation  of  error  from  empirical  data.  While  Fletcher  and  Galt  [10]  state  that  the \nrelation  between  AI  and  intelligibility  is  exponential,  [8]  fits  with  a  third  order \npolynomial,  and  we  have  chosen  to  compare  to  [8].  The  empirical  word-score \nintelligibility was from [8]. \n\n\f3  Results \n\n3. 1   D e t e r m i n i n g  f r e q u e n c y  w e i gh t i n g  s t r u c t u r e  \n\n \n\nthrough  minimizing \n\nFor the first tests, the optimal frequency weights (the values of (cid:127)\ni from equation 3) \nwere  designed \nthe  predicted \nintelligibility  and  the  empirical  intelligibility.  At  each  iteration  one  of  the  values \nwas dithered up or down, and then the sum of the (cid:127)\ni was normalized to one. This is \nvery similar to [5] whose final standard deviation of prediction error for males was \n12.8%, and 8.8% for females. The NAI\u2019s final standard deviation of prediction error \nfor males was 8.9%, and 7.1% for females. \n\nthe  difference  between \n\n \nFigure 2 Relation between NAI and empirical word-score intelligibility for male \n(left)  and  female  (right)  speech  with  bandpass  limiting  and  noise.  The  vertical \nspread from the best fitting polynomial for males has a s.d. =  8.9% versus the \nSTI [5] s.d. = 12.8%, for females the fit has a s.d. = 7.1% versus the STI [5] s.d. \n= 8.8% \n\nThe  frequency  weighting  factors  are  similar  for  the  NAI  and  the  STI.  The  STI \nweighting factors from [8], which produced the optimal prediction of empirical data \n(male s.d. = 6.8%, female s.d. = 6.0%) and the NAI are plotted in Figure 3. \n\n \nFigure  3  Frequency  weighting  factors  for  the  optimal  predictor  of  male  and \nfemale intelligibility calculated with the NAI and published by Steeneken [8]. \n\nAs  one  can  see,  the  low  frequency  information  is  tremendously  suppressed  in  the \nNAI,  while  the  high  frequencies  are  emphasized.  This  may  be  an  effect  of  the \nstimuli corpus. The corpus has a high percentage of stops and fricatives in the initial \nand  final  consonant  positions.  Since  these  have  a  comparatively  large  amount  of \nhigh  frequency  signal  they  may  explain  this  discrepancy  at  the  cost  of  the  low \nfrequency  weights.  [8]  does  state  that  these  frequency  weights  are  dependant  upon \nthe conditions used for evaluation. \n\n\f3. 2   D e t e r m i n i n g   f r e q u e n c y   w e i gh t i n g   w i t h   r e d u n d a n c y   f ac t o r s  \n\n \n\nIn experiment two, rather then using equation (3) that assumes each frequency band \ncontributes  independently,  we  introduce  redundancy  factors.  There  is  correlation \nbetween the different frequency bands of speech [11], which tends to make the STI \nover-predict  intelligibility.  The  redundancy  factors  attempt  to  remove  correlate \nsignals between bands. Equation (3) then becomes: \nNAIr\nwhere the r subscript denotes a redundant NAI and (cid:127) is the correlation factor. Only \nadjacent bands are used here to reduce complexity. We replicated Section 3.1 except \nusing  equation  6.  The  same  testing,  and  adaptation  strategy  from  Section  3.1  was \nused to find the optimal (cid:127)s and (cid:127)s. \n\nNTI\n1\n\nNTI\n1\n\n++\n\n...\n\n,\n\n \n(6) \n\n+\n\n\u03b1\n2\n\nNTI\n\n3\n\nNTI\n\n7\n\n\u22c5\n\n\u03b1\n1\n\nNTI\n\n2\n\nNTI\n\n2\n\nNTI\n\n2\n\n\u2212\n\n\u03b2\n1\n\n\u2212\n\n\u03b2\n1\n\n\u03b1\n7\n\n\u22c5\n\n=\n\n\u22c5\n\n\u22c5\n\n\u22c5\n\n \nFigure  4  Relation  between  NAIr  and  empirical  word-score  intelligibility  for \nmale  speech  (right)  and  female  speech  (left)  with  bandpass  limiting  and  noise \nwith Redundancy Factors. The vertical spread from the best fitting polynomial \nfor males has a s.d. = 6.9% versus the STIr [8] s.d. = 4.7%, for females the best \nfitting polynomial has a s.d. = 5.4% versus the STIr [8] s.d. = 4.0%. \nThe  frequency  weighting  and  redundancy  factors  given  as  optimal  in  Steeneken, \nversus calculated through optimizing the NAIr are given in Figure 5. \n\n \nFigure  5  Frequency  and  redundancy  factors  for  the  optimal  predictor  of  male \nand female intelligibility calculated with the NAIr and published in [8]. \n\nThe  frequency  weights  for  the  NAIr  and  STIr  are  more  similar  than  in  Section  3.1. \nThe  redundancy  factors  are  very  different  though.  The  NAI  redundancy  factors \nshow no real frequency dependence unlike the convex STI redundancy factors. This \nmay be due to differences in optimization that were not clear in [8]. \n\nTable 5: Standard Deviation of Prediction Error \n\n \n\nNAI \nSTI [5] \nSTI [8] \n\nMALE \nEQ. 3 \n\n8.9 % \n12.8 % \n6.8 % \n\nFEMALE \nEQ. 3 \n\n7.1 % \n8.8 % \n6.0 % \n\nMALE \nEQ. 6 \n\n6.9 % \n\n \n\n4.7 % \n\nFEMALE \nEQ. 6 \n\n5.4 % \n\n \n\n4.0 % \n\n\f \n\nThe mean difference in error between the STIr, as given in [8], and the NAIr is 1.7%. \nThis  difference  may  be  from  the  limited  CVC  word  choice.  It  is  well  within  the \nrange of normal speaker variation, about  2%, so we believe that the NAI and NAIr \nare comparable to the STI and STIr in predicting speech intelligibility. \n\n4  Conclusions \nThese results are very encouraging. The NAI provides  a modest improvement over \nSTI in predicting intelligibility. We do not propose this as a replacement for the STI \nfor general acoustics since the NAI is much more computationally complex then the \nSTI. The NAI\u2019s end applications are in predicting hearing impairment intelligibility \nand  using  statistical  decision  theory  to  describe  the  auditory  systems  feature \nextractors - tasks which the STI cannot do, but are available to the NAI. \nWhile  the  AI  and  STI  can  take  into  account  threshold  shifts  in  a  hearing  impaired \nindividual, neither can account for sensorineural, suprathreshold degradations [12]. \nThe  accuracy  of  this  model,  based  on  cat  anatomy  and  physiology,  in  predicting \nhuman speech intelligibility provides strong validation of attempts to design hearing \naid  amplification  schemes  based  on  physiological  data  and  models  [13].  By \nquantifying the hearing impairment in an intelligibility metric by way of a damaged \nauditory model one can provide a more accurate assessment of the distortion, probe \nhow  the  distortion  is  changing  the  neuronal  response  and  provide  feedback  for \npreprocessing  via  a  hearing  aid  before  the  impairment.  The  NAI  may  also  give \ninsight into how the ear codes stimuli for the very robust, human  auditory system. \n\nR e f e r e n c e s  \n[1]  French,  N.R.  &  Steinberg,  J.C.  (1947)  Factors  governing  the  intelligibility  of  speech \nsounds. J. Acoust. Soc. Am. 19:90-119. \n[2]  Kryter,  K.D.  (1962)  Validation  of  the  articulation  index.  J.  Acoust.  Soc.  Am.  34:1698-\n1702. \n[3]  Kryter,  K.D.  (1962b)  Methods  for  the  calculation  and  use  of  the  articulation  index.  J. \nAcoust. Soc. Am. 34:1689-1697.  \n[4]  Houtgast,  T.  &  Steeneken,  H.J.M.  (1973)  The  modulation  transfer  function  in  room \nacoustics as a predictor of speech intelligibility. Acustica 28:66-73. \n[5]  Steeneken,  H.J.M.  &  Houtgast,  T.  (1980)  A  physical  method  for  measuring  speech-\ntransmission quality. J. Acoust. Soc. Am. 67(1):318-326. \n[6] ANSI (1997) ANSI S3.5-1997 Methods for calculation of the speech intelligibility index. \nAmerican National Standards Institute, New York. \n[7] Bruce, I.C., Sachs, M.B., Young, E.D. (2003) An auditory-periphery model of the effects \nof acoustic trauma on auditory nerve responses. J. Acoust. Soc. Am., 113(1):369-388. \n[8]  Steeneken,  H.J.M.  (1992)  On  measuring  and  predicting  speech  intelligibility.  Ph.D. \nDissertation, University of Amsterdam. \n[9] van Son, R.J.J.H., Binnenpoorte, D., van den Heuvel, H. & Pols, L.C.W. (2001) The IFA \ncorpus:  a  phonemically  segmented  Dutch  \u201copen  source\u201d  speech  database.  Eurospeech  2001 \nPoster \n[10] Fletcher, H., & Galt, R.H. (1950) The perception of speech and its relation to telephony. \nJ. Acoust. Soc. Am. 22:89-151. \n[11]  Houtgast,  T.,  &  Verhave,  J.  (1991)  A  physical  approach  to  speech  quality  assessment: \ncorrelation patterns in the speech spectrogram. Proc. Eurospeech 1991, Genova:285-288. \n[12]  van  Schijndel,  N.H.,  Houtgast,  T.  &  Festen,  J.M.  (2001)  Effects  of  degradation  of \nintensity, time, or frequency content on speech intelligibility for normal-hearing and hearing-\nimpaired listeners. J. Acoust. Soc. Am.110(1):529-542. \n[13]  Sachs,  M.B.,  Bruce,  I.C.,  Miller,  R.L.,  &  Young,  E.  D.  (2002)  Biological  basis  of \nhearing-aid design. Ann. Biomed. Eng. 30:157\u2013168. \n\nhttp://145.18.230.99/corpus/index.html \n\n \n\n \n\n\f", "award": [], "sourceid": 2346, "authors": [{"given_name": "Jeff", "family_name": "Bondy", "institution": null}, {"given_name": "Ian", "family_name": "Bruce", "institution": null}, {"given_name": "Suzanna", "family_name": "Becker", "institution": null}, {"given_name": "Simon", "family_name": "Haykin", "institution": null}]}