{"title": "Basis Selection for Wavelet Regression", "book": "Advances in Neural Information Processing Systems", "page_first": 627, "page_last": 633, "abstract": null, "full_text": "Basis  Selection For  Wavelet  Regression \n\nKevin R.  Wheeler \n\nCaelum  Research  Corporation \nNASA  Ames  Research  Center \n\nMail  Stop  269-1 \n\nMoffett  Field,  CA  94035 \n\nkwheeler@mail.arc.nasa.gov \n\nAtam P.  Dhawan \n\nCollege of Engineering \nUniversity of Toledo \n\n2801  W.  Bancroft Street \n\nToledo,  OH  43606 \n\nadhawan@eng.utoledo.edu \n\nAbstract \n\nA  wavelet  basis  selection  procedure  is  presented  for  wavelet  re(cid:173)\ngression.  Both  the  basis  and  threshold  are  selected  using  cross(cid:173)\nvalidation.  The  method  includes  the  capability  of  incorporating \nprior knowledge on the smoothness (or shape of the basis functions) \ninto  the  basis  selection  procedure.  The  results  of the  method  are \ndemonstrated  using  widely  published  sampled  functions.  The  re(cid:173)\nsults of the method are contrasted with other basis function  based \nmethods. \n\n1 \n\nINTRODUCTION \n\nWavelet  regression  is  a  technique  which  attempts  to  reduce  noise  in  a  sampled \nfunction  corrupted with  noise.  This  is  done  by  thresholding  the small  wavelet  de(cid:173)\ncomposition coefficients which represent mostly noise.  Most of the papers published \non  wavelet  regression  have  concentrated  on  the  threshold  selection  process.  This \npaper  focuses  on  the  effect  that  different  wavelet  bases  have  on  cross-validation \nbased threshold selection, and the error in  the final  result.  This paper also suggests \nhow prior information may be incorporated into the basis selection process, and the \neffects  of choosing  a  wrong prior.  Both orthogonal and biorthogonal wavelet  bases \nwere explored. \n\nWavelet regression is  performed in  three steps.  The first  step is  to apply a  discrete \nwavelet transform to the sampled data to produce decomposition coefficients.  Next \na threshold is  applied to the coefficients.  Then an inverse discrete wavelet transform \nis  applied  to these modified  coefficients. \n\n\f628 \n\nK.  R.  Wheeler and A. P Dhawan \n\nThe basis selection procedure is demonstrated to perform better than other wavelet \nregression methods even  when  the wrong prior on  the space of the basis selections \nis  specified. \n\nThis  paper  is  broken  into  the  following  sections.  The  background  section  gives  a \nbrief summary of the mathematical requirements of the discrete wavelet  transform. \nThis section is followed  by  a  methodology section  which  outlines the basis selection \nalgorithms, and the process for  obtaining the presented results.  This is  followed  by \na  results section  and then a  conclusion. \n\n2  BACKGROUND \n\n2.1  DISCRETE WAVELET  TRANSFORM \n\nThe Discrete Wavelet Transform (DWT)  [Daubechies, 92]  is  implemented as a series \nof projections  onto scaling  functions  in  L2 (R).  The  initial  assumption  is  that the \noriginal  data samples  lie  in  the  finest  space  Vo,  which  is  spanned  by  the  scaling \nfunction  ,p  E Vo  such that the collection {,p( x -t) It E Z} is a Riesz basis of Vo .  The \nfirst  level of the dyadic decomposition then consists of projecting the data samples \nonto scaling functions  which  have  been  dilated  to be  twice  as  wide  as  the  original \n,p.  These span  the coarser space  V\u00b7- 1  :  {,p(2x  - 2t)  It E Z}.  The information  that \nis  lost going from  the finer  to coarser scale is  retained in  what is  known  as  wavelet \ncoefficients.  Instead of taking the difference, the wavelet coefficients can be obtained \nvia a  projection operation  onto  the  wavelet  basis  functions  'I/J  which  span a  space \nknown as Woo  The projections are typically implemented using Quadrature Mirror \nFilters  (QMF)  which  are  implemented  as  Finite  Impulse  Response  filters  (FIR) . \nThe next level  of decomposition is  obtained by again doubling the scaling functions \nand projecting the first scaling decomposition coefficients onto these functions .  The \ndifference  in  information  between  this  level  and  the  last  one  is  contained  in  the \nwavelet  coefficients  for  this  level.  In  general,  the scaling  functions  for  level  j  and \ntranslationmmayberepresentedby: ,pj(t)  =  2:::,}-,p(2- j t-m)wheretE [0,  2k-1J, \n\nk  ~ 1,  1 ~ j  ~ k, \u00b0 ~ m  ~ 2k - j  - 1. \n\n2.1.1  Orthogonal \n\nAn  orthogonal wavelet  decomposition  is  defined  such  that the difference  space Wj \nis  the  orthogonal  complement  of  Vj  in  Vj +!\n:  Wo..l  Vo  which  means  that  the \nprojection  of  the  wavelet  functions  onto  the  scaling  functions  on  a  level  is  zero: \n('I/J,,pC -t)) =  0,  t E  Z \n\nThis  results  in  the  wavelet  spaces  Wj  with  j  E  Z  being  all  mutually  orthogo(cid:173)\nnal.  The refinement relations for  an orthogonal  decomposition  may  be written  as: \n,p(x)  =  2 Lk hk,p(2x - k)  and 'I/J (x)  =  2 Lk gk,p(2x  - k). \n\n2.1.2  Biorthogonal \n\nSymmetry is  as  an  important  property when  the  scaling functions  are  used  as  in(cid:173)\nterpolatory functions.  Most commonly used interpolatory functions  are symmetric. \nIt is  well  known  in  the subband filtering  community that symmetry and exact re(cid:173)\nconstruction  are  incompatible  if  the  same  FIR  filters  are  used  for  reconstruction \nand decomposition (except for the Haar filter)  [Daubechies,  92].  If we  are willing to \n\n\fBasis Selectionfor Wavelet Regression \n\n629 \n\nuse  different  filters  for  the analysis and synthesis  banks,  then  symmetry and exact \nreconstructior:: are possible using b~orthogonal wavelets.  Biorthogonal wavelets have \ndual  scaling  4>  and  dual  wavelet  1/J  functions .  These  generate  a  dual  multiresolu(cid:173)\ntion  analysis  with  subspaces  ~ and  TVj  so  that:  l~  1..  Wj  and  Vj  1..  Wj  and  the \northogonality conditions can now  be written  as: \n\n(\u00a2, 1/J (.  -l))  =  ('\u00a2,4>(- -l))  =  0 \n\n(\u00a2j,l,4>k,m) \n\nOJ-k ,OI-m  for  l,m,j,k E  Z \n\n(-0j ,I,1/Jk,m) \n\nOJ-k,OI-m for  l,m , j,k E Z \n\nwhere OJ - k  =  1 when j  =  k,  and zero otherwise. \n\nThe refinement  relations for  biorthogonal wavelets can be written: \n\n4>(:::)  =  2 L hk4>(2x  - k)  and 1/J(x) \n\n2 L gk4>(2x  - k) \n\nk \n\nk \n\n\u00a2(x) \n\n2 L hk\u00a2(2x - k)  and -0(x) \n\nk \n\nBasically,  this  means  that the scaling functions  at one level  are composed of linear \ncombinations of scaling functions  at the  next  finer  level.  The wavelet  functions  at \none  level  are  also  composed  of linear  combinations  of  the  scaling  functions  at  the \nnext finer  level. \n\n2.2  LIFTING  AND  SECOND  GENERATION WAVELETS \n\nSwelden's lifting scheme [Sweldens, 95a] is a way to transform a biorthogonal wavelet \ndecomposition  obtained  from  low  order filters  to one  that  could  be  obtained  from \nhigher order filters  (more FIR filter coefficients),  without applying the longer filters \nand  thus  saving  computations.  This  method  can be  used  to  increase  the  number \nof  vanishing  moments  of  the  wavelet,  or  change  the  shape  of  the  wavelet.  This \nmeans that several different filters  (i.e.  sets of basis functions)  may be applied with \nproperties relevant  to the problem domain in  a  manner more efficient  than directly \napplying  the filters  individually.  This  is  beneficial  to  performing a  search over  the \nspace of admissible  basis functions  meeting the problem domain requirements. \n\nSwelden's  Second  Generation  Wavelets  [Sweldens,  95b]  are  a  result  of  applying \nlifting to simple interpolating biorthogonal wavelets,  and redefining the refinement \nrelation of the dual  wavelet  to be: \n\n,\u00a2(x)  =  \u00a2(2x - 1)  - L ak\u00a2(x - k) \n\nk \n\nwhere the ak  are  the lifting  parameters.  The lifting  parameters may  be  selected  to \nachieve desired  properties in  the  basis functions  relevant  to the  problem domain. \n\nPrior information for  a particular application domain may now be incorporated into \nthe  basis  selection  for  wavelet  regression.  For  example,  if a  particular  application \nrequires  that  there  be a  certain degree of smoothness  (or  a  certain number of van(cid:173)\nishing  moments  in  the  baSiS),  then  only  those  lifting  parameters  which  result  in  a \nnumber  of  vanishing  moments  within  this  range  are  used.  Another  way  to  think \n\n\f630 \n\nK.  R.  Wheeler and A.  P  Dhawan \n\nabout this is  to form  a  probability distribution over the space of lifting parameters. \nThe  most  likely  lifting  parameters  will  be  those  which  most  closely  match  one's \nintuition  for  the given  problem domain. \n\n2.3  THRESHOLD  SELECTION \n\nSince the wavelet  transform  is  a  linear operator the decomposition  coefficients  will \nhave the same form of noise as the sampled data.  The idea behind wavelet regression \nis  that the decomposition coefficients that have a small magnitude are substantially \nrepresentative of the noise component of the sampled data.  A  threshold is  selected \nand then  all  coefficients  which  are below  the threshold  in  magntiude  are  either set \nto  zero  (a  hard  threshold)  or  a  moved  towards  zero  (a  soft  threshold).  The  soft \nthreshold'T]t(Y)  =  sgn(Y)(1  Y I -t) is  used  in  this study. \nThere  are  two  basic  methods  of  threshold  selection:  1.  Donoho's  [Donoho,  95] \nanalytic  method  which  relies  on  knowledge  of  the  noise  distribution  (such  as  a \nGaussian noise source with a certain variance);  2.  a cross-validation approach (many \nof which are reviewed in  [Nason, 96]).  It is beyond the scope of this paper to review \nthese methods.  Leave-one-out cross-validation with padding was  used in this study. \n\n3  METHODOLOGY \n\nThe  test  functions  used  in  this  study  are the four  functions  published  by  Donoho \nand  Johnstone  [Donoho  and  Johnstone,  94].  These  functions  have  been  adopted \nby  the  wavelet  regression  community  to  aid  in  comparison  of  algorithms  across \npublications . \n\nEach function  was  uniformly sampled to contain 2048  points.  Gaussian white noise \nwas  added so  that  the signal  to noise  ratio  (SNR)  was  7.0.  Fifty  replicates  of each \nnoisy function  were created, of which  four  instantiations are depicted in  Figure 1. \n\nThe  noise  removal  process  involved  three  steps.  The  first  step  was  to  perform  a \ndiscrete  wavelet  transform  using  a  paticular  basis.  A  threshold  was  selected  for \nthe  resulting  decomposition  coefficients  using  leave-one-out  cross  validation  with \npadding. \n\nThe soft threshold was then applied to the decomposition.  Next, the inverse wavelet \ntransform was applied to obtain a cleaner version of the original signal.  These steps \nwere  repeated for  each basis set or for  each  set of lifting  parameters. \n\n3.1  WAVELET  BASIS  SELECTION \n\nTo  demonstrate  the  effect  of basis  selection  on  the  threshold  found  and  the  error \nin  the  resulting recovered signal,  the following experiments were conducted.  In  the \nfirst  trial two well  studied orthogonal wavelet families  were used:  Daubechies most \ncompactly  supported  (DMCS),  and  Symlets  (8)  [Daubechies,  92].  For  the  DMCS \nfamily,  filters  of order  1  (which  corresponds  to  the  Haar  wavelet)  through  7  were \nused.  For the Symlets, filters  of order 2 through 8  were used.  For each filter,  leave(cid:173)\none-out  cross-validation  was  used  to  find  a  threshold  which  minimized  the  mean \nsquare error for  each  of the  50  replicates  for  the  four  test  functions.  The median \nthreshold  found  was  then  applied  to  the  decomposit.ion  of  each  of  the  replicates \n\n\fBasis Selection for Wavelet Regression \n\n631 \n\nfor  each  test  function.  The  resulting  reconstructed  signals  are  compared  to  the \nideal function  (the original before noise was added)  and the Normalized Root Mean \nSquare Error  (NRMSE)  is  presented. \n\n3.2 \n\nINCORPORATING PRIOR INFORMATION:  LIFTING \nPARAMETERS \n\nIf the function  that  we  are sampling is  known  to  have  certain smoothness  proper(cid:173)\nties,  then  a  distribution  of the  admissible lifting  coefficients  representing  a  similar \nsmoothness characteristic can be formed.  However, it is  not necessary to cautiously \npick  a  prior.  The  performance  of  this  method  with  a  piecewise  linear  prior  (the \n(2,2)  biorthogonal  wavelet  of  Cohen-Daubechies-Feauveau  [Cohen,  92])  has  been \napplied  to  the  non-linear  smooth  test  functions  Bumps,  Doppler,  and  Heavysin. \nThis  method  has  been  compared  with  several  standard  techniques  [Wheeler,  96]. \nThe  Smoothing  Spline  method  (SS)  [Wahba,  90] ,  Donoho's  Sure  Shrink  method \n(SureShrink)[Donoho, 95],  and an optimized Radial Basis Function Neural Network \n(RBFNN) . \n\n4  RESULTS \n\nIn  the first  experiment,  the procedure was  only  allowed  to select  between  two  well \nknown  bases  (Daubechies  most  compactly  supported  and  symmlet  wavelets)  with \nthe  desired  filter  order.  Table  1  shows  the  filter  order  resulting  in  lowest  cross(cid:173)\nvalidation error for  each filter  and function.  The NRMSE is  presented with respect \nto  the  original  noise-free  functions  for  comparison.  As  expected  the  best  basis \nfor  the  noisy  blocks  function  was  the  piecewise  linear  basis  (Daubechies,  order  1) . \nThe doppler,  which had very high frequency components required  the highest filter \norder.  Figure 2  represents  typical  denoised  versions  for  the functions  recovered  by \nthe filters  listed  in  bold in  the table. \n\nThe method selected the basis having similar properties to the underlying function \nwithout  knowing  the  original  function.  When  higher  order  filters  were  applied  to \nthe  noisy  Blocks data, the resulting  NRMSE was  higher. \n\nThe basis selection procedure (labelled CV-Wavelets in Table 2)  was compared with \nDonoho's SureShrink, Wahba's Smoothing Splines  (SS),  and an optimized RBFNN \n[Wheeler, 96].  The prior information specified incorrectly to the procedure to prefer \nbases  near  piecewise  linear.  The  remarkable  observation  is  that  the  method  did \nbetter than the others as  measured  by  Mean Square Error. \n\n5  CONCLUSION \n\nA  basis selection  procedure for  wavelet  regression was  presented.  The method  was \nshown to select bases appropriate to the characteristics of the underlying functions. \nThe shape of the basis was  determined with cross-validation selecting from either a \npre-set  library of filters or from  previously calculated lifting coefficients.  The lifting \ncoefficients  were  calculated  to  be  appropriate  for  the  particular  problem  domain. \nThe  method  was  compared  for  various  bases  and  against  other  popular  methods. \nEven with the wrong lifting parameters, the method was able to reduce error better \nthan other standard algorithms. \n\n\f632 \n\nK.  R.  Wheeler and A . P  Dhawan \n\nNoisy  Blocks  Function \n\nNoisy  Bumps Function \n\nNoisy  Heavysin function \n\nNoisy  Doppler function. \n\nFigure 1:  Noisy  Test Functions \n\nRecovered Blocks Function \n\nRecovered  Bumps  Function \n\nRecovered  Heavysin  function \n\nRecovered  Doppler function. \n\nFigure 2:  Recovered  Functions \n\n\fBasis Selection/or Wavelet Regression \n\n633 \n\nTable  1:  Effects  of Basis Selection \n\nFilter \nFunction \nOrder \nBlocks \n1 \nBlocks \n2 \nBumps \n4 \nBumps \n5 \nDoppler \n8 \nDoppler \n8 \nHeavysin  2 \nReavysin \n5 \n\nFamily \n\nDaubechies \n\nSymmlets \nDaubechies \nSymmlets \nDaubechies \nSymmlets \nDaubechies \n\nSymmlets \n\nMedian \n\nMedian \nThr.  (MT)  Using MT  True Thr. \n\nNRMSE \n\n1.33 \n1.245 \n1.11 \n1.13 \n1.27 \n1.36 \n1.97 \n1.985 \n\n0.038 \n0.045 \n0.059 \n0.058 \n0.058 \n0.054 \n0.039 \n0.039 \n\n1.61 \n1.40 \n1.47 \n1.48 \n1.65 \n1.74 \n2.17 \n2.16 \n\nNRMSE \n\nusing  MTT \n\n0.036 \n0.045 \n0.056 \n0.055 \n0.054 \n0.050 \n0.038 \n0.038 \n\nTable 2:  Methods  Comparison Table of MSE \n\nFunction \nBlocks \nHeavysin \nDoppler \n\nSS \n\n0.546 \n0.075 \n0.205 \n\nSure Shrink  RBFNN  CV -Wavelets \n\n0.398 \n0.062 \n0.145 \n\n1.281 \n0.113 \n0.287 \n\n0.362 \n0.051 \n0.116 \n\nReferences \n\nA.  Cohen, 1.  Daubechies, and J .  C. Feauveau  (1992),  \"Biorthogonal bases  of com(cid:173)\npactly  supported  wavelets,\"  Communications  on  Pure  and  Applied  Mathematics, \nvol.  45,  no.  5,  pp.  485  - 560,  June. \n\n1.  Daubechies  (1992),  Ten  Lectures  on  Wavelets,  CBMS-NSF  Regional  Conference \nSeries in  Applied  Mathematics,  vol.  61,  SIAM, Philadelphia,  PA. \n\nD.  L.  Donoho  (1995),  \"De-noising  by  soft-thresholding,\"  IEEE  Transactions  on \nInformation  Th eory,  vol.  41,  no.  3,  pp.613-627, May. \n\nD.  L. Donoho, 1.  M.  Johnstone (1994),  \"Ideal spatial adaptation by wavelet shrink(cid:173)\nage,\"  Biometrika,  vol.  81,  no.  3,  pp.  425-455, September. \n\nG.  P.  Nason (1996),  \"Wavelet shrinkage using cross-validation,\"  Journal of the Royal \nStatistical Society,  Series  B , vol.  58,  pp.  463  - 479. \n\nW.  Sweldens (1995),  \"The lifting scheme:  a custom-design construction of biorthog(cid:173)\nonal wavelets,\"  Technical Report, no.  IMI 1994:7, Dept.  of Mathematics, University \nof South  Carolina. \n\nW.  Sweldens  (1995),  \"The  lifting  scheme:  a  construction  of  second  generation \nwavelets,\"  Technical  Report,  no.  IMI  1995:6,  Dept.  of  Mathematics,  University \nof South  Carolina. \n\nG.  Wahba (1990),  Spline  Models  for  Observational  Data,  SIAM,  Philadelphia, PA. \n\nK.  Wheeler  (1996),  Smoothing  Non-uniform  Data  Samples  With  Wavelets,  Ph.D. \nThesis,  University  of  Cincinnati,  Dept.  of  Electrical  and  Computer  Engineering, \nCincinnati,  OR . \n\n\f", "award": [], "sourceid": 1623, "authors": [{"given_name": "Kevin", "family_name": "Wheeler", "institution": null}, {"given_name": "Atam", "family_name": "Dhawan", "institution": null}]}