{"title": "Support Vector Method for Function Approximation, Regression Estimation and Signal Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 281, "page_last": 287, "abstract": null, "full_text": "Support  Vector Method for  Function \nApproximation,  Regression Estimation, \n\nand  Signal Processing\u00b7 \n\nVladimir Vapnik \n\nAT&T  Research \n\n101  Crawfords Corner \n\nHolmdel,  N J  07733 \n\nvlad@research.att .com \n\nSteven E.  Golowich \n\nBell  Laboratories \n700  Mountain Ave. \n\nMurray Hill,  NJ  07974 \ngolowich@bell-Iabs.com \n\nAlex Smola\u00b7 \nGMD First \n\nRudower Shausee  5 \n\n12489  Berlin \n\nasm@big.att.com \n\nAbstract \n\nThe  Support  Vector  (SV)  method  was  recently  proposed  for  es(cid:173)\ntimating  regressions,  constructing  multidimensional  splines,  and \nsolving linear operator equations  [Vapnik,  1995].  In  this  presenta(cid:173)\ntion we report results of applying the SV method to these problems. \n\n1 \n\nIntroduction \n\nThe Support Vector method is a universal tool for solving multidimensional function \nestimation problems.  Initially it was designed to solve pattern recognition problems, \nwhere  in  order  to  find  a  decision  rule  with  good  generalization  ability  one  selects \nsome (small) subset of the training data, called the Support Vectors (SVs).  Optimal \nseparation of the SV s  is  equivalent to optimal separation  the entire  data. \n\nThis  led  to  a  new  method  of  representing  decision  functions  where  the  decision \nfunctions  are  a  linear expansion  on  a  basis  whose elements  are  nonlinear functions \nparameterized  by  the  SVs  (we  need  one  SV  for  each  element  of the  basis).  This \ntype of function representation is especially useful for high dimensional input space: \nthe number of free  parameters in  this representation  is  equal to the  number of SVs \nbut does  not  depend on the  dimensionality of the space. \n\nLater  the  SV  method  was  extended  to  real-valued  functions.  This  allows  us  to \nexpand high-dimensional functions  using  a small basis  constructed  from  SVs.  This \n\n\u00b7smola@prosun.first.gmd.de \n\n\f282 \n\nv.  Vapnik,  S.  E.  Golowich and A.  Smola \n\nnovel  type  of function  representation  opens  new  opportunities  for  solving  various \nproblems of function  approximation and estimation. \n\nIn  this paper  we  demonstrate that using  the SV technique one  can  solve  problems \nthat in  classical techniques  would  require estimating a  large number of free  param(cid:173)\neters.  In particular we  construct one and two dimensional splines with an arbitrary \nnumber  of grid  points.  Using  linear  splines  we  approximate non-linear  functions . \nWe show that by  reducing requirements on the accuracy  of approximation, one  de(cid:173)\ncreases  the number of SVs which  leads to data compression.  We also show  that the \nSV  technique  is  a useful  tool for  regression  estimation.  Lastly we  demonstrate that \nusing the SV function representation for  solving inverse ill-posed problems provides \nan  additional opportunity for  regularization. \n\n2  SV method for  estimation of real functions \n\nLet  x  E  Rn  and Y E Rl.  Consider  the following set  of real functions:  a  vector  x  is \nmapped into some a  priori chosen  Hilbert space,  where we  define functions that are \nlinear in  their parameters \n\nY = I(x,w) = L  Wi<Pi(X),  W = (WI, ... ,WN, ... ) E n \n\n00 \n\ni=1 \n\n(1) \n\nIn [Vapnik,  1995] the following method for estimating functions  in the set (1)  based \non training data (Xl, Yd, .. . , (Xl, Yl)  was suggested:  find the function  that minimizes \nthe following functional: \n\n1 \n\nl \n\nR(w) = \u00a3 L \ni=1 \n\nIYi  - I(Xi, w)lt:  + I(w, w), \n\nwhere \n\nIy - I(x, w)lt:  = \n\n{  0 \n\nIy - I(x, w)l- \u00a3  otherwise, \n\nif  Iy - I(x, w)1  < \u00a3, \n\n(2) \n\n(3) \n\n(w, w)  is  the  inner  product  of two  vectors,  and  I  is  some constant .  It was  shown \nthat the function  minimizing this functional has a  form: \n\nI(x, a, a*)  =  L(a; - ai)(<I>(xi), <I>(x)) + b \n\nl \n\n(4) \n\n;=1 \n\nwhere  ai, ai  2::  0  with  aiai  =  0  and  (<I>(Xi), <I>(x\u00bb  is  the  inner  product  of two \nelements of Hilbert space. \n\nTo find  the coefficients a;  and ai  one has to solve the following quadratic optimiza(cid:173)\ntion problem:  maximize the functional \n\ni l l  \n\nW(a*, a) = -\u00a3 L(a; +ai)+ Ly(a; -ai)-~ L \ni,j=1 \n\ni=1 \n\n;=1 \n\n(a; -ai)(aj -aj )(<I>(Xi), <I>(Xj)), \n\nsubject  to constraints \n\nl \nL(ai-ai)=O,  O~ai,a;~C,  i=l, ... ,f. \ni=1 \n\n(5) \n\n(6) \n\n\fSV Method for Function Approximation and Regression Estimation \n\n283 \n\nThe important feature  of the solution (4)  of this optimization problem is  that only \nsome of the coefficients (a; - ai) differ from zero.  The corresponding vectors  Xi  are \ncalled Support  Vectors  (SVs).  Therefore  (4)  describes  an expansion on  SVs. \n\nIt was  shown  in  [Vapnik,  1995]  that  to  evaluate  the  inner  products  (<1>(Xi)' <1>(x)) \nboth  in  expansion  (4)  and  in  the  objective  function  (5)  one  can  use  the  general \nform  of the  inner  product  in  Hilbert  space.  According  to  Hilbert  space  theory,  to \nguarantee that  a  symmetric function  K ( u, v)  has an expansion \n\nK(u, v) = L ak1fJk(u)tPk(V) \n\n00 \n\nk=l \n\nwith  positive coefficients  ak  > 0,  i.e.  to guarantee that K (u, v)  is  an inner product \nin some feature  space  <1>,  it  is  necessary  and sufficient  that  the conditions \n\nJ K(u, v)g(u)g(v) du dv  > 0 \n\n(7) \n\n(8) \n\nbe  valid for  any  non-zero function  9  on  the Hilbert space  (Mercer's  theorem). \n\nTherefore,  in  the SV  method, one  can replace  (4)  with \n\nI(x, a, a*) = L(a; - ai)K(x, Xi) + b \n\nl \n\ni=l \n\nwhere  the inner product  (<1>( Xi),  <1>( x\u00bb \ncoefficients  ai  and ai  one  has  to maximize the function \n\nis  defined  through a  kernel  K (Xi,  x).  To find \n\nW(a*, a) = -[ L(a; +ai)+ Ly(a; -ai)- ~ L \n\n(a; -ai)(aj -aj)K(xi, Xj)  (9) \n\nl \n\nl \n\nl \n\ni=l \n\ni=l \n\ni,j=l \n\nsubject  to constraints  (6). \n\n3  Constructing kernels for  inner products \n\nTo define  a set  of approximating functions  one has  to define  a  kernel  K (Xi, X)  that \ngenerates  the  inner  product  in  some  feature  space  and  solve  the  corresponding \nquadratic optimization problem. \n\n3.1  Kernels generating splines \n\nWe  start with the spline functions.  According to their  definition, splines  are piece(cid:173)\nwise  polynomial functions,  which  we  will  consider on the set  [0,1].  Splines of order \nn  have the following representation \n\nn \n\nN \n\nIn(x) = L  arxr + L  Wj(x - t~r~. \n\n(10) \n\nr=O \n\n~=l \n\nwhere  (x - t)+  =  max{(x - t),  O},  tl, ... , tN  E  [0,1]  are  the  nodes,  and  ar , Wj  are \nreal  values.  One  can  consider  the  spline  function  (10)  as  a  linear function  in  the \nn + N  + 1 dimensional feature  space spanned by \n\n1, x, ... , xn, (x - tdf., ... , (x - tN)f.. \n\n\f284 \n\nV.  Vapnik,  S.  E.  Golowich and A.  Smola \n\nTherefore  the  inner product  that generates splines  of order  n  in  one  dimension is \n\nn \n\nN \n\nI\u00abXi,Xj) = Lx;xj + L(Xi -t3)~(Xj -t3)~' \n\n(11) \n\nr=O \n\n3=1 \n\nTwo dimensional splines  are linear functions  in the (N + n + 1)2  dimensional space \n\n1, x, ... , xn, y, ... , yn, ... , (x - td~(y - t~)~, ... , (x - tN )~(y - tN )~. \n\n(12) \nLet us  denote by Ui  = (Xi ,Yi),  Uj  = (Xi,Yj)  two two-dimensional vectors.  Then the \ngenerating  kernel  for  two  dimensional spline functions  of order  n  is \n\nIt is  easy  to  check  that  the  generating kernel  for  the  m-dimensional splines  is  the \nproduct  of m  one-dimensional generating  kernels. \n\nIn  applications  of the  SV  method  the  number  of nodes  does  not  play  an  impor(cid:173)\ntant  role.  Therefore,  we  introduce  splines  of order  d  with  an  infinite  number  of \nnodes  S~oo).  To  do  this  in  the  R1  case,  we  map  any  real  value  Xi  to  the  element \n1, Xi,  ... , xi, (Xi  - t)+  of the  Hilbert space.  The inner  product  becomes \n\nI\u00abXi,Xj) =  Lx;xj+ 1 (Xi-t)~(Xj -t)~dt \n\n1 \n\nn \n\nr=O \n\n0 \n\n(13) \n\nFor  linear splines  S~oo) we  therefore  have  the following generating kernel: \n\nIn  many applications expansions  in  Bn-splines  [Unser  &  Aldroubi,  1992]  are  used, \nwhere \n\nBn(x) = E (-~y (  n + 1  )  (X + n + 1 _ r)n . \n\nn. \n\nr=O \n\nr \n\n2 \n\n+ \n\nOne  may  use  Bn-splines  to  perform  a  construction  similar  to  the  above,  yielding \nthe kernel \n\n3.2  Kernels generating Fourier expansions \nLastly,  Fourier  expansion  can  be  considered  as  a  hyperplane  in  following  2N + 1 \ndimensional feature space \n\nV2 ' cos x, sln x, ... , cos  x, sln  x. \n1 \n\nN '  N \n\n. \n\nThe inner product  in  this  space  is  defined  by  the Dirichlet formula: \n\n\fSV Method for Function Approximation and Regression Estimation \n\n285 \n\n4  Function  estimation and data compression \n\nIn this section  we  approximate functions on the basis  of observations at f  points \n\n(16) \n\nWe  demonstrate that  to  construct  an  approximation within  an  accuracy  of \u00b1c  at \nthe data points, one  can  use  only the subsequence of the data containing the SVs. \n\nWe  consider  approximating the one  and two  dimensional functions \n\nsmlxl \nf(x)  =  smclxl  =  -I-xl-\n\n. \n\n(17) \n\non the  basis of a  sequence  of measurements  (without  noise)  on  the  uniform lattice \n(100 for  the one  dimensional case  and 2,500 for the two-dimensional case). \nFor different  c  we  approximate this function  by  linear splines from si 00) . \n\nFigure 1:  Approximations with different  levels of accuracy require different numbers \nofSV: 31  SV for  c = 0.02  (left)  and  9 SV for  c =  0.1.  Large  dots indicate SVs . \n\n..,(cid:173)\n.. .. -.. \n\n\u2022  +;. \n\nos \n\nD \n\n.... \n.. . . - -\n+ \u2022\u2022\u2022\u2022\u2022 :  .\"'.4\\ , \u2022\u2022 ~. \n. .   ....  \" ...... . \n\u2022\u2022 \n\n,....  +:  :. \n\n\u2022  - . :  \u2022\u2022\u2022\u2022\u2022 + \u2022\u2022 \n\n\u2022 \n\n.+ .+ \n\n....  ,+ ::, \u2022\u2022\u2022\u2022 \n\nFigure  2:  Approximation  of  f( x, y) \nsplines  with  accuracy  c =  0.01  (left)  required  157 SV  (right) \n\nsinc vi x 2 + y2  by  two  dimensional  linear \n\n.. \n\n\u2022 \n\n\u2022 \n\n~  \u2022 \n\n0 \n\n\u2022 \n\n\u2022  \u2022 \n\u2022 \n\nFigure  3:  sincx  function  corrupted  by  different  levels  of noise  \u00ab(7  = 0.2  left,  0.5 \nright)  and its regression.  Black  dots  indicate SV, circles  non-SV  data. \n\no \no \n\nof> \n\n.0 \n\n~ . \n\n\u2022 \n\n\u2022  0 \n\n\f286 \n\nV.  Vapnik,  S.  E.  Golowich and A. Smola \n\n5  Solution  of the linear  operator equations \n\nIn  this  section  we  consider  the  problem  of solving  linear  equations  in  the  set  of \nfunctions defined by SVs.  Consider the problem of solving a linear operator equation \n\nAf(t) = F(x), \n\nf(t)  E 2,  F(x)  E  W, \n\n(18) \n\nwhere  we  are given measurements of the right hand side \n\n(19) \nConsider  the  set  of functions  f(t, w)  E  2  linear  in  some  feature  space  {<I>(t)  = \n(\u00a2>o(t),  ... , \u00a2>N(t), ... )}: \n\n(Xl, FI ), ... , (Xl,  Fl). \n\n00 \n\nf(t, w) =  L  wr\u00a2>r(t)  =  (W, <I>(t\u00bb . \n\n(20) \n\nr=O \n\nThe operator A  maps this set  of functions  into \n\nF(x, w) =  Af(t, w)  =  L  wrA\u00a2>r(t)  =  L  wrtPr(x)  =  (W, w(x\u00bb \n\n(21) \n\n00 \n\n00 \n\nr=O \n\nr=O \n\nwhere  tPr(x)  = A\u00a2>r(t),  w(x) = (tPl(X), ... , tPN(X), ... ).  Let  us  define  the  generating \nkernel  in  image space \n\n00 \n\nK(Xi, Xj)  =  L \n\ntPr(Xi)tPr(Xj)  = (W(Xi)' W(Xj\u00bb \n\n(22) \n\nand the corresponding  cross-kernel  function \n\nr=O \n\n00 \n\nK,(Xi' t) =  L \n\ntPr(xd\u00a2>r(t)  =  (W(Xi), <I>(t\u00bb. \n\n(23) \n\nr=O \n\nThe problem of solving (18)  in the set  of functions  f(t, w)  E  2  (finding  the vector \nW)  is  equivalent to the problem of regression estimation (21)  using data (19). \n\nTo  estimate  the  regression  on  the  basis  of the  kernel  K(Xi, Xj)  one  can  use  the \nmethods  described  in  Section  1.  The  obtained  parameters  (a;  - ai,  i  =  1, ... f) \ndefine  the approximation to the solution of equation  (18)  based on  data (19): \n\nl \n\nf(t, a) = L(ai - ai)K,(xi, t). \n\ni=l \n\nWe have applied this  method to solution of the Radon equation \n\nj aCm) \n\n-a(m) \n\nf( m cos tt + u sin tt,  m sin tt - u cos tt )du =  p( m, tt), \n\n-1 ~ m  ~ 1,  0 < tt < 11\", \n\n(24) \nusing  noisy  observations  (ml' ttl, pd, ... , (ml' ttl, Pi)'  where Pi  = p( mi, tti) + ~i  and \n{ed  are independent  with Eei  = 0,  Eel  < 00. \n\na(m) =  -/1 - m 2 \n\n\fsv Method for Function Approximation and Regression Estimation \n\n287 \n\nFor  two-dimensional linear splines  S~ 00)  we  obtained analytical expressions  for  the \n'kernel (22)  and cross-kernel  (23).  We  have  used  these  kernels  for  solving the corre(cid:173)\nsponding regression problem and reconstructing images based on data that is similar \nto  what  one  might get  from  a  Positron  Emission  Tomography scan  [Shepp,  Vardi \n&  Kaufman,  1985]. \n\nA  remarkable feature  of this  solution is  that  it  aVOIds  a  pixel  representation  of the \nfunction  which  would  require  the estimation of 10,000  to  60,000  parameters.  The \nspline  approximation shown  here  required only  172  SVs. \n\nFigure 4:  Original image (dashed line) and its reconstruction  (solid line) from 2,048 \nobservations (left).  172 SVs  (support lines)  were  used  in the reconstruction  (right). \n\n6  Conclusion \n\nIn  this  article  we  present  a  new  method  of function  estimation  that  is  especially \nuseful  for  solving  multi-dimensional problems.  The  complexity  of the  solution  of \nthe  function  estimation problem using  the  SV representation  depends  on the  com(cid:173)\nplexity of the desired  solution  (i.e.  on the  required  number of SVs for  a  reasonable \napproximation  of the  desired  function)  rather  than  on  the  dimensionality of the \nspace.  Using the SV  method one  can solve various problems of function estimation \nboth in statistics  and  in  applied  mathematics. \n\nAcknowledgments \n\nWe would like to thank Chris Burges (Lucent Technologies) and Bernhard Scholkopf \n(MPIK Tiibingen)  for  help  with  the  code  and useful  discussions. \n\nThis  work  was  supported  in  part  by  NSF  grant  PHY  95-12729  (Steven  Golowich) \nand by ARPA grant N00014-94-C-0186 and the German National Scholarship Foun(cid:173)\ndation (Alex Smola). \n\nReferences \n\n1.  Vladimir Vapnik,  \"The  Nature  of Statistical  Learning  Theory\",  1995,  Springer \nVerlag  N.Y.,  189  p. \n\n2.  Michael Unser and Akram Aldroubi, \"Polynomial Splines and Wevelets - A Signal \nPerspectives\",  In  the  book:  \"Wavelets  -A  tutorial  in  Theory  and  Applications\" , \nC.K.  Chui  (ed)  pp.  91  - 122,  1992  Academic Press,  Inc. \n\n3.  1. Shepp,  Y.  Vardi, and L.  Kaufman,  \"A statistical model for  Positron  Emission \nTomography,\"  J.  Amer.  Stat.  Assoc.  80:389 pp.  8-37  1985. \n\n\f", "award": [], "sourceid": 1187, "authors": [{"given_name": "Vladimir", "family_name": "Vapnik", "institution": null}, {"given_name": "Steven", "family_name": "Golowich", "institution": null}, {"given_name": "Alex", "family_name": "Smola", "institution": null}]}