{"title": "Support Vector Method for Function Approximation, Regression Estimation and Signal Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 281, "page_last": 287, "abstract": null, "full_text": "Support Vector Method for Function \nApproximation, Regression Estimation, \n\nand Signal Processing\u00b7 \n\nVladimir Vapnik \n\nAT&T Research \n\n101 Crawfords Corner \n\nHolmdel, N J 07733 \n\nvlad@research.att .com \n\nSteven E. Golowich \n\nBell Laboratories \n700 Mountain Ave. \n\nMurray Hill, NJ 07974 \ngolowich@bell-Iabs.com \n\nAlex Smola\u00b7 \nGMD First \n\nRudower Shausee 5 \n\n12489 Berlin \n\nasm@big.att.com \n\nAbstract \n\nThe Support Vector (SV) method was recently proposed for es(cid:173)\ntimating regressions, constructing multidimensional splines, and \nsolving linear operator equations [Vapnik, 1995]. In this presenta(cid:173)\ntion we report results of applying the SV method to these problems. \n\n1 \n\nIntroduction \n\nThe Support Vector method is a universal tool for solving multidimensional function \nestimation problems. Initially it was designed to solve pattern recognition problems, \nwhere in order to find a decision rule with good generalization ability one selects \nsome (small) subset of the training data, called the Support Vectors (SVs). Optimal \nseparation of the SV s is equivalent to optimal separation the entire data. \n\nThis led to a new method of representing decision functions where the decision \nfunctions are a linear expansion on a basis whose elements are nonlinear functions \nparameterized by the SVs (we need one SV for each element of the basis). This \ntype of function representation is especially useful for high dimensional input space: \nthe number of free parameters in this representation is equal to the number of SVs \nbut does not depend on the dimensionality of the space. \n\nLater the SV method was extended to real-valued functions. This allows us to \nexpand high-dimensional functions using a small basis constructed from SVs. This \n\n\u00b7smola@prosun.first.gmd.de \n\n\f282 \n\nv. Vapnik, S. E. Golowich and A. Smola \n\nnovel type of function representation opens new opportunities for solving various \nproblems of function approximation and estimation. \n\nIn this paper we demonstrate that using the SV technique one can solve problems \nthat in classical techniques would require estimating a large number of free param(cid:173)\neters. In particular we construct one and two dimensional splines with an arbitrary \nnumber of grid points. Using linear splines we approximate non-linear functions . \nWe show that by reducing requirements on the accuracy of approximation, one de(cid:173)\ncreases the number of SVs which leads to data compression. We also show that the \nSV technique is a useful tool for regression estimation. Lastly we demonstrate that \nusing the SV function representation for solving inverse ill-posed problems provides \nan additional opportunity for regularization. \n\n2 SV method for estimation of real functions \n\nLet x E Rn and Y E Rl. Consider the following set of real functions: a vector x is \nmapped into some a priori chosen Hilbert space, where we define functions that are \nlinear in their parameters \n\nY = I(x,w) = L Wi(xi), *(x)) + b \n\nl \n\n(4) \n\n;=1 \n\nwhere ai, ai 2:: 0 with aiai = 0 and (**(Xi), **(x\u00bb is the inner product of two \nelements of Hilbert space. \n\nTo find the coefficients a; and ai one has to solve the following quadratic optimiza(cid:173)\ntion problem: maximize the functional \n\ni l l \n\nW(a*, a) = -\u00a3 L(a; +ai)+ Ly(a; -ai)-~ L \ni,j=1 \n\ni=1 \n\n;=1 \n\n(a; -ai)(aj -aj )(**(Xi), **(Xj)), \n\nsubject to constraints \n\nl \nL(ai-ai)=O, O~ai,a;~C, i=l, ... ,f. \ni=1 \n\n(5) \n\n(6) \n\n\fSV Method for Function Approximation and Regression Estimation \n\n283 \n\nThe important feature of the solution (4) of this optimization problem is that only \nsome of the coefficients (a; - ai) differ from zero. The corresponding vectors Xi are \ncalled Support Vectors (SVs). Therefore (4) describes an expansion on SVs. \n\nIt was shown in [Vapnik, 1995] that to evaluate the inner products (<1>(Xi)' <1>(x)) \nboth in expansion (4) and in the objective function (5) one can use the general \nform of the inner product in Hilbert space. According to Hilbert space theory, to \nguarantee that a symmetric function K ( u, v) has an expansion \n\nK(u, v) = L ak1fJk(u)tPk(V) \n\n00 \n\nk=l \n\nwith positive coefficients ak > 0, i.e. to guarantee that K (u, v) is an inner product \nin some feature space <1>, it is necessary and sufficient that the conditions \n\nJ K(u, v)g(u)g(v) du dv > 0 \n\n(7) \n\n(8) \n\nbe valid for any non-zero function 9 on the Hilbert space (Mercer's theorem). \n\nTherefore, in the SV method, one can replace (4) with \n\nI(x, a, a*) = L(a; - ai)K(x, Xi) + b \n\nl \n\ni=l \n\nwhere the inner product (<1>( Xi), <1>( x\u00bb \ncoefficients ai and ai one has to maximize the function \n\nis defined through a kernel K (Xi, x). To find \n\nW(a*, a) = -[ L(a; +ai)+ Ly(a; -ai)- ~ L \n\n(a; -ai)(aj -aj)K(xi, Xj) (9) \n\nl \n\nl \n\nl \n\ni=l \n\ni=l \n\ni,j=l \n\nsubject to constraints (6). \n\n3 Constructing kernels for inner products \n\nTo define a set of approximating functions one has to define a kernel K (Xi, X) that \ngenerates the inner product in some feature space and solve the corresponding \nquadratic optimization problem. \n\n3.1 Kernels generating splines \n\nWe start with the spline functions. According to their definition, splines are piece(cid:173)\nwise polynomial functions, which we will consider on the set [0,1]. Splines of order \nn have the following representation \n\nn \n\nN \n\nIn(x) = L arxr + L Wj(x - t~r~. \n\n(10) \n\nr=O \n\n~=l \n\nwhere (x - t)+ = max{(x - t), O}, tl, ... , tN E [0,1] are the nodes, and ar , Wj are \nreal values. One can consider the spline function (10) as a linear function in the \nn + N + 1 dimensional feature space spanned by \n\n1, x, ... , xn, (x - tdf., ... , (x - tN)f.. \n\n\f284 \n\nV. Vapnik, S. E. Golowich and A. Smola \n\nTherefore the inner product that generates splines of order n in one dimension is \n\nn \n\nN \n\nI\u00abXi,Xj) = Lx;xj + L(Xi -t3)~(Xj -t3)~' \n\n(11) \n\nr=O \n\n3=1 \n\nTwo dimensional splines are linear functions in the (N + n + 1)2 dimensional space \n\n1, x, ... , xn, y, ... , yn, ... , (x - td~(y - t~)~, ... , (x - tN )~(y - tN )~. \n\n(12) \nLet us denote by Ui = (Xi ,Yi), Uj = (Xi,Yj) two two-dimensional vectors. Then the \ngenerating kernel for two dimensional spline functions of order n is \n\nIt is easy to check that the generating kernel for the m-dimensional splines is the \nproduct of m one-dimensional generating kernels. \n\nIn applications of the SV method the number of nodes does not play an impor(cid:173)\ntant role. Therefore, we introduce splines of order d with an infinite number of \nnodes S~oo). To do this in the R1 case, we map any real value Xi to the element \n1, Xi, ... , xi, (Xi - t)+ of the Hilbert space. The inner product becomes \n\nI\u00abXi,Xj) = Lx;xj+ 1 (Xi-t)~(Xj -t)~dt \n\n1 \n\nn \n\nr=O \n\n0 \n\n(13) \n\nFor linear splines S~oo) we therefore have the following generating kernel: \n\nIn many applications expansions in Bn-splines [Unser & Aldroubi, 1992] are used, \nwhere \n\nBn(x) = E (-~y ( n + 1 ) (X + n + 1 _ r)n . \n\nn. \n\nr=O \n\nr \n\n2 \n\n+ \n\nOne may use Bn-splines to perform a construction similar to the above, yielding \nthe kernel \n\n3.2 Kernels generating Fourier expansions \nLastly, Fourier expansion can be considered as a hyperplane in following 2N + 1 \ndimensional feature space \n\nV2 ' cos x, sln x, ... , cos x, sln x. \n1 \n\nN ' N \n\n. \n\nThe inner product in this space is defined by the Dirichlet formula: \n\n\fSV Method for Function Approximation and Regression Estimation \n\n285 \n\n4 Function estimation and data compression \n\nIn this section we approximate functions on the basis of observations at f points \n\n(16) \n\nWe demonstrate that to construct an approximation within an accuracy of \u00b1c at \nthe data points, one can use only the subsequence of the data containing the SVs. \n\nWe consider approximating the one and two dimensional functions \n\nsmlxl \nf(x) = smclxl = -I-xl-\n\n. \n\n(17) \n\non the basis of a sequence of measurements (without noise) on the uniform lattice \n(100 for the one dimensional case and 2,500 for the two-dimensional case). \nFor different c we approximate this function by linear splines from si 00) . \n\nFigure 1: Approximations with different levels of accuracy require different numbers \nofSV: 31 SV for c = 0.02 (left) and 9 SV for c = 0.1. Large dots indicate SVs . \n\n..,(cid:173)\n.. .. -.. \n\n\u2022 +;. \n\nos \n\nD \n\n.... \n.. . . - -\n+ \u2022\u2022\u2022\u2022\u2022 : .\"'.4\\ , \u2022\u2022 ~. \n. . .... \" ...... . \n\u2022\u2022 \n\n,.... +: :. \n\n\u2022 - . : \u2022\u2022\u2022\u2022\u2022 + \u2022\u2022 \n\n\u2022 \n\n.+ .+ \n\n.... ,+ ::, \u2022\u2022\u2022\u2022 \n\nFigure 2: Approximation of f( x, y) \nsplines with accuracy c = 0.01 (left) required 157 SV (right) \n\nsinc vi x 2 + y2 by two dimensional linear \n\n.. \n\n\u2022 \n\n\u2022 \n\n~ \u2022 \n\n0 \n\n\u2022 \n\n\u2022 \u2022 \n\u2022 \n\nFigure 3: sincx function corrupted by different levels of noise \u00ab(7 = 0.2 left, 0.5 \nright) and its regression. Black dots indicate SV, circles non-SV data. \n\no \no \n\nof> \n\n.0 \n\n~ . \n\n\u2022 \n\n\u2022 0 \n\n\f286 \n\nV. Vapnik, S. E. Golowich and A. Smola \n\n5 Solution of the linear operator equations \n\nIn this section we consider the problem of solving linear equations in the set of \nfunctions defined by SVs. Consider the problem of solving a linear operator equation \n\nAf(t) = F(x), \n\nf(t) E 2, F(x) E W, \n\n(18) \n\nwhere we are given measurements of the right hand side \n\n(19) \nConsider the set of functions f(t, w) E 2 linear in some feature space {**(t) = \n(\u00a2>o(t), ... , \u00a2>N(t), ... )}: \n\n(Xl, FI ), ... , (Xl, Fl). \n\n00 \n\nf(t, w) = L wr\u00a2>r(t) = (W, **(t\u00bb . \n\n(20) \n\nr=O \n\nThe operator A maps this set of functions into \n\nF(x, w) = Af(t, w) = L wrA\u00a2>r(t) = L wrtPr(x) = (W, w(x\u00bb \n\n(21) \n\n00 \n\n00 \n\nr=O \n\nr=O \n\nwhere tPr(x) = A\u00a2>r(t), w(x) = (tPl(X), ... , tPN(X), ... ). Let us define the generating \nkernel in image space \n\n00 \n\nK(Xi, Xj) = L \n\ntPr(Xi)tPr(Xj) = (W(Xi)' W(Xj\u00bb \n\n(22) \n\nand the corresponding cross-kernel function \n\nr=O \n\n00 \n\nK,(Xi' t) = L \n\ntPr(xd\u00a2>r(t) = (W(Xi), **(t\u00bb. \n\n(23) \n\nr=O \n\nThe problem of solving (18) in the set of functions f(t, w) E 2 (finding the vector \nW) is equivalent to the problem of regression estimation (21) using data (19). \n\nTo estimate the regression on the basis of the kernel K(Xi, Xj) one can use the \nmethods described in Section 1. The obtained parameters (a; - ai, i = 1, ... f) \ndefine the approximation to the solution of equation (18) based on data (19): \n\nl \n\nf(t, a) = L(ai - ai)K,(xi, t). \n\ni=l \n\nWe have applied this method to solution of the Radon equation \n\nj aCm) \n\n-a(m) \n\nf( m cos tt + u sin tt, m sin tt - u cos tt )du = p( m, tt), \n\n-1 ~ m ~ 1, 0 < tt < 11\", \n\n(24) \nusing noisy observations (ml' ttl, pd, ... , (ml' ttl, Pi)' where Pi = p( mi, tti) + ~i and \n{ed are independent with Eei = 0, Eel < 00. \n\na(m) = -/1 - m 2 \n\n\fsv Method for Function Approximation and Regression Estimation \n\n287 \n\nFor two-dimensional linear splines S~ 00) we obtained analytical expressions for the \n'kernel (22) and cross-kernel (23). We have used these kernels for solving the corre(cid:173)\nsponding regression problem and reconstructing images based on data that is similar \nto what one might get from a Positron Emission Tomography scan [Shepp, Vardi \n& Kaufman, 1985]. \n\nA remarkable feature of this solution is that it aVOIds a pixel representation of the \nfunction which would require the estimation of 10,000 to 60,000 parameters. The \nspline approximation shown here required only 172 SVs. \n\nFigure 4: Original image (dashed line) and its reconstruction (solid line) from 2,048 \nobservations (left). 172 SVs (support lines) were used in the reconstruction (right). \n\n6 Conclusion \n\nIn this article we present a new method of function estimation that is especially \nuseful for solving multi-dimensional problems. The complexity of the solution of \nthe function estimation problem using the SV representation depends on the com(cid:173)\nplexity of the desired solution (i.e. on the required number of SVs for a reasonable \napproximation of the desired function) rather than on the dimensionality of the \nspace. Using the SV method one can solve various problems of function estimation \nboth in statistics and in applied mathematics. \n\nAcknowledgments \n\nWe would like to thank Chris Burges (Lucent Technologies) and Bernhard Scholkopf \n(MPIK Tiibingen) for help with the code and useful discussions. \n\nThis work was supported in part by NSF grant PHY 95-12729 (Steven Golowich) \nand by ARPA grant N00014-94-C-0186 and the German National Scholarship Foun(cid:173)\ndation (Alex Smola). \n\nReferences \n\n1. Vladimir Vapnik, \"The Nature of Statistical Learning Theory\", 1995, Springer \nVerlag N.Y., 189 p. \n\n2. Michael Unser and Akram Aldroubi, \"Polynomial Splines and Wevelets - A Signal \nPerspectives\", In the book: \"Wavelets -A tutorial in Theory and Applications\" , \nC.K. Chui (ed) pp. 91 - 122, 1992 Academic Press, Inc. \n\n3. 1. Shepp, Y. Vardi, and L. Kaufman, \"A statistical model for Positron Emission \nTomography,\" J. Amer. Stat. Assoc. 80:389 pp. 8-37 1985. \n\n\f", "award": [], "sourceid": 1187, "authors": [{"given_name": "Vladimir", "family_name": "Vapnik", "institution": null}, {"given_name": "Steven", "family_name": "Golowich", "institution": null}, {"given_name": "Alex", "family_name": "Smola", "institution": null}]}*