{"title": "Support Vector Method for Multivariate Density Estimation", "book": "Advances in Neural Information Processing Systems", "page_first": 659, "page_last": 665, "abstract": null, "full_text": "Support Vector Method for Multivariate \n\nDensity Estimation \n\nVladimir N. Vapnik \n\nRoyal Halloway College and \nAT &T Labs, 100 Schultz Dr. \n\nRed Bank, NJ 07701 \nvlad@research.att.com \n\nSayan Mukherjee \nCBCL, MIT E25-201 \nCambridge, MA 02142 \n\nsayan@ai.mit.edu \n\nAbstract \n\nA new method for multivariate density estimation is developed \nbased on the Support Vector Method (SVM) solution of inverse \nill-posed problems. The solution has the form of a mixture of den(cid:173)\nsities. This method with Gaussian kernels compared favorably to \nboth Parzen's method and the Gaussian Mixture Model method. \nFor synthetic data we achieve more accurate estimates for densities \nof 2, 6, 12, and 40 dimensions. \n\n1 \n\nIntroduction \n\nThe problem of multivariate density estimation is important for many applications, \nin particular, for speech recognition [1] [7]. When the unknown density belongs to a \nparametric set satisfying certain conditions one can estimate it using the maximum \nlikelihood (ML) method. Often these conditions are too restrictive. Therefore, \nnon-parametric methods were proposed. \n\nThe most popular of these, Parzen's method [5], uses the following estimate given \ndata Xl, ... , Xl: \n\n(1) \n\nwhere K'Yl(t - Xi) is a smooth function such that J K'Yl(t - xi)dt = 1. Under some \nconditions on \"Yl and K'Yl (t - Xi), Parzen's method converges with a fast asymptotic \nrate. An example of such a function is a Gaussian with one free parameter \"Y; (the \nwidth) \n\nThe structure of the Parzen estimator is too complex: the number of terms in (1) \nis equal to the number of observations (which can be hundreds of thousands). \n\n\f660 \n\nV. N. Vapnik and S. Mukherjee \n\nResearchers believe that for practical problems densities can be approximated by \na mixture with few elements (Gaussians for Gaussian Mixture Models (GMM)). \nTherefore, the following parametric density model was introduced \n\nm \n\nm \n\nP(x, a,~) = 2: (}:iP(X, ai, ~i)' \n\n(}: ~ 0, 2: (}:i = 1, \n\n(3) \n\ni=l \n\ni=l \n\nwhere P(x, ai, ~i) are Gaussians with different vectors ai and different diagonal \ncovariance matrices ~i; (}:i is the proportion of the i-th Gaussian in the mixture. \n\nIt is known [9] that for general forms of Gaussian mixtures the ML estimate does not \nexist. To use the ML method two values are specified: a lower bound on diagonal \nelements of the covariance matrix and an upper bound on the number of mixture \nelements. Under these constraints one can estimate the mixture parameters using \nthe EM algorithm. This solution, however, is based on predefined parameters. \n\nIn this article we use an SVM approach to obtain an estimate in the form of a \nmixture of densities. The approach has no free parameters. In our experiments it \nperforms better than the GMM method. \n\n2 Density estimation is an ill-posed problem \n\nA density p(t) is defined as the solution of the equation \n\ni~ p(t) dt = F(x), \n\n(4) \n\nwhere F(x) is the probability distribution function. Estimating a density from data \ninvolves solving equation (4) on a given set of densities when the distribution func(cid:173)\ntion F(x) is unknown but a random i.i.d. sample Xl, ... , Xe is given. The empirical \ndistribution function Fe(x) is a good approximation of the actual distribution, \n\nwhere O( u) is the step-function. \nIn the univariate case, for sufficiently large l \nthe distribution of the supremum error between F(x) and Ft(x) is given by the \nKolmogorov-Smirnov distribution \n\nP{sup IF(x) - Fe(x)1 < c/Vi} = 1- 22:( _1)k-1 exp{ -2c2k2 }. \n\n(5) \n\nx \n\nk=l \n\nHence, the problem of density estimation can be restated as solving equation (4) \nbut replacing the distribution function F(x) with the empirical distribution function \nFe(x) which converges to the true one with the (fast) rate 0(1)' for univariate and \nmultivariate cases. \nThe problem of solving the linear operator equation Ap = F with approximation \nFt(x) is ill-posed. \n\nIn the 1960's methods were proposed for solving ill-posed problems using approx(cid:173)\nimations Ft converging to F as l increases. The idea of these methods was to \n\n\fSupport Vector Method for Multivariate Density Estimation \n\n661 \n\nintroduce a regularizing functional O(P) (a semi-continuous, positive functional for \nc > 0 is a compactum) and define the solution Pt which is a \nwhich O(p) ~ c, \ntrade-off between O(p) and IIAp - Ftll. \nThe following two methods which are asymptotically equivalent [11] were proposed \nby Tikhonov [8] and Phillips [6] \n\nmin [IIAp - Ftll 2 + ItO(P)] , \n\np \n\nIt > 0, It -+ 0, \n\nminO(p) s.t. IIAp - Fill < et, et > 0, \n\np \n\net -+ O. \n\n(6) \n\n(7) \n\nFor the stochastic case it can be shown for both methods that if Ft(x) converges in \nprobability to F(x) and Ii -+ 0 then for sufficiently large f and arbitrary 1/ and J.L \nthe following inequality holds [10] [9] [3] \n\n(8) \n\nwhere f > fO(1/, J.L) and PEl (p,Pi), PE2(F, Fi ) are metrics in the spaces p and F. \nSince Fi(X) -+ F(x) in probability with the rate O( ~), from equation (8) it follows \nthat if Ii > O( ~) the solutions of equation (4) are consistent. \n\n3 Choice of regularization parameters \n\nFor the deterministic case the residual method [2] can be used to set the regular(cid:173)\nization parameters (,i in (6) and ei in (7)) by setting the parameter (,i or ei) such \nthat Pi satisfies the following \n\nIIApi - Fill = IIF(x) - Fi(X) II = Ui, \n\n(9) \n\nwhere Ui is the known accuracy of approximation of F(x) by Fi(X). We use this idea \nfor the stochastic case. The Kolmogorov-Smirnov distribution is used to set Ui, Ui = \nc/...[i, where c corresponds to an appropriate quantile. For the multivariate case \none can either evaluate the appropriate quantile analytically [4] or by simulations. \n\nThe density estimation problem can be solved using either regularization method \n(6) or (7). Using method (6) with a L2 norm in image space F and regularization \nfunctional O(p) = (Tp, Tp) where T is a convolution operator, one obtains Parzen's \nmethod [10] [9] with kernels defined by operator T \n\n4 SVM for density estimation \n\nWe apply the SVM technique to equation (7) for density estimation. We use the C \nnorm in (7) and solve equation (4) in a set of functions belonging to a Reproducing \nKernel Hilbert Space (RKHS). We use the regularization functional \n\nO(P) = IIplit = (p,p)1i. \n\n(10) \n\nA RKHS can be defined by a positive definite kernel K (x, y) and an inner product \n(f, g)1i in Hilbert space 1-l such that \n\n(f(x),K(x,Y))1i = f(y) \"If E 1-l. \n\n(11) \n\n\f662 \n\nV. N. Vapnik and S. Mukherjee \n\nNote that any positive definite function K(x,y) has an expansion \n\n00 \n\nK(x,y) = LAi