{"title": "Covariance Estimation for High Dimensional Data Vectors Using the Sparse Matrix Transform", "book": "Advances in Neural Information Processing Systems", "page_first": 225, "page_last": 232, "abstract": "Covariance estimation for high dimensional vectors is a classically difficult problem in statistical analysis and machine learning due to limited sample size. In this paper, we propose a new approach to covariance estimation, which is based on constrained maximum likelihood (ML) estimation of the covariance. Specifically, the covariance is constrained to have an eigen decomposition which can be represented as a sparse matrix transform (SMT). The SMT is formed by a product of pairwise coordinate rotations known as Givens rotations. Using this framework, the covariance can be efficiently estimated using greedy minimization of the log likelihood function, and the number of Givens rotations can be efficiently computed using a cross-validation procedure. The estimator obtained using this method is always positive definite and well-conditioned even with limited sample size. Experiments on hyperspectral data show that SMT covariance estimation results in consistently better estimates of the covariance for a variety of different classes and sample sizes compared to traditional shrinkage estimators.", "full_text": "Covariance Estimation for High Dimensional Data\n\nVectors Using the Sparse Matrix Transform\n\nGuangzhi Cao\n\nCharles A. Bouman\nSchool of Electrical and Computer Enigneering\n\nPurdue University\n\nWest Lafayette, IN 47907\n\n{gcao, bouman}@purdue.edu\n\nAbstract\n\nCovariance estimation for high dimensional vectors is a classically dif\ufb01cult prob-\nlem in statistical analysis and machine learning.\nIn this paper, we propose a\nmaximum likelihood (ML) approach to covariance estimation, which employs a\nnovel sparsity constraint. More speci\ufb01cally, the covariance is constrained to have\nan eigen decomposition which can be represented as a sparse matrix transform\n(SMT). The SMT is formed by a product of pairwise coordinate rotations known\nas Givens rotations. Using this framework, the covariance can be ef\ufb01ciently esti-\nmated using greedy minimization of the log likelihood function, and the number\nof Givens rotations can be ef\ufb01ciently computed using a cross-validation proce-\ndure. The resulting estimator is positive de\ufb01nite and well-conditioned even when\nthe sample size is limited. Experiments on standard hyperspectral data sets show\nthat the SMT covariance estimate is consistently more accurate than both tradi-\ntional shrinkage estimates and recently proposed graphical lasso estimates for a\nvariety of different classes and sample sizes.\n\n1 Introduction\n\nMany problems in statistical pattern recognition and analysis require the classi\ufb01cation and analysis\nof high dimensional data vectors. However, covariance estimation for high dimensional vectors is\na classically dif\ufb01cult problem because the number of coef\ufb01cients in the covariance grows as the\ndimension squared [1, 2]. This problem, sometimes referred to as the curse of dimensionality [3],\npresents a classic dilemma in statistical pattern analysis and machine learning.\n\nIn a typical application, one measures n versions of a p dimensional vector. If n < p, then the sample\ncovariance matrix will be singular with p \u2212 n eigenvalues equal to zero. Over the years, a variety\nof techniques have been proposed for computing a nonsingular estimate of the covariance. For\nexample, regularized and shrinkage covariance estimators [4, 5, 6] are examples of such techniques.\n\nIn this paper, we propose a new approach to covariance estimation, which is based on constrained\nmaximum likelihood (ML) estimation of the covariance [7]. In particular, the covariance is con-\nstrained to have an eigen decomposition which can be represented as a sparse matrix transform\n(SMT) [8, 9]. The SMT is formed by a product of pairwise coordinate rotations known as Givens\nrotations [10]. Using this framework, the covariance can be ef\ufb01ciently estimated using greedy min-\nimization of the log likelihood function, and the number of Givens rotations can be ef\ufb01ciently com-\nputed using a cross-validation procedure. The estimator obtained using this method is always posi-\ntive de\ufb01nite and well-conditioned even when the sample size is limited.\n\nIn order to validate our model, we perform experiments using a standard set of hyperspectral data\n[11], and we compare against both traditional shrinkage estimates and recently proposed graphical\nlasso estimates [12] for a variety of different classes and sample sizes. Our experiments show that,\n\n1\n\n\ffor this example, the SMT covariance estimate is consistently more accurate. The SMT method\nalso has a number of other advantages.\nIt seems to be particularly good when estimating small\neigenvalues and their associated eigenvectors. The cross-validation procedure used to estimate the\nSMT model order requires little additional computation, and the resulting eigen decomposition can\nbe computed with very little computation (i.e. \u226a p2 operations).\n\n2 Covariance estimation for high dimensional vectors\n\nIn the general case, we observe a set of n vectors, y1, y2, \u00b7 \u00b7 \u00b7 , yn, where each vector, yi, is p dimen-\nsional. Without loss of generality, we assume yi has zero mean. We can represent this data as the\nfollowing p \u00d7 n matrix\n\nIf the vectors yi are identically distributed, then the sample covariance is given by\n\nY = [y1, y2, \u00b7 \u00b7 \u00b7 , yn] .\n\nS =\n\n1\nn\n\nY Y t ,\n\n(1)\n\n(2)\n\nand S is an unbiased estimate of the true covariance matrix with R = E [yiyt\nWhile S is an unbiased estimate of R it is also singular when n < p. This is a serious de\ufb01ciency\nsince as the dimension p grows, the number of vectors needed to estimate R also grows. In practical\napplications, n may be much smaller than p which means that most of the eigenvalues of R are\nerroneously estimated as zero.\n\ni ] = E[S].\n\nA variety of methods have been proposed to regularize the estimate of R so that it is not singular.\nShrinkage estimators are a widely used class of estimators which regularize the covariance matrix by\nshrinking it toward some target structures [4, 5, 13]. Shrinkage estimators generally have the form\n\u02c6R = \u03b1D + (1 \u2212 \u03b1)S, where D is some positive de\ufb01nite matrix. Some popular choices for D are the\nidentity matrix (or its scaled version) [5, 13] and the diagonal entries of S, i.e. diag(S) [5, 14]. In\nboth cases, the shrinkage intensity \u03b1 can be estimated using cross-validation or boot-strap methods.\nRecently, a number of methods have been proposed for regularizing the estimate by making either\nthe covariance or its inverse sparse [6, 12]. For example, the graphical lasso method enforces sparsity\nby imposing an L1 norm constraint on the inverse covariance [12]. Banding or thresholding can also\nbe used to obtain a sparse estimate of the covariance [15].\n\n2.1 Maximum likelihood covariance estimation\n\nOur approach will be to compute a constrained maximum likelihood (ML) estimate of the covariance\nR, under the modeling assumption that eigenvectors of R may be represented as a sparse matrix\ntransform (SMT) [8, 9]. To do this, we \ufb01rst decompose R as\n\nR = E\u039bEt ,\n\n(3)\n\nwhere E is the orthonormal matrix of eigenvectors and \u039b is the diagonal matrix of eigenvalues.\nThen we will estimate the covariance by maximizing the likelihood of the data Y subject to the\nconstraint that E is an SMT. By varying the order, K, of the SMT, we may then reduce or increase\nthe regularizing constraint on the covariance.\n\nIf we assume that the columns of Y are independent and identically distributed Gaussian random\nvectors with mean zero and positive-de\ufb01nite covariance R, then the likelihood of Y given R is given\nby\n\npR(Y ) =\n\n1\n(2\u03c0)\n\nnp\n\n2\n\n|R|\u2212 n\n\n2 exp(cid:26)\u2212\n\n1\n2\n\ntr{Y tR\u22121Y }(cid:27) .\n\nThe log-likelihood of Y is then given by [7]\n\nlog p(E,\u039b)(Y ) = \u2212\n\nn\n2\n\ntr{diag(EtSE)\u039b\u22121} \u2212\n\nn\n2\n\nlog |\u039b| \u2212\n\nnp\n2\n\nlog(2\u03c0) ,\n\nwhere R = E\u039bEt is speci\ufb01ed by the orthonormal eigenvector matrix E and diagonal eigenvalue\nmatrix \u039b. Jointly maximizing the likelihood with respect to E and \u039b then results in the ML estimates\n\n2\n\n(4)\n\n(5)\n\n\fy0\n\ny1\n\ny2\n\ny3\n\ny4\n\ny5\n\ny6\n\ny7\n\nW 0\n\n8\n\nW 0\n\n8\n\nW 0\n\n8\n\nW 0\n\n8\n\n\u22121\n\n\u22121\n\n\u22121\n\n\u22121\n\n8\n\nW 0\nW 2\n\n8\n\n\u22121\n\u22121\n\n8\n\nW 0\nW 2\n\n8\n\n\u22121\n\u22121\n(a) FFT\n\n8\n\n8\n\nW 0\nW 1\nW 2\nW 3\n\n8\n\n8\n\n\u02dcy0\n\n\u02dcy1\n\n\u02dcy2\n\n\u02dcy3\n\n\u02dcy4\n\n\u02dcy5\n\n\u02dcy6\n\n\u02dcy7\n\ny0\n\ny1\n\ny2\n\ny3\n\nyp\u22124\n\nyp\u22123\n\nyp\u22122\n\nyp\u22121\n\nE0\n\nE1\n\nE2\n\n\u22121\n\u22121\n\u22121\n\u22121\n\n\u02dcy0\n\n\u02dcy1\n\n\u02dcy2\n\n\u02dcy3\n\n\u02dcyp\u22124\n\n\u02dcyp\u22123\n\n\u02dcyp\u22122\n\n\u02dcyp\u22121\n\nE5\n\nE6\n\nE3\n\nE4\n\n(b) SMT\n\nEK\u22121\n\nFigure 1: (a) 8-point FFT. (b) The SMT implementation of \u02dcy = Ey. The SMT can be viewed as a\ngeneralization of FFT and orthonormal wavelet transforms.\n\nof E and \u039b given by [7]\n\n\u02c6E = arg min\n\nE\u2208\u2126(cid:8)(cid:12)(cid:12)diag(EtSE)(cid:12)(cid:12)(cid:9)\n\n\u02c6\u039b = diag( \u02c6EtS \u02c6E) ,\n\n(6)\n\n(7)\n\nwhere \u2126 is the set of allowed orthonormal transforms. So we may compute the ML estimate by \ufb01rst\nsolving the constrained optimization of (6), and then computing the eigenvalue estimates from (7).\n\n2.2 ML estimation of eigenvectors using SMT model\n\nThe ML estimate of E can be improved if the feasible set of eigenvector transforms, \u2126, can be\nconstrained to a subset of all possible orthonormal transforms. By constraining \u2126, we effectively\nregularize the ML estimate by imposing a model. However, as with any model-based approach, the\nkey is to select a feasible set, \u2126, which is as small as possible while still accurately modeling the\nbehavior of the data.\n\nOur approach is to select \u2126 to be the set of all orthonormal transforms that can be represented as an\nSMT of order K [9]. More speci\ufb01cally, a matrix E is an SMT of order K if it can be written as a\nproduct of K sparse orthornormal matrices, so that\n\nk=K\u22121\n\nE =\n\nEk = E0E1 \u00b7 \u00b7 \u00b7 EK\u22121 ,\n\n(8)\n\nY0\n\nwhere every sparse matrix, Ek, is a Givens rotation operating on a pair of coordinate indices (ik, jk)\n[10]. Every Givens rotation Ek is an orthonormal rotation in the plane of the two coordinates, ik\nand jk, which has the form\n\nwhere \u0398(ik, jk, \u03b8k) is de\ufb01ned as\n\nEk = I + \u0398(ik, jk, \u03b8k) ,\n\n(9)\n\n(10)\n\ncos(\u03b8k) \u2212 1 if i = j = ik or i = j = jk\nsin(\u03b8k)\n\u2212 sin(\u03b8k)\n0\n\nif i = ik and j = jk\nif i = jk and j = ik\notherwise\n\n.\n\n[\u0398]ij =\uf8f1\uf8f4\uf8f2\n\uf8f4\uf8f3\n\nFigure 1(b) shows the \ufb02ow diagram for the application of an SMT to a data vector y. Notice that each\n2D rotation, Ek, plays a role analogous to a \u201cbutter\ufb02y\u201d used in a traditional fast Fourier transform\n(FFT) [16] in Fig. 1(a). However, unlike an FFT, the organization of the butter\ufb02ies in an SMT is\nunstructured, and each butter\ufb02y can have an arbitrary rotation angle \u03b8k. This more general structure\nallows an SMT to implement a larger set of orthonormal transformations. In fact, the SMT can\nbe used to represent any orthonormal wavelet transform because, using the theory of paraunitary\nwavelets, orthonormal wavelets can be represented as a product of Givens rotations and delays [17].\n\nMore generally, when K = (cid:0)p\n\ntransformation [7].\n\n2(cid:1), the SMT can be used to exactly represent any p \u00d7 p orthonormal\n\n3\n\n\fUsing the SMT model constraint, the ML estimate of E is given by\n\n(11)\n\n\u02c6E = arg\n\nmin\n\n0\n\nE=Qk=K\u22121\n\nEk(cid:12)(cid:12)diag(EtSE)(cid:12)(cid:12) .\n\nUnfortunately, evaluating the constrained ML estimate of (11) requires the solution of an optimiza-\ntion problem with a nonconvex constraint. So evaluation of the globally optimal solutions is dif\ufb01cult.\nTherefore, our approach will be to use greedy minimization to compute a locally optimal solution to\n(11). The greedy minimization approach works by selecting each new butter\ufb02y Ek to minimize the\ncost, while \ufb01xing the previous butter\ufb02ies, El for l < k.\nThis greedy optimization algorithm can be implemented with the following simple recursive proce-\ndure. We start by setting S0 = S to be the sample covariance, and initialize k = 0. Then we apply\nthe following two steps for k = 0 to K \u2212 1.\n\nE \u2217\n\nk = arg min\n\nSk+1 = E \u2217t\n\nk SkE \u2217\nk .\n\nEk (cid:12)(cid:12)diag(cid:0)Et\n\nkSkEk(cid:1)(cid:12)(cid:12)\n\n(12)\n\n(13)\n\nThe resulting values of E \u2217\nThe problem remains of how to compute the solution to (12). In fact, this can be done quite easily\nby \ufb01rst determining the two coordinates, ik and jk, that are most correlated,\n\nk are the butter\ufb02ies of the SMT.\n\n(ik, jk) \u2190 arg min\n\n(i,j) 1 \u2212\n\n[Sk]2\nij\n\n[Sk]ii[Sk]jj! .\n\n(14)\n\nIt can be shown that this coordinate pair, (ik, jk), can most reduce the cost in (12) among all possible\ncoordinate pairs [7]. Once ik and jk are determined, we apply the Givens rotation E \u2217\nk to minimize\nthe cost in (12), which is given by\n\nwhere\n\n\u03b8k =\n\n1\n2\n\nE \u2217\n\nk = I + \u0398(ik, jk, \u03b8k) ,\n\natan(\u22122[Sk]ikjk , [Sk]ikik \u2212 [Sk]jkjk ) .\n\nBy iterating the (12) and (13) K times, we obtain the constrained ML estimate of E given by\n\n\u02c6E =\n\nk=K\u22121\n\nY0\n\nE \u2217\n\nk .\n\n(15)\n\n(16)\n\n(17)\n\nThe model order, K, can be determined by a simple cross-validation procedure. For example, we\ncan partition the data into three subsets, and K is chosen to maximize the average likelihood of the\nleft-out subsets given the estimated covariance using the other two subsets. Once K is determined,\nthe proposed covariance estimator is re-computed using all the data and the estimated model order.\n\nThe SMT covariance estimator obtained as above has some interesting properties. First, it is positive\nde\ufb01nite even for the limited sample size n < p. Also, it is permutation invariant, that is, the\ncovariance estimator does not depend on the ordering of the data. Finally, the eigen decomposition\nEty can be computed very ef\ufb01ciently by applying the K sparse rotations in sequence.\n\n2.3 SMT Shrinkage Estimator\n\nIn some cases, the accuracy of the SMT estimator can be improved by shrinking it towards the\nsample covariance. Let \u02c6RSM T represent the SMT covariance estimator. Then the SMT shrinkage\nestimate (SMT-S) can be obtained as\n\n\u02c6RSM T \u2212S = \u03b1 \u02c6RSM T + (1 \u2212 \u03b1)S ,\n\nwhere the parameter \u03b1 can be computed using cross validation. Notice that\n\np \u02c6RSM T \u2212S\n\n(Y ) = p \u02c6E \u02c6RSM T \u2212S \u02c6Et( \u02c6EY ) = p\u03b1 \u02c6\u039b+(1\u2212\u03b1) \u02c6ES \u02c6Et( \u02c6EY ) .\n\nSo cross validation can be ef\ufb01ciently implemented as in [5].\n\n(18)\n\n(19)\n\n4\n\n\f3 Experimental results\n\nThe effectiveness of the SMT covariance estimation depends on how well the SMT model can cap-\nture the behavior of real data vectors. Therefore in this section, we compare the performance of the\nSMT covariance estimator to commonly used shrinkage and graphical lasso estimators. We do this\ncomparison using hyperspectral remotely sensed data as our high dimensional data vectors.\n\nThe hyperspectral data we use is available with the recently published book [11]. Figure 2(a) shows\na simulated color IR view of an airborne hyperspectral data \ufb02ightline over the Washington DC Mall.\nThe sensor system measured the pixel response in 191 effective bands in the 0.4 to 2.4 \u00b5m region of\nthe visible and infrared spectrum. The data set contains 1208 scan lines with 307 pixels in each scan\nline. The image was made using bands 60, 27 and 17 for the red, green and blue colors, respectively.\nThe data set also provides ground truth pixels for \ufb01ve classes designated as grass, water, roof, street,\nand tree. In Fig. 2(a), the ground-truth pixels of the grass class are outlined with a white rectangle.\nFigure 2(b) shows the spectrum of the grass pixels, and Fig. 2(c) shows multivariate Gaussian vectors\nthat were generated using the measured sample covariance for the grass class.\n\nFor each class, we computed the \u201ctrue\u201d covariance by using all the ground truth pixels to calculate\nthe sample covariance. The covariance is computed by \ufb01rst subtracting the sample mean vector\nfor each class, and then computing the sample covariance for the zero mean vectors. The number\nof pixels for the ground-truth classes of grass, water, roof, street, and tree are 1928, 1224, 3579,\n416, and 388, respectively. In each case, the number of ground truth pixels was much larger than\n191, so the true covariance matrices are nonsingular, and accurately represent the covariance of the\nhyperspectral data for that class.\n\n3.1 Review of alternative estimators\n\nA popular choice of the shrinkage target is the diagonal of S [5, 14]. In this case, the shrinkage\nestimator is given by\n\n\u02c6R = \u03b1diag (S) + (1 \u2212 \u03b1) S .\n\n(20)\n\nWe use an ef\ufb01cient algorithm implementation of the leave-one-out likelihood (LOOL) cross-\nvalidation method to choose \u03b1 as suggested in [5].\nAn alternative estimator is the graphic lasso (glasso) estimate recently proposed in [12] which is an\nL1 regularized maximum likelihood estimate, such that\n\n\u02c6R = arg max\n\nR\u2208\u03a8(cid:8)log(Y | R) \u2212 \u03c1 k R\u22121 k1(cid:9) ,\n\n(21)\n\nwhere \u03a8 denotes the set of p \u00d7 p positive de\ufb01nite matrices and \u03c1 the regularization parameter. We\nused the R code for glasso that is publically available online. We found cross-validation estimation\nof \u03c1 to be dif\ufb01cult, so in each case we manually selected the value of \u03c1 to minimize the Kullback-\nLeibler distance to the known covariance.\n\n3.2 Gaussian case\n\nFirst, we compare how different estimators perform when the data vectors are samples from an ideal\nmultivariate Gaussian distribution. To do this, we \ufb01rst generated zero mean multivariate vectors\nwith the true covariance for each of the \ufb01ve classes. Next we estimated the covariance using the\nfour methods, the shrinkage estimator, glasso, SMT and SMT shrinkage estimation. In order to\ndetermine the effect of sample size, we also performed each experiment for a sample size of n = 80,\n40, and 20, respectively. Every experiment was repeated 10 times.\nIn order to get an aggregate accessment of the effectiveness of SMT covariance estimation, we com-\npared the estimated covariance for each method to the true covariance using the Kullback-Leibler\n(KL) distance [7]. The KL distance is a measure of the error between the estimated and true distri-\nbution. Figure 3(a)(b) and (c) show plots of the KL distances as a function of sample size for the\nfour estimators. The error bars indicate the standard deviation of the KL distance due to random\nvariation in the sample statistics. Notice that the SMT shrinkage (SMT-S) estimator is consistently\nthe best of the four.\n\n5\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 2: (a) Simulated color IR view of an airborne hyperspectral data over the Washington DC\nMall [11]. (b) Ground-truth pixel spectrum of grass that are outlined with the white rectangles in\n(a). (c) Synthesized data spectrum using the Gaussian distribution.\n\nFigure 4(a) shows the estimated eigenvalues for the grass class with n = 80. Notice that the eigen-\nvalues of the SMT and SMT-S estimators are much closer to the true values than the shrinkage and\nglasso methods. Notice that the SMT estimators generate good estimates especially for the small\neigenvalues.\n\nTable 1 compares the computational complexity, CPU time and model order for the four estimators.\nThe CPU time and model order were measured for the Guassian case of the grass class with n = 80.\nNotice that even with the cross validation, the SMT and SMT-S estimators are much faster than\nglasso. This is because the SMT transform is a sparse operator.\nIn this case, the SMT uses an\naverage of K = 495 rotations, which is equal to K/p = 495/191 = 2.59 rotations (or equivalently\nmultiplies) per spectral sample.\n\n3.3 Non-Gaussian case\n\nIn practice, the sample vectors may not be from an ideal multivariate Gaussian distribution.\nIn\norder to see the effect of the non-Gaussian statistics on the accuracy of the covariance estimate,\nwe performed a set of experiments which used random samples from the ground truth pixels as\ninput. Since these samples are from the actual measured data, their distribution is not precisely\nGaussian. Using these samples, we computed the covariance estimates for the \ufb01ve classes using the\nfour different methods with sample sizes of n = 80, 40, and 20.\nPlots of the KL distances for the non-Gaussian grass case1are shown in Fig. 3(d)(e) and (f); and\nFigure 4(b) shows the estimated eigenvalues for grass with n = 80. Note that the results are similar\nto those found for the ideal Guassian case.\n\n4 Conclusion\n\nWe have proposed a novel method for covariance estimation of high dimensional data. The new\nmethod is based on constrained maximum likelihood (ML) estimation in which the eigenvector\ntransformation is constrained to be the composition of K Givens rotations. This model seems to\ncapture the essential behavior of the data with a relatively small number of parameters. The con-\nstraint set is a K dimensional manifold in the space of orthonormal transforms, but since it is not a\nlinear space, the resulting ML estimation optimization problem does not yield a closed form global\noptimum. However, we show that a recursive local optimization procedure is simple, intuitive, and\nyields good results.\n\nWe also demonstrate that the proposed SMT covariance estimation methods substantially reduce\nthe error in the covariance estimate as compared to current state-of-the-art estimates for a stan-\ndard hyperspectral data set. The MATLAB code for SMT covariance estimation is available at:\nhttps://engineering.purdue.edu/\u223cbouman/publications/pub smt.html.\n\n1In fact, these are the KL distances between the estimated covariance and the sample covariance computed\n\nfrom the full set of training data, under the assumption of a multivariate Gaussian distribution.\n\n6\n\n\f \n\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n260\n\n240\n\n220\n\n200\n\n180\n\n160\n\ne\nc\nn\na\n\nt\ns\nd\n\ni\n\n160\n\n140\n\n120\n\n100\n\n80\n\n60\n\n40\n\ne\nc\nn\na\n\nt\ns\nd\n\ni\n\n \n\nL\nK\n\n \n\nL\nK\n\n140\n\n120\n\n100\n\n80\n\n60\n\n \n10\n\n20\n\n30\n\n40\n\n50\n\nSample size\n(a) Grass\n\n260\n\n240\n\n220\n\n200\n\n180\n\n160\n\n140\n\n120\n\n100\n\n80\n\ne\nc\nn\na\nt\ns\nd\n \nL\nK\n\ni\n\n60\n\n \n10\n\n20\n\n30\n\n40\n\n50\n\n60\n\nSample size\n(d) Grass\n\n60\n\n70\n\n80\n\n90\n\n20\n\n \n10\n\n20\n\n30\n\n \n\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n220\n\n200\n\n180\n\n160\n\n140\n\n120\n\n100\n\n80\n\n60\n\n40\n\ne\nc\nn\na\nt\ns\nd\n \nL\nK\n\ni\n\n70\n\n80\n\n90\n\n20\n\n \n10\n\n20\n\n30\n\n \n\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n70\n\n80\n\n90\n\n40\n\n50\n\n60\n\nSample size\n(b) Water\n\n \n\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n240\n\n220\n\n200\n\n180\n\n160\n\n140\n\ne\nc\nn\na\n\nt\ns\nd\n\ni\n\n20\n\n30\n\n \n\nL\nK\n\n120\n\n100\n\n80\n\n60\n\n40\n\n \n10\n\n240\n\n220\n\n200\n\n180\n\n160\n\n140\n\n120\n\n100\n\n80\n\n60\n\ne\nc\nn\na\nt\ns\nd\n \nL\nK\n\ni\n\n \n\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n70\n\n80\n\n90\n\n40\n\n50\n\n60\n\nSample size\n(c) Street\n\n \n\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n60\n\n70\n\n80\n\n90\n\n40\n\n50\n\n60\n\nSample size\n(e) Water\n\n70\n\n80\n\n90\n\n40\n\n \n10\n\n20\n\n30\n\n40\n\n50\n\nSample size\n(f) Street\n\nFigure 3: Kullback-Leibler distance from true distribution versus sample size for various classes:\n(a) (b) (c) Gaussian case (d) (e) (f) non-Gaussian case.\n\n \n\nTrue Eigenvalues\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n108\n\n106\n\n104\n\n102\n\n100\n\nl\n\ns\ne\nu\na\nv\nn\ne\ng\ne\n\ni\n\n \n\nTrue Eigenvalues\nShrinkage Estimator\nGlasso Estimator\nSMT Estimator\nSMT\u2212S Estimator\n\n108\n\n106\n\n104\n\n102\n\n100\n\nl\n\ns\ne\nu\na\nv\nn\ne\ng\ne\n\ni\n\n10\u22122\n \n0\n\n50\n\n100\nindex\n(a)\n\n150\n\n200\n\n10\u22122\n \n0\n\n50\n\n150\n\n200\n\n100\nindex\n(b)\n\nFigure 4: The distribution of estimated eigenvalues for the grass class with n = 80: (a) Gaussian\ncase (b) Non-Gaussian case.\n\nComplexity\n\n(without cross-validation)\n\nCPU time\n(seconds)\n\nModel order\n\nShrinkage Est.\n\nglasso\nSMT\nSMT-S\n\np\np3I\n\np2 + Kp\np2 + Kp\n\n8.6 (with cross-validation)\n\n422.6 (without cross-validation)\n\n6.5 (with cross-validation)\n7.2 (with cross-validation)\n\n1\n\n4939\n495\n496\n\nTable 1: Comparison of computational complexity, CPU time and model order for various covari-\nance estimators. The complexity is without cross validation and does not include the computation\nof the sample covariance (order of np2). The CPU time and model order were measured for the\nGuassian case of the grass class with n = 80. I is the number of cycles used in glasso.\n\n7\n\n\fAcknowledgments\n\nThis work was supported by the National Science Foundation under Contract CCR-0431024. We\nwould also like to thank James Theiler (J.T.) and Mark Bell for their insightful comments and sug-\ngestions.\n\nReferences\n\n[1] C. Stein, B. Efron, and C. Morris, \u201cImproving the usual estimator of a normal covariance\n\nmatrix,\u201d Dept. of Statistics, Stanford University, Report 37, 1972.\n\n[2] K. Fukunaga, Introduction to Statistical Pattern Recognition. Boston, MA: Academic Press,\n\n1990, 2nd Ed.\n\n[3] A. K. Jain, R. P. Duin, and J. Mao, \u201cStatistical pattern recognition: A review,\u201d IEEE Transac-\n\ntions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4\u201337, 2000.\n\n[4] J. H. Friedman, \u201cRegularized discriminant analysis,\u201d Journal of the American Statistical Asso-\n\nciation, vol. 84, no. 405, pp. 165\u2013175, 1989.\n\n[5] J. P. Hoffbeck and D. A. Landgrebe, \u201cCovariance matrix estimation and classi\ufb01cation with lim-\nited training data,\u201d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18,\nno. 7, pp. 763\u2013767, 1996.\n\n[6] P. J. Bickel and E. Levina, \u201cRegularized estimation of large covariance matrices,\u201d Annals of\n\nStatistics, vol. 36, no. 1, pp. 199\u2013227, 2008.\n\n[7] G. Cao and C. A. Bouman, \u201cCovariance estimation for high dimensional data vectors using the\n\nsparse matrix transform,\u201d Purdue University, Technical Report ECE 08-05, 2008.\n\n[8] G. Cao, C. A. Bouman, and K. J. Webb, \u201cFast reconstruction algorithms for optical tomography\nusing sparse matrix representations,\u201d in Proceedings of 2007 IEEE International Symposium\non Biomedical Imaging, April 2007.\n\n[9] \u2014\u2014, \u201cNon-iterative MAP reconstruction using sparse matrix representations,\u201d (submitted to)\n\nIEEE Trans. on Image Processing.\n\n[10] W. Givens, \u201cComputation of plane unitary rotations transforming a general matrix to triangular\nform,\u201d Journal of the Society for Industrial and Applied Mathematics, vol. 6, no. 1, pp. 26\u201350,\nMarch 1958.\n\n[11] D. A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing. New York: Wiley-\n\nInterscience, 2005.\n\n[12] J. Friedman, T. Hastie, and R. Tibshirani, \u201cSparse inverse covariance estimation with the graph-\n\nical lasso,\u201d Biostatistics, vol. 9, no. 3, pp. 432\u2013441, Jul. 2008.\n\n[13] M. J. Daniels and R. E. Kass, \u201cShrinkage estimators for covariance matrices,\u201d Biometrics,\n\nvol. 57, no. 4, pp. 1173\u20131184, 2001.\n\n[14] J. Schafer and K. Strimmer, \u201cA shrinkage approach to large-scale covariance matrix estimation\nand implications for functional genomics,\u201d Statistical Applications in Genetics and Molecular\nBiology, vol. 4, no. 1, 2005.\n\n[15] P. J. Bickel and E. Levina, \u201cCovariance regularization by thresholding,\u201d Department of Statis-\n\ntics, UC Berkeley, Technical Report 744, 2007.\n\n[16] J. W. Cooley and J. W. Tukey, \u201cAn algorithm for the machine calculation of complex Fourier\n\nseries,\u201d Mathematics of Computation, vol. 19, no. 90, pp. 297\u2013301, April 1965.\n\n[17] A. Soman and P. Vaidyanathan, \u201cParaunitary \ufb01lter banks and wavelet packets,\u201d Acoustics,\nSpeech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on,\nvol. 4, pp. 397\u2013400 vol.4, Mar 1992.\n\n8\n\n\f", "award": [], "sourceid": 569, "authors": [{"given_name": "Guangzhi", "family_name": "Cao", "institution": null}, {"given_name": "Charles", "family_name": "Bouman", "institution": null}]}