{"title": "Robust Principal Component Analysis with Adaptive Neighbors", "book": "Advances in Neural Information Processing Systems", "page_first": 6961, "page_last": 6969, "abstract": "Suppose certain data points are overly contaminated, then the existing principal component analysis (PCA) methods are frequently incapable of filtering out and eliminating the excessively polluted ones, which potentially lead to the functional degeneration of the corresponding models. To tackle the issue, we propose a general framework namely robust weight learning with adaptive neighbors (RWL-AN), via which adaptive weight vector is automatically obtained with both robustness and sparse neighbors. More significantly, the degree of the sparsity is steerable such that only exact k well-fitting samples with least reconstruction errors are activated during the optimization, while the residual samples, i.e., the extreme noised ones are eliminated for the global robustness. Additionally, the framework is further applied to PCA problem to demonstrate the superiority and effectiveness of the proposed RWL-AN model.", "full_text": "Robust Principal Component Analysis with Adaptive\n\nNeighbors\n\nRui Zhang\n\nArizona State University\n\nTempe, AZ, U.S.A.\n\nruizhang8633@gmail.com\n\nHanghang Tong\u2217\n\nUniversity of Illinois at Urbana-Champaign\n\nUrbana, Illinois, U.S.A.\n\nhtong@illinois.edu\n\nAbstract\n\nSuppose certain data points are overly contaminated, then the existing principal\ncomponent analysis (PCA) methods are frequently incapable of \ufb01ltering out and\neliminating the excessively polluted ones, which potentially lead to the functional\ndegeneration of the corresponding models. To tackle the issue, we propose a general\nframework namely robust weight learning with adaptive neighbors (RWL-AN), via\nwhich adaptive weight vector is automatically obtained with both robustness and\nsparse neighbors. More signi\ufb01cantly, the degree of the sparsity is steerable such\nthat only exact k well-\ufb01tting samples with least reconstruction errors are activated\nduring the optimization, while the residual samples, i.e., the extreme noised ones\nare eliminated for the global robustness. Additionally, the framework is further\napplied to PCA problem to demonstrate the superiority and effectiveness of the\nproposed RWL-AN model.\n\n1\n\nIntroduction\n\nAs for the high-quality data reconstruction, principal component analysis (PCA) [16, 4, 7] has been\nwidely investigated. To deal with high dimensional data, conventional PCA methods usually include\nthe data preprocessing, i.e., vectorization of each data point. Nonetheless, the vectorization of the\ndata points could easily incur the curse of dimensionality. Therefore, two-dimensional reconstruction\nhas been brought to the study in the \ufb01eld of image analysis. In sum, equipped with the PCA methods\n[17, 18, 19], the statistical properties of input data can be retained under the obtained subspace.\nIn reality, the presence of outliers in data largely reduces the performance of PCA approaches. The\nexisting reconstruction methods usually promote the robustness by exploiting the robust norms as\ntheir loss functions [10], e.g., L1-norm and non-squared F -norm. More speci\ufb01cally, L1-norm based\napproaches [5, 14, 9] are developed to alleviate the negative effects of local ill-dimensions. For\ninstance, Li et al. [5] proposed the L1-norm based 2DPCA (2DPCA-L1) by optimizing multiple\nprojection directions sequentially. The L1-norm based methods approximate the related optimization\nproblem and therefore often lead to a greedy strategy, which is potentially stuck with heuristic\nsolutions and large computational cost. Luo et al. [6, 15] proposed a non-greedy algorithm for an\napproximate solution to the L1-norm based maximization problem. Moreover, non-squared F -norm\nbased methods [10] are developed, where the sum of non-squared F -norm reconstruction errors is\nminimized. Zhang et al. [20] optimized the robust non-squared F -norm based objective by virtue of\na dual problem, where the transitional weight is assigned to each term of the objective.\nHowever, aforementioned robust approaches have lots of limitations. Firstly, all of them depend\non different types of loss functions, which are potentially sensitive to outliers. For instance, L1-\nnorm based methods are usually utilized to handle the occluded data with local outliers, while\n\n\u2217Hanghang Tong is the Corresponding Author\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fnon-squared F -norm based approaches are effective to tackle the data with global noises. Secondly,\nwhen certain samples are excessively polluted, weak robust methods might be incapable of preventing\nthe degeneration of the reconstruction. Zhang et al. [20] addressed this problem by learning a\nsparse weight via a capped model [8, 2], where the threshold is pre-given to eliminate the terms with\nlarger reconstruction errors. In other words, the performance is sensitive to the choice of threshold.\nNonetheless, it is strenuous to search the optimal threshold with frequent inaccuracy. Accordingly,\nthe performance of the existing reconstruction methods is unsatisfactory.\nIn this paper, we propose a general framework named RWL-AN for learning an adaptive weight\nvector with both robustness and sparse neighbors. RWL-AN can be further applied to a spectrum\nof subspace learning approaches via the adaptive-weight strategy. Speci\ufb01cally speaking, RWL-AN\nassigns a smaller weight to the term with larger reconstruction error automatically to reduce the\nnegative effect of local outliers. Besides the local robustness, the weight vector is sparse to prevent\nthe excessive noised terms from degrading the performance of the model. In other words, the degree\nof the sparsity is steerable such that only speci\ufb01ed k samples with least reconstruction errors are\neffective to eliminate the extreme noised data points for the global robustness. By applying the\nproposed RWL-AN framework to the PCA problem, the superiority and effectiveness of the proposed\nmethod are demonstrated both theoretically and empirically.\nNotations: In the paper, all the matrices are written in uppercase. For matrix M, the ij-th element\nof M is denoted by mij . The trace of matrix M is denoted by T r(M). The (cid:96)2-norm of vector v is\ndenoted by (cid:107)v(cid:107)2. MT denotes the transpose operation of M. The Frobenius norm of matrix M is\ndenoted by (cid:107)M(cid:107)F . M\u22a5 denotes the orthogonal complement space of M.\n\n2 Robust Principal Component Analysis Revisited\nGiven a dataset X = {x1, x2,\u00b7\u00b7\u00b7 , xN}, xi \u2208 Rd represents the i-th sample. X \u2208 Rd\u00d7N denotes\nthe associated matrix of the dataset X . To obtain the optimal mean automatically instead of directly\ncentering the data, Nie et al. [10] proposed a robust PCA model from the perspective of low-rank\napproximation, i.e., minimizing the reconstruction errors with optimal mean as\n\nmin\n\n(1)\nwhere variable m \u2208 Rd serves as the optimal mean in Eq.\n(1). Z = [z1, . . . , zN ] \u2208 Rd\u00d7N\nrepresents the low-rank approximation of X upon the orthogonal subspace W \u2208 Rd\u00d7m. Via the\nrank factorization of zi on the subspace W, we have zi = W(vi)T , where vi \u2208 R1\u00d7m. Therefore,\nproblem (1) can be reformulated into\n\nm,rank(Z)=k\n\ni=1\n\n(cid:107)xi \u2212 m \u2212 zi(cid:107)2,\n\nmin\n\nm,vi,WT W=I\n\n(cid:107)xi \u2212 m \u2212 W(vi)T(cid:107)2,\n\n(2)\n\nwhose third term within the (cid:96)2-norm is the low-rank reconstructed data. Accordingly, the solution of\nvi could be achieved according to the Karush-Kuhn-Tucker (KKT) condition of problem (4) with\n\nrespect to (w.r.t.) vi as\n(2) can be addressed by solving the following dual problem:\n\n\u2202vi\n\n= 0 \u21d2 vi = (xi \u2212 m)T W.\n\ni=1\n\n\u2202\n\n(cid:107)xi\u2212m\u2212W(vi)T (cid:107)2\n\n. Therefore, problem\n\nN(cid:80)\n\nN(cid:88)\n\nN(cid:88)\n\ni=1\n\npi(cid:107)(cid:16)\n\nI \u2212 WWT(cid:17)\n\nN(cid:88)\n\ni=1\n\nmin\n\nm,WT W=I\n\n(xi \u2212 m)(cid:107)2\n2.\n\n(3)\n\n1\n\n2(cid:107)(I\u2212WWT )(xi\u2212m)(cid:107)2\n\nwhere pi \u2190\nserves as a transitional weight to be iteratively updated. In other\nwords, the smaller weight would be assigned to the term with larger outliers automatically and vice\nversa for the robustness.\nMotivated by problem (2), Zhang et al. extends it to a 2D version to enhance the robustness of\n2DPCA. Denote an image dataset A = {A1, A2,\u00b7\u00b7\u00b7 , AN}, where Ai \u2208 Ru1\u00d7u2 represents the i-th\nimage matrix. Robust 2DPCA method is formulated as\n\nmin\n\nM,Bi,UT\n\n1 U1=I,UT\n\n2 U2=I\n\n(cid:107)Ai \u2212 M \u2212 U1BiUT\n\n2 (cid:107)F ,\n\n(4)\n\nN(cid:80)\n\ni=1\n\n2\n\n\fwhere U1 \u2208 Ru1\u00d7d1 and U2 \u2208 Ru2\u00d7d2 are left and right orthogonal subspaces for dimensionality\nreduction, respectively. Bi \u2208 Rd1\u00d7d2 denotes a low-dimenional representation of Ai. M \u2208 Ru1\u00d7u2\nserves as the optimal mean of input data. Since Bi is free from any constraint, problem (4) could be\nrewritten as\n\n(cid:107)Ai \u2212 M \u2212 U1UT\n\n2 (Ai \u2212 M)U2UT\n\n2 (cid:107)F .\n\n(5)\n\nN(cid:80)\n\ni=1\n\nmin\n\nM,UT\n\n1 U1=I,UT\n\n2 U2=I\n\n3 Framework of Robust Weight Learning with Adaptive Neighbors\n\nThe robust PCA methods mentioned above frequently highlight the robustness and reduce the\nimpact of outliers by developing different metrics, which would possibly lead to various limitations.\nAlthough sparsity could also be obtained via the capped model, the performance of the models are\noften sensitive to the presetting threshold, which is dif\ufb01cult to determine.\nIn this paper, a framework regarding adaptive weight learning is developed to apply to various\nreconstruction approaches. The adaptive weight vector can be achieved by the proposed framework\nwith 1) robustness, i.e., the term with larger reconstruction error is assigned with smaller weight to\nprevent the outliers from dominating the model; 2) sparsity, i.e., the images with excessive noises\nare eliminated to prevent the ill samples from decreasing the performance. Accordingly, the proposed\nframework for Robust Weight Learning with Adaptive Neighbors (RWL-AN) is formulated as\n\ni=1\n\nmin\n\np\u22650,pT 1=1\n\npig(xi) + \u03b3p2\ni ,\n\n(6)\nwhere g(xi) \u2208 R+ denotes the reconstruction function under the i-th data point xi with trade-off\nparameter \u03b3. p = [p1, p2,\u00b7\u00b7\u00b7 , pN ]T is the weight vector, where pi is the weight assigned to the\ni-th reconstruction term. The \ufb01rst term in Eq. (6) indicates that a sample with large reconstruction\nerror should be assigned with a small weight, while the second term is the regularization to avoid\ntrivial solution and over-\ufb01tting. It is worth mentioning that an ef\ufb01cient technique is further applied to\nsolving problem (6), such that the weight vector p has k adaptive neighbors (nonzero entries), i.e.,\nonly k best well-\ufb01tting samples are activated.\nParticularly, the following speci\ufb01c derivation is provided to obtain the closed form solution to problem\n(6). Denote g(xi) by gi, then problem (6) is equivalent to\n\nN(cid:88)\n\npi +\n\n.\n\n(7)\n\nDenote g = [g1, g2,\u00b7\u00b7\u00b7 , gN ]T , then problem (7) can be further rewritten as\n\n(8)\nwhere 0 = [0, 0,\u00b7\u00b7\u00b7 , 0]T \u2208 RN and 1 = [1, 1,\u00b7\u00b7\u00b7 , 1]T \u2208 RN . Due to the (cid:96)1-ball constraint p \u2265 0\nand pT 1 = 1, the Lagrangian function is represented as\n\np\u22650,pT 1=1\n\nmin\n\n2\n\n1\n2\n\ng\n2\u03b3\n\nN(cid:88)\n\ni=1\n\nmin\n\np\u22650,pT 1=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)p +\n\n1\n2\n\ng\n2\u03b3\n\ngi\n2\u03b3\n\n(cid:19)2\n(cid:13)(cid:13)(cid:13)(cid:13)2\n\n,\n\n1\n2\n\n(cid:18)\n(cid:13)(cid:13)(cid:13)(cid:13)p +\n(cid:13)(cid:13)(cid:13)(cid:13)2\n\n(cid:18)\n\n(cid:19)\n\npi =\n\n\u03bb \u2212 gi\n2\u03b3\n\n,\n\n+\n\n3\n\nL(p, \u03bb, \u03c3) =\n\n(9)\nwhere \u03bb \u2208 R and \u03c3 \u2208 RN \u2265 0 are the Lagrangian multipliers. According to the KKT conditions,\nthe optimal solution to problem (8) satis\ufb01es\n\n2\n\n\u2212 \u03bb(pT 1 \u2212 1) \u2212 \u03c3T p,\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3\n\n\u2202L(p,\u03bb,\u03c3)\n\n\u2202p\n\n= 0 \u21d2 pi + gi\n\n2\u03b3 \u2212 \u03bb \u2212 \u03c3i = 0\npi \u2265 0\n\u03c3i \u2265 0\npi\u03c3i = 0\n\n.\n\nFrom the KKT conditions in (10), pi, (i = 1, 2,\u00b7\u00b7\u00b7 , N ) can be summarized as\n\n(10)\n\n(11)\n\n\fwhere the operator (\u2022)+ = max(\u2022, 0). According to Eq. (11), pi is non-negative and inversely\nproportional to gi.\nFurthermore, we attempt to determine \u03bb and \u03b3 in Eq. (11). Without loss of generality, we assume\ng1 \u2264 g2 \u2264 \u00b7\u00b7\u00b7 \u2264 gN and thus have p1 \u2265 p2 \u2265 \u00b7\u00b7\u00b7 \u2265 pN \u2265 0 based on the negative relationship\nbetween pi and gi in Eq. (11). When only k neighbors of p are considered, we have\n\nBy combining Eq. (12) with the constraint pT 1 = 1, we have\n\n(cid:19)\n\n(cid:18)\n\nk(cid:88)\n\ni=1\n\n\u03bb \u2212 gi\n2\u03b3\n\n= 1 \u21d2 \u03bb =\n\n1\nk\n\n+\n\n1\n\n2\u03b3k\n\n(12)\n\n(13)\n\nk(cid:88)\n\ni=1\n\ngi.\n\n(cid:26)\n\npk > 0 \u21d2 \u03bb \u2212 gk\npk+1 = 0 \u21d2 \u03bb \u2212 gk+1\n\n2\u03b3 > 0\n2\u03b3 \u2264 0.\n\nBased on the constraints in Eq. (12) and result in Eq. (13), the following inequality w.r.t. \u03b3 can be\n\ninferred \uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3 1\n\n1\n\nk > gk\nk \u2264 gk+1\n\n2\u03b3 \u2212 1\n2\u03b3 \u2212 1\n\n2\u03b3k\n\n2\u03b3k\n\nk(cid:80)\nk(cid:80)\n\ni=1\n\ni=1\n\ngi\n\ngi.\n\n\u21d2 k\n2\n\ngk \u2212 1\n2\n\nk(cid:88)\n\ni=1\n\ngi < \u03b3 \u2264 k\n2\n\ngk+1 \u2212 1\n2\n\nk(cid:88)\n\ni=1\n\ngi.\n\n(14)\n\nTo achieve exact k nonzero weights, the upper bound \u03b3 = k\n\ngj is selected. With \u03bb and\n\n2 gk+1 \u2212 1\n\n2\n\n\u03b3 in Eqs. (13) and (14) respectively, pi in (11) can be eventually formulated as\n\nj=1\n\nk(cid:80)\nk(cid:80)\n2 gk+1 \u2212 1\n\nj=1\n\n2\n\n2k( k\n\nk(cid:80)\nk(cid:80)\n\nj=1\n\ngj)\n\n2 gk+1 \u2212 1\n\n2\n\ngj) +\n\ngj \u2212 kgi\n\n\uf8f6\uf8f7\uf8f7\uf8f7\uf8f8\n\nj=1\n\n+\n\n(cid:18)\n(cid:32)\n\npi =\n\n=\n\n(cid:19)\n\n\uf8eb\uf8ed 1\n\nk\n\n\u03bb \u2212 gi\n2\u03b3\n\n=\n\n+\n\ngk+1 \u2212 gi\n\nkgk+1 \u2212(cid:80)k\n\nj=1 gj\n\n+\n\nk(cid:88)\n\nj=1\n\ngj \u2212 gi\n2\u03b3\n\n+\n\n(cid:33)\n\n1\n\n2\u03b3k\n\n.\n\n\uf8eb\uf8ec\uf8ec\uf8ec\uf8ed 2( k\n\n\uf8f6\uf8f8\n\n=\n\n+\n\n(15)\nFrom Eq. (15) regarding the weight pi, we could notice that 1) pi is non-negative and inversely\nproportional to gi, which ensures the local robustness of reconstruction problem (6), i.e., the term\nwith larger reconstruction error is assigned with a smaller weight; 2) if i > k, then pi = 0, which\nensures the sparsity of p in problem (6), such that only k terms with smallest reconstruction errors\nare considered or activated; 3) k is a steerable integer parameter that directly manipulates the number\nof activated samples, which indicates a global robustness to the outliers. According to Eq. (15),\nAlgorithm 1 is developed by solving the proposed RWL-AN framework in (6).\n\nAlgorithm 1: Algorithm for solving RWL-AN in (6)\nInput: a vector g = [g1, g2,\u00b7\u00b7\u00b7 , gN ]T that preserves the reconstruction errors under each sample;\nOutput: a weight vector p = [p1, p2,\u00b7\u00b7\u00b7 , pN ]T assigned to each term in the objective (6).\n1 Sort g satisfying g1 \u2264 g2 \u00b7\u00b7\u00b7 \u2264 gN ;\n\nthe integer parameter k (k \u2264 N ) that controls the number of activated samples.\n\n2 Calculate pi =\n\n, (i = 1, 2,\u00b7\u00b7\u00b7 , N );\n\n\uf8eb\uf8ed gk+1\u2212gi\nkgk+1\u2212 k(cid:80)\n\ngj\n\n\uf8f6\uf8f8\n\nj=1\n\n+\n\n4\n\n\f4 Robust PCA under RWL-AN\n\nEquipped with the RWL-AN framework in (6), we propose the robust PCA model under the proposed\nRWL-AN as\n\nN(cid:80)\n\nmin\n\npi(cid:107)xi \u2212 m \u2212 W(vi)T(cid:107)2\n\n2 + \u03b3p2\ni\n\nm,vi,p,W\n\ns.t. p \u2265 0, pT 1 = 1, WT W = I,\n\ni=1\n\nwhere W \u2208 Rd\u00d7m is the orthogonal subspaces and vi \u2208 R1\u00d7m denotes a low-dimensional repre-\nsentation of xi. Similar as problem (2), the optimal solution vi to problem (16) can be derived as\nvi = (xi \u2212 m)T W. Speci\ufb01cally speaking, the term (cid:107)xi \u2212 m \u2212 W(vi)T(cid:107)2\n2 exactly evaluates the\nreconstruction error for the i-th data point and thus satis\ufb01es the de\ufb01nition of gi in the framework (6).\nTo solve problem (16), we utilize an alternative optimization strategy, i.e., coordinate-block descent\nmethod [13].\nOptimize W & m by \ufb01xing p: When p is \ufb01xed, problem (16) degenerates to\n\npi(cid:107)(cid:16)\n\nI \u2212 WWT(cid:17)\n\nN(cid:88)\n\ni=1\n\nmin\n\nm,WT W=I\n\n(xi \u2212 m)(cid:107)2\n2,\n\nwhere m serves as the mean variable.\nTheorem 1. The optimal mean m\u2217 in problem (17) satis\ufb01es the form of\n\nm\u2217 = Xp =\n\npixi.\n\nN(cid:88)\n\ni=1\n\n(16)\n\n(17)\n\n(18)\n\n(19)\n\n(20)\n\nProof. By taking the derivative of Eq. (17) w.r.t. m and setting it to zero, we have\n\nNote that(cid:0)m1T \u2212 X(cid:1) diag(p)1 = W\u03be + W\u22a5\u03b7 via the associated orthogonal decomposition, thus\n\n(cid:16)\n\nI \u2212 WWT(cid:17)(cid:0)m1T \u2212 X(cid:1) diag(p)1 = 0.\n\nwe have\n\nW\u03be \u2212 W\u03be + W\u22a5\u03b7 \u2212 0 = 0 \u21d2 \u03b7 = 0.\n\nDue to the constraint pT 1 = 1 and diag(p)1 = p, we could further obtain that\n\nwhere \u03be is an arbitrary vector. By substituting Eq. (19), problem (17) can be rewritten as\n\nm = Xp + W\u03be,\n\npi(cid:107)(cid:16)\n\nI \u2212 WWT(cid:17)\n\nN(cid:88)\n\ni=1\n\nmin\n\nm,WT W=I\n\n(xi \u2212 Xp)(cid:107)2\n2,\n\nwhich is totally independent of \u03be. Therefore, we could select \u03be as the zero vector for the convenience,\nsuch that the optimal mean m\u2217 is represented as Eq. (18).\n\nAccording to Theorem 1, the optimal solution of m to problem (17) takes the form as derived in\n(18). Therefore, problem (17) could be further reformulated into\n\nT r(diag(p)(cid:0)X \u2212 Xdiag(p)11T(cid:1)T(cid:0)I \u2212 WWT(cid:1)(cid:0)X \u2212 Xdiag(p)11T(cid:1))\nT r(WT X(cid:0)I \u2212 diag(p)11T(cid:1) diag(p)(cid:0)I \u2212 11T diag(p)(cid:1) XT W)\n\nmin\n\u21d2 max\n\nWT W=I\n\n(21)\n\nWT W=I\n\n= max\n\nWT W=I\n\nT r(WT XDXT W),\n\nwhere D = diag(p) \u2212 ppT . Hence, W are the k eigenvector matrix corresponding to the k largest\neigenvalues of XDXT [18].\n\nOptimize p by \ufb01xing W & m: Denote ri = (cid:107)(cid:16)\nI \u2212 WWT(cid:17)\nN(cid:88)\n\nbe rewritten as\n\n(xi \u2212 m)(cid:107)2\n\n2, then problem (16) could\n\n(22)\n\nmin\n\np\u22650,pT 1=1\n\npiri + \u03b3p2\ni .\n\ni=1\n\n5\n\n\fSame as problem (6), problem (22) can be solved with the closed form solution as represented\nin Eq. (15), where gi, (i = 1, 2,\u00b7\u00b7\u00b7 , N ) is replaced by ri, (i = 1, 2,\u00b7\u00b7\u00b7 , N ). k is an integer\nparameter to determine the number of nonzero weights in p. Similarly, the i-th weight pi is inversely\nproportional to the associated reconstruction error ri to promote the local robustness. In addition, as\nfor the i-th term satisfying i \u2265 (k + 1), the related weight vanishes, such that the excessive outliers,\nwhich might potentially sabotage our model can be totally prevented. In other words, the sparsity\npromotes the global robustness of the reconstruction problem (16). According to Eqs. (18), (21),\nand (22), an ef\ufb01cient algorithm can be summarized in Algorithm 2 to solve problem (16). Since the\ncoordinate-block descent method is utilized with achieving the closed form solutions w.r.t. W, m,\nand p, Algorithm 2 monotonically converges.\n\nAlgorithm 2: Algorithm for solving robust problem (16)\nInput: an image matrix X = [x1, x2,\u00b7\u00b7\u00b7 , xN ]; the number of effective samples k.\nOutput: orthogonal subspace W \u2208 Rd\u00d7m.\n1 Initialize random p satisfying pT 1 = 1;\n2 while not converge do\n3\n4\n\nUpdate D \u2190 diag(p) \u2212 ppT ;\nUpdate W \u2190 arg max\n\nUpdate ri \u2190 (cid:107)(cid:16)\n\nI \u2212 WWT(cid:17)\n\nT r(WT XDXT W);\n(xi \u2212 Xp)(cid:107)2\n\n2, (i = 1, 2,\u00b7\u00b7\u00b7 , N );\n\nWT W=I\n\n5\n\n6\n7 end\n\nUpdate {pi}N\n\ni=1 by Algorithm 1 with inputting {ri}N\n\ni=1;\n\n5 Experiment\n\nDiverse experiments are conducted to evaluate the performance of our method. Firstly, the experi-\nmental settings are provided. Moreover, the experimental results on different tasks are recorded.\n\n5.1 Experimental Settings\n\nThe proposed robust PCA with RWL-AN is compared to the reconstruction methods including\nconventional PCA (denoted by PCA) [4], robust PCA with optimal mean (denoted by RPCA-OM)\n[10], generalized low-rank approximations of matrices (denoted by GLRAM) [18], robust 2DPCA\nwith optimal mean (denoted by R2DPCA) [20] and capped robust 2DPCA with optimal mean\n(denoted by capped R2DPCA) [20]. The integer parameter k of our method is setted as [0.85N ] (N\nis the total number of data points), such that 85% samples are assigned with non-zero weights. As for\ncapped R2DPCA, \u0001 is searched in the grid of {10, 20,\u00b7\u00b7\u00b7 , 50} and the best results are recorded.\nFour benchmark face image datasets including AT&T [1], UMIST [3], FEI and FERET [12] are\nutilized in the experiment. Table 1 reports the information for the benchmark datasets. In each\ndataset, occlusions are placed with random size (over 25% area) on part of images (number of noised\nsamples = total number of samples \u00d7 noise rate). Note that all the experiments are implemented by\nMATLAB R2015b on Windows 7 PC with 3.20 GHz i5-3470 CPU and 16.0 GB main memory.\n\nTable 1: The information of the benchmark datasets\nDataset\nFERET\n1400\n80 \u00d7 80\n\nAT&T\n400\n64 \u00d7 64\n\nFEI\n2600\n32 \u00d7 32\n\nNo. of images\nSize of images\n\nUMIST\n64 \u00d7 64\n\n575\n\nClass\n\n40\n\n20\n\n200\n\n200\n\nAll the methods are evaluated on two tasks regarding image reconstruction and clustering. As for\nthe reconstruction task, numerical results are recorded and compared. As for the clustering task, we\nemploy k-means as metric. Moreover, we run 50 times with random initialization in each experiment.\n\n6\n\n\fTable 2: Reconstruction error comparison. The best is bolded and runner-up is underlined.\n\nnoise rate\n\nours\n\nGLRAM R2DPCA\n\ncapped\nR2DPCA\n\nPCA\n\nRPCA-OM\n\nAT&T\n\nUMIST\n\nFEI\n\nFERET\n\nraw\n0.2\nraw\n0.2\nraw\n0.2\nraw\n0.2\n\n3.56\n6.65\n4.31\n8.50\n1.35\n2.19\n14.70\n23.26\n\n21.87\n59.45\n23.45\n59.18\n21.26\n24.81\n44.89\n99.61\n\n13.44\n26.91\n16.16\n28.13\n4.29\n7.45\n30.90\n51.32\n\n13.44\n21.88\n16.16\n24.11\n4.29\n6.52\n30.90\n33.20\n\n1197.25\n1291.10\n674.67\n720.49\n533.63\n505.60\n1661.45\n1603.78\n\n5.27\n19.19\n6.43\n26.66\n1.99\n10.30\n19.21\n67.75\n\n(a) AT&T\n\n(b) FEI\n\n(c) FERET\n\n(d) UMIST\n\nFigure 1: Clustering accuracy of occluded images and their reconstructed images. The x-axis\nrepresents the reduced dimensionality d1 of subspace U1 with the dimensionality d2 of U2 satisfying\nd2 = d1, while W has the dimensionality d1 \u00d7 d2.\n\n5.2 Comparison of Reconstruction Error\n\ni=1 pi(cid:107)xr\n\nstructed. The performance of the reconstructed methods are measured by(cid:80)N\n\nReconstruction problem is to seek the optimal subspace, upon which low-rank images are recon-\ni \u2212 xo\ni(cid:107)2\n2 , where\ni represents the i-th reconstructed image and xo\ni is the original image. For the fair comparison,\nxr\nweights are normalized. The reduced dimensionality for 2D method is d1 = 9, d2 = 10, such\nthat 1D methods perform with the reduced dimensionality m = 90. Table 2 records the results of\nreconstruction error comparison. From Table 2, we could conclude that\n1) As for the noised datasets, the proposed method achieves the best performance.\n2) As for the raw datasets, RPCA-OM achieves the runner-up performance, while ours and R2DPCA\noutperform GLRAM and PCA. The results also illustrate the superiority of the optimal-mean based\nPCA methods.\n3) By applying RWL-AN, the reconstruction performance of PCA is largely improved by outperform-\ning all the other competitors. Therefore, the effectiveness of the proposed framework RWL-AN is\nveri\ufb01ed.\n\n(a) AT&T\n\n(b) FERET\n\nFigure 2: Reconstruction errors of our proposed method w.r.t the varying parameter k = N \u00d7 krate\n(N is the total number of samples).\n\n7\n\nReduced dimension d1 of subspace (d1=d2)1520253035Clustering accuracy (%)3031323334353637383940oursGLRAMRPCA-OMR2DPCAcapped R2DPCABaselineReduced dimension d1 of subspace (d1=d2)1520253035Clustering accuracy (%)2830323436384042oursGLRAMRPCA-OMR2DPCAcapped R2DPCABaselineReduced dimension d1 of subspace (d1=d2)1520253035Clustering accuracy (%)26.52727.52828.52929.53030.5oursGLRAMRPCA-OMR2DPCAcapped R2DPCABaselineReduced dimension d1 of subspace (d1=d2)1520253035Clustering accuracy (%)383940414243444546474849oursGLRAMRPCA-OMR2DPCAcapped R2DPCABaseline0.50.60.70.80.9k_rate10152025Reconstruction error0.50.60.70.80.9k_rate202530354045Reconstruction error\fTable 3: CPU Time comparison (seconds) when iteration number is \ufb01xed as 50 for each algorithm.\n\nMethod\nOurs\n\nGLRAM\nR2DPCA\nRPCA-OM\n\nAT&T\n8.14\n6.81\n8.55\n848.46\n\nUMIST\n11.40\n8.78\n12.55\n925.73\n\nFEI\n16.76\n14.71\n17.49\n266.58\n\nFERET\n49.03\n45.22\n49.69\n2902.15\n\n4) When severe occlusions are involved in the datasets, robust methods as our proposed method,\nR2DPCA, capped R2DPCA, and RPCA-OM have better performance than the conventional methods\nincluding GLRAM and PCA.\nFrom Figure 1, it is noticed that the robust methods as our proposed method, R2DPCA, and capped\nR2DPCA are superior to GLRAM. The capped R2DPCA overcomes R2DPCA a little, while the\nproposed method has the outstanding performance under the most cases.\nTable 3 reports the CPU time of the comparative algorithms except for capped R2DPCA, which is\ntime-consuming due to tuning an appropriate threshold. We can conclude that the optimal mean\nbased methods including ours, R2DPCA, RPCA-OM are slower than GLRAM due to the calculation\nof optimal mean in each iteration. Besides that, the time consumption of ours and R2DPCA is similar.\nIn fact, the computation of our weight in Eq. (15) is more complicated than R2DPCA. Nonetheless,\ndue to the sparse weight in the proposed method, ours often runs faster.\n\n5.3 Comparison of Clustering\n\n(cid:80)N\n\nIn order to demonstrate the discriminative ability of the reconstructed algorithms, we further compare\nthe clustering results of the reconstructed images via k-means classi\ufb01er, where the clustering accuracy\n[11] is computed by ACC = 1\ni=1 \u03b4(li, map(ci)). li denotes real label of the i-th instance, and\nci is the corresponding clustering index. map(\u00b7) denotes a function that maps each cluster index to\nN\nthe best class label. \u03b4(\u00b7) represents the \u03b4-function, i.e., value is 1 when two input parameters are\nthe same, and 0 otherwise. Figure 1 shows the clustering results under the reconstructed image of\ndifferent algorithms.\n1) Since twenty percent of input images are occluded by noises for each dataset, the superior clustering\nperformance of the proposed method implies its stronger robustness to the outliers.\n\n5.4 Sensitivity Analysis w.r.t. Parameter k\n\nIn this part, the corresponding experiments are conducted to investigate the sensitivity of our model\n(16) regarding the parameter k . We utilize two benchmark datasets known as AT&T and FERET,\nwhose 20% samples are contaminated as previously described. We increase the degree of sparsity\nby setting krate from 0.5 to 0.95, where the parameter k is calculated by N \u00d7 krate. Moreover, the\nrelated reconstruction errors of our proposed method are shown in Figure 2.\n1) The curves in Figure 2 are steady when krate is less than 0.8. Afterwards, the curves increase\nrapidly, since 20% polluted samples are included.\n2) Our model is insensitive to parameter krate, when the krate \u2264 0.8, which is the pivotal point.\nTherefore, we can either determine krate by tuning it or simply set it as a medium value such as 0.5.\n\n6 Conclusion\n\nIn this paper, a general framework entitled RWL-AN is proposed, such that the adaptive weight\nvector is achieved automatically with the local robustness. In particular, the weight vector is sparse\nwith adaptive neighbors, i.e., the degree of the sparsity is steerable with only k activated samples of\nleast reconstruction errors. In other words, the sparsity is steerable to eliminate the excessive noised\nsamples for the global robustness. The framework is further applied to the PCA problem to achieve\nboth local and global robustness. Eventually, theoretical analysis and extensive experimental results\nare presented to validate the superiority of the proposed method.\n\n8\n\n\fAcknowledgment\n\nThis work is supported by NSF (IIS-1651203, IIS-1715385), and DHS (2017-ST-061-QA0001).\n\nReferences\n[1] Parameterisation\n\nmodel\n\nfor\n\nhuman\n\nface\n\nidenti\ufb01cation.\n\nhttp://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html., 1994.\n\nof\n\na\n\nstochastic\n\n[2] Hongchang Gao, Feiping Nie, Weidong Cai, and Heng Huang. Robust capped norm nonnegative matrix\nfactorization: Capped norm nmf. In ACM International on Conference on Information and Knowledge\nManagement, pages 871\u2013880, 2015.\n\n[3] Daniel B Graham and Nigel M Allinson. Face recognition: From theory to applications. NATO ASI Series\n\nF, Computer and Systems Sciences, (163):446\u2013456, 1998.\n\n[4] I. T Jolliffe. Principal component analysis. 2002.\n\n[5] Xuelong Li, Yanwei Pang, and Yuan Yuan. L1-norm-based 2dpca. IEEE Transactions on Systems Man &\n\nCybernetics Part B, 40(4):1170\u20131175, 2010.\n\n[6] Minnan Luo, Feiping Nie, Xiaojun Chang, Yi Yang, Alexander Hauptmann, and Qinghua Zheng. Avoiding\noptimal mean robust pca/2dpca with non-greedy l1-norm maximization. In International Joint Conference\non Arti\ufb01cial Intelligence, pages 1802\u20131808, 2016.\n\n[7] Feiping Nie, Heng Huang, Chris Ding, Dijun Luo, and Hua Wang. Robust principal component analysis\nwith non-greedy l1-norm maximization. In International Joint Conference on Arti\ufb01cial Intelligence, pages\n1433\u20131438, 2011.\n\n[8] Feiping Nie, Zhouyuan Huo, and Heng Huang. Joint capped norms minimization for robust matrix recovery.\n\nIn Twenty-Sixth International Joint Conference on Arti\ufb01cial Intelligence, pages 2557\u20132563, 2017.\n\n[9] Feiping Nie, Hua Wang, Cheng Deng, Xinbo Gao, Xuelong Li, and Heng Huang. New l1-norm relaxations\nand optimizations for graph clustering. In Thirtieth AAAI Conference on Arti\ufb01cial Intelligence, pages\n1962\u20131968, 2016.\n\n[10] Feiping Nie, Jianjun Yuan, and Heng Huang. Optimal mean robust principal component analysis. In\n\nInternational Conference on Machine Learning, pages 1062\u20131070, 2014.\n\n[11] Christos H Papadimitriou and Kenneth Steiglitz. Combinatorial optimization: algorithms and complexity.\n\nIEEE Transactions on Acoustics Speech & Signal Processing, 32(6):1258\u20131259, 1998.\n\n[12] I. Philips, H. Wechsler, J. Huang, and P. Rauss. The fere database and evaluation procedure for face\n\nrecognition algorithms. Image and Vision Computing, (16):295\u2013306, 1998.\n\n[13] P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal\n\nof Optimization Theory & Applications, 109(3):475\u2013494, 2001.\n\n[14] Haixian Wang and Jing Wang. 2dpca with l1-norm for simultaneously robust and sparse modelling. Neural\n\nNetworks the Of\ufb01cial Journal of the International Neural Network Society, 46(10):190, 2013.\n\n[15] Rong Wang, Feiping Nie, Xiaojun Yang, Feifei Gao, and Minli Yao. Robust 2dpca with non-greedy\n\nl1-norm maximization for image analysis. IEEE Transactions on Cybernetics, 45(5):1108\u20131112, 2017.\n\n[16] Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis. Chemom.intell.lab, 2(1):37\u2013\n\n52, 1987.\n\n[17] Jian Yang, David Zhang, Alejandro F. Frangi, and Jingyu Yang. Two-dimensional pca: A new approach to\nappearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell, 26(1):131\u2013\n137, 2004.\n\n[18] Jieping Ye. Generalized low rank approximations of matrices. Machine Learning, 61(1-3):167\u2013191, 2004.\n\n[19] Daoqiang Zhang and Zhi Hua Zhou. (2d)2pca: Two-directional two-dimensional pca for ef\ufb01cient face\n\nrepresentation and recognition. Neurocomputing, 69(1):224\u2013231, 2005.\n\n[20] Rui Zhang, Feiping Nie, and Xuelong Li. Auto-weighted two-dimensional principal component analysis\nwith robust outliers. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages\n6065\u20136069, 2017.\n\n9\n\n\f", "award": [], "sourceid": 3774, "authors": [{"given_name": "Rui", "family_name": "Zhang", "institution": "Arizona State University"}, {"given_name": "Hanghang", "family_name": "Tong", "institution": "University of Illinois at Urbana-Champaign"}]}