{"title": "Recovery of Sparse Probability Measures via Convex Programming", "book": "Advances in Neural Information Processing Systems", "page_first": 2420, "page_last": 2428, "abstract": "We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. It's well-known that the classical L1 regularizer fails to promote sparsity on the probability simplex since L1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known heuristics based on L1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm.", "full_text": "Recovery of Sparse Probability Measures via Convex\n\nProgramming\n\nMert Pilanci and Laurent El Ghaoui\n\nElectrical Engineering and Computer Science\n\nUniversity of California Berkeley\n\n{mert,elghaoui}@eecs.berkeley.edu\n\nBerkeley, CA 94720\n\nVenkat Chandrasekaran\n\nDepartment of Computing and Mathematical Sciences\n\nCalifornia Institute of Technology\n\nPasadena, CA 91125\n\nvenkatc@caltech.edu\n\nAbstract\n\nWe consider the problem of cardinality penalized optimization of a convex func-\ntion over the probability simplex with additional convex constraints. The classical\n(cid:96)1 regularizer fails to promote sparsity on the probability simplex since (cid:96)1 norm\non the probability simplex is trivially constant. We propose a direct relaxation of\nthe minimum cardinality problem and show that it can be ef\ufb01ciently solved using\nconvex programming. As a \ufb01rst application we consider recovering a sparse prob-\nability measure given moment constraints, in which our formulation becomes lin-\near programming, hence can be solved very ef\ufb01ciently. A suf\ufb01cient condition for\nexact recovery of the minimum cardinality solution is derived for arbitrary af\ufb01ne\nconstraints. We then develop a penalized version for the noisy setting which can\nbe solved using second order cone programs. The proposed method outperforms\nknown rescaling heuristics based on (cid:96)1 norm. As a second application we consider\nconvex clustering using a sparse Gaussian mixture and compare our results with\nthe well known soft k-means algorithm.\n\n1\n\nIntroduction\n\nWe consider optimization problems of the following form,\n\np\u2217 =\n\nmin\n\nx\u2208C, 1T x=1, x\u22650\n\nf (x) + \u03bbcard(x)\n\nwhere f is a convex function, C is a convex set, card(x) denotes the number of nonzero elements of\nx and \u03bb \u2265 0 is a given tradeoff parameter for adjusting desired sparsity. Since the cardinality penalty\nis inherently of combinatorial nature, these problems are in general not solvable in polynomial-time.\nIn recent years (cid:96)1 norm penalization as a proxy for penalizing cardinality has attracted a great deal\nof attention in machine learning, statistics, engineering and applied mathematics [1], [2], [3], [4].\nHowever the aforementioned types of sparse probability optimization problems are not amenable to\nthe (cid:96)1 heuristic since (cid:107)x(cid:107)1 = 1T x = 1 is constant on the probability simplex. Numerous prob-\nlems in machine learning, statistics, \ufb01nance and signal processing fall into this category however to\nthe authors\u2019 knowledge there is no known general convex optimization strategy for such problems\nconstrained on the probability simplex. The aim of this paper is to claim that the reciprocal of the\n\n\f(a) Level sets of the regularization function\non the probability simplex\n\n1\n\nmaxi xi\n\n(b) The sparsest probability distribution on the set C\nis x\u2217 (green) which also minimizes\non the\nintersection (red)\n\nmaxi xi\n\n1\n\nFigure 1: Probability simplex and the reciprocal of the in\ufb01nity norm\n\n1\n\nmaxi xi\n\nin\ufb01nity-norm, i.e.,\ncan be used as a convex heuristic for penalizing cardinality on the prob-\nability simplex and the resulting relaxations can be solved via convex optimization. Figure 1(a) and\n1(b) depict the level sets and an example of a sparse probability measure which has maximal in\ufb01nity\nnorm. In the following sections we expand our discussion by exploring two speci\ufb01c problems: re-\ncovering a measure from given moments where f = 0 and C is af\ufb01ne, and convex clustering where\nf is a log-likelihood and C = R. For the former case we give a suf\ufb01cient condition for this convex\nrelaxation to exactly recover the minimal cardinality solution of p\u2217. We then present numerical sim-\nulations for the both problems which suggest that the proposed scheme offers a very ef\ufb01cient convex\nrelaxation for penalizing cardinality on the probability simplex.\n\n2 Optimizing over sparse probability measures\n\nWe begin the discussion by \ufb01rst taking an alternative approach to the cardinality penalized optimiza-\ntion by directly lower-bounding the original hard problem using the following relation\n\n(cid:107)x(cid:107)1 =\n\n|xi| \u2264 card(x) max\n\ni\n\n|xi| \u2264 card(x)(cid:107)x(cid:107)\u221e\n\nn(cid:88)\n\ni=1\n\n(cid:26)\n\nwhich is essentially one of the core motivations of using (cid:96)1 penalty as a proxy for cardinality.\nWhen constrained to the probability simplex, the lower-bound for the cardinality simply becomes\n\u2264 card(x). Using this bound on the cardinality, we immediately have a lower-bound on our\noriginal NP-hard problem which we denote by p\u2217\n\u221e:\n\nmaxi xi\n\n1\n\np\u2217 \u2265 p\u2217\n\n\u221e :=\n\nmin\n\nx\u2208C, 1T x=1, x\u22650\n\nf (x) + \u03bb\n\n1\n\nmaxi xi\n\n(1)\n\n1\n\nmaxi xi\n\nThe function\nis concave and hence the above lower-bounding problem is not a convex\noptimization problem. However below we show that the above problem can be exactly solved using\nconvex programming.\nProposition 2.1. The lower-bounding problem de\ufb01ned by p\u2217\nfollowing n convex programs in n + 1 dimensions:\n\n\u221e can be globally solved using the\n\np\u2217 \u2265 p\u2217\n\n\u221e = min\n\ni=1,...,n\n\nx\u2208C, 1T x=1, x\u22650, t\u22650\n\nmin\n\nf (x) + t\n\n: xi \u2265 \u03bb/t\n\n.\n\n(2)\n\n(cid:27)\n\nNote that the constraint xi \u2265 \u03bb/t is jointly convex since 1/t is convex in t \u2208 R+, and they can be\nhandled in most of the general purpose convex optimizers, e.g. cvx, using either the positive inverse\nfunction or rotated cone constraints.\n\nC x * \fProof.\n\np\u2217\n\u221e =\n\nmin\n\nx\u2208C, 1T x=1, x\u22650\n\nf (x) + min\n\ni\n\n= min\n\ni\n\nmin\n\nx\u2208C, 1T x=1, x\u22650\n\nf (x) +\n\n\u03bb\nxi\n\u03bb\nxi\n\n= min\n\ni\n\nmin\n\nx\u2208C, 1T x=1, x\u22650,t\u22650\n\nf (x) + t s.t.\n\n\u2264 t\n\n\u03bb\nxi\n\n(3)\n\n(4)\n\n(5)\n\nThe above formulation can be used to ef\ufb01ciently approximate the original cardinality constrained\nproblem by lower-bounding for arbitrary convex f and C.\nIn the next section we show how to\ncompute the quality of approximation.\n\n2.1 Computing a bound on the quality of approximation\n\nBy the virtue of being a relaxation to the original cardinality problem, we have the following remark-\nable property. Let \u02c6x be an optimal solution to the convex program p\u2217\n\u221e, then we have the following\nrelation\n\n(6)\nSince the left-hand side and right-hand side of the above bound are readily available when p\u2217\n\u221e\nde\ufb01ned in (2) is solved, we immediately have a bound on the quality of relaxation. More speci\ufb01cally\nthe relaxation is exact, i.e., we \ufb01nd a solution for the original cardinality penalized problem, if the\nfollowing holds:\n\nf (\u02c6x) + \u03bbcard(\u02c6x) \u2265 p\u2217 \u2265 p\u2217\n\u221e\n\nf (\u02c6x) + \u03bbcard(\u02c6x) = p\u2217\n\u221e\n\nIt should be noted that for general cardinality penalized problems, using (cid:96)1 heuristic does not yield\nsuch a quality bound, since it is not a lower or upper bound in general. Moreover most of the known\nequivalence conditions for (cid:96)1 heuristics such as Restricted Isometry Property and variants are NP-\nhard to check. Therefore a remarkable property of the proposed scheme is that it comes with a\nsimple computable bound on the quality of approximation.\n\n3 Recovering a Sparse Measure\n\nSuppose that \u00b5 is a discrete probability measure and we would like to know the sparsest measure\nsatisfying some arbitrary moment constraints:\n\np\u2217 = min\n\n\u00b5\n\ncard(\u00b5)\n\n: E\u00b5[Xi] = bi, i = 1, . . . , m\n\nwhere Xi\u2019s are random variables and E\u00b5 denotes expectation with respect to the measure \u00b5. One\nmotivation for the above problem is the fact that it upper-bounds the minimum entropy power prob-\nlem:\n\nexp H(\u00b5)\n\n: E\u00b5[Xi] = bi, i = 1, . . . , m\n\nwhere H(\u00b5) := \u2212(cid:80)\n\np\u2217 \u2265 min\n\n\u00b5\n\ni \u00b5i log \u00b5i is the Shannon entropy. Both of the above problems are non-convex\n\nand in general very hard to solve.\nWhen viewed as a \ufb01nite dimensional optimization problem the minimum cardinality problem can\nbe cast as a linear sparse recovery problem:\np\u2217 = min\n\n: Ax = b\n\ncard(x)\n\n(7)\n\n1T x=1, x\u22650\n\nAs noted previously, applying the (cid:96)1 heuristic doesn\u2019t work and it does not even yield a unique\nsolution when the problem is underdetermined since it simply solves a feasibility problem:\n\np\u2217\n1 =\n\n=\n\nmin\n\n1T x=1, x\u22650\n\nmin\n\n1T x=1, x\u22650\n\n: Ax = b\n\n(cid:107)x(cid:107)1\n1 : Ax = b\n\n(8)\n\n(9)\n\n\fand recovers the true minimum cardinality solution if and only if the set 1T x = 1, x \u2265 0, Ax = b is\na singleton. This condition may hold in some cases, i.e. when the \ufb01rst 2k\u2212 1 moments are available,\ni.e., A is a Vandermonde matrix where k = card(x) [6]. However in general this set is a polyhedron\ncontaining dense vectors. Below we show how the proposed scheme applies to this problem.\nUsing general form in (2), the proposed relaxation is given by the following,\n\n(p\u2217)\u22121 \u2264 (p\u2217\n\n\u221e)\u22121 = max\n\ni=1,...,n\n\nmax\n\n1T x=1, x\u22650\n\nxi\n\n: Ax = b\n\n.\n\n(10)\n\n(cid:26)\n\n(cid:27)\n\n(cid:27)\n\n(cid:26)\n\nwhich can be solved very ef\ufb01ciently by solving n linear programs in n variables. The total complex-\nity is at most O(n4) using a primal-dual LP solver.\nIt\u2019s easy to check that strong duality holds and the dual problems are given by the following:\n\n\u221e)\u22121 = max\n(p\u2217\n\ni=1,...,n\n\nwT b + \u03bb : AT w + \u03bb1 \u2265 ei\n\nmin\nw, \u03bb\n\n.\n\n(11)\n\nwhere 1 is the all ones vector and ei is all zeros with a one in only i\u2019th coordinate.\n\n3.1 An alternative minimal cardinality selection scheme\n\nWhen the desired criteria is to \ufb01nd a minimum cardinality probability vector satisfying Ax = b, the\nfollowing alternative selection scheme offers a further re\ufb01nement, by picking the lowest cardinality\nsolution among the n linear programming solutions. De\ufb01ne\n\n\u02c6xi : = arg max\n\n1T x=1, x\u22650\n\nxi\n\n: Ax = b\n\n\u02c6xmin : = arg min\n\ni=1,...,n\n\ncard(\u02c6xi)\n\n(12)\n\n(13)\n\nThe following theorem gives a suf\ufb01cient condition for the recovery of a sparse measure using the\nabove method.\nTheorem 3.1. Assume that the solution to p\u2217 in (7) is unique and given by x\u2217. If the following\ncondition holds\n\nmin\n\n1T x=1, y\u22650, 1T y=1\n\nxi s.t. ASx = AScy > 0\n\nwhere b = Ax\u2217 and AS is the submatrix containing columns of A corresponding to non-zero ele-\nments of x\u2217 and ASc is the submatrix of remaining columns, then the convex linear program\n\nmax\n\n1T x=1, x\u22650\n\nxi\n\n: Ax = b\n\nhas a unique solution given by x\u2217.\nLet Conv(a1, . . . , am) denote the convex hull of the m vectors {a1, . . . , am}. The following corol-\nlary depicts a geometric condition for recovery.\nCorollary 3.2. If Conv(ASc ) does not intersect an extreme point of Conv(AS) then \u02c6xmin = x\u2217,\ni.e. we recover the minimum cardinality solution using n linear programs.\n\nProof Outline:\nConsider k\u2019th inner linear program de\ufb01ned in the problem p\u2217\n\u221e. Using the optimality conditions of\nthe primal-dual linear program pairs in (10) and (11), it can be shown that the existence of a pair\n(w, \u03bb) satisfying\n\nAT\nS w + \u03bb1 = ek\nAT\nScw + \u03bb1 > 0\n\n(14)\n(15)\nimplies that the support of solution of the linear program is exactly equal to the support of x\u2217, and\nin particular they have the same cardinality. Since the solution of p\u2217 is unique and has minimum\ncardinality, we conclude that x\u2217 is indeed the unique solution to the k\u2019th linear program. Applying\nFarkas\u2019 lemma and duality theory we arrive at the conditions de\ufb01ned in Theorem 3.1. The corollary\nfollows by \ufb01rst observing that the condition of Theorem 3.1 is satis\ufb01ed if Conv(ASc ) does not\nintersect an extreme point of Conv(AS). Finally observe that if any of the n linear programs recover\nthe minimal cardinality solution then \u02c6xmin = x\u2217, since card(\u02c6xmin) \u2264 card(\u02c6xk), \u2200k.\n\n\f3.2 Noisy measure recovery\n\n(cid:26)\n\nWhen the data contains noise and inaccuracies, such as the case when using empirical moments\ninstead of exact moments, we propose the following noise-aware robust version, which follows\nfrom the general recipe given in the \ufb01rst section:\n\n(cid:107)Ax \u2212 b(cid:107)2\n\n: xi \u2265 \u03bb/t\n\nmin\n\nmin\n\ni=1,...,n\n\n1T x=1, x\u22650,t\u22650\n\n(16)\nwhere \u03bb \u2265 0 is a penalty parameter for encouraging sparsity. The above problem can be solved\nusing n second-order cone programs in n + 1 variables, hence has O(n4) worst case complexity.\nThe proposed measure recovery algorithms are investigated and compared with a known suboptimal\nheuristic in Section 6.\n\n2 + t\n\n.\n\n(cid:27)\n\n4 Convex Clustering\n\nIn this section we base our discussion on the exemplar based convex clustering framework of [8].\nGiven a set of data points {z1, . . . , zn} of d-dimensional vectors, the task of clustering is to \ufb01t a\nmixture probability model to maximize the log likelihood function\n\nn(cid:88)\n\n\uf8ee\uf8f0 k(cid:88)\n\nL :=\n\n1\nn\n\nlog\n\nxjf (zi; mj)\n\ni=1\n\nj=1\n\nwhere f (z; m) is an exponential family distribution on Z with parameter m, and x is a k-dimensional\nvector on the probability simplex denoting the mixture weights. For the standard multivariate Nor-\nmal distribution we have f (zi; mj) = e\u2212\u03b2(cid:107)zi\u2212mj(cid:107)2\n2 for some parameter \u03b2 > 0. As in [8] we\u2019ll\nfurther assume that the mean parameter mj is one of the examples zi which is unknown a-priori.\nThis assumption helps to simply the log-likelihood whose data dependence is now only through a\nkernel matrix Kij := e\u2212\u03b2(cid:107)zi\u2212zj(cid:107)2\n\n2 as follows\n\n\uf8f9\uf8fb\n\n\uf8f9\uf8fb\n\nwhere H(x) is the Shannon entropy of the mixture probability vector.\nApplying our convexi\ufb01cation strategy, we arrive at another upper-bound which can be computed via\nconvex optimization\n\nc \u2264 p\u2217\np\u2217\n\n\u221e := max\n\n1T x=1, x\u22650\n\nxjKij\n\nmaxi xi\n\ni=1\n\nj=1\n\nWe investigate the above approach in a numerical example in Section 6 and compare with the well-\nknown soft k-means algorithm.\n\nPartitioning the data {z1, . . . , zn} into few clusters is equivalent to have a sparse mixture x, i.e.,\neach example is assigned to few centers (which are some other examples). Therefore to cluster the\ndata we propose to approximate the following cardinality penalized problem,\n\nAs hinted previously, the above problem can be seen as a lower-bound for the entropy penalized\nproblem\n\nL =\n\n=\n\n1\nn\n\n1\nn\n\nlog\n\nxje\u2212\u03b2(cid:107)zi\u2212zj(cid:107)2\n\n2\n\nlog\n\nxjKij\n\nn(cid:88)\nn(cid:88)\n\ni=1\n\ni=1\n\nn(cid:88)\n\ni=1\n\nn(cid:88)\n\ni=1\n\nj=1\n\nj=1\n\n\uf8ee\uf8f0 k(cid:88)\n\uf8ee\uf8f0 k(cid:88)\n\uf8ee\uf8f0 k(cid:88)\n\uf8ee\uf8f0 k(cid:88)\n\uf8ee\uf8f0 k(cid:88)\n\nlog\n\nj=1\n\nj=1\n\nn(cid:88)\n\n\uf8f9\uf8fb\n\uf8f9\uf8fb \u2212 \u03bbcardx\n\uf8f9\uf8fb \u2212 \u03bb exp H(x)\n\uf8f9\uf8fb \u2212\n\n\u03bb\n\np\u2217\nc := max\n\n1T x=1, x\u22650\n\nlog\n\nxjKij\n\nc \u2264 max\np\u2217\n\n1T x=1, x\u22650\n\nlog\n\nxjKij\n\n(17)\n\n(18)\n\n(19)\n\n(20)\n\n(21)\n\n\f5 Algorithms\n\n5.1 Exponentiated Gradient\n\nemploys the Kullback-Leibler divergence D(x, y) = (cid:80)\n\nExponentiated gradient [7] is a proximal algorithm to optimize over the probability simplex which\nbetween two probability distri-\nbutions. For minimizing a convex function \u03c8 the exponentiated gradient updates are given by the\nfollowing:\n\ni xi log xi\nyi\n\nxk+1 = arg min\n\nx\n\n\u03c8(xk) + \u2207\u03c8(xk)T (x \u2212 xk) +\n\nD(x, xk)\n\n1\n\u03b1\n\nWhen applied to the general form of 2 it yields the following updates to solve the i\u2019th problem of\np\u2217\n\u221e\n\n\uf8eb\uf8ed(cid:88)\n\n\uf8f6\uf8f8\ni = exp(cid:0)\u03b1(\u2207if (xk) \u2212 \u03bb/x2\ni )(cid:1)\n\nxk+1\ni = rk\n\ni xk\ni /\n\nrk\nj xk\nj\n\nj\n\nrk\n\nwhere the weights ri are exponentiated gradients:\n\nWe also note that the above updates can be done in parallel for the n convex programs, and they are\nguaranteed to converge to the optimum.\n\n6 Numerical Results\n\n6.1 Recovering a Measure from Gaussian Measurements\n\nHere we show that the proposed recovery scheme is able to recover a sparse measure exactly with\noverwhelming probability, when the matrix A \u2208 Rm\u00d7n is chosen from the independent Gaussian\nensemble, i.e, Ai,j \u223c N (0, 1) i.i.d.\nAs an alternative method we consider a commonly employed simple heuristic to optimize over\na probability measure which \ufb01rst drops the constraint 1T x = 1 and solves the corresponding (cid:96)1\npenalized problem. And \ufb01nally rescales the optimal x such that 1T x = 1. In the worst case, this\nprocedure recovers the true solution whenever minimizing (cid:96)1-norm recovers the solution, i.e., when\nthere is only one feasible vector satistfying Ax = b and x \u2265 0, 1T x = 1. This is clearly a\nsuboptimal approach and we will refer it as the rescaling heuristic. We set n = 50 and randomly\npick a probability vector x\u2217 which is k sparse, let b = Ax\u2217 be m noiseless measurements, then\ncheck the probability of recovery, i.e. \u02c6x = x\u2217 where \u02c6x is the solution to,\n\n(cid:26)\n\n(cid:27)\n\nmax\n\ni=1,...,n\n\nmax\n\n1T x=1, x\u22650\n\nxi\n\n: Ax = b\n\n.\n\nFigure 2(a) shows the probability of exact recovery as a function of m, the number of measurements,\nin 100 independent realizations of A for the proposed LP formulation and the rescaling heuristic.\nAs it can be seen in Figure 2(a), the proposed method recovers the correct measure with probability\nalmost 1 when m \u2265 5. Quite interestingly the rescaling heuristic doesn\u2019t succeed to recover the true\nmeasure with high probability even for a cardinality 2 vector.\nWe then add normal distributed noise with standard deviation 0.1 on the observations and solve,\n\nmin\n\ni=1,...,n\n\nmin\n\n1T x=1, x\u22650,t\u22650\n\n(cid:107)Ax \u2212 b(cid:107)2\n\n2 + t\n\n: xi \u2265 \u03bb/t\n\n.\n\n(23)\n\nWe compare the above approach by the corresponding rescaling heuristic, which \ufb01rst solves a non-\nnegative Lasso,\n\n(cid:26)\n\n(cid:27)\n\n(cid:107)Ax \u2212 b(cid:107)2\n\n2 + \u03bb(cid:107)x(cid:107)1\n\nmin\nx\u22650\n\nthen rescales x such that 1T x = 1. For each realization of A and measurement noise we run both\nmethods using a primal-dual interior point solver for 30 equally spaced values of \u03bb \u2208 [0, 10] and\nrecord the minimum error (cid:107)\u02c6x \u2212 x\u2217(cid:107)1. The average error over 100 realizations are shown in Figure\n2(b). Is it can be seen in the \ufb01gure the proposed scheme clearly outperforms the rescaling heuristic\nsince it can utilize the fact that x is on the probability simplex, without trivializing it\u2019s complexity\nregularizer.\n\n(22)\n\n(24)\n\n\f(a) Probability of exact recovery as a function of m\n\n(b) Average error for noisy recovery as a function of m\n\nFigure 2: A comparison of the exact recovery probability in the noiseless setting (top) and estimation\nerror in the noisy setting (bottom) of the proposed approach and the rescaled (cid:96)1 heuristic\n\n6.2 Convex Clustering\n\nWe generate synthetic data using a Gaussian mixture of 10 components with identity covariances\nand cluster the data using the proposed method, the resulting clusters given by the mixture density is\npresented in Figure 3. The centers of the circles represent the means of the mixture components and\nthe radii are proportional to the respective mixture weights. We then repeat the clustering procedure\nusing the well known soft k-means algorithm and present the results in Figure 4.\nAs it can be seen from the \ufb01gures the proposed convex relaxation is able to penalize the cardinality\non the mixture probability vector and produce clusters signi\ufb01cantly better than soft k-means algo-\nrithm. Note that soft k-means is a non-convex procedure whose performance depends heavily on\nthe initialization. The proposed approach is convex hence insensitive to the initializations. Note that\nin [8] the number of clusters are adjusted indirectly by varying the \u03b2 parameter of the distribution.\nIn contrast our approach tries to implicitly optimizes the likelihood/cardinality tradeoff by varying\n\u03bb. Hence when the number of clusters is unknown, choosing a value of \u03bb is usually easier than\nspeci\ufb01cying a value of k for the k-means algorithms.\n\n7 Conclusions and Future Directions\n\nWe presented a convex cardinality penalization scheme for problems constrained on the probability\nsimplex. We then derived a suf\ufb01cient condition for recovering the sparsest probability measure in\nan af\ufb01ne space using the proposed method. The geometric interpretation suggests that it holds for a\nlarge class of matrices. An open theoretical question is to analyze the probability of exact recovery\nfor a normally distributed A. Another interesting direction is to extend the recovery analysis to the\nnoisy setting and arbitrary functions such as the log-likelihood in the clustering example. There\nmight also be other problems where proposed approach could be practically useful such as portfolio\noptimization, where a sparse convex combination of assets is sought or sparse multiple kernel learn-\n\n12345678900.10.20.30.40.50.60.70.80.91m \u2212 number of measurements (moment constraints)Probability of Exact Recovery in 100 independent trials of A Rescaling L1 HeuristicProposed relaxation12345678900.20.40.60.811.21.41.61.82m \u2212 number of measurements (moment constraints)Averaged error of estimating the true measure : ||x\u2212xt||1 Rescaling L1 HeuristicProposed relaxation\f(a) \u03bb = 1000\n\n(b) \u03bb = 300\n\n(c) \u03bb = 100\n\n(d) \u03bb = 45\n\nFigure 3: Proposed convex clustering scheme\n\n(a) k = 3\n\n(b) k = 4\n\n(c) k = 8\n\n(d) k = 10\n\nFigure 4: Soft k-means algorithm\n\ning.\nAcknowledgements This work is partially supported by the National Science Foundation under\nGrants No. CMMI-0969923, FRG-1160319, and SES-0835531, as well as by a University of Cali-\nfornia CITRIS seed grant, and a NASA grant No. NAS2-03144. The authors would like to thank the\nArea Editor and the reviewers for their careful review of our submission.\n\n\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\u22121.5\u22121\u22120.500.511.52\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.511.5\fReferences\n[1] E.J. Cand\u00b4es, T. Tao, \u201dDecoding by linear programming\u201d. IEEE Trans. Inform. Theory 51\n\n(2005), 4203-4215.\n\n[2] S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit\u201d SIAM Review,\n\n43(1):129-159, 2001.\n\n[3] A. Bruckstein, D. Donoho, and M. Elad. \u201dFrom sparse solutions of systems of equations to\n\nsparse modeling of signals and images\u201d. SIAM Review, 2007.\n\n[4] V. Chandrasekaran, B. Recht, P.A. Parrilo, and A.S. Willsky. \u201dThe convex algebraic geometry\nof linear inverse problems\u201d. In Communication, Control, and Computing (Allerton), 2010 48th\nAnnual Allerton Conference on, pages 699-703, 2010.\n\n[5] S. Boyd and L. Vandenberghe, \u201dConvex Optimization\u201d. Cambridge, U.K.: Cambridge Univ.\n\nPress, 2003.\n\n[6] A. Cohen and A. Yeredor, \u201dOn the use of sparsity for recovering discrete probability distribu-\n\ntions from their moments\u201d. Statistical Signal Processing Workshop (SSP), 2011 IEEE\n\n[7] J. Kivinen and M. Warmuth. \u201dExponentiated gradient versus gradient descent for linear predic-\n\ntors\u201d. Information and Computation, 132(1):1-63, 1997.\n\n[8] D. Lashkari and P. Golland, \u201dConvex clustering with exemplar-based models\u201d, in NIPS, 2008.\n\n\f", "award": [], "sourceid": 1166, "authors": [{"given_name": "Mert", "family_name": "Pilanci", "institution": null}, {"given_name": "Laurent", "family_name": "Ghaoui", "institution": null}, {"given_name": "Venkat", "family_name": "Chandrasekaran", "institution": null}]}