{"title": "Structured Matrix Recovery via the Generalized Dantzig Selector", "book": "Advances in Neural Information Processing Systems", "page_first": 3252, "page_last": 3260, "abstract": "In recent years, structured matrix recovery problems have gained considerable attention for its real world applications, such as recommender systems and computer vision. Much of the existing work has focused on matrices with low-rank structure, and limited progress has been made on matrices with other types of structure. In this paper we present non-asymptotic analysis for estimation of generally structured matrices via the generalized Dantzig selector based on sub-Gaussian measurements. We show that the estimation error can always be succinctly expressed in terms of a few geometric measures such as Gaussian widths of suitable sets associated with the structure of the underlying true matrix. Further, we derive general bounds on these geometric measures for structures characterized by unitarily invariant norms, a large family covering most matrix norms of practical interest. Examples are provided to illustrate the utility of our theoretical development.", "full_text": "Structured Matrix Recovery via the Generalized\n\nDantzig Selector\n\nSheng Chen\nArindam Banerjee\nDept. of Computer Science & Engineering\n\nUniversity of Minnesota, Twin Cities\n\n{shengc,banerjee}@cs.umn.edu\n\nAbstract\n\nIn recent years, structured matrix recovery problems have gained considerable\nattention for its real world applications, such as recommender systems and computer\nvision. Much of the existing work has focused on matrices with low-rank structure,\nand limited progress has been made on matrices with other types of structure. In\nthis paper we present non-asymptotic analysis for estimation of generally structured\nmatrices via the generalized Dantzig selector based on sub-Gaussian measurements.\nWe show that the estimation error can always be succinctly expressed in terms of a\nfew geometric measures such as Gaussian widths of suitable sets associated with\nthe structure of the underlying true matrix. Further, we derive general bounds on\nthese geometric measures for structures characterized by unitarily invariant norms,\na large family covering most matrix norms of practical interest. Examples are\nprovided to illustrate the utility of our theoretical development.\n\n1\n\nIntroduction\n\nStructured matrix recovery has found a wide spectrum of applications in real world, e.g., recommender\nsystems [22], face recognition [9], etc. The recovery of an unknown structured matrix \u0398\u2217 \u2208 Rd\u00d7p\nessentially needs to consider two aspects: the measurement model, i.e., what kind of information\nabout the unknown matrix is revealed from each measurement, and the structure of the underlying\nmatrix, e.g., sparse, low-rank, etc. In the context of structured matrix estimation and recovery, a\nwidely used measurement model is the linear measurement, i.e., one has access to n observations\nof the form yi = (cid:104)(cid:104)\u0398\u2217, Xi(cid:105)(cid:105) + \u03c9i for \u0398\u2217, where (cid:104)(cid:104)\u00b7,\u00b7(cid:105)(cid:105) denotes the matrix inner product, i.e.,\n(cid:104)(cid:104)A, B(cid:105)(cid:105) = Tr(AT B) for any A, B \u2208 Rd\u00d7p, and \u03c9i\u2019s are additive noise. In the literature, various\ntypes of measurement matrices Xi has been investigated, for example, Gaussian ensemble where Xi\nconsists of i.i.d. standard Gaussian entries [11], rank-one projection model where Xi is randomly\ngenerated with constraint rank(Xi) = 1 [7]. A special case of rank-one projection is the matrix\ncompletion model [8], in which Xi has a single entry equal to 1 with all the rest set to 0, i.e., yi\ntakes the value of one entry from \u0398\u2217 at each measurement. Other measurement models include\nrow-and-column af\ufb01ne measurement [34], exponential family matrix completion [21, 20], etc.\nPrevious work has shown that low-complexity structure of \u0398\u2217, often captured by a small value of\nsome norm R(\u00b7), can signi\ufb01cantly bene\ufb01t its recovery [11, 26]. For instance, one of the popular\nstructures of \u0398\u2217 is low-rank, which can be approximated by a small value of trace norm (i.e., nuclear\nnorm) (cid:107) \u00b7 (cid:107)tr. Under the low-rank assumption of \u0398\u2217, recovery guarantees have been established for\ndifferent measurement matrices using convex programs, e.g., trace-norm regularized least-square\nestimator [10, 27, 26, 21],\n\nn(cid:88)\n\ni=1\n\nmin\n\n\u0398\u2208Rd\u00d7p\n\n1\n2\n\n(yi \u2212 (cid:104)(cid:104)Xi, \u0398(cid:105)(cid:105))2 + \u03b2n(cid:107)\u0398\u2217(cid:107)tr ,\n\n(1)\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\f(cid:32) n(cid:88)\n\ni=1\n\n(cid:33)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n(cid:88)\n\ni=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)op\n\nand constrained trace-norm minimization estimators [10, 27, 11, 7, 20], such as\n\u2264 \u03bbn ,\n\n((cid:104)(cid:104)Xi, \u0398(cid:105)(cid:105) \u2212 yi) Xi\n\n(cid:107)\u0398(cid:107)tr\n\ns.t.\n\nmin\n\n\u0398\u2208Rd\u00d7p\n\n(2)\n\nwhere \u03b2n, \u03bbn are tuning parameters, and (cid:107) \u00b7 (cid:107)op denotes the operator (spectral) norm. Among the\nconvex approaches, the exact recovery guarantee of a matrix-form basis-pursuit [14] estimator was\nanalyzed for the noiseless setting in [27], under certain matrix-form restricted isometry property\n(RIP). In the presence of noise, [10] also used matrix RIP to establish the recovery error bound for\nboth regularized and constraint estimators, i.e., (1) and (2). In [7], a variant of estimator (2) was\nproposed and its recovery guarantee was built on a so-called restricted uniform boundedness (RUB)\ncondition, which is more suitable for the rank-one projection based measurement model. Despite\nthe fact that the low-rank structure has been well studied, only a few works extend to more general\nstructures. In [26], the regularized estimator (1) was generalized by replacing the trace norm with a\ndecomposable norm R(\u00b7) for other structures. [11] extended the estimator in [27] with (cid:107)\u00b7(cid:107)tr replaced\nby a norm from a broader class called atomic norm, but the consistency of the estimator is only\navailable when the noise vector is bounded.\nIn this work, we make two key contributions. First, we present a general framework for estimation of\nstructured matrices via the generalized Dantzig sector (GDS) [12, 6] as follows\n\n\u02c6\u0398 = argmin\n\u0398\u2208Rd\u00d7p\n\nR(\u0398) s.t. R\u2217\n\n((cid:104)(cid:104)Xi, \u0398(cid:105)(cid:105) \u2212 yi) Xi\n\n\u2264 \u03bbn ,\n\n(3)\n\nin which R(\u00b7) can be any norm and its dual norm is R\u2217(\u00b7). GDS has been studied in the context of\nstructured vectors [12], so (3) can be viewed as a natural generalization to matrices. Note that the\nestimator (2) is a special case of the formulation above, as operator norm is dual to trace norm. Our\ndeterministic analysis of the estimation error (cid:107) \u02c6\u0398 \u2212 \u0398\u2217(cid:107)F relies on a condition based on a suitable\nchoice of \u03bbn and the restricted strong convexity (RSC) condition [26, 3]. By assuming sub-Gaussian\nXi and \u03c9i, we show that these conditions are satis\ufb01ed with high probability, and the recovery error\ncan be expressed in terms of certain geometric measures of sets associated with \u0398\u2217. Such a geometric\ncharacterization is inspired by related advances in recent years [26, 11, 3]. One key ingredient in\nsuch characterization is the Gaussian width [18], which measures the size of sets in Rd\u00d7p. Related\nadvances can be found in [11, 12, 6], but they all rely on the Gaussian measurements, to which\nclassical concentration results [18] are directly applicable. In contrast, our work allows general\nsub-Gaussian measurement matrices and noise, by suitably using ideas from generic chaining [30], a\npowerful geometric approach to bounding stochastic processes. Our results can also be extended to\nheavy tailed measurement and noise, following recent advances [28]. Recovery guarantees of the\nGDS were analyzed for general norms in matrix completion setting [20], but it is different from our\nwork since its measurement model is not sub-Gaussian as we consider.\nOur second contribution is motivated by the fact that though certain existing analyses end up with\nthe geometric measures such as Gaussian widths, limited attention has been paid in bounding these\nmeasures in terms of more easily understandable quantities especially for matrix norms. Here our key\nnovel contribution is deriving general bounds for those geometric measures for the class of unitarily\ninvariant norms, which are invariant under any unitary transformation, i.e., for any matrix \u0398 \u2208 Rd\u00d7p,\nits norm value is equal to that of U \u0398V if both U \u2208 Rd\u00d7d and V \u2208 Rp\u00d7p are unitary matrices. The\nwidely-used trace norm, spectral norm and Frobenius norm all belong to this class. A well-known\nresult is that any unitarily invariant matrix norm is equivalent to some vector norm applied on the\nset of singular values [23] (see Lemma 1 for details), and this equivalence allows us to build on the\ntechniques developed in [13] for vector norms to derive the bounds of the geometric measures for\nunitarily invariant norms. Previously these general bounds were not available in the literature for the\nmatrix setting, and bounds were only in terms the geometric measures, which can be hard to interpret\nor bound in terms of understandable quantities. We illustrate concrete versions of the general bounds\nusing the trace norm and the recently proposed spectral k-support norm [24].\nThe rest of the paper is organized as follows: we \ufb01rst provide the deterministic analysis in Section 2.\nIn Section 3, we introduce some probability tools, which are used in the later analysis. In Section 4,\nwe present the probabilistic analysis for sub-Gaussian measurement matrices and noise, along with\nthe general bounds of the geometric measures for unitarily invariant norms. Section 5 is dedicated to\nthe examples for the application of general bounds, and we conclude in Section 6.\n\n2\n\n\f2 Deterministic Recovery Guarantees\nTo evaluate the performance of GDS (3), we focus on the Frobenius-norm error, i.e., (cid:107) \u02c6\u0398 \u2212 \u0398\u2217(cid:107)F .\nThroughout the paper, w.l.o.g. we assume that d \u2264 p. For convenience, we denote the collection of\nXi\u2019s by X = {Xi}n\ni=1, and let \u03c9 = [\u03c91, \u03c92, . . . , \u03c9n]T be the noise vector. In the following theorem,\nwe provide a deterministic bound for (cid:107) \u02c6\u0398 \u2212 \u0398\u2217(cid:107)F under some standard assumptions on \u03bbn and X.\nTheorem 1 De\ufb01ne the set ER(\u0398\u2217) = cone{ \u2206 \u2208 Rd\u00d7p | R(\u2206 + \u0398\u2217) \u2264 R(\u0398\u2217)} . Assume that\n\n\u03bbn \u2265 R\u2217\n\n\u03c9iXi\n\n, and\n\n(cid:104)(cid:104)Xi, \u2206(cid:105)(cid:105)2/ (cid:107)\u2206(cid:107)2\n\nF \u2265 \u03b1 > 0, \u2200 \u2206 \u2208 ER(\u0398\u2217) .\n\n(cid:32) n(cid:88)\n\n(cid:33)\n\nn(cid:88)\n\n(4)\n\n(5)\n\ni=1\n\ni=1\n\nThen the estimation (cid:107) \u02c6\u0398 \u2212 \u0398\u2217(cid:107)F error satis\ufb01es\n\n(cid:107) \u02c6\u0398 \u2212 \u0398\u2217(cid:107)F \u2264 2\u03a8R(\u0398\u2217)\u03bbn\n\n\u03b1\n\n,\n\nwhere \u03a8R(\u00b7) is the restricted compatibility constant de\ufb01ned as \u03a8R(\u0398\u2217) = sup\u2206\u2208ER(\u0398\u2217)\nThe proof is deferred to the supplement. The convex cone ER(\u0398\u2217) plays a important role in character-\nizing the error bound, and its geometry is determined by R(\u00b7) and \u0398\u2217. The recovery bound assumes\nno knowledge of the norm R(\u00b7) and true matrix \u0398\u2217, thus allowing general structures. The second\ncondition in 4 is often referred to as restricted strong convexity [26]. In this work, we are particularly\ninterested in R(\u00b7) from the class of unitarily invariant matrix norm, which essentially satis\ufb01es the fol-\nlowing property, R(\u0398) = R(U \u0398V ) for any \u0398 \u2208 Rd\u00d7p and unitary matrices U \u2208 Rd\u00d7d, V \u2208 Rp\u00d7p.\nA useful result for such norms is given in Lemma 1 (see [23, 4] for details).\n\nR(\u2206)\n(cid:107)\u2206(cid:107)F\n\n.\n\nthe singular values of a matrix \u0398 \u2208 Rd\u00d7p are given by \u03c3 =\nLemma 1 Suppose that\n[\u03c31, \u03c32, . . . , \u03c3d]T . A unitarily invariant norm R : Rd\u00d7p (cid:55)\u2192 R can be characterized by some symmet-\nric gauge function1 f : Rd (cid:55)\u2192 R as R(\u0398) = f (\u03c3), and its dual norm is given by R\u2217(\u0398) = f\u2217(\u03c3).\n\nAs the sparsity of \u03c3 equals the rank of \u0398, the class of unitarily invariant matrix norms is useful in\nstructured low-rank matrix recovery and includes many widely used norms, e.g., trace norm with\nf (\u00b7) = (cid:107)\u00b7(cid:107)1, Frobenius norm with f (\u00b7) = (cid:107)\u00b7(cid:107)2, Schatten p-norm with f (\u00b7) = (cid:107)\u00b7(cid:107)p, Ky Fan k-norm\nwhen f (\u00b7) is the (cid:96)1 norm of the largest k elements in magnitude, etc.\nBefore proceeding with the analysis, we introduce some notations. For the rest of paper, we denote\nby \u03c3(\u0398) \u2208 Rd the vector of singular values (sorted in descending order) of matrix \u0398 \u2208 Rd\u00d7p,\nand may use the shorthand \u03c3\u2217 for \u03c3(\u0398\u2217). For any \u03b8 \u2208 Rd, we de\ufb01ne the corresponding |\u03b8|\u2193 by\narranging the absolute values of elements of \u03b8 in descending order. Given any matrix \u0398 \u2208 Rd\u00d7p\nand subspace M \u2286 Rd\u00d7p, we denote by \u0398M the orthogonal projection of \u0398 onto M. Besides we\nlet colsp(\u0398) (rowsp(\u0398)) be the subspace spanned by columns (rows) of \u0398. The notation Sdp\u22121\nrepresents the unit sphere of Rd\u00d7p, i.e., the set {\u0398|(cid:107)\u0398(cid:107)F = 1}. The unit ball of norm R(\u00b7) is denoted\nby \u2126R = {\u0398 | R(\u0398) \u2264 1}. Throughout the paper, the symbols c, C, c0, C0, etc., are reserved for\nuniversal constants, which may be different at each occurrence.\nIn the rest of our analysis, we will frequently use the so-called ordered weighted (cid:96)1 (OWL) norm\nfor Rd [17], which is de\ufb01ned as (cid:107)\u03b8(cid:107)w (cid:44) (cid:104)|\u03b8|\u2193,|w|\u2193(cid:105), where w \u2208 Rd is a prede\ufb01ned weight vector.\nNoting that the OWL norm is a symmetric gauge, we de\ufb01ne the spectral OWL norm for \u0398 as:\n(cid:107)\u0398(cid:107)w (cid:44) (cid:107)\u03c3(\u0398)(cid:107)w, i.e., applying the OWL norm on \u03c3(\u0398).\n\n3 Background and Preliminaries\n\nThe tools for our probabilistic analysis include the notion of Gaussian width [18], sub-Gaussian\nrandom matrices, and generic chaining [30]. Here we brie\ufb02y introduce the basic ideas and results for\neach of them as needed for our analysis.\n\n1Symmetric gauge function is a norm that is invariant under sign-changes and permutations of the elements.\n\n3\n\n\f3.1 Gaussian width and sub-Gaussian random matrices\nThe Gaussian width can be de\ufb01ned for any subset A \u2286 Rd\u00d7p as follows [18, 19],\n\n(cid:104)\n\n(cid:104)(cid:104)G, Z(cid:105)(cid:105)(cid:105)\n\nw(A) (cid:44) EG\n\n,\n\nsup\nZ\u2208A\n\n(6)\nwhere G is a random matrix with i.i.d. standard Gaussian entries, i.e., Gij \u223c N (0, 1). The Gaussian\nwidth essentially measures the size of the set A, and some of its properties can be found in [11, 1].\n\u2264 \u03ba if |||(cid:104)(cid:104)X, Z(cid:105)(cid:105)|||\u03c82\nA random matrix X is sub-Gaussian with |||X|||\u03c82\n\u2264 \u03ba for any Z \u2208 Sdp\u22121,\nwhere the \u03c82 norm for sub-Gaussian random variable x is de\ufb01ned as |||x|||\u03c82\n2 (E|x|q)\n= supq\u22651 q\u2212 1\n(see [31] for more details of \u03c82 norm). One nice property of sub-Gaussian random variable is the\nthin tail, i.e., P(|x| > \u0001) \u2264 e \u00b7 exp(\u2212c\u00012/(cid:107)x(cid:107)2\n), in which c is a constant.\n\n1\nq\n\n\u03c82\n\n3.2 Generic chaining\n\nGeneric chaining is a powerful tool for bounding the supreme of stochastic processes [30]. Suppose\n{Zt}t\u2208T is a centered stochastic process, where each Zt is a centered random variable. We assume\nthe index set T is endowed with some metric s(\u00b7,\u00b7).\nIn order to use generic chaining bound,\nthe critical condition for {Zt}t\u2208T to satisfy is that, for any u, v \u2208 T , P (|Zu \u2212 Zv| \u2265 \u0001) \u2264 c1 \u00b7\n\nexp(cid:0)\u2212c2\u00012/s2(u, v)(cid:1), where c1 and c2 are constants. Under this condition, we have\n(cid:17) \u2264 C2 exp(cid:0)\u2212\u00012(cid:1) ,\n\n|Zu \u2212 Zv| \u2265 C1 (\u03b32(T , s) + \u0001 \u00b7 diam (T , s))\n\nZt] \u2264 c0\u03b32 (T , s) ,\n\nE[sup\nt\u2208T\n\nP(cid:16)\n\n(7)\n\n(8)\n\nsup\nu,v\u2208T\n\nwhere diam (T , s) is the diameter of set T w.r.t. the metric s(\u00b7,\u00b7). (7) is often referred to as generic\nchaining bound (see Theorem 2.2.18 and 2.2.19 in [30]), and (8) is the Theorem 2.2.27 in [30]. The\nfunctional \u03b32(T , s) essentially measures the geometric size of the set T under the metric s(\u00b7,\u00b7). To\navoid unnecessary complications, we omit the de\ufb01nition of \u03b32(T , s) here (see Chapter 2 of [30] for\nan introduction if one is interested), but provide two of its properties below,\n\n\u03b32(T , s1) \u2264 \u03b32(T , s2) if s1(u, v) \u2264 s2(u, v) \u2200 u, v \u2208 T ,\n\n\u03b32(T , \u03b7s) = \u03b7 \u00b7 \u03b32(T , s) for any \u03b7 > 0 .\n\n(9)\n(10)\n\nThe important aspect of \u03b32-functional is the following majorizing measure theorem [29, 30].\n\nThen \u03b32(T , s) can be upper bounded by \u03b32(T , s) \u2264 C0E [supt\u2208T Yt].\n\nTheorem 2 Given any Gaussian process {Yt}t\u2208T , de\ufb01ne s(u, v) =(cid:112)E|Yu \u2212 Yv|2 for u, v \u2208 T .\nGiven Theorem 2, the metric s(U, V ) =(cid:112)E|(cid:104)(cid:104)G, U \u2212 V (cid:105)(cid:105)|2 = (cid:107)U \u2212 V (cid:107)F . Therefore we have\n\nThis theorem is essentially Theorem 2.4.1 in [30]. For our purpose, we simply focus on the Gaussian\nprocess {Y\u2206 = (cid:104)(cid:104)G, \u2206(cid:105)(cid:105)}\u2206\u2208A, in which A \u2286 Rd\u00d7p and G is a standard Gaussian random matrix.\n\n\u03b32 (A,(cid:107) \u00b7 (cid:107)F ) \u2264 C0E[ sup\n\u2206\u2208A\n\n(cid:104)(cid:104)G, \u2206(cid:105)(cid:105)] = C0w(A) ,\n\n(11)\n\n4 Error Bounds with Sub-Gaussian Measurement and Noise\n\nThough the deterministic recovery bound (5) in Section 2 applies to any measurement X and noise\n\u03c9 as long as the assumptions in (4) are satis\ufb01ed, it is of practical interest to express the bound in\nterms of the problem parameters, e.g., d, p and n, for random X and \u03c9 sampled from some general\nand widely used family of distributions. For this work, we assume that Xi\u2019s in X are i.i.d. copies of\n\u2264 \u03ba for a constant \u03ba, and the\na zero-mean random vector X, which is sub-Gaussian with |||X|||\u03c82\nnoise \u03c9 contains i.i.d. centered random variables with (cid:107)\u03c9i(cid:107)\u03c82 \u2264 \u03c4 for a constant \u03c4. In this section,\nwe show that each quantity in (5) can be bounded using certain geometric measures associated with\nthe true matrix \u0398\u2217. Further, we show that for unitarily invariant norms, the geometric measures can\nthemselves be bounded in terms of d, p, n, and structures associated with \u0398\u2217.\n\n4\n\n\f4.1 Bounding restricted compatibility constant\n\nGiven the de\ufb01nition of restricted compatibility constant in Theorem 1, it involves no randomness and\npurely depends on R(\u00b7) and the geometry of ER(\u0398\u2217). Hence we directly work on its upper bound for\nunitarily invariant norms. In general, characterizing the error cone ER(\u0398\u2217) is dif\ufb01cult, especially for\nnon-decomposable R(\u00b7). To address the issue, we \ufb01rst de\ufb01ne the seminorm below.\nDe\ufb01nition 1 Given two orthogonal subspaces M1,M2 \u2286 Rd\u00d7p and two vectors w, z \u2208 Rd, the\nsubspace spectral OWL seminorm for Rd\u00d7p is de\ufb01ned as (cid:107)\u0398(cid:107)w,z (cid:44) (cid:107)\u0398M1(cid:107)w + (cid:107)\u0398M2(cid:107)z, where\n\u0398M1 and \u0398M2 are the orthogonal projections of \u0398 onto M1 and M2, respectively.\nNext we will construct such a seminorm based on a subgradient \u03b8\u2217 of the symmetric gauge f\nassociated with R(\u00b7) at \u03c3\u2217, which can be obtained by solving the so-called polar operator [32]\n\n\u03b8\u2217 \u2208 argmax\nx:f\u2217(x)\u22641\n\n(cid:104)x, \u03c3\u2217(cid:105) .\n\n(12)\n\nmax/\u03b8\u2217\n\nmax (\u03b8\u2217\n\n2, . . . , \u03b8\u2217\n\nmin (if \u03b8\u2217\n\nr , 0, . . . , 0]T \u2208 Rd, z = [\u03b8\u2217\n\nGiven that \u03c3\u2217 is sorted, w.l.o.g. we may assume that \u03b8\u2217 is nonnegative and sorted because (cid:104)\u03c3\u2217, \u03b8\u2217(cid:105) \u2264\n(cid:104)\u03c3\u2217,|\u03b8\u2217|\u2193(cid:105) and f\u2217(\u03b8\u2217) = f\u2217(|\u03b8\u2217|\u2193). Also, we denote by \u03b8\u2217\nmin) the largest (smallest) element\nmin = 0, we de\ufb01ne \u03c1 = +\u221e). Throughout the paper, we\nof the \u03b8\u2217, and de\ufb01ne \u03c1 = \u03b8\u2217\nwill frequently use these notations. As shown in the lemma below, a constructed seminorm based on\n\u03b8\u2217 will induce a set E(cid:48) that contains ER(\u0398\u2217) and is considerably easier to work with.\nLemma 2 Assume that rank(\u0398\u2217) = r and its compact SVD is given by \u0398\u2217 = U \u03a3V T , where\nU \u2208 Rd\u00d7r, \u03a3 \u2208 Rr\u00d7r and V \u2208 Rp\u00d7r.\nLet \u03b8\u2217 be any subgradient of f (\u03c3\u2217), w =\nd, 0, . . . , 0]T \u2208 Rd, U = colsp(U )\n[\u03b8\u2217\n1, \u03b8\u2217\nr+2, . . . , \u03b8\u2217\nand V = rowsp(V T ), and de\ufb01ne M1, M2 as M1 = {\u0398 | colsp(\u0398) \u2286 U, rowsp(\u0398) \u2286\nV}, M2 = {\u0398 | colsp(\u0398) \u2286 U\u22a5, rowsp(\u0398) \u2286 V\u22a5}, where U\u22a5, V\u22a5 are orthogonal comple-\nments of U and V respectively. Then the speci\ufb01ed subspace spectral OWL seminorm (cid:107) \u00b7 (cid:107)w,z satis\ufb01es\nER(\u0398\u2217) \u2286 E(cid:48) (cid:44) cone{\u2206 | (cid:107)\u2206 + \u0398\u2217(cid:107)w,z \u2264 (cid:107)\u0398\u2217(cid:107)w,z}\nThe proof is given in the supplementary. Base on the superset E(cid:48), we are able to bound the restricted\ncompatibility constant for unitarily invariant norms by the following theorem.\nTheorem 3 Assume there exist \u03b71 and \u03b72 such that the symmetric gauge f for R(\u00b7) satis\ufb01es f (\u03b4) \u2264\nmax{\u03b71(cid:107)\u03b4(cid:107)1, \u03b72(cid:107)\u03b4(cid:107)2} for any \u03b4 \u2208 Rd. Then given a rank-r \u0398\u2217, the restricted compatibility constant\n\u03a8R(\u0398\u2217) is upper bounded by\n\nr+1, \u03b8\u2217\n\n\u03a8R(\u0398\u2217) \u2264 2\u03a6f (r) + max(cid:8)\u03b72, \u03b71(1 + \u03c1)\n\nr(cid:9) ,\n\n\u221a\n\n(13)\nmin, and \u03a6f (r) = sup(cid:107)\u03b4(cid:107)0\u2264r f (\u03b4)/(cid:107)\u03b4(cid:107)2 is called sparse compatibility constant.\n\nwhere \u03c1 = \u03b8\u2217\n\nmax/\u03b8\u2217\n\nRemark: The assumption for Theorem 3 might seem cumbersome at the \ufb01rst glance, but the different\ncombinations of \u03b71 and \u03b72 give us more \ufb02exibility. In fact, it trivially covers two cases, \u03b72 = 0 along\nwith f (\u03b4) \u2264 \u03b71(cid:107)\u03b4(cid:107)1 for any \u03b4, and the other way around, \u03b71 = 0 along with f (\u03b4) \u2264 \u03b72(cid:107)\u03b4(cid:107)2.\n\n4.2 Bounding restricted convexity \u03b1\n\nThe second condition in (4) is equivalent to(cid:80)n\n\nthe following theorem, we express the restricted convexity \u03b1 in terms of Gaussian width.\n\ni=1(cid:104)(cid:104)Xi, \u2206(cid:105)(cid:105)2 \u2265 \u03b1 > 0, \u2200 \u2206 \u2208 ER(\u0398\u2217) \u2229 Sdp\u22121. In\n\nTheorem 4 Assume that Xi\u2019s are i.i.d. copies of a centered isotropic sub-Gaussian random\nmatrix X with |||X|||\u03c82\n\u2264 \u03ba, and let AR(\u0398\u2217) = ER(\u0398\u2217) \u2229 Sdp\u22121. With probability at least\n1 \u2212 exp(\u2212\u03b6w2(AR(\u0398\u2217))), the following inequality holds with absolute constant \u03b6 and \u03be,\n\ninf\n\u2206\u2208A\n\n1\nn\n\n(cid:104)(cid:104)Xi, \u2206(cid:105)(cid:105)2 \u2265 1 \u2212 \u03be\u03ba2 \u00b7 w(AR(\u0398\u2217))\n\n\u221a\n\nn\n\n.\n\n(14)\n\nn(cid:88)\n\ni=1\n\n5\n\n\fThe proof is essentially an application of generic chaining [30] and the following theorem from [25].\nRelated line of works can be found in [15, 16, 5].\n\nTheorem 5 (Theorem D in [25]) There exist absolute constants c1, c2, c3 for which the following\nholds. Let (\u2126, \u00b5) be a probability space, H be a subset of the unit sphere of L2(\u00b5), i.e., H \u2286 SL2 =\n{h : |||h|||L2\n\u2264 \u03ba. Then, for any \u03b2 > 0 and n \u2265 1 satisfying\nc1\u03ba\u03b32(H,|||\u00b7|||\u03c82\n\nn, with probability at least 1 \u2212 exp(\u2212c2\u03b22n/\u03ba4), we have\n\n= 1}, and assume suph\u2208H |||h|||\u03c82\n\n) \u2264 \u03b2\n\n\u221a\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) 1\n\nn\n\nh2(Xi) \u2212 E(cid:2)h2(cid:3)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264 \u03b2 .\n\nn(cid:88)\n\ni=1\n\nsup\nh\u2208H\n\n(15)\n\nProof of Theorem 4:\nprobability space that X is de\ufb01ned on, and construct\n\nFor simplicity, we use A as shorthand for AR(\u0398\u2217). Let (\u2126, \u00b5) be the\n\nH = {h(\u00b7) = (cid:104)(cid:104)\u00b7, \u2206(cid:105)(cid:105) | \u2206 \u2208 A} .\n\n\u2264 \u03ba immediately implies that suph\u2208H |||h|||\u03c82\n\n|||X|||\u03c82\n\u2264 \u03ba. As X is isotropic, i.e., E[(cid:104)(cid:104)X, \u2206(cid:105)(cid:105)2] = 1\nfor any \u2206 \u2208 A \u2286 Sdp\u22121, thus H \u2286 SL2 and E[h2] = 1 for any h \u2208 H. Given h1 = (cid:104)(cid:104)\u00b7, \u22061(cid:105)(cid:105), h2 =\n(cid:104)(cid:104)\u00b7, \u22062(cid:105)(cid:105) \u2208 H, where \u22061, \u22062 \u2208 A, the metric induced by \u03c82 norm satis\ufb01es |||h1 \u2212 h2|||\u03c82\n=\n\u2264 \u03ba(cid:107)\u22061 \u2212 \u22062(cid:107)F . Using the properties of \u03b32-functional and the majorizing\n|||(cid:104)(cid:104)X, \u22061 \u2212 \u22062(cid:105)(cid:105)|||\u03c82\nmeasure theorem in Section 3, we have\n\u03b32(H,|||\u00b7|||\u03c82\n) \u2264 \u03b2\n\n) \u2264 \u03ba\u03b32(A,(cid:107) \u00b7 (cid:107)F ) \u2264 \u03bac4w(A) ,\n\u221a\nwhere c4 is an absolute constant. Hence, by choosing \u03b2 = c1c4\u03ba2w(A)/\ncondition c1\u03ba\u03b32(H,|||\u00b7|||\u03c82\nleast 1 \u2212 exp(\u2212c2c2\n4w2(A)), we have suph\u2208H\nn(cid:88)\n1c2\n\ni=1 h2(Xi) \u2212 1(cid:12)(cid:12) \u2264 \u03b2, which implies\n(cid:80)n\n\nn, we can guarantee that\nn holds for H. Applying Theorem 5 to this H, with probability at\n\n(cid:12)(cid:12) 1\n\n\u221a\n\nn\n\ninf\n\u2206\u2208A\n\n1\nn\n\ni=1\n\n(cid:104)(cid:104)Xi, \u2206(cid:105)(cid:105)2 \u2265 1 \u2212 \u03b2 .\n\n1c2\n\n4, \u03be = c1c4, we complete the proof.\n\nLetting \u03b6 = c2c2\nThe bound (14) involves the Gaussian width of set AR(\u0398\u2217), i.e., the error cone intersecting with unit\nsphere. For unitarily invariant R, the theorem below provides a general way to bound w(AR(\u0398\u2217)).\nTheorem 6 Under the setting of Lemma 2, let \u03c1 = \u03b8\u2217\nmin and rank(\u0398\u2217) = r. The Gaussian\nwidth w(AR(\u0398\u2217)) satis\ufb01es\n\nmax/\u03b8\u2217\n\n(cid:110)(cid:112)dp,(cid:112)(2\u03c12 + 1) (d + p \u2212 r) r\n\n(cid:111)\n\nw(AR(\u0398\u2217)) \u2264 min\n\n.\n\n(16)\n\nThe proof of Theorem 6 is included in the supplementary material, which relies on a few speci\ufb01c\nproperties of Gaussian random matrix [1, 11].\n\nIn view of Theorem 1, we should choose the \u03bbn large enough to satisfy the condition in (4). Hence\n\ni=1 \u03c9iXi), which holds with high probability.\n\ni=1 are i.i.d. copies of a centered isotropic sub-Gaussian\n\u2264 \u03ba, and the noise \u03c9 consists of i.i.d. centered entries with\n\u2264 \u03c4. Let \u2126R be the unit ball of R(\u00b7) and \u03b7 = sup\u2206\u2208\u2126R (cid:107)\u2206(cid:107)F . With probability at least\n\n3\u03b72(cid:1), the following inequality holds\n\n4.3 Bounding regularization parameter \u03bbn\n\nwe an upper bound for random quantity R\u2217 ((cid:80)n\n1 \u2212 exp(\u2212c1n) \u2212 c2 exp(cid:0)\u2212w2(\u2126R)/c2\n(cid:32) n(cid:88)\n\nTheorem 7 Assume that X = {Xi}n\nrandom matrix X with |||X|||\u03c82\n|||\u03c9i|||\u03c82\n\n(cid:33)\n\nR\u2217\n\n\u2264 c0\u03ba\u03c4 \u00b7 \u221a\n\n\u03c9iXi\n\ni=1\n\n6\n\nnw(\u2126R) .\n\n(17)\n\n\fi\n\n\u03c82\n\n=\n\n\u221a\n\nP((cid:107)\u03c9(cid:107)2\n\n2|||\u03c9i|||\u03c82\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\u03c81\n\n\u2264 4|||\u03c9i|||2\n\n[31]. By Bernstein\u2019s inequality, we get\n\n2\u03c4, and(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\u03c92\n\n\u2264\n\u2264 4\u03c4 2, where we use the de\ufb01nition of \u03c82 norm and its relation to \u03c81 norm\n\nProof: For each entry in \u03c9, we have(cid:112)E[\u03c92\ni ](cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\u03c81\n2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\u03c92\ni ] \u2264 \u221a\n2 \u2212 2\u03c4 2 \u2265 \u0001) \u2264 P(cid:0)(cid:107)\u03c9(cid:107)2\n2] \u2265 \u0001(cid:1) \u2264 exp(cid:0)\u2212c1 min(cid:0)\u00012/16\u03c4 4n, \u0001/4\u03c4 2(cid:1)(cid:1) .\nTaking \u0001 = 4\u03c4 2n, we have P(cid:0)(cid:107)\u03c9(cid:107)2 \u2265 \u03c4\n6n(cid:1) \u2264 exp (\u2212c1n). Denote Yu =(cid:80)n\n(cid:118)(cid:117)(cid:117)(cid:116) n(cid:88)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\u03c82\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\nFor any u \u2208 Sn\u22121, we get |||Yu|||\u03c82\n\n\u2264 c\u03ba for any \u2206 \u2208 Sdp\u22121.\n\ni=1 uiXi for u \u2208 Rn.\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n(cid:88)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)\n\ni|||(cid:104)(cid:104)Xi, \u2206(cid:105)(cid:105)|||2\nu2\n\n|||(cid:104)(cid:104)Yu, \u2206(cid:105)(cid:105)|||\u03c82\n\n2 \u2212 E[(cid:107)\u03c9(cid:107)2\n\nui(cid:104)(cid:104)Xi, \u2206(cid:105)(cid:105)\n\n\u2264 c\u03ba due to\n\ni \u2212 E[\u03c92\n\n\u2264 c\n\n\u221a\n\ni=1\n\ni=1\n\n=\n\n\u03c82\n\nFor the rest of the proof, we may drop the subscript of Yu for convenience. We construct the stochastic\nprocess {Z\u2206 = (cid:104)(cid:104)Y, \u2206(cid:105)(cid:105)}\u2206\u2208\u2126R, and note that any ZU and ZV from this process satisfy\n\nP (|ZU \u2212 ZV | \u2265 \u0001) = P (|(cid:104)(cid:104)Y, U \u2212 V (cid:105)(cid:105)| \u2265 \u0001) \u2264 e \u00b7 exp(cid:0)\u2212C\u00012/\u03ba2(cid:107)U \u2212 V (cid:107)2\n\n(cid:1) ,\n\nF\n\nfor some universal constant C due to the sub-Gaussianity of Y . As \u2126R is symmetric, it follows that\n\nsup\n\nU,V \u2208\u2126R\n\n|ZU \u2212 ZV | = 2 sup\n\u2206\u2208\u2126R\n\nZ\u2206 ,\n\nsup\n\nU,V \u2208\u2126R\n\n(cid:107)U \u2212 V (cid:107)F = 2 sup\n\u2206\u2208\u2126R\n\n(cid:107)\u2206(cid:107)F = 2\u03b7 .\n\nLet s(\u00b7,\u00b7) be the metric induced by norm \u03ba(cid:107) \u00b7 (cid:107)F and T = \u2126R. Using deviation bound (8), we have\n\nZ\u2206 \u2265 c4\u03ba (\u03b32(\u2126R,(cid:107) \u00b7 (cid:107)F ) + \u0001 \u00b7 2\u03b7)\n\n(cid:19)\n\n\u2264 c2 exp(cid:0)\u2212\u00012(cid:1) ,\n\n(cid:18)\n\nP\n\n2 sup\n\u2206\u2208\u2126R\n\nwhere c2 and c4 are absolute constant. By (11), there exist constants c3 and c5 such that\nP (2R\u2217(Y ) \u2265 c5\u03ba (w(\u2126R) + \u0001)) = P\n\nZ\u2206 \u2265 c5\u03ba (w(\u2126R) + \u0001)\n\n\u2264 c2 exp(cid:0)\u2212\u00012/c2\n3\u03b72(cid:1) .\n\n(cid:19)\n\n(cid:18)\n\n2 sup\n\u2206\u2208\u2126R\n\n(cid:32)\n\nLetting \u0001 = w(\u2126R), we have P (R\u2217(Yu) \u2265 c5\u03baw(\u2126R)) \u2264 c2 exp\n\u221a\nSn\u22121. Combining this with the bound for (cid:107)\u03c9(cid:107)2 and letting c0 =\n\n(cid:33)\n\n(cid:32) n(cid:88)\nP (R\u2217 (Yu) \u2265 c5\u03baw(\u2126R)) + P(cid:16)(cid:107)\u03c9(cid:107)2 \u2265 \u03c4\n\n\u2265 c0\u03ba\u03c4\n\nnw(\u2126R)\n\n(cid:33)\n\n\u2264 P\n\n\u03c9iXi\n\n\u221a\n\ni=1\n\n(cid:18) R\u2217 (Y\u03c9)\n\nP\n\nR\u2217\n\n\u2264 sup\nu\u2208Sn\u22121\n\nwhich completes the proof.\n\n(cid:16)\u2212 (w(\u2126R)/c3\u03b7)2(cid:17)\n+ P(cid:16)(cid:107)\u03c9(cid:107)2 \u2265 \u03c4\n\n6c5, by union bound, we have\n\u221a\n\n(cid:19)\n\nfor any u \u2208\n\n(cid:17)\n\n(cid:107)\u03c9(cid:107)2\n\u221a\n\n6n\n\n\u2265 c5\u03baw(\u2126R)\n\n(cid:17) \u2264 c2 exp(cid:0)\u2212w2(\u2126R)/c2\n\n6n\n\n3\u03b72(cid:1) + exp (\u2212c1n) ,\n\nThe theorem above shows that the lower bound of \u03bbn depends on the Gaussian width of the unit ball\nof R(\u00b7). Next we give its general bound for the unitarily invariant matrix norm.\nTheorem 8 Suppose that the symmetric gauge f associated with R(\u00b7) satis\ufb01es f (\u00b7) \u2265 \u03bd(cid:107) \u00b7 (cid:107)1. Then\nthe Gaussian width w(\u2126R) is upper bounded by\n\nw(\u2126R) \u2264\n\np\n\n.\n\n(18)\n\n\u221a\n\n\u221a\n\nd +\n\u03bd\n\n5 Examples\nCombining results in Section 4, we have that if the number of measurements n > O(w2(AR(\u0398\u2217))),\n\u221a\nthen the recovery error, with high probability, satis\ufb01es (cid:107) \u02c6\u0398\u2212 \u0398\u2217(cid:107)F \u2264 O (\u03a8R(\u0398\u2217)w(\u2126R)/\nn). Here\nwe give two examples based on the trace norm [10] and the recently proposed spectral k-support\nnorm [24] to illustrate how to bound the geometric measures and obtain the error bound.\n\n7\n\n\f\u221a\n\n5.1 Trace norm\nTrace norm has been widely used in low-rank matrix recovery. The trace norm of \u0398\u2217 is basically\nthe (cid:96)1 norm of \u03c3\u2217, i.e., f = (cid:107) \u00b7 (cid:107)1. Now we turn to the three geometric measures. Assuming that\nrank(\u0398\u2217) = r (cid:28) d, one subgradient of (cid:107)\u03c3\u2217(cid:107)1 is \u03b8\u2217 = [1, 1, . . . , 1]T .\nRestricted compatibility constant \u03a8tr(\u0398\u2217): It is obvious that assumption in Theorem 3 will hold\nfor f by choosing \u03b71 = 1 and \u03b72 = 0, and we have \u03c1 = 1. The sparse compatibility constant \u03a6(cid:96)1(r)\nis\n\nr because (cid:107)\u03b4(cid:107)1 \u2264 \u221a\n\u221a\nGaussian width w(Atr(\u0398\u2217)): As \u03c1 = 1, Theorem 6 implies that w(Atr(\u0398\u2217)) \u2264(cid:112)3r(d + p \u2212 r).\nr(cid:107)\u03b4(cid:107)2 for any r-sparse \u03b4. Using Theorem 3, we have \u03a8tr(\u0398\u2217) \u2264 4\nGaussian width w(\u2126tr): Using Theorem 8 with \u03bd = 1, it is easy to see that w(\u2126tr) \u2264 \u221a\nPutting all the results together, we have (cid:107) \u02c6\u0398 \u2212 \u0398\u2217(cid:107)F \u2264 O((cid:112)rd/n +(cid:112)rp/n) holds with high\nprobability when n > O(r(d + p \u2212 r)), which matches the bound in [8].\n\nd +\n\n\u221a\n\np.\n\nr.\n\n5.2 Spectral k-support norm\n\nk\n\ni\n\ni\n\n,\n\n(cid:111)\n\n\u221a\nmax{(cid:107) \u00b7 (cid:107)2,(cid:107) \u00b7 (cid:107)1/\n\n(cid:44) inf\n\n(cid:107)\u03b8(cid:107)sp\n\n(cid:88)\n\n(cid:12)(cid:12)(cid:12) (cid:107)ui(cid:107)0 \u2264 k,\n\n(cid:110)(cid:88)\nThe k-support norm proposed in [2] is de\ufb01ned as\n(cid:107)ui(cid:107)2\nk = (cid:107)|\u03b8|\u2193\n\n(19)\nand its dual norm is simply given by (cid:107)\u03b8(cid:107)sp\u2217\n1:k(cid:107)2. It is shown that k-support norm has similar\nbehavior as elastic-net regularizer [33]. Spectral k-support norm (denoted by (cid:107) \u00b7 (cid:107)sk) of \u0398\u2217 is de\ufb01ned\nby applying the k-support norm on \u03c3\u2217, i.e., f = (cid:107) \u00b7 (cid:107)sp\nk , which has demonstrated better performance\nthan trace norm in matrix completion task [24]. For simplicity, We assume that rank(\u0398\u2217) = r = k\nand (cid:107)\u03c3\u2217(cid:107)2 = 1. One subgradient of (cid:107)\u03c3\u2217(cid:107)sp\nRestricted compatibility constant \u03a8sk(\u0398\u2217): The following relation has been shown for k-support\nnorm in [2],\n\nk can be \u03b8\u2217 = [\u03c3\u2217\n\n2, . . . , \u03c3\u2217\n\nr , . . . , \u03c3\u2217\n\n1, \u03c3\u2217\n\nr , \u03c3\u2217\n\nui = \u03b8\n\nk} .\n\nk} \u2264 (cid:107) \u00b7 (cid:107)sp\n\nk \u2264\nHence the assumption in Theorem 3 will hold for \u03b71 =\n\u221a\nThe sparse compatibility constant \u03a6sp\nUsing Theorem 3, we have \u03a8sk(\u0398\u2217) \u2264 2\n\n(20)\n\u221a\n1/\u03c3\u2217\n2, and we have \u03c1 = \u03c3\u2217\nr .\nk and \u03b72 =\nk = (cid:107)\u03b4(cid:107)2 for any k-sparse \u03b4.\nk (k) = 1 because (cid:107)\u03b4(cid:107)sp\n\u221a\n\u221a\nk (r) = \u03a6sp\n1/\u03c3\u2217\n2 (3 + \u03c3\u2217\n\nGaussian width w(Ask(\u0398\u2217)): Theorem 6 implies w(Ask(\u0398\u2217)) \u2264(cid:112)r(d + p \u2212 r) [2\u03c3\u22172\nO((cid:112)rd/n +(cid:112)rp/n) when n > O(r(d + p \u2212 r)). The spectral k-support norm was \ufb01rst introduced\n\nr + 1].\nGaussian width w(\u2126sk): The relation above for k-support norm shown in [2] also implies that\n\u03bd = 1/\nGiven the upper bounds for geometric measures, with high probability, we have (cid:107) \u02c6\u0398 \u2212 \u0398\u2217(cid:107)F \u2264\n\nr. By Theorem 8, we get w(\u2126sk) \u2264 \u221a\n\n\u221a\nk = 1/\n\n2 (1 + \u03c3\u2217\n\n1 /\u03c3\u22172\n\n1/\u03c3\u2217\nr ).\n\n\u221a\nr(\n\nin [24], in which no statistical results are provided. Although [20] investigated the statistical aspects\nof spectral k-support norm in matrix completion setting, the analysis was quite different from our\nsetting. Hence this error bound is new in the literature.\n\nr ) =\n\n\u221a\n2 max{(cid:107) \u00b7 (cid:107)2,(cid:107) \u00b7 (cid:107)1/\n\n(cid:113) 2\n\nr ]T .\n\n\u221a\n\nd +\n\np).\n\n2 +\n\n\u221a\n\n\u221a\n\n6 Conclusions\n\nIn this work, we present the recovery analysis for matrices with general structures, under the setting\nof sub-Gaussian measurement and noise. Base on generic chaining and Gaussian width, the recovery\nguarantees can be succinctly summarized in terms of some geometric measures. For the class\nof unitarily invariant norms, we also provide novel general bounds of these measures, which can\nsigni\ufb01cantly facilitate the analysis in future.\n\nAcknowledgements\nThe research was supported by NSF grants IIS-1563950, IIS-1447566, IIS-1447574, IIS-1422557,\nCCF-1451986, CNS- 1314560, IIS-0953274, IIS-1029711, NASA grant NNX12AQ39A, and gifts\nfrom Adobe, IBM, and Yahoo.\n\n8\n\n\fReferences\n[1] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp. Living on the edge: Phase transitions in convex\n\nprograms with random data. Inform. Inference, 3(3):224\u2013294, 2014.\n\n[2] A. Argyriou, R. Foygel, and N. Srebro. Sparse prediction with the k-support norm. In NIPS, 2012.\n[3] A. Banerjee, S. Chen, F. Fazayeli, and V. Sivakumar. Estimation with norm regularization. In NIPS, 2014.\n[4] R. Bhatia. Matrix Analysis. Springer, 1997.\n[5] J. Bourgain, S. Dirksen, and J. Nelson. Toward a uni\ufb01ed theory of sparse dimensionality reduction in\n\nEuclidean space. Geometric and Functional Analysis, 25(4):1009\u20131088, 2015.\n\n[6] T. T. Cai, T. Liang, and A. Rakhlin. Geometrizing Local Rates of Convergence for High-Dimensional\n\nLinear Inverse Problems. arXiv:1404.4408, 2014.\n\n[7] T. T. Cai and A. Zhang. ROP: Matrix recovery via rank-one projections. The Annals of Statistics,\n\n[8] E. Cand\u00e8s and B. Recht. Exact matrix completion via convex optimization. Communications of the ACM,\n\n[9] E. J. Cand\u00e8s, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM,\n\n43(1):102\u2013138, 2015.\n\n55(6):111\u2013119, 2012.\n\n58(3):11:1\u201311:37, 2011.\n\n[10] E. J. Cand\u00e8s and Y. Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of\n\nnoisy random measurements. IEEE Transactions on Information Theory, 57(4):2342\u20132359, 2011.\n\n[11] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse\n\nproblems. Foundations of Computational Mathematics, 12(6):805\u2013849, 2012.\n\n[12] S. Chatterjee, S. Chen, and A. Banerjee. Generalized dantzig selector: Application to the k-support norm.\n\nIn Advances in Neural Information Processing Systems (NIPS), 2014.\n\n[13] S. Chen and A. Banerjee. Structured estimation with atomic norms: General bounds and applications. In\n\nNIPS, pages 2908\u20132916, 2015.\n\n43(1):129\u2013159, 2001.\n\n[14] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Rev.,\n\n[15] S. Dirksen. Dimensionality reduction with subgaussian matrices: a uni\ufb01ed theory. arXiv:1402.3973, 2014.\n[16] S. Dirksen. Tail bounds via generic chaining. Electron. J. Probab., 20, 2015.\n[17] M. A. T. Figueiredo and R. D. Nowak. Ordered weighted l1 regularized regression with strongly correlated\n\ncovariates: Theoretical aspects. In AISTATS, 2016.\n\n[18] Y. Gordon. Some inequalities for Gaussian processes and applications. Israel Journal of Mathematics,\n\n50(4):265\u2013289, 1985.\n\n[19] Y. Gordon. On Milman\u2019s inequality and random subspaces which escape through a mesh in Rn. In\nGeometric Aspects of Functional Analysis, volume 1317 of Lecture Notes in Mathematics, pages 84\u2013106.\nSpringer, 1988.\n\n[20] S. Gunasekar, A. Banerjee, and J. Ghosh. Uni\ufb01ed view of matrix completion under general structural\n\nconstraints. In NIPS, pages 1180\u20131188, 2015.\n\n[21] S. Gunasekar, P. Ravikumar, and J. Ghosh. Exponential family matrix completion under structural\n\nconstraints. In International Conference on Machine Learning (ICML), 2014.\n\n[22] Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer,\n\n42(8):30\u201337, 2009.\n\n2(1-2):173\u2013183, 1995.\n\n[23] A. S. Lewis. The Convex Analysis of Unitarily Invariant Matrix Functions. Journal of Convex Analysis,\n\n[24] A. M. McDonald, M. Pontil, and D. Stamos. Spectral k-support norm regularization. In NIPS, 2014.\n[25] S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann. Reconstruction and subGaussian operators in\n\nasymptotic geometric analysis. Geometric and Functional Analysis, 17:1248\u20131282, 2007.\n\n[26] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A uni\ufb01ed framework for the analysis of\n\nregularized M-estimators. Statistical Science, 27(4):538\u2013557, 2012.\n\n[27] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via\n\nnuclear norm minimization. SIAM Review, 52(3):471\u2013501, 2010.\n\n[28] V. Sivakumar, A. Banerjee, and P. Ravikumar. Beyond sub-gaussian measurements: High-dimensional\n\nstructured estimation with sub-exponential designs. In NIPS, pages 2206\u20132214, 2015.\n\n[29] M. Talagrand. A simple proof of the majorizing measure theorem. Geometric & Functional Analysis\n\nGAFA, 2(1):118\u2013125, 1992.\n\n[30] M. Talagrand. Upper and Lower Bounds for Stochastic Processes. Springer, 2014.\n[31] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Y. Eldar and G. Kutyniok,\n\neditors, Compressed Sensing, chapter 5, pages 210\u2013268. Cambridge University Press, 2012.\n\n[32] X. Zhang, Y. Yu, and D. Schuurmans. Polar operators for structured sparse estimation. In NIPS, 2013.\n[33] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal\n\nStatistical Society, Series B, 67:301\u2013320, 2005.\n\n[34] O. Zuk and A. Wagner. Low-rank matrix recovery from row-and-column af\ufb01ne measurements.\n\nIn\n\nInternational Conference on Machine Learning (ICML), 2015.\n\n9\n\n\f", "award": [], "sourceid": 1620, "authors": [{"given_name": "Sheng", "family_name": "Chen", "institution": "University of Minnesota"}, {"given_name": "Arindam", "family_name": "Banerjee", "institution": "University of Minnesota"}]}