{"title": "Robust Subspace Approximation in a Stream", "book": "Advances in Neural Information Processing Systems", "page_first": 10683, "page_last": 10693, "abstract": "We study robust subspace estimation in the streaming and distributed settings. Given a set of n data points {a_i}_{i=1}^n in R^d and an integer k, we wish to find a linear subspace S of dimension k for which sum_i M(dist(S, a_i)) is minimized, where dist(S,x) := min_{y in S} |x-y|_2, and M() is some loss function. When M is the identity function, S gives a subspace that is more robust to outliers than that provided by the truncated SVD. Though the problem is NP-hard, it is approximable within a (1+epsilon) factor in polynomial time when k and epsilon are constant.\n\tWe give the first sublinear approximation algorithm for this problem in the turnstile streaming and arbitrary partition distributed models, achieving the same time guarantees as in the offline case. Our algorithm is the first based entirely on oblivious dimensionality reduction, and significantly simplifies prior methods for this problem, which held in neither the streaming nor distributed models.", "full_text": "Robust Subspace Approximation in a Stream\n\nRoie Levin1\n\nroiel@cs.cmu.edu\n\nAnish Sevekari2\n\nasevekar@andrew.cmu.edu\n\nDavid P. Woodruff1\n\ndwoodruf@cs.cmu.edu\n\n1 Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213\n\n2 Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA 15213\n\nAbstract\n\n\u2211\n\nWe study robust subspace estimation in the streaming and distributed settings.\nGiven a set of n data points faign\ni=1 in Rd and an integer k, we wish to \ufb01nd a lin-\near subspace S of dimension k for which\ni M (dist(S; ai)) is minimized, where\n2, and M ((cid:1)) is some loss function. When M is the\ndist(S; x) := miny2S \u2225x (cid:0) y\u2225\nidentity function, S gives a subspace that is more robust to outliers than that pro-\nvided by the truncated SVD. Though the problem is NP-hard, it is approximable\nwithin a (1 + \u03f5) factor in polynomial time when k and \u03f5 are constant. We give the\n\ufb01rst sublinear approximation algorithm for this problem in the turnstile streaming\nand arbitrary partition distributed models, achieving the same time guarantees as\nin the of\ufb02ine case. Our algorithm is the \ufb01rst based entirely on oblivious dimension-\nality reduction, and signi\ufb01cantly simpli\ufb01es prior methods for this problem, which\nheld in neither the streaming nor distributed models.\n\n1\n\nIntroduction\n\n\u2211\nA fundamental problem in large-scale machine learning is that of subspace approximation. Given\na set of n data points faign\ni=1 in Rd and an integer k, we wish to \ufb01nd a linear subspace S of\ni M (dist(S; ai)) is minimized, where dist(S; x) := miny2S \u2225x (cid:0) y\u2225\ndimension k for which\n2,\nand M ((cid:1)) is some loss function. When M ((cid:1)) = ((cid:1))2, this is the well-studied least squares subspace\napproximation problem. The minimizer in this case can be computed exactly by computing the\ntruncated SVD of the data matrix.\nOtherwise M is often chosen from ((cid:1))p for some p (cid:21) 0, or from a class of functions called M-\nestimators, with the goal of providing a more robust estimate than least squares in the face of outliers.\nIndeed, for p < 2, since one is not squaring the distances to the subspace, one is placing less\nemphasis on outliers and therefore capturing more of the remaining data points. For example, when\nM is the identity function, we are \ufb01nding a subspace so as to minimize the sum of distances to it,\nwhich could arguably be more natural than \ufb01nding a subspace so as to minimize the sum of squared\ndistances. We can write this problem in the following form:\n\nmin\nS dim k\n\ni\n\ndist(S; ai) = min\nX rank k\n\n\u2225(A (cid:0) AX)i(cid:3)\u2225\n\n2\n\nwhere A is the matrix in which the i-th row is the vector ai. This is the form of robust subspace\napproximation that we study in this work. We will be interested in the approximate version of the\n\u2032 for which with high probability,\nproblem for which the goal is to output a k-dimensional subspace S\n\ndist(S\n\n\u2032\n\n; ai) (cid:20) (1 + \u03f5)\n\ndist(S; ai)\n\n(1)\n\ni\n\ni\n\nThe particular form with M equal to the identity was introduced to the machine learning commu-\nnity by Ding et al. [10], though these authors employed heuristic solutions. The series of work in\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\u2211\n\n\u2211\n\n\u2211\n\ni\n\n\u2211\n\n\f[7],[15] and [8, 12, 20, 5] shows that if M ((cid:1)) = j (cid:1) jp for p \u0338= 2, there is no algorithm that out-\nputs a (1 + 1= poly(d)) approximation to this problem unless P = NP. However, [5] also show\nthat for any p there is an algorithm that runs in O(nnz(A) + (n + d) poly(k=\u03f5) + exp(poly(k=\u03f5))\n(\ntime and outputs a k-dimensional subspace whose cost is within a (1 + \u03f5) factor of the opti-\nmal solution cost. This provides a considerable computational savings since in most applica-\ntions k \u226a d \u226a n. Their work builds upon techniques developed in [13] and [11] which give\ntime algorithms for the p (cid:21) 1 case. These in turn build\nO\non the weak coreset construction of [9]. In other related work [6] give algorithms for performing\nregression with a variety of M-estimator loss functions.\n\nnd (cid:1) poly(k=\u03f5) + exp\n\n(k=\u03f5)O(p)\n\n))\n\n(\n\nOur Contributions. We give the \ufb01rst sketching-based solution to this problem. Namely, we show\nit suf\ufb01ces to compute Z (cid:1) A, where Z is a poly(log(nd)k\u03f5\n(cid:0)1) (cid:2) n random matrix with entries\nchosen obliviously to the entries of A. The matrix Z is a block matrix with blocks consisting of\nindependent Gaussian entries, while other blocks consist of independent Cauchy random variables,\nand yet other blocks are sparse matrices with non-zero entries in f(cid:0)1; 1g. Previously such sketching-\nbased solutions were known only for M ((cid:1)) = ((cid:1))2. Prior algorithms [8, 12, 20, 5] also could not be\nimplemented as single-shot sketching algorithms since they require \ufb01rst making a pass over the data\nto obtain a crude approximation, and then using (often adaptive) sampling methods in future passes\nto re\ufb01ne to a (1 + \u03f5)-approximation. Our sketching-based algorithm, achieving O(nnz(A) + (n +\n(cid:0)1)) time, matches the running time of previous algorithms and\nd) poly(log(nd)k=\u03f5)+exp(poly(k\u03f5\nhas considerable bene\ufb01ts as described below.\nStreaming Model. Since Z is linear and oblivious, one can maintain Z(cid:1)A in the presence of insertions\nand deletions to the entries of A. Indeed, given the update Ai;j Ai;j + \u2206 for some \u2206 2 R,\nwe simply update the j-th column ZAj in our sketch to ZAj + \u2206 (cid:1) Z (cid:1) ei, where ei is the i-th\nstandard unit vector. Also, the entries of Z can be represented with limited independence, and\nso Z can be stored with a short random seed. Consequently, we obtain the \ufb01rst algorithm with\n(cid:0)1) memory for this problem in the standard turnstile data stream model [19].\nd poly(log(nd)k\u03f5\nIn this model, A 2 Rn(cid:2)d is initially the zero matrix, and we receive a stream of updates to A\nwhere the i-th update is of the form (xi; yi; ci), which means that Axi;yi should be incremented\n\u2032 which is a\nby ci. We are allowed one pass over the stream, and should output a rank-k matrix X\n(cid:20)\n(1 + \u03f5) approximation to the robust subspace estimation problem, namely\n2 : The space complexity of the algorithm is the total number\n(1 + \u03f5) minX rank k\nof words required to store this information during the stream. Here, each word is O(log(nd)) bits.\n(cid:0)1) memory, and so only logarithmically depends on n.\nOur algorithm achieves d poly(log(nd)k\u03f5\nThis is comparable to the memory of streaming algorithms when M ((cid:1)) = ((cid:1))2 [3, 14], which is the\nonly prior case for which streaming algorithms were known.\nDistributed Model. Since our algorithm maintains Z (cid:1) A for an oblivious linear sketch Z, it is\nparallelizable, and can be used to solve the problem in the distributed setting in which there are s\ni=1 Ai. This is called the arbitrary par-\nmachines holding A1; A2; : : : ; As, respectively, and A =\ntition model [17]. In this model, we can solve the problem in one round with s(cid:1)d poly(log(nd)k\u03f5\n(cid:0)1)\ncommunication by having each machine agree upon (a short seed describing) Z, and sending ZAi\nto a central coordinator who computes and runs our algorithm on Z (cid:1) A =\ni ZAi. The arbitrary\npartition model is stronger than the so-called row partition model, in which the points (rows of A)\nare partitioned across machines. For example, if each machine corresponds to a shop, the rows of A\ncorrespond to customers, the columns of A correspond to items, and Ai\nc;d indicates how many times\ncustomer c purchased item d at shop i, then the row partition model requires customers to make pur-\nchases at a single shop. In contrast, in the arbitrary partition model, customers can purchase items\nat multiple shops.\n\n\u2225(A (cid:0) AX)i(cid:3)\u2225\n\n\u2225(A (cid:0) AX\n\n\u2211\n\n\u2211\n\n\u2211\n\n\u2032\n\n)i(cid:3)\u2225\n\n2\n\ni\n\ni\n\n\u2211\n\ns\n\n2 Notation and Terminology\n\nFor a matrix A, let Ai(cid:3) denote the i-th row of A, and A(cid:3)j denote the j-th column of A.\n\n2\n\n\f\u2211\nDe\ufb01nition 2.1 (\u2225(cid:1)\u2225\n2;1, \u2225(cid:1)\u2225\n\u221a\u2211\n\u2225Ai(cid:3)\u2225\n\n\u2225A\u2225\n\n(cid:17)\n\n2;1\n\n2\n\ni\n\n1;2, \u2225(cid:1)\u2225\n1;1, \u2225(cid:1)\u2225\n\u2225A\u2225\n\n1;2\n\nmed;1, \u2225(cid:1)\u2225\n(cid:17) \u2225A\n\u2211\n2;1 =\n\n\u2211\nF ). For a matrix A 2 Rn(cid:2)m, let:\n\u2211\n\n\u2225A(cid:3)j\u2225\n\n\u22ba\u2225\n\n2\n\nj\n\n\u2225A\u2225\n\nF\n\n(cid:17)\n\n\u2225Ai(cid:3)\u22252\n\n2\n\n\u2225A\u2225\n\n1;1\n\n(cid:17)\n\n\u2225Ai(cid:3)\u2225\n\n1\n\n\u2225A\u2225\n\nmed;1\n\n(cid:17)\n\n\u2225A(cid:3)j\u2225\n\nmed\n\ni\n\ni\n\nj\n\nmed denotes the function that takes the median of absolute values.\n\nwhere \u2225(cid:1)\u2225\nDe\ufb01nition 2.2 (X\n\n(cid:3)). Let:\n\n(cid:3), \u2206\n(cid:3) (cid:17) min\n\nX rank k\n\n\u2206\n\n\u2225A (cid:0) AX\u2225\n\n2;1\n\nX\n\n(cid:3) (cid:17) argmin\n\n\u2225A (cid:0) AX\u2225\n\n2;1\n\nX rank k\n\nDe\ufb01nition 2.3 (((cid:11); (cid:12))-coreset). For a matrix A 2 Rn(cid:2)d and a target rank k, W is an ((cid:11); (cid:12))-\n(cid:3).\ncoreset if its row space is an (cid:11)-dimensional subspace of Rd that contains a (cid:12)-approximation to X\nFormally:\n\n\u2225A (cid:0) AXW\u2225\n\n(cid:20) (cid:12)\u2206\n\n(cid:3)\n\n2;1\n\nargmin\nX rank k\n\nDe\ufb01nition 2.4 (Count-Sketch Matrix). A random matrix S 2 Rr(cid:2)t is a Count-Sketch matrix if it\nis constructed via the following procedure. For each of the t columns S(cid:3)i, we \ufb01rst independently\nchoose a uniformly random row h(i) 2 f1; 2; :::; rg. Then, we choose a uniformly random element\nof f(cid:0)1; 1g denoted (cid:27)(i). We set Sh(i);i = (cid:27)(i) and set Sj;i = 0 for all j \u0338= i.\nFor the applications of Count-Sketch matrices in this paper, it suf\ufb01ces to use O(1)-wise instead\nof full independence for the hash and sign functions. Thus these can be stored in O(1) space,\nand multiplication SA can be computed in nnz(A) time. For more background on such sketching\nmatrices, we refer the reader to the monograph [22].\nWe also use the following notation: [n] denotes the set f1; 2; 3;(cid:1)(cid:1)(cid:1) ng. [[E]] denotes the indicator\n(cid:0) denotes the pseu-\nfunction for event E. nnz(A) denotes the number of non-zero entries of A. A\ndoinverse of A. I denotes the identity matrix.\n\n3 Algorithm Overview\n\nAt a high level we follow the framework put forth in [5] which gives the \ufb01rst input sparsity time\nalgorithm for the robust subspace approximation problem. In their work Clarkson and Woodruff \ufb01rst\n\ufb01nd a crude (poly(k); K)-coreset for the problem. They then use a non-adaptive implementation\nof a residual sampling technique from [9] to improve the approximation quality but increase the\ndimension, yielding a (K poly(k); 1 + \u03f5)-coreset. From here they further use dimension reducing\nsketches to reduce to an instance with parameters that depend only polynomially on k=\u03f5. Finally\nthey pay a cost exponential only in poly(k=\u03f5) to solve the small problem via a black box algorithm\nof [2].\nThere are several major obstacles to directly porting this technique to the streaming setting. For\none, the construction of the crude approximation subspace uses leverage score sampling matrices\nwhich are non-oblivious and thus not usable in 1-pass turnstile model algorithms. We circumvent\nthis dif\ufb01culty in Section 4.1 by showing that if T is a sparse poly(k) (cid:2) n matrix of Cauchy random\nvariables, the row span of T A contains a rank-k matrix which is a log(d) poly(k) approximation to\nthe best rank-k matrix under the \u2225(cid:1)\u2225\nSecond, the residual sampling step requires sampling rows of A with respect to probabilities propor-\ntional to their distance to the crude approximation (in our case T A). This is challenging because\none does not know T A until the end of the stream, much less the distances of rows of A to T A.\nWe handle this in Section 4.2 using a row-sampling data structure of [18] developed for regression,\nwhich for a matrix B maintains a sketch HB in a stream from which one can extract samples of rows\nof B according to probabilities given by their norms. By linearity, it suf\ufb01ces to maintain HA and\nT A in parallel in the stream, and apply the sample extraction procedure to HA (cid:1) (I (cid:0) PT A), where\n(cid:0)1T A is the projection onto the rowspace of T A. Unfortunately, the\n\u22ba\nPT A = (T A)\n)\nextraction procedure only returns noisy perturbations of the original rows which majorly invalidates\nthe analysis in [5] of the residual sampling. In Section 4.2 we give an analysis of non-adaptive noisy\n\n2;1 norm.\n\n(T A(T A)\n\n\u22ba\n\n3\n\n\fp\n\np\n\nlog d(log log d) poly(k) approximation to the \u2225(cid:1)\u2225\n\nresidual sampling which we name BOOTSTRAPCORESET. This gives a procedure for transforming\nour poly(k)-dimensional space containing a poly(k) log(d) approximation into a poly(k) log(d)-\ndimensional space containing a 3=2 factor approximation.\nThird, requiring the initial crude approximation to be oblivious yields a coarser log(d) poly(k) initial\napproximation than the constant factor approximation of [5]. Thus the dimension of the subspace\nafter residual sampling is poly(k) log(d). Applying dimension reduction techniques reduces the\nproblem to an instance with poly(k) rows and log(d) poly(k) columns. Here the black box algo-\nrithm of [2] would take time dpoly(k) which is no longer \ufb01xed parameter tractable as desired. Our\nkey insight is that \ufb01nding the best rank-k matrix under the Frobenius norm, which can be done ef\ufb01-\nciently, is a\n2;1 norm minimizer. From here we\ncan repeat the residual sampling argument which this time yields a small instance with poly(k) rows\nby\nlog d(log log d) poly(k=\u03f5) columns. Sublogarithmic in d makes all the difference and now enu-\nmerating can be done in time (n + d) poly(k=\u03f5) + exp(poly(k=\u03f5). All this is done in parallel in a\nsingle pass of the stream.\nLastly, the sketching techniques applied after the residual sampling are not oblivious in [5]. We\ninstead use an obvlious median based embedding in Section 5.1, and show that we can still use the\nblack box algorithm of [2] to \ufb01nd the minimizer under the \u2225(cid:1)\u2225\nWe present our results as two algorithms for the robust subspace approximation problem. The \ufb01rst\nruns in fully polynomial time but gives a coarse approximation guarantee, which corresponds to\nstopping before repeating the residual sampling a second time. The second algorithm captures the\nentire procedure, and uses the \ufb01rst as a subroutine.\n\nmed;1 norm in Section 5.2.\n\nAlgorithm 1 COARSEAPPROX\n\nInput: A 2 Rn(cid:2)d as a stream\nOutput: X 2 Rd(cid:2)d such that \u2225A (cid:0) AX\u2225\n\n(cid:20) p\n\n2;1\n\n(cid:3)\nlog d(log log d) poly(k)\u2206\n\n1: T 2 Rpoly(k)(cid:2)n Sparse Cauchy matrix // as in Thm. 4.1\n2: C1 2 Rpoly(k)(cid:2)n Sparse Cauchy matrix // as in Thm. 4.4\n3: S1 2 Rlog d(cid:1)poly(k)(cid:2)d Count Sketch composed with Gaussian // as in Thm. 4.3\n4: R1 2 Rpoly(k)(cid:2)d Count Sketch matrix // as in Thm. 4.3\n5: G1 2 Rlog d(cid:1)poly(k)(cid:2)log d(cid:1)poly(k) Gaussian matrix // as in Thm. 4.4\n6: Compute T A online\n7: Compute C1A online\n8: U\n9: ^X 2 Rpoly(k)(cid:2)log d poly(k) argminX rank k\n10: return R\nTheorem 3.1 (Coarse Approximation in Polynomial Time). Given a matrix A 2 Rn(cid:2)d, Algorithm\n1 with constant probability computes a rank k matrix X 2 Rd(cid:2)d such that:\nlog d(log log d) (cid:1) poly(k) (cid:1) \u2225A (cid:0) AX\n\n2 Rlog d poly(k)(cid:2)d BOOTSTRAPCORESET(A; T A; 1=2) // as in Alg. 3\n1 G1\u2225\n\u22ba\n\n\u2225C1(A (cid:0) AR\n\nF // as in Fact 4.2\n\n\u2225A (cid:0) AX\u2225\n\n\u22ba\n1 XU\n\n\u221a\n\n\u22ba\n1 )S\n\n^XU\n\n(cid:3)\u2225\n\n(cid:20)\n\n\u22ba\n1\n\n\u22ba\n1\n\n\u22ba\n\n2;1\n\n2;1\n\nthat runs in time O(nnz(A))+d poly(k log(nd)). Furthermore, it can be implemented as a one-pass\nstreaming algorithm with space O (d poly(k log(nd))) and time per update O(poly(log(nd)k)).\nProof Sketch We show the following are true in subsequent sections:\n\n1. The row span of T A is a (poly(k); log d (cid:1) poly(k))-coreset for A (Section 4.1) with proba-\n2. BOOTSTRAPCORESET(A; T A; 1=2) is a (log d (cid:1) poly(k); 3=2)-coreset with probability\n\nbility 24=25.\n\n49=50 (Section 4.2).\n\n3. If:\n\nthen with probability 47=50:\n\n\u2225C1AS\n\n^X = argmin\nX rank k\n\n(cid:13)(cid:13)(cid:13)A (cid:0) AR\n\n\u22ba\n1\n\n^XU\n\n\u22ba\n1\n\n(cid:13)(cid:13)(cid:13)\n\n\u22ba\n1 S\n\n\u22ba\n1 XU\n\n1 G1\u2225\n\u22ba\n\n1 G1 (cid:0) C1AR\n\u22ba\n\u221a\nlog d log log d (cid:1) \u2206\n\n(cid:20) poly(k)\n\nF\n\n(cid:3)\n\n2;1\n\n4\n\n\f(Sections 4.3 and 4.4, with \u03f5 = 1=2).\n\nBy a union bound, with probability 88/100 all the statements above hold, and the theorem is proved.\nBOOTSTRAPCORESET requires d poly(k log(nd)) space and time. Left matrix multiplications by\nSparse Cauchy matrices T A and C1A can be done in O(nnz(A)) time (see Section J of [21] for a\nfull description of Sparse Cauchy matrices). Computing remaining matrix products and ^X requires\ntime d poly(k log d).\n\nAlgorithm 2 (1 + \u03f5)-APPROX\n\nInput: A 2 Rn(cid:2)d as a stream\nOutput: X 2 Rd(cid:2)d such that \u2225A (cid:0) AX\u2225\n\n(cid:20) (1 + \u03f5)\u2206\n(cid:3)\n\n2;1\n\n1: ^X 2 Rpoly(k)(cid:2)log d poly(k) COARSEAPPROX(A) // as in Thm. 3.1\np\n2: C2 2 R\nlog d(log log d) poly(k=\u03f5)(cid:2)n Cauchy matrix // as in Thm. 5.1\np\nlog d(log log d)(cid:1)poly(k=\u03f5)(cid:2)d Count Sketch composed with Gaussian // as in Thm. 4.3\n3: S2 2 R\n4: R2 2 Rpoly(k=\u03f5)(cid:2)d Count Sketch matrix // as in Thm. 4.3\np\nlog d(log log d)(cid:1)poly(k=\u03f5)(cid:2)p\n5: G2 2 R\nlog d(log log d)(cid:1)poly(k=\u03f5) Gaussian matrix // as in Thm. 5.1\n\u22ba\n6: Compute AR\n2 online\n\u22ba\n2 online\n7: Compute AS\n8: Let V 2 Rlog d poly(k)(cid:2)k be such that ^X = W V\n9: U\n\nlog d log log d(cid:2)d BOOTSTRAPCORESET(A; V\n\n\u22ba is the rank-k decomposition of ^X\n\np\n\n\u22ba\n\n) // as in Alg. 3, U1 as\n\n\u22ba\n2\n\n2 Rpoly(k=\u03f5)\np\n\u2032 2 Rpoly(k=\u03f5)(cid:2)poly(k=\u03f5)\n\ncomputed during COARSEAPPROX in line 1.\n\n10: ^X\n\nlog d log log d argminX rank k\n\nU\n\n\u22ba\n1 ; \u03f5\n\n\u2032\n\u2225C2(A (cid:0) AR\n\n\u22ba\n2 XU\n\n\u22ba\n2 )S\n\n2 G2\u2225\n\u22ba\n\nmed;1\n\n// as in Thm. 5.2\n\u2032\u22ba\nU\n\n11: return R\n\n^X\n\n\u2032\n\n\u22ba\n2\n\nTheorem 3.2 ((1 + \u03f5)-Approximation). Given a matrix A 2 Rn(cid:2)d, Algorithm 2 with constant\nprobability computes a rank k matrix X 2 Rd(cid:2)d such that:\n)\n\n(cid:20) (1 + \u03f5)\u2225A (cid:0) AX\n(\n\n\u2225A (cid:0) AX\u2225\n\nthat runs in time\n\n))\n\n(\n\n(\n\n(cid:3)\u2225\n\n2;1\n\n2;1\n\nO(nnz(A)) + (n + d) poly\n\n+ exp\n\npoly\n\nk log(nd)\n\n\u03f5\n\nk\n\u03f5\n\nit can be implemented as a one-pass\n\nstreaming algorithm with space\n\nand time per update O(poly(log(nd)k=\u03f5)).\n\n(\n\n(\n\nFurthermore,\nO\n\nd poly\n\nk log(nd)\n\n\u03f5\n\n))\n\nProof Sketch We show the following are true in subsequent sections:\np\n\u22ba is a (poly(k); poly(k)\n\n1. If V is such that ^X = W V\n\n\u22ba, then V\nprobability 88/100 (Theorem 3.1).\n\u22ba\n\u2032\n\u22ba\n1 ; \u03f5\n\n2. BOOTSTRAPCORESET(A; V\n\nU\n\np\n)\n\n\u2032\n\nwith probability 49=50 (Reusing Section 4.2).\n\nlog d log log d)-coreset with\n\n) is a (poly(k=\u03f5\n\nlog d log log d; (1 + \u03f5\n\n\u2032\n\n))-coreset\n\n3. If:\n\n\u2032 argmin\n\n^X\n\n\u2225C2(A (cid:0) AR\n\n\u22ba\n2 XU\n\n\u22ba\n2 )S\n\n2 G2\u2225\n\u22ba\n\nmed;1\n\nthen with probability 19=20:(cid:13)(cid:13)(cid:13)A (cid:0) AR\n\nX\n\n(cid:13)(cid:13)(cid:13)\n\n\u22ba\n2\n\n^X\n\n\u2032\n\n\u22ba\n2\n\nU\n\n2;1\n\n(cid:20) (1 + O(\u03f5\n\n\u2032\n\n(cid:3)\n\n))\u2206\n\n(Reusing Section 4.3 and Section 5.1).\n\n4. A black box algorithm of [2] computes ^X\n\n\u2032 to within (1 + O(\u03f5\n\n\u2032\n\n)) (Section 5.2).\n\nBy a union bound, with probability 81/100 all the statements above hold. Setting \u03f5\nsmall as a function of \u03f5, the theorem is proved.\n\n\u2032 appropriately\n\n5\n\n\fCOARSEAPPROX and BOOTSTRAPCORESET together require d poly(k log(nd)=\u03f5) space and\n\u22ba\nO(nnz(A)) + d poly(k log(nd)=\u03f5) time. Right multiplication by the sketching matrices AS\n2 and\n\u2032 requires time\nAR\n(n+d) poly(log(d)k=\u03f5)+exp(poly(k=\u03f5)) (See end of Section 5.2 for details on this last bound).\n\n\u22ba\n2 can be done in time nnz(A). Computing remaining matrix products and ^X\n\nWe give further proofs and details of these theorems in subsequent sections. Refer to the full version\nof the paper for complete proofs.\n\n4 Coarse Approximation\n\nInitial Coreset Construction\n\n4.1\nWe construct a (poly(k); log d (cid:1) poly(k))-coreset which will serve as our starting point.\nTheorem 4.1. If T 2 Rpoly(k)(cid:2)n is a Sparse Cauchy matrix, then the row space of T A contains a\n\u2032 such that with probability 24=25:\nk dimensional subspace with corresponding projection matrix X\n2;1 = log d (cid:1) poly(k) (cid:1) \u2206\n\u2225A (cid:0) AX\u2225\n\n(cid:20) log d (cid:1) poly(k) min\n\n\u2225A (cid:0) AX\n\n\u2032\u2225\n\n2;1\n\n(cid:3)\n\nX rank k\n\n2;1 norm, here and several times elsewhere we make use of a\n\nIn order to deal with the awkward \u2225(cid:1)\u2225\nwell known theorem due to Dvoretzky to convert it into an entrywise 1-norm.\nFact 4.1 (Dvoretzky\u2019s Theorem (Special Case), Section 3.3 of [16]). There exists an appropriately\nscaled Gaussian Matrix G 2 Rd(cid:2) d log(1=\u03f5)\nthe following holds for all y 2 Rd\nsimultaneously\nG\u2225\n\nsuch that w.h.p.\n\u22ba\u2225\n2 (1 (cid:6) \u03f5)\u2225y\n\n\u2225y\n\n\u22ba\n\n\u03f52\n\nThus the rowspace of T A with T as in Theorem 4.1 above is a (poly(k); log d(cid:1) poly(k))-coreset for\nA.\n\n1\n\n2\n\n4.2 Bootstrapping a Coreset\n\nGiven a poor coreset Q for A, we now show how to leverage known results about residual sampling\nfrom [9] and [5] to obtain a better coreset of slightly larger dimension.\n\nAlgorithm 3 BOOTSTRAPCORESET\n\nInput: A 2 Rn(cid:2)d, Q 2 R(cid:11)(cid:2)d ((cid:11); (cid:12))-coreset, \u03f5 2 (0; 1)\nOutput: U 2 R((cid:11)+(cid:12) poly(k=\u03f5))(cid:2)d ((cid:11) + (cid:12) poly(k=\u03f5); (1 + \u03f5))-coresets\n])\n\n1: Compute HA online // as in Lem. 4.2.2\n2: P (cid:12) poly(k=\u03f5) samples of rows of A(I (cid:0) Q) according to P(HA(I (cid:0) Q)) // as in Lem.\n\n([\n\n4.2.2\n\u22ba Orthonormal basis for RowSpan\n\nQ\nP\n\n3: U\n4: return U\n\n\u22ba\n\nTheorem 4.2. Given Q, an ((cid:11); (cid:12))-coreset for A, with probability 49=50 BOOTSTRAPCORESET\nreturns an ((cid:11) + (cid:12) poly(k=\u03f5); (1 + \u03f5))-coreset for A. Furthermore BOOTSTRAPCORESET runs in\nspace and time O(d poly((cid:12) log(nd)k=\u03f5)), with poly((cid:12) log(nd)k=\u03f5) time per update in the stream-\ning setting.\n\n(cid:20) (cid:23) \u2225Bi\u2225\n\nProof. Consider the following idealized noisy sampling process that samples rows of a matrix B.\nand add an arbitrary noise vector Ei such that\nSample a row Bi of B with probability\n\u2225Ei\u2225\n100k(cid:12) . Supposing we had such a process\nP(cid:3)\nLemma 4.2.1. Suppose Q is an ((cid:11); (cid:12))-coreset for A, and P is a noisy subset of rows of the residual\nA(I (cid:0) Q) of size (cid:12)(poly k=\u03f5) each sampled according to P(cid:3)\n(A(I (cid:0) Q)). Then with probability\n\n2;1, where we \ufb01x the parameter (cid:23) = \u03f5\n\n(B), we can prove the following lemma.\n\n\u2225Bi\u2225\n\u2225B\u2225\n\n2;1\n\n2\n\n2\n\n6\n\n\f99=100, RowSpan(Q) [ RowSpan(P ) is an ((cid:11) + (cid:12) poly(k=\u03f5)) dimensional subspace containing\na k-dimensional subspace with corresponding projection matrix X\n\n\u2032 such that:\n\n\u2225A (cid:0) AX\n\n\u2032\u2225\n\n2;1\n\n(cid:20) (1 + \u03f5)\u2206\n(cid:3)\n\nIt remains to show that we can sample from P(cid:3) in a stream.\nLemma 4.2.2. Let B 2 Rn(cid:2)d be a matrix, and let (cid:14); (cid:23) 2 (0; 1) be given. Also let s be a given\ninteger. Then there is an oblivious sketching matrix H 2 Rpoly(s=((cid:14)(cid:23)))(cid:2)n and a sampling process\nP, such that P(HB) returns a collection of s\n= O(s) distinct row indices i1; : : : ; is\u2032 2 [n] and\n\u2032\napproximations ~Bij = Bij + Eij with \u2225Eij\n\u22252 for j = 1; : : : ; s. With probability\n\u22252 (cid:20) (cid:23) (cid:1) \u2225Bij\n)\n1 (cid:0) (cid:14) over the choice of H, the probability an index i appears in the sampled set fi1; : : : ; is\u2032g is at\nleast the probability that i appears in a set of s samples without replacement from the distribution\n. Furthermore the multiplication HB and sampling process P can be done in\n\u2225B\u22252;1\nnnz(B)+d(cid:1)poly(s=((cid:14)(cid:23))) time, and can be implemented in the streaming model with d(cid:1)poly(s=((cid:14)(cid:23)))\nbits of space.\n\n(\u2225B1;(cid:3)\u22252\n\n\u2225Bn;(cid:3)\u22252\n\u2225B\u22252;1\n\n; : : :\n\n100k(cid:12) and s = (cid:12) poly(k=\u03f5), it follows that P contains\nSetting b = log(nd), (cid:14) = 1=100, (cid:13) = (cid:23) = \u03f5\n(cid:12) poly(k=\u03f5) samples from P(cid:3)\n(A(I (cid:0) Q)) with probability 99=100. By Lemma 4.2.1 and a union\nbound, the projection matrix of RowSpan(Q) [ RowSpan(P ) is an ((cid:11) + (cid:12) poly(k=\u03f5); (1 + \u03f5))-\ncoreset for A with probability 49=50. BOOTSTRAPCORESET takes total time O(nnz(A)) +\nO(d poly((cid:12) log(nd)k=\u03f5)) and space O(d poly((cid:12) log(nd)k=\u03f5)).\nNote that in our main algorithm we cannot compute the projection A(I (cid:0) Q) until the after the\nstream is \ufb01nished. Fortunately, since H is oblivious, we can right multiply HA by (I (cid:0) Q) once Q\nis available, and only then perform the sampling procedure P.\n\n4.3 Right Dimension Reduction\n\nWe show how to reduce the right dimension of our problem. This result is used in both Algorithm 1\nand Algorithm 2.\n\u22ba is an ((cid:11); (cid:12))-coreset, S 2 R(cid:11)(cid:1)poly(k=\u03f5)(cid:2)d is a CountSketch matrix composed\nTheorem 4.3. If U\nwith a matrix of i.i.d. Gaussians, and R 2 Rd(cid:2)poly(k=\u03f5) is a CountSketch matrix, then with proba-\n\u2032\nbility 49=50, if X\n\n\u22ba\u2225\n\nXU\n\nS\n\n\u22ba\n\n\u22ba\n\n2;1 then:\n\n= argminX\n\u2225A (cid:0) AR\n\u22ba\n\nX\n\n\u22ba (cid:0) AR\n\u2225AS\n\u22ba\u2225\n\u2032\n\nU\n\n2;1\n\n(cid:20) (1 + O(\u03f5)) min\n\n\u2225A (cid:0) AXU\n\n\u22ba\u2225\n\n2;1\n\nX rank k\n\n4.4 Left Dimension Reduction\n\nWe show how to reduce the left dimension of our problem. Together with results from Section 4.3,\nthis preserves the solution to X\nTheorem 4.4. Suppose the matrices S1, R1 and U1 are as in Algorithm 1. If C1 2 Rpoly(k=\u03f5)(cid:2)n is\na Sparse Cauchy matrix, and G1 2 Rlog d poly(k=\u03f5)(cid:2)log d poly(k=\u03f5) is a matrix of appropriately scaled\ni.i.d. Gaussians (as in Fact 4.1), and\n\nlog d log log d (cid:1) poly(k=\u03f5) factor.\n\n(cid:3) to within a coarse\n\np\n\n^X = argmin\nX rank k\n\n\u2225C1AS\n\n(cid:13)(cid:13)(cid:13)AS\n\n\u22ba\n1\n\n(cid:0) AR\n\n\u22ba\n1\n\n^XU\n\n1 G1 (cid:0) C1AR\n\u22ba\n(cid:20) p\n\u22ba\n\u22ba\n1 S\n1\n\n(cid:13)(cid:13)(cid:13)\n\n\u22ba\n1 S\n\n1 G1\u2225\n\u22ba\n\n\u22ba\n1 XU\nlog d log log d (cid:1) poly(k=\u03f5) (cid:1) \u2206\n\nF\n\n(cid:3)\n\n2;1\n\nthen with probability 24=25:\n\nThe rank constrained Frobenius norm minimization problem above has a closed form solution.\nFact 4.2. For a matrix M, let UM (cid:6)M V\n\n\u22ba\nM be the SVD of M. Then:\n\n\u2225Y (cid:0) ZXW\u2225\n\nF = Z\n\n(cid:0)\n\n[UZU\n\n\u22ba\nZY VW V\n\n\u22ba\nW ]kW\n\n(cid:0)\n\nargmin\nX rank k\n\n7\n\n\f5\n\n(1 + \u03f5)-Approximation\n\n5.1 Left Dimension Reduction\n\nThe following median based embedding allows us to reduce the left dimension of our problem.\n(cid:3) to within a (1 + O(\u03f5))\nTogether with results from Section 4.3, this preserves the solution to X\nfactor.\np\nTheorem 5.1. Suppose S2, R2 and U2 are as in Algorithm 2. If C2 2 R\nlog d log log d poly(k=\u03f5)(cid:2)n is\na Cauchy matrix, and G2 2 R\nlog d log log d poly(k=\u03f5) is a matrix of appropri-\nately scaled i.i.d. Gaussians (as in Fact 4.1), and:\n\np\nlog d log log d poly(k=\u03f5)(cid:2)p\n\n\u2032\n\n^X\n\n= argmin\nX rank k\n\n\u2225C2AS\n\n2 G2 (cid:0) C2AR\n\u22ba\n\n\u22ba\n2 XU\n\n\u22ba\n2 S\n\n2 G2\u2225\n\u22ba\n\nmed;1\n\n(cid:20) (1 + \u03f5) min\n\nX rank k\n\n\u2225AS\n\n2 G2 (cid:0) AR\n\u22ba\n\n\u22ba\n2 XU\n\n\u22ba\n2 S\n\n2 G2\u2225\n\u22ba\n\n1;1\n\n1;1\n\n(cid:13)(cid:13)(cid:13)AS\n\nthen with probability 99=100:\n\u22ba\n2 S\n\n^X\n\n\u22ba\n2\n\nU\n\n\u2032\n\n\u22ba\n2 G2\n\n(cid:13)(cid:13)(cid:13)\n\n2 G2 (cid:0) AR\n\u22ba\n(\n\nProof. The following fact is known:\nFact 5.1 (Lemma F.1 from [1]). Let L be a t dimensional subspace of Rs. Let C 2 Rm(cid:2)s be a\nand i.i.d. standard Cauchy entries. With probability 99=100, for all\nmatrix with m = O\nx 2 L we have\n\n\u03f52 t log t\n\n1\n\n(cid:20) \u2225Cx\u2225\n\n1\n\nmed\n\n(cid:20) (1 + \u03f5)\u2225x\u2225\n\n1\n\n)\n(1 (cid:0) \u03f5)\u2225x\u2225\n\n\u03f5\n\n\u22ba\nThe theorem statement is simply the lemma applied to L = ColSpan ([AS\n2\n\nj AR\n\n\u22ba\n2 ]).\n\n5.2 Solving Small Instances\n\n\u2225Y (cid:0) ZXW\u2225\n\nGiven problems of the form ^X = argminX rank k\nmed;1, we leverage an algorithm for\nchecking the feasibility of a system of polynomial inequalities as a black box.\nLemma 5.1. [2] Given a set K = f(cid:12)1;(cid:1)(cid:1)(cid:1) ; (cid:12)sg of polynomials of degree d in k variables\nwith coef\ufb01cients in R, the problem of deciding whether there exist X1;(cid:1)(cid:1)(cid:1) Xk 2 R for which\n(cid:12)i(X1;(cid:1)(cid:1)(cid:1) ; Xk) (cid:21) 0 for all i 2 [s] can be solved deterministically with (sd)O(k) arithmetic op-\nerations over R.\nTheorem 5.2. Fix any \u03f5 2 (0; 1) and k 2 [0; min(m1; m2)]. Let Y 2 Rn(cid:2)m\n, Z 2 Rn(cid:2)m1, and\nbe any matrices. Let C 2 Rm\nW 2 Rm2(cid:2)m\n\u2032(cid:2)n be a matrix of i.i.d. Cauchy random variables, and\n\u2032\u2032\nG 2 Rm\n\u2032\u2032(cid:2)m\n\u2032\u2032\npoly(1=\u03f5) be a matrix of scaled i.i.d. Gaussian random variables. Then conditioned\non C satisfying Fact 5.1 for the adjoined matrix [Y; Z] and G satisfying the condition of Fact 4.1, a\nrank-k projection matrix X can be found that minimizes \u2225C(Y (cid:0) ZXW )G\u2225\nmed;1 up to a (1 + \u03f5)-\n\u2032\nfactor in time poly(m\n\n) poly(1=\u03f5), where m = max(m1; m2).\n\n\u2032\u2032\n=\u03f5)O(mk)+(m\n\n\u2032\u2032\nm\n\n\u2032\n+m\n\n\u2032\u2032\n\np\nWe remark that if, as we do in our algorithm, we set the all the parameters m, m\nbe log log d\n\n\u2032\u2032 to\nlog d (cid:1) poly(k=\u03f5), we can write the runtime of this step (Line 9 of Algorithm 2) as\nlog d=(log log d)2, then this step is captured\n\n(n+d) poly(k=\u03f5)+exp(poly(k=\u03f5))). If poly(k=\u03f5) (cid:20) p\n\n\u2032 and m\n\nin the (n + d) poly(k=\u03f5) term. Otherwise this step is captured in the exp(poly(k=\u03f5)) term.\n\n6 Experiments\n\nIn this section we empirically demonstrate the effectiveness of COARSEAPPROX compared to the\ntruncated SVD. We experiment on synthetic and real world data sets. Since the algorithm is random-\nized, we run it 20 times and take the best performing run. For a fair comparison, we use an input\nsparsity time approximate SVD as in [4].\nFor the synthetic data, we use two example matrices all of dimension 1000 (cid:2) 100. In Figure 1a we\nuse a Rank-3 matrix with additional large outlier noise. First we sample U random 100 (cid:2) 3 matrix\nand V random 3 (cid:2) 10 matrix. Then we create a random sparse matrix W with each entry nonzero\nwith probability 0:9999 and then scaled by a uniform random variable between 0 and 10000 (cid:1) n. We\n\n8\n\n\f(a) Random Rank-3 Matrix Plus Large Outliers\n\n(b) Large Outlier Rank-2 Matrix\n\n(c) Glass\n\n(d) E. Coli\n\nFigure 1: Comparison of Algorithm 1 on synthetic and real world examples.\n\nuse 10 (cid:1) U V + W . In Figure 1b we create a simple Rank-2 matrix with a large outlier. The \ufb01rst row\nis n followed by all zeros. All subsequent rows are 0 followed by all ones.\nWhile the approximation guarantee of COARSEAPPROX is weak, we \ufb01nd that it performs well\nagainst the SVD baseline in practice on our examples, namely when the data has large outliers\nrows. The second example in particular serves as a good demonstration of the robustness of the\n(2,1)-norm to outliers in comparison to the Frobenius norm. When k = 1, the truncated SVD which\nis the Frobenius norm minimizer recovers the \ufb01rst row of large magnitude, whereas our algorithm\nrecovers the subsequent rows. Note that both our algorithm and the SVD recover the matrix exactly\nwhen k is greater than or equal to rank.\nWe have additionally compared our algorithm against the SVD on two real world datasets from\nthe UCI Machine Learning Repository: Glass is a 214 (cid:2) 9 matrix representing attributes of glass\nsamples, and E.Coli is a 336 (cid:2) 7 matrix representing attributes of various proteins. For this\nset of experiments, we use a heuristic extension of our algorithm that performs well in prac-\ntice. After running COARSEAPPROX, we iterate solving Yt = minY \u2225CAS\nG (cid:0) Y Zt(cid:0)1\u22251;1 and\nZt = minZ \u2225CAS\nG (cid:0) YtZ\u22251;1 (via Iteratively Reweighted Least Squares for speed). Finally we\noutput the rank k Frobenius minimizer constrained to RowSpace(YtZt). In Figure 1c we consis-\ntently outperform the SVD by between 5% and 15% for nearly all k, and nearly match the SVD\notherwise. In Figure 1d we are worse than the SVD by no more than 5% for k = 1 to 4, and beat the\nSVD by up to 50% for k = 5 and 6. We have additionally implemented a greedy column selection\nalgorithm which performs worse than the SVD on all of our datasets.\n\n\u22ba\n\n\u22ba\n\nAcknowledgements: We would like to thank Ainesh Bakshi for many helpful discussions. D.\nWoodruff thanks partial support from the National Science Foundation under Grant No. CCF-\n1815840. Part of this work was also done while D. Woodruff was visiting the Simons Institute\nfor the Theory of Computing.\n\n9\n\n\fReferences\n[1] Arturs Backurs, Piotr Indyk, Ilya P. Razenshteyn, and David P. Woodruff. Nearly-optimal\nIn\n\nbounds for sparse recovery in generic norms, with applications to k-median sketching.\nSODA, 2016.\n\n[2] Saugata Basu, Richard Pollack, and Marie-Fran\u00e7oise Roy. On the combinatorial and algebraic\n\ncomplexity of quanti\ufb01er elimination. In J. ACM, 1994.\n\n[3] Kenneth L. Clarkson and David P. Woodruff. Numerical linear algebra in the streaming model.\nIn Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009,\nBethesda, MD, USA, May 31 - June 2, 2009, pages 205\u2013214, 2009.\n\n[4] Kenneth L. Clarkson and David P. Woodruff. Low rank approximation and regression in in-\nIn Proceedings of the Forty-\ufb01fth Annual ACM Symposium on Theory of\n\nput sparsity time.\nComputing, STOC \u201913, pages 81\u201390, New York, NY, USA, 2013. ACM.\n\n[5] Kenneth L. Clarkson and David P. Woodruff. Input sparsity and hardness for robust subspace\napproximation. 2015 IEEE 56th Annual Symposium on Foundations of Computer Science,\npages 310\u2013329, 2015.\n\n[6] Kenneth L. Clarkson and David P. Woodruff. Sketching for m-estimators: A uni\ufb01ed approach\n\nto robust regression. In SODA, 2015.\n\n[7] Amit Deshpande, Madhur Tulsiani, and Nisheeth K. Vishnoi. Algorithms and hardness for\n\nsubspace approximation. In SODA, 2011.\n\n[8] Amit Deshpande and Kasturi R. Varadarajan. Sampling-based dimension reduction for sub-\nspace approximation. In Proceedings of the 39th Annual ACM Symposium on Theory of Com-\nputing, San Diego, California, USA, June 11-13, 2007, pages 641\u2013650, 2007.\n\n[9] Amit Deshpande and Kasturi R. Varadarajan. Sampling-based dimension reduction for sub-\n\nspace approximation. In STOC, 2007.\n\n[10] Chris H. Q. Ding, Ding Zhou, Xiaofeng He, and Hongyuan Zha. R1-pca: rotational invariant\n\nl1-norm principal component analysis for robust subspace factorization. In ICML, 2006.\n\n[11] Dan Feldman and Michael Langberg. A uni\ufb01ed framework for approximating and clustering\n\ndata. In STOC, 2011.\n\n[12] Dan Feldman, Morteza Monemizadeh, Christian Sohler, and David P. Woodruff. Coresets and\nsketches for high dimensional subspace approximation problems. In Proceedings of the Twenty-\nFirst Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA,\nJanuary 17-19, 2010, pages 630\u2013649, 2010.\n\n[13] Dan Feldman, Morteza Monemizadeh, Christian Sohler, and David P. Woodruff. Coresets and\n\nsketches for high dimensional subspace approximation problems. In SODA, 2010.\n\n[14] Mina Ghashami, Edo Liberty, Jeff M. Phillips, and David P. Woodruff. Frequent directions:\n\nSimple and deterministic matrix sketching. SIAM J. Comput., 45(5):1762\u20131792, 2016.\n\n[15] Venkatesan Guruswami, Prasad Raghavendra, Rishi Saket, and Yi Wu. Bypassing ugc from\nsome optimal geometric inapproximability results. ACM Trans. Algorithms, 12:6:1\u20136:25,\n2010.\n\n[16] P. Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proceedings\nof the 42Nd IEEE Symposium on Foundations of Computer Science, FOCS \u201901, pages 10\u2013,\nWashington, DC, USA, 2001. IEEE Computer Society.\n\n[17] Ravi Kannan, Santosh Vempala, and David P. Woodruff. Principal component analysis and\nhigher correlations for distributed data. In Proceedings of The 27th Conference on Learning\nTheory, COLT 2014, Barcelona, Spain, June 13-15, 2014, pages 1040\u20131057, 2014.\n\n[18] Morteza Monemizadeh and David P. Woodruff. 1-pass relative-error lp-sampling with applica-\n\ntions. In SODA, 2010.\n\n10\n\n\f[19] S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in\n\nTheoretical Computer Science, 1(2), 2005.\n\n[20] Nariankadu D. Shyamalkumar and Kasturi R. Varadarajan. Ef\ufb01cient subspace approximation\n\nalgorithms. Discrete & Computational Geometry, 47(1):44\u201363, 2012.\n\n[21] Zhao Song, David P. Woodruff, and Peilin Zhong. Low rank approximation with entrywise\n\nl1-norm error. CoRR, abs/1611.00898, 2016.\n\n[22] David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends\n\nin Theoretical Computer Science, 10(1-2):1\u2013157, 2014.\n\n11\n\n\f", "award": [], "sourceid": 6793, "authors": [{"given_name": "Roie", "family_name": "Levin", "institution": "Carnegie Mellon University"}, {"given_name": "Anish Prasad", "family_name": "Sevekari", "institution": "Carnegie Mellon University"}, {"given_name": "David", "family_name": "Woodruff", "institution": "Carnegie Mellon University"}]}