{"title": "Learning the structure of manifolds using random projections", "book": "Advances in Neural Information Processing Systems", "page_first": 473, "page_last": 480, "abstract": "We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data.", "full_text": "Learning the structure of manifolds using random\n\nprojections\n\nYoav Freund \u2217\nUC San Diego\n\nSanjoy Dasgupta \u2020\n\nUC San Diego\n\nMayank Kabra\nUC San Diego\n\nNakul Verma\nUC San Diego\n\nAbstract\n\nWe present a simple variant of the k-d tree which automatically adapts to intrinsic\nlow dimensional structure in data.\n\n1 Introduction\n\nThe curse of dimensionality has traditionally been the bane of nonparametric statistics, as re\ufb02ected\nfor instance in convergence rates that are exponentially slow in dimension. An exciting way out of\nthis impasse is the recent realization by the machine learning and statistics communities that in many\nreal world problems the high dimensionality of the data is only super\ufb01cial and does not represent\nthe true complexity of the problem. In such cases data of low intrinsic dimension is embedded in a\nspace of high extrinsic dimension.\n\nFor example, consider the representation of human motion generated by a motion capture system.\nSuch systems typically track marks located on a tight-\ufb01tting body suit. The number of markers, say\nN, is set suf\ufb01ciently large in order to get dense coverage of the body. A posture is represented by a\n(3N)-dimensional vector that gives the 3D location of each of the N marks. However, despite this\nseeming high dimensionality, the number of degrees of freedom is relatively small, corresponding\nto the dozen-or-so joint angles in the body. The marker positions are more or less deterministic\nfunctions of these joint angles. Thus the data lie in R3N , but on (or very close to) a manifold [4] of\nsmall dimension.\n\nIn the last few years, there has been an explosion of research investigating methods for learning in\nthe context of low-dimensional manifolds. Some of this work (for instance, [2]) exploits the low\nintrinsic dimension to improve the convergence rate of supervised learning algorithms. Other work\n(for instance, [12, 11, 1]) attempts to \ufb01nd an embedding of the data into a low-dimensional space,\nthus \ufb01nding an explicit mapping that reduces the dimensionality.\nIn this paper, we describe a new way of modeling data that resides in RD but has lower intrinsic\ndimension d < D. Unlike many manifold learning algorithms, we do not attempt to \ufb01nd a single\nuni\ufb01ed mapping from RD to Rd. Instead, we hierarchically partition RD into pieces in a manner\nthat is provably sensitive to low-dimensional structure. We call this spatial data structure a random\nprojection tree (RP tree). It can be thought of as a variant of the k-d tree that is provably manifold-\nadaptive.\n\nk-d trees, RP trees, and vector quantization\nRecall that a k-d tree [3] partitions RD into hyperrectangular cells. It is built in a recursive manner,\nsplitting along one coordinate direction at a time. The succession of splits corresponds to a binary\ntree whose leaves contain the individual cells in RD. These trees are among the most widely-used\nmethods for spatial partitioning in machine learning and computer vision.\n\n\u2217Corresponding author: yfreund@cs.ucsd.edu.\n\u2020Dasgupta and Verma acknowledge the support of NSF, under grants IIS-0347646 and IIS-0713540.\n\n1\n\n\fFigure 1: Left: A spatial partitioning of R2 induced by a k-d tree with three levels. The dots are data\nvectors; each circle represents the mean of the vectors in one cell. Right: Partitioning induced by an\nRP tree.\n\nOn the left part of Figure 1 we illustrate a k-d tree for a set of vectors in R2. The leaves of the tree\npartition RD into cells; given a query point q, the cell containing q is identi\ufb01ed by traversing down\nthe k-d tree. Each cell can be thought of as having a representative vector: its mean, depicted in the\n\ufb01gure by a circle. The partitioning together with these mean vectors de\ufb01ne a vector quantization\n(VQ) of R2: a mapping from R2 to a \ufb01nite set of representative vectors (called a \u201ccodebook\u201d in the\ncontext of lossy compression methods). A good property of this tree-structured vector quantization\nis that a vector can be mapped ef\ufb01ciently to its representative. The design goal of VQ is to minimize\nthe error introduced by replacing vectors with their representative.\n\nWe quantify the VQ error by the average squared Euclidean distance between a vector in the set and\nthe representative vector to which it is mapped. This error is closely related (in fact, proportional) to\nthe average diameter of cells, that is, the average squared distance between pairs of points in a cell.1\nAs the depth of the k-d tree increases the diameter of the cells decreases and so does the VQ error.\nHowever, in high dimension, the rate of decrease of the average diameter can be very slow. In fact,\nas we show in the supplementary material, there are data sets in RD for which a k-d tree requires D\nlevels in order to halve the diameter. This slow rate of decrease of cell diameter is \ufb01ne if D = 2 as\nin Figure 1, but it is disastrous if D = 1000. Constructing 1000 levels of the tree requires 21000 data\npoints! This problem is a real one that has been observed empirically: k-d trees are prone to a curse\nof dimensionality.\n\nWhat if the data have low intrinsic dimension? In general, k-d trees will not be able to bene\ufb01t from\nthis; in fact the bad example mentioned above has intrinsic dimension d = 1. But we show that\na simple variant of the k-d tree does indeed decrease cell diameters much more quickly. Instead\nof splitting along coordinate directions, we use randomly chosen unit vectors, and instead of split-\nting data exactly at the median, we use a more carefully chosen split point. We call the resulting\ndata structure a random projection tree (Figure 1, right) and we show that it admits the following\ntheoretical guarantee (formal statement is in the next section).\n\nPick any cell C in the RP tree, and suppose the data in C have intrinsic dimension\nd. Pick a descendant cell \u2265 d levels below; then with constant probability, this\ndescendant has average diameter at most half that of C.2\n\nThere is no dependence at all on the extrinsic dimensionality (D) of the data. We thus have a\nvector quantization construction method for which the diameter of the cells depends on the intrinsic\ndimension, rather than the extrinsic dimension of the data.\n\nA large part of the bene\ufb01t of RP trees comes from the use of random unit directions, which is\nrather like running k-d trees with a preprocessing step in which the data are projected into a random\n\n1This is in contrast to the max diameter, the maximum distance between two vectors in a cell.\n2Here the probability is taken over the randomness in constructing the tree.\n\n2\n\n\flow-dimensional subspace. In fact, a recent experimental study of nearest neighbor algorithms [8]\nobserves that a similar pre-processing step improves the performance of nearest neighbor schemes\nbased on spatial data structures. Our work provides a theoretical explanation for this improvement\nand shows both theoretically and experimentally that this improvement is signi\ufb01cant. The explana-\ntion we provide is based on the assumption that the data has low intrinsic dimension.\n\nAnother spatial data structure based on random projections is the locality sensitive hashing scheme\n[6].\n\nManifold learning and near neighbor search\n\nThe fast rate of diameter decrease in random projection trees has many consequences beyond the\nquality of vector quantization. In particular, the statistical theory of tree-based statistical estimators\n\u2014 whether used for classi\ufb01cation or regression \u2014 is centered around the rate of diameter decrease;\nfor details, see for instance Chapter 20 of [7]. Thus RP trees generically exhibit faster convergence\nin all these contexts.\n\nAnother case of interest is nearest neighbor classi\ufb01cation. If the diameter of cells is small, then it\nis reasonable to classify a query point according to the majority label in its cell. It is not necessary\nto \ufb01nd the nearest neighbor; after all, the only thing special about this point is that it happens to be\nclose to the query. The classical work of Cover and Hart [5] on the Bayes risk of nearest neighbor\nmethods applies equally to the majority vote in a small enough cell.\n\nFigure 2: Distributions with low intrinsic dimension. The purple areas in these \ufb01gures indicate re-\ngions in which the density of the data is signi\ufb01cant, while the complementary white areas indicate ar-\neas where data density is very low. The left \ufb01gure depicts data concentrated near a one-dimensional\nmanifold. The ellipses represent mean+PCA approximations to subsets of the data. Our goal is to\npartition data into small diameter regions so that the data in each region is well-approximated by its\nmean+PCA. The right \ufb01gure depicts a situation where the dimension of the data is variable. Some of\nthe data lies close to a one-dimensional manifold, some of the data spans two dimensions, and some\nof the data (represented by the red dot) is concentrated around a single point (a zero-dimensional\nmanifold).\n\nFinally, we return to our original motivation: modeling data which lie close to a low-dimensional\nmanifold. In the literature, the most common way to capture this manifold structure is to create a\ngraph in which nodes represent data points and edges connect pairs of nearby points. While this is\na natural representation, it does not scale well to very large datasets because the computation time\nof closest neighbors grows like the square of the size of the data set. Our approach is fundamentally\ndifferent.\nInstead of a bottom-up strategy that starts with individual data points and links them\ntogether to form a graph, we use a top-down strategy that starts with the whole data set and partitions\nit, in a hierarchical manner, into regions of smaller and smaller diameter. Once these individual cells\nare small enough, the data in them can be well-approximated by an af\ufb01ne subspace, for instance that\ngiven by principal component analysis. In Figure 2 we show how data in two dimensions can be\napproximated by such a set of local ellipses.\n\n2 The RP tree algorithm\n\n2.1 Spatial data structures\nIn what follows, we assume the data lie in RD, and we consider spatial data structures built by\nrecursive binary splits. They differ only in the nature of the split, which we de\ufb01ne in a subroutine\n\n3\n\n\fcalled CHOOSERULE. The core tree-building algorithm is called MAKETREE, and takes as input a\ndata set S \u2282 RD.\n\nprocedure MAKETREE(S)\nif |S| < M inSize\nthen return (Leaf)\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3Rule \u2190 CHOOSERULE(S)\n\nelse\n\nLef tT ree \u2190 MAKETREE({x \u2208 S : Rule(x) = true})\nRightT ree \u2190 MAKETREE({x \u2208 S : Rule(x) = false})\nreturn ([Rule, Lef tT ree, RightT ree])\n\nA natural way to try building a manifold-adaptive spatial data structure is to split each cell along its\nprincipal component direction (for instance, see [9]).\n\nprocedure CHOOSERULE(S)\ncomment: PCA tree version\nlet u be the principal eigenvector of the covariance of S\nRule(x) := x \u00b7 u \u2264 median({z \u00b7 u : z \u2208 S})\nreturn (Rule)\n\nThis method will do a good job of adapting to low intrinsic dimension (details omitted). However,\nit has two signi\ufb01cant drawbacks in practice. First, estimating the principal eigenvector requires a\nsigni\ufb01cant amount of data; recall that only about 1/2k fraction of the data winds up at a cell at level\nk of the tree. Second, when the extrinsic dimension is high, the amount of memory and computation\nrequired to compute the dot product between the data vectors and the eigenvectors becomes the\ndominant part of the computation. As each node in the tree is likely to have a different eigenvector\nthis severely limits the feasible tree depth. We now show that using random projections overcomes\nthese problems while maintaining the adaptivity to low intrinsic dimension.\n\n2.2 Random projection trees\n\nWe shall see that the key bene\ufb01ts of PCA-based splits can be realized much more simply, by picking\nrandom directions. To see this pictorially, consider data that is concentrated on a subspace, as in the\nfollowing \ufb01gure. PCA will of course correctly identify this subspace, and a split along the principal\neigenvector u will do a good job of reducing the diameter of the data. But a random direction v will\nalso have some component in the direction of u, and splitting along the median of v will not be all\nthat different from splitting along u.\n\nFigure 3: Intuition: a random direction is almost as good as the principal eigenvector.\n\nNow only medians need to be estimated, not principal eigenvectors; this signi\ufb01cantly reduces the\ndata requirements. Also, we can use the same random projection in different places in the tree; all\nwe need is to choose a large enough set of projections that, with high probability, there is be a good\nprojection direction for each node in the tree. In our experience setting the number of projections\nequal to the depth of the tree is suf\ufb01cient. Thus, for a tree of depth k, we use only k projection\nvectors v, as opposed to 2k with a PCA tree. When preparing data to train a tree we can compute\nthe k projection values before building the tree. This also reduces the memory requirements for\nthe training set, as we can replace each high dimensional data point with its k projection values\n(typically we use 10 \u2264 k \u2264 20).\nWe now de\ufb01ne RP trees formally. For a cell containing points S, let \u2206(S) be the diameter of S (the\ndistance between the two furthest points in the set), and \u2206A(S) the average diameter, that is, the\n\n4\n\n\faverage distance between points of S:\n\nX\n\nX\n\n\u22062\n\nA(S) =\n\n1\n|S|2\n\nkx \u2212 yk2 =\n\n2\n|S|\n\nkx \u2212 mean(S)k2.\n\nx\u2208S\nWe use two different types of splits: if \u22062(S) is less than c\u22062\nA(S) (for some constant c) then we\nuse the hyperplane split discussed above. Otherwise, we split S into two groups based on distance\nfrom the mean.\n\nx,y\u2208S\n\nprocedure CHOOSERULE(S)\ncomment: RP tree version\nif \u22062(S) \u2264 c \u00b7 \u22062\n\nA(S)\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\nchoose a random unit direction v\nsort projection values: a(x) = v \u00b7 x \u2200x \u2208 S, generating the list a1 \u2264 a2 \u2264 \u00b7\u00b7\u00b7 \u2264 an\nfor i = 1, . . . , n \u2212 1 compute\nj=1 aj, \u00b52 = 1\n\u00b51 = 1\nn\u2212i\ni\n\n(\nPi\nPn\nj=1(aj \u2212 \u00b51)2 +Pn\nci =Pi\nj=i+1(aj \u2212 \u00b52)2\n\nj=i+1 aj\n\nthen\n\n\ufb01nd i that minimizes ci and set \u03b8 = (ai + ai+1)/2\nRule(x) := v \u00b7 x \u2264 \u03b8\n\nelse {Rule(x) := kx \u2212 mean(S)k \u2264 median{kz \u2212 mean(S)k : z \u2208 S}\nreturn (Rule)\n\nIn the \ufb01rst type of split, the data in a cell are projected onto a random direction and an appropriate\nsplit point is chosen. This point is not necessarily the median (as in k-d trees), but rather the position\nthat maximally decreases average squared interpoint distance. In Figure 4.4, for instance, splitting\nthe bottom cell at the median would lead to a messy partition, whereas the RP tree split produces\ntwo clean, connected clusters.\n\nFigure 4: An illustration of the RP-Tree algorithm. 1: The full data set and the PCA ellipse that\napproximates it. 2: The \ufb01rst level split. 3: The two PCA ellipses corresponding to the two cells after\nthe \ufb01rst split. 4: The two splits in the second level. 5: The four PCA ellipses for the cells at the third\nlevel. 6: The four splits at the third level. As the cells get smaller, their individual PCAs reveal 1D\nmanifold structure. Note: the ellipses are for comparison only; the RP tree algorithm does not look\nat them.\n\nThe second type of split, based on distance from the mean of the cell, is needed to deal with cases in\nwhich the cell contains data at very different scales. In Figure 2, for instance, suppose that the vast\nmajority of data is concentrated at the singleton \u201c0-dimensional\u201d point. If only splits by projection\nwere allowed, then a large number of splits would be devoted to uselessly subdividing this point\nmass. The second type of split separates it from the rest of the data in one go. For a more concrete\nexample, suppose that the data are image patches. A large fraction of them might be \u201cempty\u201d\nbackground patches, in which case they\u2019d fall near the center of the cell in a very tight cluster. The\n\n5\n\n\fremaining image patches will be spread out over a much larger space. The effect of the split is then\nto separate out these two clusters.\n\n2.3 Theoretical foundations\n\n1 \u2265 \u03c32\n\n2 \u2265 \u00b7\u00b7\u00b7 \u2265 \u03c32\n\nIn analyzing RP trees, we consider a statistical notion of dimension: we say set S has local covari-\nance dimension (d, \u0001) if (1\u2212 \u0001) fraction of the variance is concentrated in a d-dimensional subspace.\nTo make this precise, start by letting \u03c32\nD denote the eigenvalues of the covariance\nmatrix; these are the variances in each of the eigenvector directions.\nDe\ufb01nition 1 S \u2282 RD has local covariance dimension (d, \u0001) if the largest d eigenvalues of its\nD =\ncovariance matrix satisfy \u03c32\n(1/2)\u22062\nNow, suppose an RP tree is built from a data set X \u2282 RD, not necessarily \ufb01nite. Recall that there\nare two different types of splits; let\u2019s call them splits by distance and splits by projection.\n\nd \u2265 (1 \u2212 \u0001) \u00b7 (\u03c32\n\nD). (Note that \u03c32\n\nA(S).)\n\n1 + \u00b7\u00b7\u00b7 + \u03c32\n\n1 + \u00b7\u00b7\u00b7 + \u03c32\n\n1 + \u00b7\u00b7\u00b7 + \u03c32\n\nTheorem 2 There are constants 0 < c1, c2, c3 < 1 with the following property. Suppose an RP\ntree is built using data set X \u2282 RD. Consider any cell C for which X \u2229 C has local covariance\ndimension (d, \u0001), where \u0001 < c1. Pick a point x \u2208 S \u2229 C at random, and let C0 be the cell that\ncontains it at the next level down.\n\n\u2022 If C is split by distance then\n\nE [\u2206(S \u2229 C0)] \u2264 c2\u2206(S \u2229 C).\n\n\u2022 If C is split by projection, then\n\nA(S \u2229 C0)(cid:3) \u2264(cid:16)\nE(cid:2)\u22062\n\n(cid:17)\n\n1 \u2212 c3\nd\n\nA(S \u2229 C).\n\u22062\n\nIn both cases, the expectation is over the randomization in splitting C and the choice of\nx \u2208 S \u2229 C.\n\nAs a consequence, the expected average diameter of cells is halved every O(d) levels. The proof of\nthis theorem is in the supplementary material, along with even stronger results for different notions\nof dimension.\n\n3 Experimental Results\n\n3.1 A streaming version of the algorithm\n\nThe version of the RP algorithm we use in practice differs from the one above in three ways. First\nof all, both splits operate on the projected data; for the second type of split (split by distance), data\nthat fall in an interval around the median are separated from data outside that interval. Second,\nthe tree is built in a streaming manner: that is, the data arrive one at a time, and are processed (to\nupdate the tree) and immediately discarded. This is managed by maintaining simple statistics at\neach internal node of the tree and updating them appropriately as the data streams by (more details\nin the supplementary matter). The resulting ef\ufb01ciency is crucial to the large-scale applications we\nhave in mind. Finally, instead of choosing a new random projection in each cell, a dictionary of a\nfew random projections is chosen at the outset. In each cell, every one of these projections is tried\nout and the best one (that gives the largest decrease in \u22062\nA(S)) is retained. This last step has the\neffect of boosting the probability of a good split.\n\n3.2 Synthetic datasets\n\nWe start by considering two synthetic datasets that illustrate the shortcomings of k-d trees. We\nwill see that RP trees adapt well to such cases. For the \ufb01rst dataset, points x1, . . . , xn \u2208 RD are\ngenerated by the following process: for each point xi,\n\n6\n\n\fFigure 5: Performance of RP trees with k-d trees on \ufb01rst synthetic dataset (left) and the second\nsynthetic dataset (right)\n\n\u2022 choose pi uniformly at random from [0, 1], and\n\u2022 select each coordinate xij independently from N(pi, 1).\n\nFor the second dataset, we choose n points from two D-dimensional Gaussians (with equal proba-\nbility) with means at (\u22121,\u22121, . . . ,\u22121) and (1, 1, . . . , 1), and identity covariances.\nWe compare the performance of different trees according to the average VQ error they incur at\nvarious levels. We consider four types of trees: (1) k-d trees in which the coordinate for a split is\nchosen at random; (2) k-d trees in which at each split, the best coordinate is chosen (the one that\nmost improves VQ error); (3) RP trees; and (4) for reference, PCA trees.\nFigure 5 shows the results for the two datasets (D = 1,000 and n = 10,000) averaged over 15 runs.\nIn both cases, RP trees outperform both k-d tree variants and are close to the performance of PCA\ntrees without having to explicitly compute any principal components.\n\n3.3 MNIST dataset\n\nWe next demonstrate RP trees on the all-familiar MNIST dataset of handwritten digits. This dataset\nconsists of 28 \u00d7 28 grayscale images of the digits zero through nine, and is believed to have low\nintrinsic dimension (for instance, see [10]). We restrict our attention to digit 1 for this discussion.\nFigure 6 (top) shows the \ufb01rst few levels of the RP tree for the images of digit 1. Each node is\nrepresented by the mean of the datapoints falling into that cell. Hence, the topmost node shows the\nmean of the entire dataset; its left and the right children show the means of the points belonging to\ntheir respective partitions, and so on. The bar underneath each node shows the fraction of points\ngoing to the left and to the right, to give a sense of how balanced each split is. Alongside each mean,\nwe also show a histogram of the 20 largest eigenvalues of the covariance matrix, which reveal how\nclosely the data in the cell is concentrated near a low-dimensional subspace. The last bar in the\nhistogram is the variance unaccounted for.\n\nNotice that most of the variance lies in a small number of directions, as might be expected. And\nthis rapidly becomes more pronounced as we go further down in the tree. Hence, very quickly, the\ncell means become good representatives of the dataset: an experimental corroboration that RP trees\nadapt to the low intrinsic dimension of the data.\n\nThis is also brought out in Figure 6 (bottom), where the images are shown projected onto the plane\nde\ufb01ned by their top two principal components. (The outer ring of images correspond to the linear\ncombinations of the two eigenvectors at those locations in the plane.) The left image shows how the\ndata was split at the topmost level (dark versus light). Observe that this random cut is actually quite\nclose to what the PCA split would have been, corroborating our earlier intuition (recall Figure 3).\nThe right image shows the same thing, but for the \ufb01rst two levels of the tree: data is shown in four\ncolors corresponding to the four different cells.\n\n7\n\n1234595010001050110011501200125013001350LevelsAvg VQ Error k\u2212d Tree (random coord)k\u2212d Tree (max var coord)RP TreePCA Tree12345100012001400160018002000LevelsAvg VQ Error k\u2212d Tree (random coord)k\u2212d Tree (max var coord)RP TreePCA Tree\fFigure 6: Top: Three levels of the RP tree for MNIST digit 1. Bottom: Images projected onto the\n\ufb01rst two principal components. Colors represent different cells in the RP tree, after just one split\n(left) or after two levels of the tree (right).\n\nReferences\n[1] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.\n\nNeural Computation, 15(6):1373\u20131396, 2003.\n\n[2] M. Belkin, P. Niyogi, and V. Sindhwani. On manifold regularization. Conference on AI and Statistics,\n\n2005.\n\n[3] J. Bentley. Multidimensional binary search trees used for associative searching. Communications of the\n\nACM, 18(9):509\u2013517, 1975.\n\n[4] W. Boothby. An Introduction to Differentiable Manifolds and Riemannian Geometry. Academic Press,\n\n2003.\n\n[5] T. M. Cover and P. E. Hart. Nearest neighbor pattern classi\ufb01cations. IEEE Transactions on Information\n\nTheory, 13(1):21\u201327, 1967.\n\n[6] M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. Locality sensitive hashing scheme based on p-stable\n\ndistributions. Symposium on Computational Geometry, 2004.\n\n[7] L. Devroye, L. Gyor\ufb01, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1996.\n[8] T. Liu, A. Moore, A. Gray, and K. Yang. An investigation of practical approximate nearest neighbor\n\nalgorithms. Advances in Neural Information Processing Systems, 2004.\n\n[9] J. McNames. A fast nearest neighbor algorithm based on a principal axis search tree. IEEE Transactions\n\non Pattern Analysis and Machine Intelligence, 23(9):964\u2013976, 2001.\n\n[10] M. Raginsky and S. Lazebnik. Estimation of intrinsic dimensionality using high-rate vector quantization.\n\nAdvances in Neural Information Processing Systems, 18, 2006.\n\n[11] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science,\n\n290:2323\u20132326, 2000.\n\n[12] J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality\n\nreduction. Science, 290(5500):2319\u20132323, 2000.\n\n8\n\n\f", "award": [], "sourceid": 133, "authors": [{"given_name": "Yoav", "family_name": "Freund", "institution": null}, {"given_name": "Sanjoy", "family_name": "Dasgupta", "institution": null}, {"given_name": "Mayank", "family_name": "Kabra", "institution": null}, {"given_name": "Nakul", "family_name": "Verma", "institution": null}]}