{"title": "Random Projections with Asymmetric Quantization", "book": "Advances in Neural Information Processing Systems", "page_first": 10858, "page_last": 10867, "abstract": "The method of random projection has been a popular tool for data compression,\nsimilarity search, and machine learning. In many practical scenarios, applying\nquantization on randomly projected data could be very helpful to further reduce\nstorage cost and facilitate more efficient retrievals, while only suffering from\nlittle loss in accuracy. In real-world applications, however, data collected from\ndifferent sources may be quantized under different schemes, which calls for a need to study the asymmetric quantization problem. In this paper, we investigate the cosine similarity estimators derived in such setting under the Lloyd-Max (LM)\nquantization scheme. We thoroughly analyze the biases and variances of a series of estimators including the basic simple estimators, their normalized versions, and\ntheir debiased versions. Furthermore, by studying the monotonicity, we show that\nthe expectation of proposed estimators increases with the true cosine similarity,\non a broader family of stair-shaped quantizers. Experiments on nearest neighbor\nsearch justify the theory and illustrate the effectiveness of our proposed estimators.", "full_text": "Random Projections with Asymmetric Quantization\n\nXiaoyun Li\n\nxiaoyun.li@rutgers.edu\n\nDepartment of Statistics\n\nRutgers University\n\nPiscataway, NJ 08854\n\nPing Li\n\nCognitive Computing Lab\n\nBaidu Research USA\nBellevue, WA 98004\nliping11@baidu.com\n\nAbstract\n\nThe method of random projection has been a popular tool for data compression,\nsimilarity search, and machine learning. In many practical scenarios, applying\nquantization on randomly projected data could be very helpful to further reduce\nstorage cost and facilitate more ef\ufb01cient retrievals, while only suffering from\nlittle loss in accuracy. In real-world applications, however, data collected from\ndifferent sources may be quantized under different schemes, which calls for a need\nto study the asymmetric quantization problem. In this paper, we investigate the\ncosine similarity estimators derived in such setting under the Lloyd-Max (LM)\nquantization scheme. We thoroughly analyze the biases and variances of a series of\nestimators including the basic simple estimators, their normalized versions, and\ntheir debiased versions. Furthermore, by studying the monotonicity, we show that\nthe expectation of proposed estimators increases with the true cosine similarity,\non a broader family of stair-shaped quantizers. Experiments on nearest neighbor\nsearch justify the theory and illustrate the effectiveness of our proposed estimators.\n\nIntroduction\n\n1\nThe method of random projections (RP) [35] has become a popular technique to reduce data dimen-\nsionality while preserving distances between data points, as guaranteed by the celebrated Johnson-\nLindenstrauss (J-L) Lemma and variants [24, 12, 1]. Given a high dimensional dataset, the algorithm\nprojects each data point onto a lower-dimensional random subspace. There is a very rich literature of\nresearch on the theory and applications of random projections, such as clustering, classi\ufb01cation, near\nneighbor search, bio-informatics, compressed sensing, etc. [22, 10, 4, 6, 8, 17, 18, 28, 15, 7, 19, 11, 9].\nIn recent years, \u201crandom projections + quantization\u201d has been an active research topic. That is, the\nprojected data, which are in general real-valued (i.e., in\ufb01nite precision), are quantized into integers in\na small number of bits. Applying quantization on top of random projections has at least two major\nadvantages: (i) the storage cost is further (substantially) reduced; and (ii) some important applications\nsuch as hashing-table-based near neighbor search, require using quantized data for indexing purposes.\nThe pioneering example of quantized random projections should be the so-called \u201c1-bit\u201d (sign)\nrandom projections, initially used for analyzing the MaxCut problem [20] and then was adopted for\nnear neighbor search [8] and compressed sensing [5, 23, 25]. As one would expect, storing merely\n1-bit per projected data value in many situations might suffer from a substantial loss of accuracy,\ncompared to using random projections with full (in\ufb01nite) precision. There have been various studies\non (symmetrically) quantized random projections beyond the 1-bit scheme, e.g., [13, 37, 26, 29, 31].\nIn this paper, we further move to studying \u201casymmetric quantization\u201d of random projections, a\nrelatively new problem arising from practical scenarios which is also mathematically very interesting.\nEveryday, the process of data collection is taking place in every possible place that one can think\nof, but it is often impractical to cast a universal encoding strategy on data storage methods for every\nplace. As a consequence, it becomes a meaningful task to look into the estimation problems with\ndata encoded by different algorithms, or namely, the asymmetric case. In this paper, we provide\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fsome insights on this type of problems, and particularly, we consider recovering inner products from\nasymmetrically quantized random projections, arising from the following two practical scenarios.\n\n\u2022 Scenario 1: quantization vs. full-precision.\n\nConsider, for example, a retrieval system\nwhich uses random projections to process every data vector. To save storage, the projected\ndata stored in the repository are quantized into a small number of bits. When a query\ndata vector arrives, it is \ufb01rst processed by random projections. We then have the option of\nquantizing the projected query data vector before conducting the similarity search (with\nvectors in the repository); but we do not have to do the quantization step since we still have\nthe projected query data vector in full-precision (why waste?). This situation hence creates\nthe \u201cquantization vs. full-precision\u201d estimation problem. This setting is natural and practical,\nand the estimation problem has been studied in the literature, for example [14, 21, 27].\n\n\u2022 Scenario 2: quantization with different bits.\n\nIn applications such as large ad hoc\nnetworks [36, 30], data are collected and processed by different nodes (e.g., sensors or\nmobile devices) at different locations before sent to the central unit or cloud server. However,\ndistinct nodes may use different quantization methods (or different bits) due to many possible\nreasons, e.g., memory capacity or purpose of data usage. In this situation, information\nretrieval among data sources using different quantization schemes could be on the cards.\nAs a tightly related topic, asymmetric distributed source coding (with different bits from\ndifferent sources) has also been considered in [3, 34] among others for sensor networks.\n\nScenario 1 is in fact an important special case of Scenario 2, where one source of data is quantized\nwith in\ufb01nite bits. In this paper, we provide thorough statistical analysis on the above two scenarios.\nOur contributions. The major contributions of this paper include the following:\n\n\u2022 In Section 3, we provide the bias and variance of linear and normalized inner product\nestimators in Scenario 1. We reveal an interesting connection between the variance of\ndebiased inner product estimator and similarity search, which is very helpful in practice.\n\u2022 In Sections 4 and 5, we conduct statistical analysis in Scenario 2, and prove the monotonicity\nof a large family of asymmetric quantized inner product estimators, which assures their\nvalidity for practical use. A new bound on the bias is also derived in the symmetric case.\n\u2022 In Section 6, an empirical study on various real-world datasets con\ufb01rms the theoretical\n\n\ufb01ndings and well illustrates the effectiveness of proposed quantization schemes.\n\n2 Preliminaries\nRandom Projections. Let U = [u1, ..., un]T \u2208 Rn\u00d7d be the original data matrix (with d possibly\nbeing large). Random projections are realized by Z = [z1, ..., zn]T = U \u00d7 R, where R \u2208 Rd\u00d7k,\nk (cid:28) d is a random matrix with i.i.d. standard Gaussian entries. Let (cid:107)\u00b7(cid:107)2 denote the l2 Euclidean norm.\nThroughout this paper, we assume that every data point is normalized to unit norm1, i.e., (cid:107)ui(cid:107)2 = 1,\n1 \u2264 i \u2264 n. We will hence use the terms \u201cinner product\u201d and \u201ccosine similarity\u201d interchangeably.\nFor the convenience of presentation, our results (estimators and properties) will be given for two pairs\nof data vectors, ui and uj (and correspondingly zi and zj). Let \u03c1 = (cid:104)ui, uj(cid:105) be the inner product\nbetween ui and uj. We also denote x = zi and y = zj. It is then easy to verify that (x, y) is bi-variate\nnormal:\n\n(cid:18)x\n\n(cid:19)\n\ny\n\n\u223c N\n\n(cid:19)\n(cid:18)(cid:18)0\n\n(cid:18)1 \u03c1\n\n(cid:19)(cid:19)\n\n.\n\n,\n\n0\n\n\u03c1 1\n\nLloyd-Max (LM) quantization [33, 32]. Assume a random signal model with signals generated\nfrom a probability distribution with density X \u223c f. An M-level scalar quantizer qM (\u00b7) is speci\ufb01ed by\nM + 1 decision borders t0 < t1 < \u00b7\u00b7\u00b7 < tM and M reconstruction levels (or codes) \u00b5i, i = 1, ..., M,\nwith the quantizing operator de\ufb01ned as\n\nqM (x) = \u00b5i\u2217 , i\u2217 = {i : ti\u22121 < x \u2264 ti, 1 \u2264 i \u2264 M}.\n\n1Normalizing each data vector to the unit norm is a standard data preprocessing procedure for many\napplications such as clustering and classi\ufb01cation. In this paper, we adopt this assumption merely for convenience.\nWhen data is not normalized, our results still hold, although we will need to store the values of the norms.\n\n2\n\n(1)\n\n(2)\n\n\f(cid:90) ti\n\n2b(cid:88)\n\ni=1\n\nE(cid:0)(X \u2212 qb(X))2(cid:1) =\n\n(cid:90)\n\nThe \u201cdistortion\u201d is an important quantity that measures how much information is lost from the original\nsignal due to quantization. In this paper, we will also assume M = 2b, with b = 1, 2, ..., being the\nnumber of bits used for the quantizer. Thus, we will write qb(\u00b7) instead of qM (\u00b7).\nDe\ufb01nition 1. The distortion of a b-bit quantizer Qb(\u00b7) with respect to distribution f is de\ufb01ned as\n\n(x \u2212 qb(x))2f (x)dx =\n\n(x \u2212 \u00b5i)2f (x)dx.\n\n(3)\n\n2\u03c0\n\nti\u22121\ne\u2212x2/2 in the conventional notation\nIn this paper, f is the standard normal, i.e., f (x) = \u03c6(x) = 1\u221a\nfor Gaussian. Also, we will use Qb to denote Lloyd-Max (LM) quantizer which minimizes the\ndistortion and Db to denote the corresponding value of distortion:\n\nE(cid:0)(X \u2212 qb(X))2(cid:1) , Db = E(cid:0)(X \u2212 Qb(X))2(cid:1)\n(cid:18)(cid:18)0\n(cid:19) i.i.d.\u223c N\n(cid:19)\n\nb(X)) = E(Qb(X)X). In practice, Lloyd\u2019s algo-\nA basic identity of LM quantizer is that E(Q2\nrithm [32] is used to \ufb01nd the solution, which alternates between updating borders and reconstruction\npoints until convergence (and the convergence is guaranteed).\nEstimates using full-precision RP\u2019s. Consider observations\n0\n1 \u2264 i \u2264 k, as in (1). The task is to estimate \u03c1. One can use the usual simple estimator\n\n(cid:18)1 \u03c1\n\nQb = argmin\n\nq\n\n(cid:19)(cid:19)\n\n(cid:18)xi\n\nyi\n\n\u03c1 1\n\n(4)\n\n,\n\n,\n\n\u02c6\u03c1f =\n\n1\nk\n\nxiyi, with E(\u02c6\u03c1f ) = \u03c1,\n\nV ar(\u02c6\u03c1f ) =\n\n1 + \u03c12\n\nk\n\n.\n\n(5)\n\nwhere E(\u02c6\u03c1) is the expectation and V ar(\u02c6\u03c1) is the variance. Note that the variance grows as |\u03c1|\nincreases. One can take advantage of the following so-called \u201cnormalized estimator\u201d:\n(1 \u2212 \u03c12)2\n\n, E(\u02c6\u03c1f,n) = \u03c1 + O(\n\n), V ar(\u02c6\u03c1f,n) =\n\n\u02c6\u03c1f,n =\n\n+ O(\n\n1\nk2 ). (6)\n\nk\n\n(cid:80)k\n(cid:113)(cid:80)k\n\ni=1 xiyi\n\n(cid:113)(cid:80)k\n\ni=1 x2\ni\n\ni=1 y2\ni\n\n1\nk\n\n\u02c6\u03c1f,n is nearly unbiased but it substantially reduces the variance especially near two extreme points\n\u03c1 = \u00b11. We refer readers to the classical textbook [2] and recent papers [28, 27] for more details.\nEstimates using symmetric LM quantized RP\u2019s.\n[29] study the inner product estimator under\nLM quantization scheme, by analyzing the biases and variances of estimators in the symmetric case.\nThat is, the observations xi and yi are quantized by the same LM scheme with the same number of\nbits (b). In this paper, we study the asymmetric setting by using b1 number of bits for quantizing\nxi and b2 number of bits for yi. Apparently, the work of [29] is a special case of our results (i.e.,\nb1 = b2). Interestingly, our analysis also leads to a more re\ufb01ned bound on the estimation bias in the\nsymmetric case compared to the corresponding bound in [29]. See Section 4 for the detailed results.\n\nk(cid:88)\n\ni=1\n\n3 Scenario 1: Quantization vs. Full-precision\nRecall that, we have i.i.d. observations {xi, yi}, i = 1, 2, ..., k, from a standard bi-variate normal with\nxi \u223c N (0, 1), yi \u223c N (0, 1), and E(xiyi) = \u03c1. In this section, we study Scenario 1: quantization vs.\nfull-precision. That is, we quantize xi with b bits and we leave yi intact. The task is to estimate \u03c1\nfrom {Qb(xi), yi}, i = 1, 2, ..., k. We can still try to use the simple estimator similar to (5):\n\nAs one would expect, this estimator \u02c6\u03c1b,f is no longer unbiased. We can show that E (\u02c6\u03c1b,f ) = \u03be1,1\u03c1.\nHence, we can attempt to remove the bias by using the following \u201cdebiased estimator\u201d\n\nQb(xi)yi.\n\n(7)\n\nk(cid:88)\n\ni=1\n\n\u02c6\u03c1b,f =\n\n1\nk\n\n\u02c6\u03c1db\nb,f =\n\n\u02c6\u03c1b,f\n\u03be1,1\n\n=\n\n1\nk\n\n1\n\u03be1,1\n\n\u03b3\u03b1,\u03b2 = E(cid:0)Qb(x)\u03b1y\u03b2(cid:1) ,\n\nQb(xi)yi.\n\nk(cid:88)\n\u03be\u03b1,\u03b2 = E(cid:0)Qb(x)\u03b1x\u03b2(cid:1) .\n\ni=1\n\n(8)\n\n(9)\n\nWe will need to de\ufb01ne \u03be1,1. More generally and analogous to the notation in [29], we de\ufb01ne\n\n3\n\n\fThat is, \u03be1,1 = E (Qb(x)x). Note that \u03be\u03b1,\u03b2 can be represented by \u03b3\u03b1,\u03b2, but we use both for\nconvenience. Also note that \u03be1,1 = \u03be2,0 = 1 \u2212 Db from de\ufb01nitions. For b = 1, 2, 3, 4,\u221e, we can\ncompute \u03be1,1 = 0.6366, 0.8825, 0.9655, 0.9905, 1, respectively (keeping four decimal points). In\n12 2\u22122b, i.e., the bias decays at the rate of O(2\u22122b). In the\nfact, it is also known that Db = 33/22\u03c0\nfollowing, Theorem 1 summarizes the expectations and variances of the two estimators \u02c6\u03c1b,f and \u02c6\u03c1db\nb,f .\nTheorem 1.\n\nE(cid:0)\u02c6\u03c1db\n\nb,f\n\n(cid:1) = \u03c1,\n\nV ar (\u02c6\u03c1b,f ) =\n\nE (\u02c6\u03c1b,f ) = \u03be1,1\u03c1,\nVb,f\nk\nV db\nb,f\nk\n\nV ar(cid:0)\u02c6\u03c1db\n\n(cid:1) =\n\nb,f\n\n, with Vb,f = (\u03be2,2 \u2212 \u03be2,0 \u2212 \u03be2\n(\u03be2,2 \u2212 \u03be2,0 \u2212 \u03be2\n\u03be2\n1,1\n\n, with V db\n\nb,f =\n\n1,1)\u03c12 + \u03be2,0\n1,1)\u03c12 + \u03be2,0\n\n(10)\n\n(11)\n\n(12)\n\n.\n\nNormalized Estimator. We also attempt to take advantage of the (bene\ufb01cial) effect of normaliza-\ntion by de\ufb01ning two normalized estimators and their variances, as summarized in Theorem 2.\nTheorem 2. As k \u2192 \u221e, we have\ni=1 Qb(xi)yi\n\nE(\u02c6\u03c1b,f,n) =(cid:112)\u03be1,1\u03c1 + O(\n\n\u02c6\u03c1b,f,n =\n\n(13)\n\n),\n\n,\n\n(cid:113)(cid:80)k\n\ni=1 Q2\n\nb(xi)\n\ni=1 y2\ni\n\n1\nk\n\n\u02c6\u03c1db\nb,f,n =\n\nV ar (\u02c6\u03c1b,f,n) =\n\nVb,f,n\n\n+ O(\n\nVb,f,n =\n\n3\n4\n\n\u03b32,0 +\n\nE(\u02c6\u03c1db\n\nb,f,n) = \u03c1 + O(\n\n1\nk\n\n),\n\nV ar(\u02c6\u03c1db\n\nb,f,n) =\n\n(cid:19)\n\n1\nk2 ),\n1\n2\n\n\u03b32,2\n\n\u03c12 \u2212\n\n(cid:18) \u03b33,1\n\n\u03b32,0\n\nV db\nb,f,n\nk\n\n(cid:19)\n\n+ O(\n\n\u03b32,2\n\u03b32,0\n\n,\n\n+ \u03b31,3\n\n\u03c1 +\n\n(14)\n\n(15)\n\n(16)\n\n1\nk2 ),\nV db\nb,f,n =\n\nVb,f,n\n\u03be1,1\n\n.\n\n(cid:80)k\n(cid:113)(cid:80)k\n\u02c6\u03c1b,f,n(cid:112)\u03be1,1\n(cid:18) \u03b34,0\n\n,\n\n4\u03b32,0\n\nk\n\n+\n\n3.1 Bene\ufb01ts of normalized estimators and debiased estimators\n\nFigure 1 plots (in the left two panels) the variances for two debiased estimators \u02c6\u03c1db\nb,f,n, to\nillustrate the bene\ufb01ts of normalization. The right panel of Figure 1 demonstrates that the variance of\nthe normalized estimator is always smaller, and substantially so as \u03c1 away from zero.\n\nb,f and \u02c6\u03c1db\n\nFigure 1: Scenario 1: Comparisons of theoretical variances between two (debiased) estimators \u02c6\u03c1db\nb,f\nb,f for b = 1, 2, 3, 4, 5,\u221e. Middle panel: the variance\nand \u02c6\u03c1db\nfactor V db\n\nb,f,n (for the normalized estimator). Right panel: the variance ratio: V db\n\nb,f,n. Left panel: the variance factor V db\n\n.\n\nb,f\nV db\n\nb,f,n\n\n(cid:16)\n\n1 \u2212(cid:112)\u03be1,1\n\n(cid:17)2\n\nTo elaborate on the bene\ufb01t of debiased estimators, we evaluate the mean square errors (MSE): bias2 +\nvariance. Given the bene\ufb01t of normalization, we consider the two normalized estimators:\n\n\u03c12 +\n\nVb,f,n\n\nM SE (\u02c6\u03c1b,f,n) =\n\n1\nk2 ).\nThus, to compare their mean square errors, we can examine the ratio: \u03be1,1 + k\u03c12 \u03be1,1(1\u2212\u221a\n\u03be1,1)2\n,\nwhich will be larger than 1 quickly as k increases. Note that \u03be1,1 \u2264 1 but it is very close to 1 when\nb \u2265 3. In summary, the MSE of the debiased estimator quickly becomes smaller as k increases.\n\nVb,f,n\n\u03be1,1k\n\n+ O(\n\n+ O(\n\nVb,f,n\n\nb,f,n\n\nk\n\nk2 ), M SE(cid:0)\u02c6\u03c1db\n\n1\n\n(cid:1) =\n\n4\n\n-1 -0.50 0.5 1 0.51 1.52 Vb,fdbLM b=1LM b=2LM b=3LM b=4LM b=5Full-precision-1 -0.50 0.5 1 0 0.51 1.5Vb,f,ndbLM b=1LM b=2LM b=3LM b=4LM b=5Full-precision-1 -0.50 0.5 1 11.522.53Varaince RatioLM b=1LM b=2LM b=3LM b=4LM b=5Full-precision\f3.2 Analysis of mis-ordering probabilities in similarity search\nIn similarity search, the estimates of inner products are subsequently used for ordering data vectors to\nidentify the nearest neighbor for a given query. Intuitively, a more accurate estimator should provide\na more accurate ordering, but a precise analysis is needed for the \u201cmis-ordering\u201d probabilities.\nDe\ufb01nition 2. Suppose u1, u2, u3 \u2208 Rd are three data points (with u1 being a query) with unit norm\nand pair-wise cosine similarity \u03c112, \u03c113 and \u03c123 respectively. For an estimator \u02c6\u03c1, the probability of\nmis-ordering is de\ufb01ned as\n\nPM(u1; u2, u3) = P r (\u02c6\u03c112 > \u02c6\u03c113|\u03c112 < \u03c113) .\n\nConsider a case where u3 is the nearest point of u1 in the data space (which implies \u03c112 < \u03c113). If the\nestimation gives \u02c6\u03c112 > \u02c6\u03c113, we then make a wrong decision that u3 is not the nearest neighbor of u1.\nTheorem 3. (Asymptotic mis-ordering) Suppose u1, u2, u3 \u2208 Rd are three data points (with u1\nbeing the query) on a unit sphere with pair-wise inner products \u03c112, \u03c113 and \u03c123 respectively.\nDenote two estimators \u02c6\u03c1 and \u02c6\u03c1(cid:48) based on k random projections such that as k \u2192 \u221e, the normality\n\u02c6\u03c3(cid:48)2\n\u02c6\u03c1 \u223c N (\u03b1\u03c1, \u02c6\u03c32\n\u03b1(cid:48)2\nand the correlations C = corr(\u02c6\u03c112, \u02c6\u03c113), C(cid:48) = corr(\u02c6\u03c1(cid:48)\n= a(cid:48)\u03b4\u03c113 , C \u2212 aa(cid:48)C(cid:48) <\n\n\u03c1 ) hold, with constants \u03b1, \u03b1(cid:48) > 0. Denote \u03b42\n\n13), respectively. If\n(1 \u2212 a2)\u03b42\n\n\u03c1) and \u02c6\u03c1(cid:48) \u223c N (\u03b1(cid:48)\u03c1, \u02c6\u03c3(cid:48)2\n\n+ (1 \u2212 a(cid:48)2)\u03b42\n\n\u03b12 , \u03b4(cid:48)2\n\n= a\u03b4\u03c112,\n\n12, \u02c6\u03c1(cid:48)\n\n\u03c1 =\n\n(17)\n\n\u03c1 =\n\n\u03b4(cid:48)\n\n\u03c113\n\n,\n\n\u02c6\u03c32\n\u03c1\n\n\u03c1\n\n\u03b4(cid:48)\n\n\u03c112\n\n\u03c113\n\n\u03c112\n2\u03b4\u03c112\u03b4\u03c113\n\nwith some 0 < a < 1, 0 < a(cid:48) < 1, then as k \u2192 \u221e we have \u02c6PM(u1; u2, u3) > \u02c6P (cid:48)\nM(u1; u2, u3),\nM(u1; u2, u3) are the mis-ordering probability of \u02c6\u03c1 and \u02c6\u03c1(cid:48), respectively.\nwhere \u02c6PM(u1; u2, u3) and \u02c6P (cid:48)\nRemark. There is an interesting connection with the variances of the aforementioned \u201cdebiased\nestimators\u201d. Condition (17) basically assumes that the variance of the debiased \u02c6\u03c1(cid:48) is smaller than\nthat of the debiased \u02c6\u03c1 at \u03c112 and \u03c113 respectively by factors a and a(cid:48). In a special case where a = a(cid:48)\nand C = C(cid:48), the last constraint in (17) reduces to C <\n, which always holds since the\nright-hand side is greater than 1. Also, note that, by Central Limit Theorem, the normality assumption\nis true for all the estimators we have discussed in this paper.\n\n\u03b42\n\u03c113\n\u03c112\n2\u03b4\u03c112 \u03b4\u03c113\n\n+\u03b42\n\nAlthough Theorem 3 is asymptotic, we are able to obtain valuable insights in \ufb01nite sample case, since\nstatistically a suf\ufb01ciently large k is enough to provide good approximation to the normal distribution.\nThe important message given by Theorem 3 is that estimators with lower \u201cdebiased variance\u201d (\u03b4) tend\nto have lower mis-ordering probability, which leads to a more accurate estimation of nearest neighbors\nin the original data space. This could be extremely feasible in numerous real world applications.\n\n4 Scenario 2: Quantization with Different Bits\n\nWe now consider the more general case (Scenario 2) where the data vectors are LM quantized with\ndifferent numbers of bits. That is, given observations {xi, yi}, 1 \u2264 i \u2264 n, we quantize xi using b1\nbits and yi using b2 bits. Without loss of generality, we assume b1 < b2. Furthermore, we denote\ntwo Lloyd-Max quantizers as Qb1 and Qb2 and distortion Db1 and Db2, respectively. Similar to\nScenario 1, we de\ufb01ne the asymmetric estimator and the corresponding normalized estimator as\n\nk(cid:88)\n\ni=1\n\n\u02c6\u03c1b1,b2 =\n\n1\nk\n\nQb1(xi)Qb2(yi),\n\n\u02c6\u03c1b1,b2,n =\n\n(cid:80)k\n(cid:113)(cid:80)k\n\u03b6\u03b1,\u03b2 = E(cid:0)Qb1(x)\u03b1Qb2(y)\u03b2(cid:1) ,\n\u03b3\u03b1,\u03b2 = E(cid:0)Qb2(x)\u03b1x\u03b2(cid:1) ,\n\ni=1 Qb1(xi)Qb2(yi)\ni=1 Q2\nb2\n\n(cid:113)(cid:80)k\n\ni=1 Q2\nb1\n\n(xi)\n\n(yi)\n\n.\n\nAs one might expect, the analysis will become somewhat more dif\ufb01cult. Similar to the analysis for\nScenario 1, in this section we will use the following notations:\n\n\u03be\u03b1,\u03b2 = E(cid:0)Qb1(x)\u03b1x\u03b2(cid:1) ,\n\nwhich allow us to express the expectation and variance of \u02c6\u03c1b1,b2 as follows.\n\nE (\u02c6\u03c1b1,b2) = \u03b61,1,\n\nV ar (\u02c6\u03c1b1,b2) =\n\nVb1,b2\n\nk\n\n,\n\nVb1,b2 = \u03b62,2 \u2212 \u03b6 2\n\n1,1\n\n\u03b61,1 can be expressed as an in\ufb01nite sum, but it appears dif\ufb01cult to be further simpli\ufb01ed. Nevertheless,\nwe are able to quantify the expectation of \u02c6\u03c1b1,b2 in Theorem 4.\n\n5\n\n(18)\n\n(19)\n\n(20)\n\n\f(21)\n(22)\n\nTheorem 4. The following two bounds hold for \u03c1 \u2208 [\u22121, 1]:\n\n(cid:12)(cid:12)E (\u02c6\u03c1b1,b2 ) \u2212 (1 \u2212 Db1)(1 \u2212 Db2 )\u03c1(cid:12)(cid:12) \u2264 \u22061, and\n\u22061 =(cid:112)Db1Db2\n\n\u22062 \u2212 \u22061 \u2264 |E (\u02c6\u03c1b1,b2) \u2212 \u03c1| \u2264 \u22061 + \u22062, where\n\n(cid:112)1 \u2212 Db1\n\n(cid:112)1 \u2212 Db2|\u03c1|3, \u22062 = (Db1 + Db2 \u2212 Db1 Db2)|\u03c1|.\n\nRemark. When b2 \u2192 \u221e (i.e., Scenario 1), we have Db2 \u2192 0 and the bound reduces to an equality\nE (\u02c6\u03c1b1,\u221e) = (1 \u2212 Db1)\u03c1, which matches the result in Section 3.\nEq. (22) provides upper and lower bounds for the absolute bias of \u02c6\u03c1b1,b2. When b1 = b2 (i.e., the\nsymmetric quantization case), Theorem 5 presents more re\ufb01ned bounds of the bias of \u02c6\u03c1b1,b2.\nTheorem 5. (Symmetric quantization) Suppose b1 = b2 = b. For \u03c1 \u2208 [\u22121, 1], we have\nb )|\u03c1|.\n\nb )|\u03c1| \u2212 Db(1 \u2212 Db)|\u03c1|3 \u2264 |E (\u02c6\u03c1b,b) \u2212 \u03c1| \u2264 (2Db \u2212 D2\n\nRemark. Compared to [29], which derived |E(\u02c6\u03c1b,b) \u2212 \u03c1| \u2264 2Db|\u03c1|, our bounds are more tight.\nWhat about the debiased estimator of \u02c6\u03c1b1,b2? It is slightly tricky because E(\u02c6\u03c1b1,b2) = \u03b61,1 cannot be\nexplicitly expressed as c\u03c1 for some constant c (otherwise the debiased estimator would be simply\n\u02c6\u03c1b1,b2 /c). In Theorem 4, Eq. (21) implies that the expectation of \u02c6\u03c1b1,b2 is close to (1\u2212Db1)(1\u2212Db2)\u03c1.\nThus, we recommend\n\n(1\u2212Db1 )(1\u2212Db2 ) as the surrogate for the debiased estimator.\n\n(2Db \u2212 D2\n\n(23)\n\n\u02c6\u03c1b1,b2\n\nNext, we provide the expectation and variance of the normalized estimator in Theorem 6.\nTheorem 6. (Normalized estimator) As k \u2192 \u221e, we have\n\n\u03b61,1(cid:112)\u03be2,0\u03b32,0\n\n1\nk\n\nE (\u02c6\u03c1b1,b2,n) =\n\n+ O(\n\n),\n\nV ar (\u02c6\u03c1b1,b2,n) =\n\nVb1,b2,n\n\nk\n\n+ O(\n\n1\nk2 ),\n\nVb1,b2,n =\n\n\u03b62,2 \u2212 \u03b6 2\n\u03be2,0\u03b32,0\n\n1,1\n\n\u2212 \u03b61,1\u03b63,1 \u2212 \u03b6 2\n\n1,1\u03be2,0\n\n1,1\u03b62,2 \u2212 \u03b6 2\n\u03b6 2\n2,0\u03b32\n2,0\n\n2\u03be2\n\n+\n\n\u03be2\n2,0\u03b32,0\n1,1\u03be2,0\u03b32,0\n\n+\n\n1,1\u03b32,0\n\n\u2212 \u03b61,1\u03b61,3 \u2212 \u03b6 2\n\u03be2,0\u03b32\n2,0\n1,1\u03be4,0 \u2212 \u03b6 2\n\u03b6 2\n1,1\u03be2\n2,0\n2,0\u03b32,0\n\n4\u03be3\n\n1,1\u03b34,0 \u2212 \u03b6 2\n\u03b6 2\n4\u03be2,0\u03b33\n2,0\n\n+\n\n1,1\u03b32\n2,0\n\n(24)\n\n(25)\n\n.\n\nRemark. When b2 = \u221e, the expected value of \u02c6\u03c1b1,b2,n reduces to that of \u02c6\u03c1b1,f,n in Scenario 1.\nAdditionally, we have \u03b61,1 = \u03b62,0\u03c1, \u03b32,0 = 1, and \u03b34,0 = 3. It is easy to check that the expression of\nthe variance will reduce to the corresponding formula in Theorem 2. Also, note that \u03be2,0 = 1 \u2212 Db1,\n\u03b32,0 = 1 \u2212 Db2, and \u03b61,1 \u2248 (1 \u2212 Db1)(1 \u2212 Db2)\u03c1. This means that we can practically use\n\u221a\n\nas surrogate for the debiased estimator of \u02c6\u03c1b1,b2,n.\n\n\u02c6\u03c1b1,b2,n\n\n(1\u2212Db1 )(1\u2212Db2 )\n\nWe plot the related results in Figure 2, which veri\ufb01es the theories in Theorems 4, 5 and 6.\n\nFigure 2: Left panel: the absolute bias (solid curves, in log10 scale) of \u02c6\u03c1b1,b2 by simulations, along\nwith the upper bound (red dashed curves) and lower bound (blue dashed curves) in Eq. (22). Middle\npanel: the absolute bias of \u02c6\u03c1b1,b2 with b1 = b2 (the symmetric case) along with the upper and lower\nbounds in Eq. (23). Right panel: The variance Vb1,b2,n of the normalized estimator in Theorem 6.\n\n5 Monotonicity of Inner Product Estimates\nIn applications such as nearest neighbor retrieval, the order of distances tends to matter more than\nthe exact values. Given an estimator \u02c6\u03c1, one would hope that E(\u02c6\u03c1) is monotone in \u03c1. This is indeed\n\n6\n\n00.20.40.60.81-8-6-4-20log10(bias2)b1=1 b2=2b1=4 b2=5b1=2 b2=3b1=3 b2=40 0.20.40.60.81 -8-6-4-20log10(bias2)b1=b2=1b1=b2=5b1=b2=2b1=b2=3b1=b2=4-1 -0.50 0.5 0 0.20.40.60.81 Vb1,b2,nb1=1 b2=2b1=1 b2=3b1=2 b2=3b1=3 b2=4b1=4 b2=5b1= b2=\fthe case in the full-precision situation. Recall that, in Section 2, given i.i.d. observations {xi, yi},\ni = 1, 2, ...k, the full-precision estimator \u02c6\u03c1f = 1\ni=1 xiyi is monotone in \u03c1 in the expectation\nbecause E(\u02c6\u03c1f ) = \u03c1. Naturally, one will ask if the expectations of our quantized estimators, e.g.,\nk\ni=1 Qb1 (xi)Qb2(yi), are also monotone in \u03c1. This turns out to be non-trivial question.\n\u02c6\u03c1b1,b2 = 1\nk\n\n(cid:80)k\n\n(cid:80)k\n\nWe solve this important problem rigorously through several stages. Our analysis will not be restricted\nto LM quantizers. To do so, we will need the following de\ufb01nition about \u201cincreasing quantizer\u201d.\nDe\ufb01nition 3. (Increasing quantizer) Let Q be an M-level quantizer with boarders t0 < \u00b7\u00b7\u00b7 < tM\nand reconstruction levels \u00b51, ..., \u00b5M . We say that Q is an increasing quantizer if \u00b51 < \u00b7\u00b7\u00b7 < \u00b5M .\nTo proceed, we will prove the following three Lemmas for increasing quantizers.\nLemma 1. (1-bit vs. others) Suppose Qb1 , Qb2 are increasing quantizers symmetric about 0, with\nb1 \u2265 1, and b2 = 1. Then E(Qb1(x)Qb2(y)) is strictly increasing in \u03c1 on [\u22121, 1].\nLemma 2. (2-bit vs. 2-bit) Suppose Qb1, Qb2 are any two increasing quantizers symmetric about 0,\nwith b1 = b2 = 2. Then E(Qb1(x)Qb2 (y)) is strictly increasing in \u03c1 on [\u22121, 1].\nLemma 3. (Universal decomposition) For any increasing discrete quantizer Qb, b \u2265 3 which is\nsymmetric about 0, there exist a 2-bit symmetric increasing quantizer Q2 and a (b-1)-bit symmetric\nincreasing quantizer Qb\u22121 such that Qb = Qb\u22121 + Q2.\nOnce we have the above lemmas, we are ready to prove the monotonicity of E(Qb1(x)Qb2 (y)).\nTheorem 7. (Monotonicity) For any increasing quantizers Qb1 and Qb2 symmetric about 0 with bits\nb1 \u2265 1 and b2 \u2265 1, the function E(Qb1 (x)Qb2(y)) is increasing in \u03c1.\nProof. By Lemma 1, we know that the statement is valid for b1 = 1, and arbitrary b2. Now we look\nat the case where b1 \u2265 2, b2 \u2265 2. By Lemma 3, we can always write\nb2\u22121(cid:88)\n2 (x)(cid:80)b2\u22121\n\n2, ..., \u02c6Qb2\u22121\nand \u02c6Q1\n\u2202E(Qb1(x)Qb2(y))\n\nare two sets of symmetric increasing 2-bit quantizers. Thus,\n\n2 (x), Qb2(y) =\n\n2, ..., \u02dcQb1\u22121\n\nb1\u22121(cid:88)\n\n\u02c6Q(j)\n\n2 (y))\n\n\u02c6Q(j)\n\n2 (y),\n\nwhere \u02dcQ1\n\nQb1(x) =\n\ni=1\n\n2\n\n\u02dcQ(i)\n\nj=1\n\n2\n\n\u2202\u03c1\n\n\u2202E((cid:80)b1\u22121\nb2\u22121(cid:88)\nb1\u22121(cid:88)\n\ni=1\n\n=\n\n=\n\ni=1\n\nj=1\n\n\u2202\u03c1\n\n\u02dcQ(i)\n\nj=1\n\n\u2202\u03c1\n\u2202E( \u02dcQ(i)\n2 (x) \u02c6Q(j)\n\n2 (y))\n\n> 0,\n\nwhere the last equality is due to linearity of expectation and derivative, and the inequality holds\nbecause of Lemma 2. Therefore, E(Qb1(x)Qb2(y)) is increasing in \u03c1 for any b1 \u2265 1 and b2 \u2265 1.\nRecall that, in Section 3.2, we have proved the result for the mis-ordering probability, i.e., Theorem 3,\nwhich actually assumes estimators have expectations monotone in \u03c1. Therefore, Theorem 7 provides\nthe necessary proof to support the assumption in Theorem 3.\n6 Empirical Study: Similarity Search\nIn this section, we test proposed estimators on 3 datasets from the UCI repository (Table 1) [16]. The\nexperiments clearly con\ufb01rm that the normalization step uniformly improves the search accuracy. The\nresults also, to an extent, illustrate the in\ufb02uence of mis-ordering probability studied in Theorem 3.\n\nTable 1: Datasets used in the empirical study. Mean \u03c1 is the average pair-wise cosine similarity for\nsample pairs. Mean 1-NN \u03c1 is the average cosine similarity of each point to its nearest neighbor.\n\nDataset\nArcene\n\nBASEHOCK\n\nCOIL20\n\n# samples\n\n200\n1993\n1440\n\n# features\n\n10000\n4862\n1024\n\n7\n\n# classes Mean \u03c1 Mean 1-NN \u03c1\n\n2\n2\n20\n\n0.63\n0.33\n0.61\n\n0.86\n0.59\n0.93\n\n\fFigure 3: Nearest neighbor search recovery results using cosine similarity and quantized estimators,\nfrom random projections. Columns 1 and 2 (Scenario 1): the estimator \u02c6\u03c1b,f and its normalized\nversion \u02c6\u03c1b,f,n. Columns 3 and 4 (Scenario 2): the estimator \u02c6\u03c1b1,b2 and its normalized version \u02c6\u03c1b1,b2,n.\n\nFor each dataset, all the examples are preprocessed to have unit norm. The evaluation metric we\nadopt is the 1-NN precision, which is the proportion of nearest neighbors (NN) we can recover from\nthe nearest neighbor estimated using quantized random projections, averaged over all the examples.\nWe summarize the results in Figure 3. First of all, we can see that, as the number of bits increases,\nthe performance of the quantized estimators converges to that of the estimator with full-precision,\nas expected. Importantly, the normalization step of the estimators substantially improves the perfor-\nmances, by comparing Column 2 with Column 1 (for Scenario 1), and Column 4 with Column 3 (for\nScenario 2). In addition, we can to an extent validate the assertions in Theorem 3, which states that\nsmaller variance of debiased estimators could improve NN recovery precision.\n\n\u2022 In Figure 1 (left panel), we see that the variance of debiased estimate \u02c6\u03c1db\n\nb,f with b = 1 is\n|\u03c1| > 0.8), and roughly\nmuch smaller than using b \u2265 2 in high similarity region (e.g.\nthe same at \u03c1 = 0.6. Since Arcene and COIL20 have high mean 1-NN \u03c1 (0.86 and 0.93\n1,f should (in general) have\nrespectively), Theorem 3 may imply that cosine estimation of \u02c6\u03c1db\nsmaller mis-ordering probability than b \u2265 2, implying higher 1-NN precision. On the other\nb,f with all b = 1, 2, ...,\u221e would\nhand, the average 1-NN \u03c1 of BASEHOCK is 0.59, so \u02c6\u03c1db\nlikely give similar performance. These claims are consistent with Column 1 of Figure 3.\n\n\u2022 The variance of the debiased normalized estimator \u02c6\u03c1db\n\nb,f,n (Figure 1, middle panel) decreases\nas b increases, uniformly for any \u03c1. Hence by Theorem 3 we expect the 1-NN precision\nshould increase with larger b on all 3 datasets, as con\ufb01rmed by Column 2 of Figure 3.\n\n7 Conclusion\nIn this paper, we conduct a comprehensive study of estimating inner product similarities from random\nprojections followed by asymmetric quantizations. This setting is theoretically interesting and also\nhas many practical applications. For example, in a retrieval system, data vectors (after random\nprojections) in the repository are quantized to reduce storage and communications; when a new query\nvector arrives, it does not have to be quantized. Another example of asymmetric quantization may\ncome from data collected from different sources with own quantization strategies. In this study, we\npropose a series of estimators for asymmetric quantization, starting with the simple basic estimator,\nthen the normalized estimator, and then the debiased estimators. We provide a thorough analysis\nof the estimation errors. Furthermore, we analyze the problems of \u201cmis-ordering\u201d probabilities and\nmonotonicity properties of estimators. While our methods and analyses are largely based on the\nclassical Lloyd-Max (LM) method, they can be extended to other more general quantization schemes.\n\n8\n\n26272829210211212Number of Projections0204060801001-NN Precision (%)ArceneLM b=(,)LM b=(1,)LM b=(2,)LM b=(3,)LM b=(4,)LM b=(5,)26272829210211212Number of Projections0204060801001-NN Precision (%)ArceneLM b=(,)LM b=(1,)LM b=(2,)LM b=(3,)LM b=(4,)LM b=(5,)26272829210211212Number of Projections0204060801001-NN Precision (%)ArceneLM b=(,)LM b=(1,2)LM b=(1,3)LM b=(2,4)LM b=(3,4)LM b=(4,5)26272829210211212Number of Projections0204060801001-NN Precision (%)ArceneLM b=(,)LM b=(1,2)LM b=(1,3)LM b=(2,4)LM b=(3,4)LM b=(4,5)26272829210211212Number of Projections0204060801001-NN Precision (%)BASEHOCKLM b=(,)LM b=(1,)LM b=(2,)LM b=(3,)LM b=(4,)LM b=(5,)26272829210211212Number of Projections0204060801001-NN Precision (%)BASEHOCKLM b=(,)LM b=(1,)LM b=(2,)LM b=(3,)LM b=(4,)LM b=(5,)26272829210211212Number of Projections0204060801001-NN Precision (%)BASEHOCKLM b=(,)LM b=(1,2)LM b=(1,3)LM b=(2,4)LM b=(3,4)LM b=(4,5)26272829210211212Number of Projections0204060801001-NN Precision (%)BASEHOCKLM b=(,)LM b=(1,2)LM b=(1,3)LM b=(2,4)LM b=(3,4)LM b=(4,5)26272829210211212Number of Projections0204060801001-NN Precision (%)COIL20LM b=(,)LM b=(1,)LM b=(2,)LM b=(3,)LM b=(4,)LM b=(5,)26272829210211212Number of Projections0204060801001-NN Precision (%)COIL20LM b=(,)LM b=(1,)LM b=(2,)LM b=(3,)LM b=(4,)LM b=(5,)26272829210211212Number of Projections0204060801001-NN Precision (%)COIL20LM b=(,)LM b=(1,2)LM b=(1,3)LM b=(2,4)LM b=(3,4)LM b=(4,5)26272829210211212Number of Projections0204060801001-NN Precision (%)COIL20LM b=(,)LM b=(1,2)LM b=(1,3)LM b=(2,4)LM b=(3,4)LM b=(4,5)\fReferences\n[1] Dimitris Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary\n\ncoins. Journal of Computer and System Sciences, 66(4):671\u2013687, 2003.\n\n[2] Theodore W. Anderson. An Introduction to Multivariate Statistical Analysis. John Wiley &\n\nSons, third edition, 2003.\n\n[3] Jay M. Berger. A note on error detection codes for asymmetric channels. Information and\n\nControl, 4(1):68\u201373, 1961.\n\n[4] Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: Appli-\ncations to image and text data. In Proceedings of the Seventh ACM SIGKDD International\nConference on Knowledge Discovery and Data Mining (KDD), pages 245\u2013250, San Francisco,\nCA, 2001.\n\n[5] Petros Boufounos and Richard G. Baraniuk. 1-bit compressive sensing.\n\nIn 42nd Annual\n\nConference on Information Sciences and Systems (CISS), pages 16\u201321, Princeton, NJ, 2008.\n\n[6] Jeremy Buhler and Martin Tompa. Finding motifs using random projections. Journal of\n\nComputational Biology, 9(2):225\u2013242, 2002.\n\n[7] Emmanuel Cand\u00e8s, Justin Romberg, and Terence Tao. Robust uncertainty principles: exact\nsignal reconstruction from highly incomplete frequency information. IEEE Transactions on\nInformation Theory, 52(2):489\u2013509, Feb 2006.\n\n[8] Moses S. Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings\non 34th Annual ACM Symposium on Theory of Computing (STOC), pages 380\u2013388, Montreal,\nCanada, 2002.\n\n[9] George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu. Large-scale malware classi\ufb01cation\nusing random projections and neural networks. In IEEE International Conference on Acoustics,\nSpeech and Signal Processing (ICASSP), pages 3422\u20133426, Vancouver, Canada, 2013.\n\n[10] Sanjoy Dasgupta. Experiments with random projection. In Proceedings of the 16th Conference\n\nin Uncertainty in Arti\ufb01cial Intelligence (UAI), pages 143\u2013151, Stanford, CA, 2000.\n\n[11] Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds.\nIn Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC), pages\n537\u2013546, Victoria, Canada, 2008.\n\n[12] Sanjoy Dasgupta and Anupam Gupta.\n\nAn elementary proof of a theorem of\n\nJohnson and Lindenstrauss. Random Structures and Algorithms, 22(1):60 \u2013 65, 2003.\n\n[13] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokn. Locality-sensitive hashing\nIn Proceedings of the 20th ACM Symposium on\n\nscheme based on p-stable distributions.\nComputational Geometr (SCG), pages 253 \u2013 262, Brooklyn, NY, 2004.\n\n[14] Wei Dong, Moses Charikar, and Kai Li. Asymmetric distance estimation with sketches for\nsimilarity search in high-dimensional spaces. In Proceedings of the 31st Annual International\nACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages\n123\u2013130, 2008.\n\n[15] David L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289\u2013\n\n1306, April 2006.\n\n[16] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.\n\n[17] Ronald Fagin, Ravi Kumar, and D. Sivakumar. Ef\ufb01cient similarity search and classi\ufb01cation\nvia rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on\nManagement of Data (SIGMOD), pages 301\u2013312, San Diego, CA, 2003.\n\n[18] Xiaoli Zhang Fern and Carla E. Brodley. Random projection for high dimensional data clustering:\nA cluster ensemble approach. In Proceedings of the Twentieth International Conference (ICML),\npages 186\u2013193, Washington, DC, 2003.\n\n9\n\n\f[19] Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, and Nakul Verma. Learning the structure of\nmanifolds using random projections. In Advances in Neural Information Processing Systems\n(NIPS), pages 473\u2013480, Vancouver, Canada, 2007.\n\n[20] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for max-\nimum cut and satis\ufb01ability problems using semide\ufb01nite programming. Journal of ACM,\n42(6):1115\u20131145, 1995.\n\n[21] Albert Gordo, Florent Perronnin, Yunchao Gong, and Svetlana Lazebnik. Asymmetric distances\n\nfor binary embeddings. IEEE Trans. Pattern Anal. Mach. Intell., 36(1):33\u201347, 2014.\n\n[22] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse\nof dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on the Theory of\nComputing (STOC), pages 604\u2013613, Dallas, TX, 1998.\n\n[23] Laurent Jacques, Jason N. Laska, Petros T. Boufounos, and Richard G. Baraniuk. Robust 1-bit\ncompressive sensing via binary stable embeddings of sparse vectors. IEEE Transactions on\nInformation Theory, 59(4):2082\u20132102, 2013.\n\n[24] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mapping into Hilbert\n\nspace. Contemporary Mathematics, 26:189\u2013206, 1984.\n\n[25] Ping Li. One scan 1-bit compressed sensing. In Proceedings of the 19th International Conference\n\non Arti\ufb01cial Intelligence and Statistics (AISTATS), pages 1515\u20131523, Cadiz, Spain, 2016.\n\n[26] Ping Li. Binary and multi-bit coding for stable random projections. In Proceedings of the 20th\nInternational Conference on Arti\ufb01cial Intelligence and Statistics (AISTATS), pages 1430\u20131438,\nFort Lauderdale, FL, 2017.\n\n[27] Ping Li. Sign-full random projections. In The Thirty-Third AAAI Conference on Arti\ufb01cial\n\nIntelligence (AAAI), pages 4205\u20134212, Honolulu, Hawaii, 2019.\n\n[28] Ping Li, Trevor J. Hastie, and Kenneth W. Church.\n\nImproving random projections using\nmarginal information. In 19th Annual Conference on Learning Theory (COLT), pages 635\u2013649,\nPittsburgh, PA, 2006.\n\n[29] Ping Li and Martin Slawski. Simple strategies for recovering inner products from coarsely\nquantized random projections. In Advances in Neural Information Processing Systems (NIPS),\npages 4567\u20134576, Long Beach, CA, 2017.\n\n[30] Xiang-Yang Li. Wireless ad hoc and sensor networks: theory and applications. Cambridge\n\nUniversity Press, 2008.\n\n[31] Xiaoyun Li and Ping Li. Generalization error analysis of quantized compressive learning. In\n\nAdvances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019.\n\n[32] Stuart P. Lloyd. Least squares quantization in PCM. IEEE Trans. Information Theory, 28(2):129\u2013\n\n136, 1982.\n\n[33] Joel Max. Quantizing for minimum distortion. IRE Trans. Information Theory, 6(1):7\u201312, 1960.\n\n[34] S Sandeep Pradhan and Kannan Ramchandran. Group-theoretic construction and analysis of\ngeneralized coset codes for symmetric/asymmetric distributed source coding. In Proceedings of\nConference on Information Sciences and Systems (CISS), 2000.\n\n[35] Santosh S. Vempala. The Random Projection Method. American Mathematical Society, 2004.\n\n[36] Ossama Younis and Sonia Fahmy. Distributed clustering in ad-hoc sensor networks: A hybrid,\nIn Proceedings the 23rd Annual Joint Conference of the IEEE\n\nenergy-ef\ufb01cient approach.\nComputer and Communications Societies (INFOCOM), Hong Kong, China, 2004.\n\n[37] Argyrios Zymnis, Stephen P. Boyd, and Emmanuel J. Cand\u00e8s. Compressed sensing with\n\nquantized measurements. IEEE Signal Process. Lett., 17(2):149\u2013152, 2010.\n\n10\n\n\f", "award": [], "sourceid": 5787, "authors": [{"given_name": "Xiaoyun", "family_name": "Li", "institution": "Rutgers University"}, {"given_name": "Ping", "family_name": "Li", "institution": "Baidu Research USA"}]}