{"title": "Testing for Families of Distributions via the Fourier Transform", "book": "Advances in Neural Information Processing Systems", "page_first": 10063, "page_last": 10074, "abstract": "We study the general problem of testing whether an unknown discrete distribution belongs to a specified family of distributions. More specifically, given a distribution family P and sample access to an unknown discrete distribution D , we want to distinguish (with high probability) between the case that D in P and the case that D is \u03b5-far, in total variation distance, from every distribution in P . This is the prototypical hypothesis testing problem that has received significant attention in statistics and, more recently, in computer science. The main contribution of this work is a simple and general testing technique that is applicable to all distribution families whose Fourier spectrum satisfies a certain approximate sparsity property. We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity.", "full_text": "Testing for Families of Distributions\n\nvia the Fourier Transform\u2217\n\nCl\u00e9ment L. Canonne\nStanford University\n\nIlias Diakonikolas\n\nUniversity of Southern California\n\nccanonne@stanford.edu\n\ndiakonik@usc.edu\n\nAlistair Stewart\n\nUniversity of Southern California\n\nstewart.al@gmail.com\n\nAbstract\n\nWe study the general problem of testing whether an unknown discrete distribution\nbelongs to a speci\ufb01ed family of distributions. More speci\ufb01cally, given a distribution\nfamily P and sample access to an unknown discrete distribution P, we want to\ndistinguish (with high probability) between the case that P \u2208 P and the case that\nP is \u0001-far, in total variation distance, from every distribution in P. This is the\nprototypical hypothesis testing problem that has received signi\ufb01cant attention in\nstatistics and, more recently, in computer science. The main contribution of this\nwork is a simple and general testing technique that is applicable to all distribution\nfamilies whose Fourier spectrum satis\ufb01es a certain approximate sparsity property.\nWe apply our Fourier-based framework to obtain near sample-optimal and com-\nputationally ef\ufb01cient testers for the following fundamental distribution families:\nSums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial\nDistributions (PMDs), and Discrete Log-Concave Distributions. For the \ufb01rst two,\nours are the \ufb01rst non-trivial testers in the literature, vastly generalizing previous\nwork on testing Poisson Binomial Distributions. For the third, our tester improves\non prior work in both sample and time complexity.\n\n1\n\nIntroduction\n\n1.1 Background and Motivation\n\nThe prototypical inference question in the area of distribution property testing [6] is the following:\nGiven a set of samples from a collection of probability distributions, can we determine whether\nthese distributions satisfy a certain property? During the past two decades, this broad question \u2013\nwhose roots lie in statistical hypothesis testing [43, 40] \u2013 has received considerable attention by the\ncomputer science community, see [49, 10] for two recent surveys. After two decades of study, for\nmany properties of interest there exist sample-optimal testers (matched by information-theoretic\nlower bounds) [44, 14, 54, 30, 2, 29, 12].\nIn this work, we focus on the problem of testing whether an unknown distribution belongs to a given\nfamily of discrete structured distributions. Let P be a family of discrete distributions over a total order\n(e.g., [n]) or a partial order (e.g., [n]k). The problem of membership testing for P is the following:\nGiven sample access to an unknown distribution P (effectively supported on the same domain as P),\nwe want to distinguish between the case that P \u2208 P versus dTV (P,P) > \u0001. (Here, dTV denotes the\n\n1The full version of this paper is available at [13].\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\ftotal variation distance between distributions.) The sample complexity of this problem depends on\nthe underlying family P. For example, if P contains a single distribution over a domain of size n, the\nsample complexity of the testing problem is O(n1/2/\u00012) [14, 54, 30, 2].\nIn this work, we give a general technique to test membership in various distribution families over\ndiscrete domains, i.e., to solve the following task:\n\nT(P, \u0001): given a family of discrete distributions P over some partially or totally ordered set,\nparameter \u0001 \u2208 (0, 1], and sample access to an unknown distribution P over the same domain,\nhow many samples are required to distinguish, with probability 3/5, between the case that\nP \u2208 P versus dTV (P,P) > \u0001?\n\nBefore we state our results in full generality, we present concrete applications to a number of\nwell-studied distribution families.\n\n1.2 Our Results\n\nOur \ufb01rst result is a nearly sample-optimal testing algorithm for sums of independent integer random\nvariables (SIIRVs). Formally, an (n, k)-SIIRV is a sum of n independent integer random variables\neach supported in {0, 1, . . . , k \u2212 1}. We will denote the set of (n, k)-SIIRVs by SIIRV n,k. SIIRVs\ncomprise a rich class of distributions that arise in many settings. The special case of k = 2 was \ufb01rst\nconsidered by Poisson [45] as a non-trivial extension of the Binomial distribution, and is known as\nPoisson binomial distribution (PBD). In application domains, SIIRVs have many uses in research\nareas such as survey sampling, case-control studies, and survival analysis, see e.g., [16] for a survey\nof the many practical uses of these distributions. In addition to their practical applications, SIIRVs are\nof fundamental probabilistic interest and have been extensively studied in the theory of probability\nand statistics [19, 38, 34, 46, 39, 5, 18, 15]. We prove:\nTheorem 1 (Testing SIIRVs). Given parameters k, n \u2208 N and sample access to a distribution over\nN, there exists an algorithm (Algorithm 1) for T(SIIRV n,k, \u0001) which takes\n\n(cid:18) kn1/4\n\n\u00012\n\nO\n\nlog1/4 1\n\u0001\n\n+\n\nk2\n\n\u00012 log2 k\n\n\u0001\n\n(cid:19)\n\nsamples, and runs in time n(k/\u0001)O(k log(k/\u0001)).\n\nPrior to our work, no non-trivial2 tester was known for (n, k)-SIIRVs for any k > 2. [11] showed a\n\nsample lower bound of \u2126(cid:0)k1/2n1/4/\u00012(cid:1), but their techniques did not yield any non-trivial sample\n(cid:112)log 1/\u0001 + (1/\u0001)O(log2 1/\u0001)(cid:17)\n\nupper bound.\nFor the special case of PBDs (k = 2), Acharya and Daskalakis [1] gave a tester with sample\ncomplexity O\n, and\nalso showed a sample lower bound of \u2126(n1/4/\u00012). The special case of our Theorem 1 for k = 2\nyields an improvement over [1] in both sample size and runtime:\nTheorem 2 (Testing PBDs). Given parameter n \u2208 N and sample access to a distribution over N,\nthere exists an algorithm (Algorithm 1) for T(PBDn, \u0001) which takes\n\n(cid:112)log 1/\u0001 + log5/2 1/\u0001\n\n(cid:16) n1/4\n\n(cid:16) n1/4\n\n, running time O\n\n(cid:17)\n\n\u00012\n\n\u00016\n\n\u00012\n\n(cid:19)\nsamples, and runs in time n1/4 \u00b7 \u02dcO(cid:0)1/\u00012(cid:1) + (1/\u0001)O(log log(1/\u0001)).\n\n(cid:18) n1/4\n\nlog1/4 1\n\u0001\n\nlog2 1/\u0001\n\n\u00012\n\n\u00012\n\n+\n\nO\n\nNote that the sample complexity of our algorithm is n1/4 \u00b7 \u02dcO(1/\u00012), matching the information-\ntheoretic lower bound up to a logarithmic factor in 1/\u0001. In particular, our algorithm does not incur\nthe extraneous \u2126(1/\u00016) term of [1]. Moreover, our runtime has a (1/\u0001)O(log log(1/\u0001)) dependence,\nas opposed to (1/\u0001)O(log2 1/\u0001). The improved running time relies on a more ef\ufb01cient computational\n\n2By the term \u201cnon-trivial\u201d here we refer to a testing algorithm that uses fewer samples than just learning the\n\nunknown distribution and then checking whether it is close to a distribution in the family.\n\n2\n\n\fDistributions (PMDs). Formally, an (n, k)-PMD is any random variable of the form X =(cid:80)n\n\n\u201cprojection step\u201d in our general framework, which leverages the geometric structure of Poisson\nBinomial distributions.\nWe remark that the guarantees provided by the above two theorems are actually stronger than the\nusual property testing one. Namely, whenever the algorithm returns accept, then it also provides a\n(proper) hypothesis H such that dTV (P, H) \u2264 \u0001 with probability at least 3/5.\nA broad generalization of PBDs to the high-dimensional setting is the family of Poisson Multinomial\ni=1 Xi,\nwhere the Xi\u2019s are independent random vectors supported on the set {e1, e2, . . . , ek} of standard\nbasis vectors in Rk. We will denote by PMDn,k the set of (n, k)-PMDs. PMDs comprise a\nbroad class of discrete distributions of fundamental importance in computer science, probability,\nand statistics. A large body of work in the probability and statistics literature has been devoted\nto the study of the behavior of PMDs under various structural conditions [4, 41, 5, 7, 47, 48].\nPMDs generalize the familiar multinomial distribution, and describe many distributions commonly\nencountered in computer science (see, e.g., [25, 26, 56, 53]). Recent years have witnessed a \ufb02urry of\nresearch activity on PMDs and related distributions, from several perspectives of theoretical computer\nscience, including learning [22, 21, 31, 23, 32], property testing [56, 52, 53], computational game\ntheory [25, 26, 9, 24, 27, 35, 17], and derandomization [37, 8, 28, 36]. We prove the following:\nTheorem 3 (Testing PMDs). Given parameters k, n \u2208 N and sample access to a distribution over\nNk, there exists an algorithm for T(PMDn,k, \u0001) which takes\n\n(cid:18) n(k\u22121)/4k2k\n\nO\n\n\u00012\n\n(cid:19)\n\nlog(k/\u0001)k\n\nsamples, and runs in time nO(k3) \u00b7 (1/\u0001)O(k3 log(k/\u0001)\n2O(k5k log(1/\u0001)k+2).\n\nlog log(k/\u0001) )k\u22121 or alternatively in time nO(k) \u00b7\n\nFor the sake of intuition, we note that Theorem 3 is particularly interesting in the regime that n is\nlarge and k is small. Indeed, the sample complexity of testing PMDs is inherently exponential in\nk: We prove a sample lower bound of \u2126k(n(k\u22121)/4/\u00012) (Theorem 8),3 nearly-matching our upper\nbound for constant k.\nFinally, we demonstrate the versatility of our techniques by obtaining a testing algorithm for discrete\nlog-concavity. Log-concave distributions constitute a broad and \ufb02exible non-parametric family that is\nextensively used in modeling and inference [57]. In the discrete setting, log-concave distributions\nencompass a range of fundamental types of discrete distributions, including binomial, negative\nbinomial, geometric, hypergeometric, Poisson, Poisson Binomial, hyper-Poisson, P\u00f3lya-Eggenberger,\nand Skellam distributions. Log-concave distributions have been studied in a wide range of different\ncontexts including economics [3], statistics and probability theory (see [50] for a recent survey),\ntheoretical computer science [42], and algebra, combinatorics and geometry [51]. We will denote by\nLCV n the class of log-concave distributions over [n]. We prove:\nTheorem 4 (Testing Log-Concavity). Given a parameter n \u2208 N and sample access to a distribution\nover N, there exists an algorithm for T(LCV n, \u0001) which takes\n\n(cid:18)\u221a\n\n(cid:19)\n\n(cid:18) 1\n\n(cid:19)\n\n+ \u02dcO\n\n\u00015/2\n\nO\n\nn\n\u00012\nn \u00b7 poly(1/\u0001)).\n\n\u221a\nsamples, and runs in time O(\n\nOur discrete log-concavity tester improves on previous work in terms of both sample and time\n\ncomplexity. Speci\ufb01cally, [2] gave a log-concavity tester with sample complexity O(cid:0)\u221a\nwhile [11] obtained a tester with sample complexity \u02dcO(cid:0)\u221a\n\nn/\u00012 + 1/\u00015(cid:1),\nn/\u00017/2(cid:1). Our sample complexity dominates\n\nboth these bounds, and is signi\ufb01cantly better when \u0001 is small. The algorithms in [2, 11] run in\npoly(n/\u0001) time, as they involve solving a linear program of poly(n/\u0001) size. In contrast, the running\ntime of our algorithm is sublinear in n.\n\n3Here, we use the notation \u2126k(\u00b7), Ok(\u00b7) to indicate that the parameter k is seen as a constant.\n\n3\n\n\f1.3 Our Techniques and Comparison to Previous Work\n\nAll the testing algorithms in this paper follow from a simple and general technique that may be of\nbroader interest. The common property of the underlying distribution families P that allows for our\nuni\ufb01ed testing approach is the following: Let P be the probability mass function of any distribution\nin P. Then, the Fourier transform of P is approximately sparse, in a well-de\ufb01ned sense.\nFor concreteness, we elaborate on our technique for the case of SIIRVs. The starting point of our\napproach is the observation from [31] that (n, k)-SIIRVs \u2013 in addition to having a relatively small\neffective support \u2013 also have an approximately sparse Fourier representation. Roughly speaking,\nmost of their Fourier mass is concentrated on a small subset of Fourier coef\ufb01cients, which can be\ncomputed ef\ufb01ciently.\nThis suggests the following natural approach to testing (n, k)-SIIRVs: \ufb01rst, identify the effective\nsupport I of the distribution P and check that it is appropriately small. If it is not, then reject. Then,\ncompute the corresponding small subset S of the Fourier domain, and check that almost no Fourier\nmass of P lies outside S. Otherwise, one can safely reject, as this is a certi\ufb01cate that P is not an\n(n, k)-SIIRV. Combining the two steps, one can show that learning the Fourier transform of P (in\nL2-norm) on this small subset S only, is suf\ufb01cient to learn P itself in total variation distance. The\nformer goal can be performed with relatively few samples, as S is suf\ufb01ciently small.\nAt this point, we have obtained a distribution H \u2013 succinctly represented by its Fourier transform on S\n\u2013 such that P and H are close in total variation distance. It only remains to perform a computational\n\u201cprojection step\u201d to verify that H itself is close to some (n, k)-SIIRV. This will clearly be the case if\nindeed P \u2208 SIIRV n,k.\nAlthough the aforementioned approach forms the core of our SIIRV testing algorithm, the actual tester\nhas to address separately the case where P has small variance, which can be handled by a testing-via-\nlearning approach. Our main contribution is thus to describe how to ef\ufb01ciently perform the second\nstep, i.e., the Fourier sparsity testing. This is done in Theorem 6, which describes a simple algorithm\nto perform this step. The algorithm proceeds by essentially considering the Fourier coef\ufb01cients of\nthe empirical distribution (obtained by taking a small number of samples). Interestingly, the main\nidea underlying Theorem 6 is to avoid analyzing directly the behavior of these Fourier coef\ufb01cients \u2013\nwhich would naively require too high a time complexity. Instead, we rely on Plancherel\u2019s identity and\nreduce the problem to the analysis of a different task: that of the sample complexity of L2 identity\ntesting (Proposition 1). By a tight analysis of this L2 tester, we get as a byproduct that several Fourier\nquantities of interest (of our empirical distribution) simultaneously enjoy good concentration \u2013 while\narguing concentration of each of these terms separately would yield a suboptimal time complexity.\nA nearly identical method works for PMDs as well. Moreover, our approach can be abstracted to\nyield a general testing framework, as we explain in Section 4. It is interesting to remark that the\nFourier transform has been used to learn PMDs and SIIRVs [31, 23, 32, 20], and therefore it may not\nbe entirely surprising that it has applications to testing as well. Notably, our Fourier testing technique\ngives an improved and nearly-optimal algorithms for log-concavity, for which no Fourier learning\nalgorithm was known. More generally, testing membership to a class using the Fourier transform is\nsigni\ufb01cantly more challenging than learning. A fundamental difference is that in the testing setting\nwe need to handle distributions that do not belong to the class (e.g., SIIRVs, PMDs), but are far\nfrom the class in an arbitrary way. In contrast, learning algorithms work under the promise that the\ndistribution is in the underlying class, and thus can leverage the speci\ufb01c structure.\n\nTesting via the Fourier Transform: the Advantage One may wonder how the detour via the\nFourier transform enables us to obtain better sample complexity than an approach purely based on L2\ntesting. Indeed, all distributions in the classes we consider, crucially, have small L2 norm. For testing\nidentity to such a distribution P, the standard L2 identity tester (see, e.g., [14] or Proposition 1), which\nworks by checking how large the L2-distance between the empirical and the hypothesis distribution\nis, will be optimal. We can thus test membership of a class of such distributions by (i) learning P\nassuming it belongs to the class, and then (ii) test whether what we learned is indeed close to P using\nthe L2 identity tester. The catch is that, in order to get guarantees in L1-distance using this approach,\nwould require us to learn to very small L2 distance (because of the Cauchy\u2013Schwarz inequality). In\n\u221a\nparticular, if the unknown distribution P has support size N, we would have to learn to L2 distance\n\u0001/\n\n\u221a\nN in (i), and then in (ii) test that we are within L2-distance \u0001/\n\nN of the learned hypothesis.\n\n4\n\n\fHowever, if a distribution P has a sparse discrete Fourier transform (whose effective support is\nknown), then it suf\ufb01ces to estimate only these few Fourier coef\ufb01cients [31, 33]. This step enables\nus to learn P in (i) not just to within L1-distance \u0001, but indeed (crucially) within L2-distance \u0001\u221a\nN\nwith good sample complexity. Additionally, the identity testing algorithm can be put into a simpler\nform for a hypothesis with sparse Fourier transform, as previously mentioned. Now, the tester has\nN /\u00012; but if it accepts, then we have learned the distribution P\nhigher sample complexity, roughly\n\nto within \u0001 total variation distance, with much fewer samples than the \u2126(cid:0)N/\u00012(cid:1) required for arbitrary\n\ndistributions over support size N. Lastly, we note that we can replace the support size N in the above\ndescription by the size of the effective support, i.e., the smallest set that contains 1 \u2212 O(\u0001) fraction of\nthe mass. Doing so for the case of (n, k)-SIIRVs leads to a sample complexity proportional to n1/4,\ninstead of n1/2.\n\n\u221a\n\n1.4 Organization\n\nThe rest of the paper is organized as follows: In Section 2, we set up notation and provide de\ufb01nitions\nas well as standard results relevant to our purposes. Section 3 contains the details of one of the main\nsubroutines our testers rely on, namely for Fourier sparsity testing.In Section 4, we describe our\ngeneral approach to obtain a tester applicable to any class of distributions which enjoys good Fourier\nsparsity. In Section 5, we state and sketch the proof of our sample complexity lower bound for testing\nPMDs. Due to space constraints, most proofs have been deferred to the full version [13].\n\n2 Notation and De\ufb01nitions\n\nWe begin with some standard notations and de\ufb01nitions, as well as basics of Fourier analysis and\nresults from Probability that we shall use throughout the paper. For m \u2208 N, we write [m] for the set\n{0, 1, . . . , m \u2212 1}, and log (resp. ln) for the binary logarithm (resp. the natural logarithm).\n\nDistributions and Metrics A probability distribution over (discrete) domain \u2126 is a function\nP : \u2126 \u2192 [0, 1] such that (cid:107)P(cid:107)1\n\u03c9\u2208\u2126 P(\u03c9) = 1; we denote by \u2206(\u2126) the set of all probability\ndistributions over domain \u2126. Recall that for two probability distributions P, Q \u2208 \u2206(\u2126), their total\n(cid:80)\nvariation distance (or statistical distance) is de\ufb01ned as dTV (P, Q) def= supS\u2286\u2126(P(S) \u2212 Q(S)) =\n2(cid:107)P \u2212 Q(cid:107)1. Given a subset P \u2286 \u2206(\u2126) of distributions,\n\u03c9\u2208\u2126|P(\u03c9) \u2212 Q(\u03c9)|, i.e. dTV (P, Q) = 1\nthe distance from P to P is then de\ufb01ned as dTV (P,P) def= inf Q\u2208P dTV (P, Q). If dTV (P,P) > \u0001,\nwe say that P is \u0001-far from P; otherwise, it is \u0001-close.\n\n1\n2\n\ndef= (cid:80)\n\nDiscrete Fourier Transform Our algorithms will rely heavily on the (discrete) Fourier transform,\nwhose de\ufb01nition we recall next.\nDe\ufb01nition 1 (Discrete Fourier Transform). For x \u2208 R, we let e(x) def= exp(\u22122i\u03c0x). The Discrete\n\nFourier Transform (DFT) modulo M of a function F : [n] \u2192 C is the function (cid:98)F : [M ] \u2192 C de\ufb01ned\n\nas\n\nM\n\nfor \u03be \u2208 [M ]. The DFT modulo M of a distribution P, (cid:98)P, is then the DFT modulo M of its\nprobability mass function (note that one can then equivalently see(cid:98)P(\u03be) as the expectation(cid:98)P(\u03be) =\nThe inverse DFT modulo M onto the range [m, m + M \u2212 1] of (cid:98)F : [M ] \u2192 C, is the function\n\n], for \u03be \u2208 [M ]).\n\n(cid:16) \u03beX\n\nEX\u223cF [e\n\n(cid:17)\n\nM\n\nF : [m, m + M \u2212 1] \u2229 Z \u2192 C de\ufb01ned by\n\n(cid:98)F (\u03be) =\n\n(cid:18) \u03bej\n\n(cid:19)\n\nn\u22121(cid:88)\n\nj=0\n\ne\n\nF (j)\n\nF (j) =\n\n1\nM\n\nfor j \u2208 [m, m + M \u2212 1] \u2229 Z.\n\n(cid:18)\n\nM\u22121(cid:88)\n\ne\n\n\u2212 \u03bej\nM\n\n(cid:19)(cid:98)F (\u03be),\n\n\u03be=0\n\n5\n\n\fNote that the DFT (modulo M) is a linear operator; moreover, we recall the standard fact relating the\nnorms of a function and of its Fourier transform, that we will use extensively:\nTheorem 5 (Plancherel\u2019s Theorem). For M \u2265 n and F, G : [n] \u2192 C, we have (i)\n\n(cid:80)M\u22121\n\u03be=0 (cid:98)F (\u03be)(cid:98)G(\u03be); and (ii) (cid:107)F(cid:107)2 = 1\u221a\n\n(cid:107)(cid:98)F(cid:107)2, where (cid:98)F ,(cid:98)G are the DFT\n\nj=0 F (j)G(j) = 1\nM\n\n(cid:80)n\u22121\n\nM\n\nmodulo M of F, G, respectively.\n\n(The latter equality is sometimes referred to as Parseval\u2019s theorem.) We also note that, for our PMD\ntesting, we shall need the appropriate generalization of the Fourier transform to the multivariate\nsetting. We leave this generalization to the full version.\n\n3 Testing Effective Fourier Support\n\nhold simultaneously.\n\nIn this section, we prove the following theorem, which will be invoked as a crucial ingredient of\nour testing algorithms. Broadly speaking, the theorem ensures one can ef\ufb01ciently test whether an\nunknown distribution Q has its Fourier transform concentrated on some (small) effective support\n\ndistance).\nTheorem 6. Given parameters M \u2265 1, \u0001, b \u2208 (0, 1], as well as a subset S \u2286 [M ] and sample\naccess to a distribution Q over [M ], Algorithm 1 outputs either reject or a collection of Fourier\n\nS (and if this is the case, learn the vector (cid:98)Q1S, the restriction of this Fourier transform to S, in L2\ncoef\ufb01cients(cid:99)H(cid:48) = ((cid:99)H(cid:48)(\u03be))\u03be\u2208S such that with probability at least 7/10, all the following statements\n2 \u2264 2b and every function Q\u2217 : [M ] \u2192 R with (cid:99)Q\u2217 supported entirely on S is such\n2 \u2264 b and there exists a function Q\u2217 : [M ] \u2192 R with (cid:99)Q\u2217 supported entirely on S\n\nthat (cid:107)Q \u2212 Q\u2217(cid:107)2 > \u0001, then it outputs reject;\nsuch that (cid:107)Q \u2212 Q\u2217(cid:107)2 \u2264 \u0001\n4. if it does not output reject, then (cid:107)(cid:98)Q1S \u2212(cid:99)H(cid:48)(cid:107)2 \u2264 \u0001\n(modulo M) H(cid:48) of the Fourier coef\ufb01cients(cid:99)H(cid:48) it outputs satis\ufb01es (cid:107)Q \u2212 H(cid:48)(cid:107)2 \u2264 6\u0001\n\n\u221a\n10 and the inverse Fourier transform\n\n2 , then it does not output reject;\n\n2 > 2b, then it outputs reject;\n\n1. if (cid:107)Q(cid:107)2\n2. if (cid:107)Q(cid:107)2\n\n3. if (cid:107)Q(cid:107)2\n\nM\n\n5 .\n\n(cid:16)\u221a\n\nb\n\u00012 +\n\n|S|\nM \u00012 +\n\n(cid:17)\n\n\u221a\n\nM\n\nsamples from Q, and runs in time\n\nMoreover, the algorithm takes m = O\nO(m|S|).\n\nNote that the rejection condition in Item 2 is equivalent to (cid:107)(cid:98)Q1 \u00afS(cid:107)2 > \u0001\n2 \u2265 (cid:107)(cid:98)Q1 \u00afS \u2212 (cid:99)Q\u22171 \u00afS(cid:107)2\n2+(cid:107)(cid:98)Q1 \u00afS \u2212 (cid:99)Q\u22171 \u00afS(cid:107)2\nand the inequality is tight for Q\u2217 being the inverse Fourier transform (modulo M) of (cid:98)Q1S.\n\nmass more than \u00012 outside of S; this is because for any Q\u2217 supported on S,\nM(cid:107)Q \u2212 Q\u2217(cid:107)2\n\n2 = (cid:107)(cid:98)Q1S \u2212 (cid:99)Q\u22171S(cid:107)2\n\n2 = (cid:107)(cid:98)Q \u2212 (cid:99)Q\u2217(cid:107)2\n\n\u221a\n\nM, that is to having Fourier\n\n2 = (cid:107)(cid:98)Q1 \u00afS(cid:107)2\n\n2\n\n2 \u2264 b.\n\nHigh-level idea. Let Q be an unknown distribution supported on M consecutive integers (we will\nlater apply this to Q def= P mod M), and S \u2286 [M ] be a set of Fourier coef\ufb01cients (symmetric with\nregard to M: \u03be \u2208 S implies \u2212\u03be mod M \u2208 S) such that 0 \u2208 S. We can further assume that we know\nb \u2265 0 such that (cid:107)Q(cid:107)2\n\nGiven Q, we can consider its \u201ctruncated Fourier expansion\u201d (with respect to S) (cid:98)H = \u02c6Q1S de\ufb01ned as\nfor \u03be \u2208 [M ]; and let H be the inverse Fourier transform (modulo M) of (cid:98)H. Note that H is no longer\n\n(cid:98)H(\u03be) def=\n\nif \u03be \u2208 S\notherwise\n\n(cid:26) \u02c6Q(\u03be)\n\n0\n\nin general a probability distribution.\n\n6\n\n\fprove Theorem 6 with a reasonable bound on m.\n\nTo obtain the guarantees of Theorem 6, a natural idea is to take some number m of samples from\nQ, and consider the empirical distribution Q(cid:48) they induce over [M ]. By computing the Fourier\ncoef\ufb01cients (restricted to S) of this Q(cid:48), as well as the Fourier mass \u201cmissed\u201d when doing so (i.e.,\n2 that Q(cid:48) puts outside of S) to suf\ufb01cient accuracy, one may hope to\n\nthe Fourier mass (cid:107)(cid:99)Q(cid:48)1 \u00afS(cid:107)2\nThe issue is that analyzing separately the behavior of (cid:107)(cid:99)Q(cid:48)1 \u00afS(cid:107)2\n\n2 to show that\nthey are both estimated suf\ufb01ciently accurately, and both small enough, is not immediate. Instead, we\nwill get a bound on both at the same time, by arguing concentration in a different manner \u2013 namely,\nby analyzing a different tester for tolerant identity testing in L2 norm.\nIn more detail, letting H be as above, we have by Plancherel that\n\n2 and (cid:107)(cid:98)Q1S \u2212(cid:99)Q(cid:48)1S(cid:107)2\n\n(Q(cid:48)(i) \u2212 H(i))2 = (cid:107)Q(cid:48) \u2212 H(cid:107)2\n\n|(cid:99)Q(cid:48)(\u03be) \u2212 (cid:98)H(\u03be)|2\nand, expanding the de\ufb01nition of (cid:98)H and using Plancherel again, this can be rewritten as\n2 \u2212 (cid:107)(cid:99)Q(cid:48)1S(cid:107)2\n\n(cid:107)(cid:99)Q(cid:48) \u2212 (cid:98)H(cid:107)2\n(Q(cid:48)(i) \u2212 H(i))2 = (cid:107)(cid:98)Q1S \u2212(cid:99)Q(cid:48)1S(cid:107)2\n\nM (cid:80)\n\n2 + M(cid:107)Q(cid:48)(cid:107)2\n\nM\u22121(cid:80)\n\n(cid:80)\n\ni\u2208[M ]\n\n1\nM\n\n1\nM\n\n2 =\n\n2 =\n\n\u03be=0\n\n2.\n\nterms: the \ufb01rst, (cid:107)(cid:98)Q1S \u2212(cid:99)Q(cid:48)1S(cid:107)2\n\n2 \u2212 (cid:107)(cid:99)Q(cid:48)1S(cid:107)2\n\n(The full derivation will be given in the proof.) The right-hand side has two non-negative compound\n2, corresponds to the L2 error obtained when learning the Fourier\n2, is the Fourier mass that our\n\ncoef\ufb01cients of Q on S. The second, M(cid:107)Q(cid:48)(cid:107)2\nempirical Q(cid:48) puts \u201coutside of S.\u201d\nSo if the LHS is small (say, order \u00012), then in particular both terms of the RHS will be small as\nwell, effectively giving us bounds on our two quantities in one shot. But this very same LHS is very\nreminiscent of a known statistic [14] for testing identity of distributions in L2. So, one can analyze\nthe number of samples required by analyzing such an L2 tester instead. This is what we will do\nin Proposition 1.\n\n2 = (cid:107)(cid:99)Q(cid:48)1 \u00afS(cid:107)2\n\ni\u2208[M ]\n\n\u221a\nb\n\u00012 +\n\n|S|\nM \u00012 +\n\n1: Set m \u2190(cid:108)\n\nAlgorithm 1 Testing the Fourier Transform Effective Support\nRequire: parameters M \u2265 1, b, \u0001 \u2208 (0, 1]; set S \u2286 [M ]; sample access to distribution Q over [M ]\n(cid:46) C > 0 is an absolute constant\n\n(cid:109)\n\nM )\n\n\u221a\n\nC(\n\n2: Draw m(cid:48) \u2190 Poi(m); if m(cid:48) > 2m, return reject\n3: Draw m(cid:48) samples from Q, and let Q(cid:48) be the corresponding empirical distribution over [M ]\n4: Compute (cid:107)Q(cid:48)(cid:107)2\n5: if m(cid:48)2(cid:107)Q(cid:48)(cid:107)2\n6: else if (cid:107)Q(cid:48)(cid:107)2\n7: else\n8:\n9: end if\n\n2,(cid:99)Q(cid:48)(\u03be) for every \u03be \u2208 S, and (cid:107)(cid:99)Q(cid:48)1S(cid:107)2\nM (cid:107)(cid:99)Q(cid:48)1S(cid:107)2\n2 \u2212 m(cid:48) > 3\n2 \u2212 1\nreturn(cid:99)H(cid:48) = ((cid:99)Q(cid:48)(\u03be))\u03be\u2208S\n\n2 bm2 then return reject\n+ 1\nm(cid:48)\n\n2 \u2265 3\u00012(cid:16) m(cid:48)\n\nthen return reject\n\n(cid:17)2\n\nm\n\n2\n\n(cid:46) Takes time O(m|S|)\n\nThe detailed proof of Theorem 6 is given in the full version.\n\n3.1 A Tolerant L2 Tester for Identity to a Pseudodistribution\n\nAs previously mentioned, one building block in the proof of Theorem 6 (and a result that may be\nof independent interest) is an optimal L2 identity testing algorithm. Our tester and its analysis are\nvery similar to the tolerant L2 closeness testing algorithm of Chan et al. [14], with the obvious\nsimpli\ufb01cations pertaining to identity (instead of closeness). The main difference is that we emphasize\nhere the fact that P\u2217 need not be an actual distribution: any P\u2217 : [r] \u2192 R would do, even taking\nnegative values. This will turn out to be crucial for our applications.\nProposition 1. There exists an absolute constant c > 0 such that the above algorithm (Algorithm 2),\nwhen given Poi(m) samples drawn from a distribution P and an explicit function P\u2217 : [r] \u2192 R will,\n\n7\n\n\fAlgorithm 2 Tolerant L2 identity tester\nRequire: \u0001 \u2208 (0, 1), Poi(m) samples from distributions P over [r], with Xi denoting the number\nof occurrences of the i-th domain elements in the samples from P, and P\u2217 being a \ufb01xed, known\npseudo distribution over [r].\n\nEnsure: Returns accept if (cid:107)P \u2212 P\u2217(cid:107)2 \u2264 \u0001 and reject if (cid:107)P \u2212 P\u2217(cid:107)2 \u2265 2\u0001.\n\n(cid:46) Can actually be computed in O(m) time\n\nDe\ufb01ne Z =(cid:80)r\n\nReturn reject if\n\ni=1(Xi \u2212 mP\u2217(i))2 \u2212 Xi.\n\u221a\nZ\nm >\n\n3\u0001, accept otherwise.\n\n\u221a\n\nwith probability at least 3/4, distinguishes between (a) (cid:107)P \u2212 P\u2217(cid:107)2 \u2264 \u0001 and (b) (cid:107)P \u2212 P\u2217(cid:107)2 \u2265 2\u0001\nprovided that m \u2265 c\nMoreover, we have the following stronger statement: in case (a), the statistic Z computed in the\n\u221a\nalgorithm satis\ufb01es\n3.1\u0001\nwith probability at least 3/4.\n\n\u221a\n\u00012 , where b is an upper bound on (cid:107)P(cid:107)2\nm \u2264 \u221a\n\n2.9\u0001 with probability at least 3/4, while in case (b) we have\n\nm \u2265 \u221a\n\n2,(cid:107)P\u2217(cid:107)2\n2.\n\n\u221a\n\nZ\n\nZ\n\nb\n\n4 The General tester\n\nIn this section, we provide our general testing framework. In more detail, our theorem (Theorem 7)\nhas the following \ufb02avor: if P is a property of distributions such that every P \u2208 P has both (i) small\neffective support and (ii) sparse effective Fourier support, then one can test membership to P with\n\u221a\nsM /\u00012 + s/\u00012) samples (where M and s are the bounds on the effective support and effective\nO(\nFourier support, respectively). As a caveat, we do require that the sparse effective Fourier support S\nbe independent of P \u2208 P, i.e., is a characteristic of the class P itself.\nThe high-level idea is then quite simple: the algorithm proceeds in three stages, namely the effective\nsupport test, the Fourier effective support test, and the projection step. In the \ufb01rst, it takes some\nsamples from P to identify what should be the effective support I of P, if P did have the property:\nand then checks that indeed |I| \u2264 M (as it should) and that P puts probability mass 1 \u2212 O(\u0001) on\n\nI. In the second stage, it invokes the Fourier testing algorithm of Section 3 to verify that(cid:98)P indeed\n\nputs very little Fourier mass outside of S; and, having veri\ufb01ed this, learns very accurately the set of\nFourier coef\ufb01cients of P on this set S, in L2 distance. At this point, either the algorithm has detected\nthat P violates some required characteristic of the distributions in P, in which case it has rejected\nalready; or is guaranteed to have learned a good approximation H of P, by the Fourier learning\nperformed in the second stage. It only remains to perform the third stage, which \u201cprojects\u201d this good\napproximation H of P onto P to verify that H is close to some distribution P\u2217 \u2208 P (as it should if\nindeed P \u2208 P).\nTheorem 7 (General Testing Statement). Assume P \u2286 \u2206(N) is a property of distributions satisfying\nthe following. There exist S : (0, 1] \u2192 2\nN, M : (0, 1] \u2192 N, and qI : (0, 1] \u2192 N such that, for every\n\u0001 \u2208 (0, 1],\n\n1. Fourier sparsity: for all P \u2208 P, the Fourier transform (modulo M (\u0001)) of P is concentrated\n\non S(\u0001): namely, (cid:107)(cid:98)P1S(\u0001)(cid:107)2\n\n\u2264 \u00012\n100 .\n\n2\n\nsuch that (i) P is concentrated on I(P): namely, P(I(P)) \u2265 1 \u2212 \u0001\nidenti\ufb01ed with probability at least 19/20 from qI (\u0001) samples from P.\n\n2. Support sparsity: for all P \u2208 P, there exists an interval I(P) \u2286 N with |I(P)| \u2264 M (\u0001)\n5 and (ii) I(P) can be\n3. Projection: there exists a procedure PROJECTP which, on input \u0001 \u2208 (0, 1] and the ex-\nplicit description of a distribution H \u2208 \u2206(N), runs in time T (\u0001); and outputs accept if\ndTV (H,P) \u2264 2\u0001\n\n2 (and can answer either otherwise).\n4. (Optional) L2-norm bound: there exists b \u2208 (0, 1] such that, for all P \u2208 P, (cid:107)P(cid:107)2\n\n2 \u2264 b.\nThen, there exists a testing algorithm for P, in the usual standard sense: it outputs either accept or\nreject, and satis\ufb01es the following.\n\n5 , and reject if dTV (H,P) > \u0001\n\n1. if P \u2208 P, then it outputs accept with probability at least 3/5;\n\n8\n\n\fAlgorithm 3 Algorithm Test-Fourier-Sparse-Class\nRequire: sample access to a distribution P \u2208 \u2206(N), parameter \u0001 \u2208 (0, 1], b \u2208 (0, 1], functions\n\nN, M : (0, 1] \u2192 N, qI : (0, 1] \u2192 N, and procedure PROJECTP as in Theorem 7\n\nS : (0, 1] \u2192 2\n\n1: Effective Support\n2:\n\n3:\n\n19/20 if P \u2208 P.\n\nTake qI (\u0001) samples from P to identify a \u201ccandidate set\u201d I.\nDraw O(1/\u0001) samples from P, to distinguish between P(I) \u2265 1 \u2212 \u0001\nif |I| > M (\u0001) or we detected that P(I) > \u0001\n\nCorrect w.p. 19/20.\n\n4 then\n\n(cid:46) Guaranteed to work w.p.\n\n5 and P(I) < 1 \u2212 \u0001\n\n4. (cid:46)\n\nend if\n\nreturn reject\n\n4:\n5:\n6:\n7:\n8: Fourier Effective Support\n9:\n\nM (\u0001),\n\n\u221a\n\n5\n\n\u0001\nM (\u0001)\n\n, b, and S(\u0001).\n\nSimulating sample access to P(cid:48) def= P mod M (\u0001), call Algorithm 1 on P(cid:48) with parameters\n\n10:\n11:\n12:\n13:\n\nif Algorithm 1 returned reject then\n\nend if\n\nreturn reject\n\nLet (cid:98)H = ((cid:98)H(\u03be))\u03be\u2208S(\u0001) denote the collection of Fourier coef\ufb01cients it outputs, and H their\n\ninverse Fourier transform (modulo M (\u0001))\n\n(cid:46) Do not actually compute H here.\n\n14:\n15: Projection Step\n16:\n17:\n\nCall PROJECTP on parameters \u0001 and H, and return accept if it does, reject otherwise.\n\n(cid:17)\n\n2. if dTV (P,P) > \u0001, then it outputs reject with probability at least 3/5.\n\nThe algorithm takes\n\nO\n\n(cid:32)(cid:112)|S(\u0001)|M (\u0001)\n\n\u00012\n\n|S(\u0001)|\n\u00012 + qI (\u0001)\n\n+\n\n(cid:33)\n\n(cid:16)\u221a\n\nsamples from P (if Item 4 holds, one can replace the above bound by O\nand runs in time O(m|S| + T (\u0001)), where m is the sample complexity.\nMoreover, whenever the algorithm outputs accept, it also learns P; that is, it provides a hypothesis\nH such that dTV (P, H) \u2264 \u0001 with probability at least 3/5.\nWe remark that the statement of Theorem 7 can be made slightly more general; speci\ufb01cally, one\ncan allow the procedure PROJECTP to have sample access to P and err with small probability, and\n\nfurther provide it with the Fourier coef\ufb01cients (cid:98)H learnt in the previous step.\n\nbM (\u0001)\n\u00012 +\n\n);\n\n|S(\u0001)|\n\u00012 + qI (\u0001)\n\n5 Lower Bound for PMD Testing\n\nk\n\n(cid:17)\n\n(cid:16)(cid:0) 4\u03c0\n\n(cid:1)k/4 n(k\u22121)/4\n\nIn this section, we obtain a lower bound to complement our upper bound for testing Poisson Multino-\nmial Distributions. Namely, we prove the following:\nTheorem 8. There exists absolute constants c, c(cid:48) \u2208 (0, 1) such that the following holds. For any\nk \u2264 nc and \u0001 \u2265 1/2c(cid:48)n, any testing algorithm for the class of PMDn,k must have sample complexity\n\u2126\nThe proof will rely on the lower bound framework of [11], reducing testing PMDn,k to testing\nidentity to some suitable hard distribution P\u2217 \u2208 PMDn,k. To do so, we need to (a) choose a\nconvenient P\u2217 \u2208 PMDn,k; (b) prove that testing identity to P\u2217 requires that many samples (we\nshall do so by invoking the [54] instance-by-instance lower bound method); (c) provide an agnostic\nlearning algorithm for PMDn,k with small enough sample complexity, for the reduction to go\nthrough. Invoking [11, Theorem 18] with these ingredients will then conclude the argument.\n\n\u00012\n\n.\n\n9\n\n\fReferences\n[1] J. Acharya and C. Daskalakis. Testing Poisson Binomial Distributions. In Proceedings of the Twenty-Sixth\n\nAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1829\u20131840, 2015.\n\n[2] J. Acharya, C. Daskalakis, and G. Kamath. Optimal testing for properties of distributions. In Proceedings\n\nof NIPS\u201915, 2015.\n\n[3] M. Y. An. Log-concave probability distributions: Theory and statistical testing. Technical Report\n\nEconomics Working Paper Archive at WUSTL, Washington University at St. Louis, 1995.\n\n[4] A. D. Barbour. Stein\u2019s Method and Poisson Process Convergence. Journal of Applied Probability, 25:pp.\n\n175\u2013184, 1988.\n\n[5] A.D. Barbour, L. Holst, and S. Janson. Poisson Approximation. Oxford University Press, New York, NY,\n\n1992.\n\n[6] T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White. Testing that distributions are close. In IEEE\n\nSymposium on Foundations of Computer Science, pages 259\u2013269, 2000.\n\n[7] V. Bentkus. On the dependence of the Berry-Esseen bound on dimension. Journal of Statistical Planning\n\nand Inference, 113:385\u2013402, 2003.\n\n[8] A. Bhaskara, D. Desai, and S. Srinivasan. Optimal hitting sets for combinatorial shapes.\n\nIn 15th\nInternational Workshop, APPROX 2012, and 16th International Workshop, RANDOM 2012, pages 423\u2013\n434, 2012.\n\n[9] C. Borgs, J. T. Chayes, N. Immorlica, A. T. Kalai, V. S. Mirrokni, and C. H. Papadimitriou. The myth of\n\nthe folk theorem. In STOC, pages 365\u2013372, 2008.\n\n[10] C. L. Canonne. A survey on distribution testing: Your data is big. but is it blue? Electronic Colloquium on\n\nComputational Complexity (ECCC), 22:63, 2015.\n\n[11] C. L. Canonne, I. Diakonikolas, T. Gouleakis, and R. Rubinfeld. Testing shape restrictions of discrete\n\ndistributions. Theory of Computing Systems, 2017.\n\n[12] C. L. Canonne, I. Diakonikolas, D. M. Kane, and A. Stewart. Testing conditional independence of discrete\n\ndistributions. In STOC, pages 735\u2013748. ACM, 2018.\n\n[13] C. L. Canonne, I. Diakonikolas, and A. Stewart. Fourier-based testing for families of distributions. CoRR,\n\nabs/1706.05738, 2017. This is the full version of the current paper.\n\n[14] S. Chan, I. Diakonikolas, P. Valiant, and G. Valiant. Optimal algorithms for testing closeness of discrete\n\ndistributions. In SODA, pages 1193\u20131203, 2014.\n\n[15] L. Chen, L. Goldstein, and Q.-M. Shao. Normal Approximation by Stein\u2019s Method. Springer, 2011.\n\n[16] S.X. Chen and J.S. Liu. Statistical applications of the Poisson-Binomial and Conditional Bernoulli\n\nDistributions. Statistica Sinica, 7:875\u2013892, 1997.\n\n[17] Y. Cheng, I. Diakonikolas, and A. Stewart. Playing anonymous games using simple strategies.\n\nIn\nProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, Proceedings of\nSODA \u201917, pages 616\u2013631, Philadelphia, PA, USA, 2017. Society for Industrial and Applied Mathematics.\n\n[18] L. H. Y. Chenz and Y. K. Leong. From zero-bias to discretized normal approximation. 2010.\n\n[19] H. Chernoff. A measure of asymptotic ef\ufb01ciency for tests of a hypothesis based on the sum of observations.\n\nAnn. Math. Statist., 23:493\u2013507, 1952.\n\n[20] C. Daskalakis, A. De, G. Kamath, and C. Tzamos. A size-free CLT for poisson multinomials and its\n\napplications. In Proceedings of STOC\u201916, 2016.\n\n[21] C. Daskalakis, I. Diakonikolas, R. O\u2019Donnell, R.A. Servedio, and L. Tan. Learning Sums of Independent\n\nInteger Random Variables. In FOCS, pages 217\u2013226, 2013.\n\n[22] C. Daskalakis, I. Diakonikolas, and R.A. Servedio. Learning Poisson Binomial Distributions. In STOC,\n\npages 709\u2013728, 2012.\n\n[23] C. Daskalakis, G. Kamath, and C. Tzamos. On the structure, covering, and learning of poisson multinomial\n\ndistributions. In FOCS, 2015.\n\n10\n\n\f[24] C. Daskalakis and C. Papadimitriou. On Oblivious PTAS\u2019s for Nash Equilibrium. In STOC, pages 75\u201384,\n\n2009.\n\n[25] C. Daskalakis and C. H. Papadimitriou. Computing equilibria in anonymous games. In FOCS, pages\n\n83\u201393, 2007.\n\n[26] C. Daskalakis and C. H. Papadimitriou. Discretized multinomial distributions and nash equilibria in\n\nanonymous games. In FOCS, pages 25\u201334, 2008.\n\n[27] C. Daskalakis and C. H. Papadimitriou. Approximate Nash equilibria in anonymous games. Journal of\n\nEconomic Theory, 2014.\n\n[28] A. De. Beyond the central limit theorem: asymptotic expansions and pseudorandomness for combinatorial\n\nsums. In FOCS, 2015.\n\n[29] I. Diakonikolas and D. M. Kane. A new approach for testing properties of discrete distributions. In FOCS,\n\npages 685\u2013694, 2016. Full version available at abs/1601.05557.\n\n[30] I. Diakonikolas, D. M. Kane, and V. Nikishkin. Testing Identity of Structured Distributions. In Proceedings\n\nof SODA\u201915, 2015.\n\n[31] I. Diakonikolas, D. M. Kane, and A. Stewart. Optimal Learning via the Fourier Transform for Sums\nof Independent Integer Random Variables. In COLT, volume 49 of JMLR Workshop and Conference\nProceedings, pages 831\u2013849. JMLR.org, 2016. Full version available at arXiv:1505.00662.\n\n[32] I. Diakonikolas, D. M. Kane, and A. Stewart. Properly learning Poisson binomial distributions in almost\npolynomial time. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, pages 850\u2013878,\n2016. Full version available at arXiv:1511.04066.\n\n[33] I. Diakonikolas, D. M. Kane, and A. Stewart. The Fourier Transform of Poisson Multinomial Distri-\nbutions and its Algorithmic Applications. In Proceedings of STOC\u201916, 2016. Full version available at\narXiv:1511.03592.\n\n[34] D. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized algorithms.\n\nCambridge University Press, Cambridge, 2009.\n\n[35] P. W. Goldberg and S. Turchetta. Query complexity of approximate equilibria in anonymous games. J.\n\nComput. Syst. Sci., 90:80\u201398, 2017.\n\n[36] P. Gopalan, D. M. Kane, and R. Meka. Pseudorandomness via the discrete Fourier transform. In FOCS,\n\n2015.\n\n[37] P. Gopalan, R. Meka, O. Reingold, and D. Zuckerman. Pseudorandom generators for combinatorial shapes.\n\nIn STOC, pages 253\u2013262, 2011.\n\n[38] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American\n\nStatistical Association, 58:13\u201330, 1963.\n\n[39] J. Kruopis. Precision of approximation of the generalized binomial distribution by convolutions of Poisson\n\nmeasures. Lithuanian Mathematical Journal, 26(1):37\u201349, 1986.\n\n[40] E. L. Lehmann and J. P. Romano. Testing statistical hypotheses. Springer Texts in Statistics. Springer,\n\n2005.\n\n[41] W. Loh. Stein\u2019s method and multinomial approximation. Ann. Appl. Probab., 2(3):536\u2013554, 08 1992.\n\n[42] L. Lov\u00e1sz and S. Vempala. The geometry of logconcave functions and sampling algorithms. Random\n\nStructures and Algorithms, 30(3):307\u2013358, 2007.\n\n[43] J. Neyman and E. S. Pearson. On the problem of the most ef\ufb01cient tests of statistical hypotheses.\nPhilosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical\nor Physical Character, 231(694-706):289\u2013337, 1933.\n\n[44] L. Paninski. A coincidence-based test for uniformity given very sparsely-sampled discrete data. IEEE\n\nTransactions on Information Theory, 54:4750\u20134755, 2008.\n\n[45] S.D. Poisson. Recherches sur la Probabilit\u00e8 des jugements en mati\u00e9 criminelle et en mati\u00e9re civile.\n\nBachelier, Paris, 1837.\n\n11\n\n\f[46] E. L. Presman. Approximation of binomial distributions by in\ufb01nitely divisible ones. Theory Probab. Appl.,\n\n28:393\u2013403, 1983.\n\n[47] B. Roos. On the Rate of Multivariate Poisson Convergence. Journal of Multivariate Analysis, 69(1):120 \u2013\n\n134, 1999.\n\n[48] B. Roos. Closeness of convolutions of probability measures. Bernoulli, 16(1):23\u201350, 2010.\n\n[49] R. Rubinfeld. Taming big probability distributions. XRDS, 19(1):24\u201328, 2012.\n\n[50] A. Saumard and J. A. Wellner. Log-concavity and strong log-concavity: A review. Statist. Surv., 8:45\u2013114,\n\n2014.\n\n[51] Richard P. Stanley. Log-concave and unimodal sequences in algebra, combinatorics, and geometry. Annals\n\nof the New York Academy of Sciences, 576(1):500\u2013535, 1989.\n\n[52] G. Valiant and P. Valiant. A CLT and tight lower bounds for estimating entropy. Electronic Colloquium on\n\nComputational Complexity (ECCC), 17(179), 2010.\n\n[53] G. Valiant and P. Valiant. Estimating the unseen: an n/ log(n)-sample estimator for entropy and support\n\nsize, shown optimal via new CLTs. In STOC, pages 685\u2013694, 2011.\n\n[54] G. Valiant and P. Valiant. An automatic inequality prover and instance optimal identity testing. In FOCS,\n\n2014. Conference version of [55].\n\n[55] G. Valiant and P. Valiant. An automatic inequality prover and instance optimal identity testing. SICOMP,\n\n46(1):429\u2013455, 2017.\n\n[56] P. Valiant. Testing symmetric properties of distributions. In STOC, pages 383\u2013392, 2008.\n\n[57] G. Walther. Inference and modeling with log-concave distributions. Statistical Science, 24(3):319\u2013327,\n\n2009.\n\n12\n\n\f", "award": [], "sourceid": 6492, "authors": [{"given_name": "Cl\u00e9ment", "family_name": "Canonne", "institution": "Stanford University"}, {"given_name": "Ilias", "family_name": "Diakonikolas", "institution": "University of Southern California"}, {"given_name": "Alistair", "family_name": "Stewart", "institution": "University of Southern California"}]}