{"title": "Multivariate tests of association based on univariate tests", "book": "Advances in Neural Information Processing Systems", "page_first": 208, "page_last": 216, "abstract": "For testing two vector random variables for independence, we propose testing whether the distance of one vector from an arbitrary center point is independent from the distance of the other vector from another arbitrary center point by a univariate test. We prove that under minimal assumptions, it is enough to have a consistent univariate independence test on the distances, to guarantee that the power to detect dependence between the random vectors increases to one with sample size. If the univariate test is distribution-free, the multivariate test will also be distribution-free. If we consider multiple center points and aggregate the center-specific univariate tests, the power may be further improved, and the resulting multivariate test may be distribution-free for specific aggregation methods (if the univariate test is distribution-free). We show that certain multivariate tests recently proposed in the literature can be viewed as instances of this general approach. Moreover, we show in experiments that novel tests constructed using our approach can have better power and computational time than competing approaches.", "full_text": "Multivariate tests of association based on univariate\n\ntests\n\nDepartment of Statistics and Operations Research\n\nRuth Heller\n\nTel-Aviv University\n\nTel-Aviv, Israel 6997801\nruheller@gmail.com\n\nYair Heller\n\nheller.yair@gmail.com\n\nAbstract\n\nFor testing two vector random variables for independence, we propose testing\nwhether the distance of one vector from an arbitrary center point is independent\nfrom the distance of the other vector from another arbitrary center point by a\nunivariate test. We prove that under minimal assumptions, it is enough to have\na consistent univariate independence test on the distances, to guarantee that the\npower to detect dependence between the random vectors increases to one with\nsample size. If the univariate test is distribution-free, the multivariate test will\nalso be distribution-free. If we consider multiple center points and aggregate\nthe center-speci\ufb01c univariate tests, the power may be further improved, and the\nresulting multivariate test may have a distribution-free critical value for speci\ufb01c\naggregation methods (if the univariate test is distribution free). We show that certain\nmultivariate tests recently proposed in the literature can be viewed as instances\nof this general approach. Moreover, we show in experiments that novel tests\nconstructed using our approach can have better power and computational time than\ncompeting approaches.\n\nIntroduction\n\n1\nLet X \u2208 (cid:60)p and Y \u2208 (cid:60)q be random vectors, where p and q are positive integers. The null hypothesis\nof independence is H0 : FXY = FX FY , where the joint distribution of (X, Y ) is denoted by FXY ,\nand the distributions of X and Y , respectively, by FX and FY . If X is a categorical variable with K\ncategories, then the null hypothesis of independence is the null hypothesis in the K-sample problem,\nH0 : F1 = . . . = FK, where Fk, k \u2208 {1, . . . , K} is the distribution of Y in category k.\nThe problem of testing for independence of random vectors, as well as the K-sample problem on a\nmultivariate Y , against the general alternative H1 : FXY (cid:54)= FX FY , has received increased attention\nin recent years. The most common approach is based on pairwise distances or similarity measures.\nSee (26), (6), (24), and (12) for consistent tests of independence, and (10), (25), (1), (22), (5), and (8)\nfor recent K-sample tests. Earlier tests based on nearest neighbours include (23) and (13). For the\nK-sample problem, the practice of comparing multivariate distributions based on pairwise distances is\njusti\ufb01ed by the fact that, under mild conditions, the distributions differ if and only if the distributions\nof within and between pairwise distances differ (19). Other innovative approaches have also been\nconsidered in recent years. In (4) and (28), the authors suggest to reduce the multivariate data to\na lower dimensional sub-space by (random) projections. Recently, in (3) another approach was\nintroduced for the two sample problem, which is based on distances between analytic functions\nrepresenting each of the distributions. Their novel tests are almost surely consistent when randomly\nselecting locations or frequencies and are fast to compute.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fWe suggest the following approach for testing for independence: \ufb01rst compute the distances from a\n\ufb01xed center point, then apply any univariate independence test on the distances. We show that this\napproach can result in novel powerful multivariate tests, that are attractive due to their theoretical\nguarantees and computational complexity. Speci\ufb01cally, in Section 2 we show that if H0 is false,\nthen applying a univariate consistent test on distances from a single center point will result in a\nmultivariate consistent test (except for a measure zero set of center points), where a consistent test is\na test with power (i.e., probability of rejecting H0 when H0 is false) increasing to one as the sample\nsize increases when H0 is false. Moreover, the computational time is that of the univariate test, which\nmeans that it can be very fast. In particular, a desirable requirement is that the null distribution\nof the test statistic does not depend on the marginal distributions of X and Y , i.e., that the test is\ndistribution-free. Powerful univariate consistent distribution-free tests exist (see (11) for novel tests\nand a review), so if one of these distribution-free univariate test is applied on the distances, the\nresulting multivariate test is distribution-free.\nIn Section 3 we show that considering the distances from M > 1 points and aggregating the resulting\nstatistics can also result in consistent tests, which may be more powerful than tests that consider a\nsingle center point. Both distribution-free and permutation-based tests can be generated, depending\non the choice of aggregation method and univariate test.\nIn Section 4 we draw the connection between these results and some known tests mentioned above.\nThe tests of (10) and of (12) can be viewed as instances of this approach, where the \ufb01xed center\npoint is a sample point, and all sample points are considered each in turn as a \ufb01xed center point,\nfor a particular univariate test. In Section 5 we demonstrate in simulations that novel tests based on\nour approach can have both a power advantage and a great computational advantage over existing\nmultivariate tests. In Section 6 we discuss further extensions.\n\n2 From multivariate to univariate\nWe use the following result by (21). Let Bd(x, r) = {y \u2208 (cid:60)d : (cid:107)x \u2212 y(cid:107) \u2264 r} be a ball centered at x\nwith radius r. A complex Radon measure \u00b5, de\ufb01ned formally in Supplementary Material (SM) \u00a7 D,\non (cid:60)d is said to be of at most exponential-quadratic growth if there exist positive constants A and \u03b1\nsuch that |\u00b5|(Bd(0, r)) \u2264 Ae\u03b1r2.\nProposition 2.1 (Rawat and Sitaram (21)). Let \u0393 \u2282 (cid:60)d be such that the only real analytic function\n(de\ufb01ned on an open set containing \u0393) that vanishes on \u0393, is the zero function. Let C = {Bd(x, r) :\nx \u2208 \u0393, r > 0}. Then for any complex Radon measure \u00b5 on (cid:60)d of at most exponential-quadratic\ngrowth, if \u00b5(C) = 0 for all C \u2208 C, then it necessarily follows that \u00b5 = 0.\nFor the two-sample problem, let Y \u2208 (cid:60)q be a random variable with cumulative distribution F1 in\ncategory X = 1, and F2 in category X = 2. For z \u2208 (cid:60)q, let F (cid:48)\niz be the cumulative distribution\nfunction of (cid:107)Y \u2212 z(cid:107) when Y has cumulative distribution Fi, i \u2208 {1, 2}. We show that if the\ndistribution of Y differs across categories, then so does the distribution of the distance of Y from\nalmost every point z. Therefore, any univariate consistent two-sample test on the distances from z\nresults in a consistent test of the equality of the multivariate distributions F1 and F2, for almost every\nz. It is straightforward to generalize these results to K > 2 categories.\nProofs of all Theorems are in SM \u00a7 A.\nTheorem 2.1. If H0 : F1 = F2 is false, then for every z \u2208 (cid:60)q, apart from at most a set of Lebesgue\n1z(r) (cid:54)= F (cid:48)\nmeasure 0, there exists an r > 0 such that F (cid:48)\nCorollary 2.1. For every z \u2208 (cid:60)q, apart from at most a set of Lebesgue measure 0, a consistent\ntwo-sample univariate test of the null hypothesis H(cid:48)\n2z will result in a multivariate\nconsistent test of the null hypothesis H0 : F1 = F2.\nFor the multivariate independence test, let X \u2208 Rp and Y \u2208 (cid:60)q be two random vectors with marginal\ndistributions FX and FY , respectively, and with joint distribution FXY . For z = (zx, zy), zx \u2208\n(cid:60)p, zy \u2208 (cid:60)q, let F (cid:48)\nY z be the\nmarginal distribution of (cid:107)X \u2212 zx(cid:107) and (cid:107)Y \u2212 zy(cid:107), respectively.\nTheorem 2.2. If H0 : FXY = FX FY is false, then for every zx \u2208 (cid:60)p, zy \u2208 (cid:60)q, apart from at most a\nset of Lebesgue measure 0, there exists rx > 0, ry > 0, such that F (cid:48)\nY z(ry).\n\nXY z be the joint distribution of ((cid:107)X \u2212 zx(cid:107),(cid:107)Y \u2212 zy(cid:107)). Let F (cid:48)\n\n2z(r).\n0 : F (cid:48)\n\n1z = F (cid:48)\n\nXz and F (cid:48)\n\nXY z(rx, ry) (cid:54)= F (cid:48)\n\nXz(rx)F (cid:48)\n\n2\n\n\fCorollary 2.2. For every z \u2208 (cid:60)p+q, apart from at most a set of Lebesgue measure 0, a consistent\nunivariate test of independence of the null hypothesis H(cid:48)\nY z will result in a\nmultivariate consistent test of the null hypothesis H0 : FXY = FX FY .\n\nXY z = F (cid:48)\n\n0 : F (cid:48)\n\nXzF (cid:48)\n\nWe have N independent copies (xi, yi) (i = 1, . . . , N) from the joint distribution FXY . The above\nresults motivate the following two-step procedure for the multivariate tests. For the K-sample test,\nxi \u2208 {1, . . . , K} determines the category and yi \u2208 (cid:60)q is the observation in category xi, so the\ntwo-step procedure is to \ufb01rst choose z \u2208 (cid:60)q and then to apply a univariate K-sample consistent\ntest on (x1,(cid:107)y1 \u2212 z(cid:107)), . . . , (xN ,(cid:107)yN \u2212 z(cid:107)). Examples of such univariate tests include the classic\nKolmogorov-Smirnov and Cramer-von Mises tests. For the independence test, the two-step procedure\nis to \ufb01rst choose zx \u2208 (cid:60)p and zy \u2208 (cid:60)q, and then to apply a univariate consistent independence test\non ((cid:107)x1 \u2212 zx(cid:107),(cid:107)y1 \u2212 zy(cid:107)), . . . , ((cid:107)xN \u2212 zx(cid:107),(cid:107)yN \u2212 zy(cid:107)). An example of such a univariate test is\nthe classic test of Hoeffding (14). Note that the consistency of a univariate test may be satis\ufb01ed only\nunder some assumptions on the distribution of the distances of the multivariate vectors. For example,\nthe consistency of (14) follows if the densities of (cid:107)X \u2212 zx(cid:107) and (cid:107)Y \u2212 zy(cid:107) are continuous. See (11)\nfor additional distribution-free univariate K-sample and independence tests.\nA great advantage of this two-step procedure is the fact that it has the same computational complexity\nas the univariate test. For example, if one chooses to use Hoeffding\u2019s univariate independence\ntest (14) , then the total complexity is only O(N log N ), which is the cost of computing the test\nstatistic. The p-value can be extracted from a look-up table since Hoeffding\u2019s test is distribution-\nfree. In comparison, the computational complexity of the multivariate permutation tests of (26) and\n(12) is O(BN 2), and O(BN 2 log N ), respectively, where B is the number of permutations. For\nmany univariate tests the asymptotic null distribution is known, thus it can be used to compute the\nsigni\ufb01cance ef\ufb01ciently without resorting to permutations, which are typically required for assessing\nthe multivariate signi\ufb01cance.\nAnother advantage of the two-step procedure is the fact that the test statistic may be estimating\nan easily interpretable population value. The univariate test statistics often converge to easily\ninterpretable population values, which are often between 0 and 1. These values carry over to provide\nmeaning to the new multivariate statistics, see examples in equations (1) and (2).\nIn practice, the choice of the center value from which the distances are measured can have a signi\ufb01cant\ni2))\ndenote the mixture distribution of k bivariate normals, with mean \u00b5i and a diagonal covariance matrix\ni2), i = 1, . . . , k. Consider the following\nwith diagonal entries \u03c32\nbivariate two sample problem which is depicted in Figure 1, where F1 = 1\n2 N2(0, diag(1, 9)) +\n2 N2(0, diag(100, 100)). Clearly F (cid:48)\n2 N2(0, diag(100, 100)) and F2 = 1\n1z has\n1\n2z if z \u2208 {(y1, y2) : y1 = y2 or y1 = \u2212y2}, see Figure 1 (c). In agreement\nthe same distribution as F (cid:48)\nwith theorem 2.1 the measure of these non-informative center points is zero. On the other hand, if we\nuse as a center point a point on one of the axes, the distribution of the distances will be very different.\nSee in particular the distribution of distances from the point (0,100) in Figure 1 (b) and the power\nanalysis in Table 2.\n\nimpact on power, as demonstrated in the following example. Let(cid:80)k\n\n2 N2(0, diag(9, 1)) + 1\n\ni2, denoted by diag(\u03c32\n\ni=1 piN2(\u00b5i, diag(\u03c32\n\ni1, \u03c32\n\ni1 and \u03c32\n\ni1, \u03c32\n\n3 Pooling univariate tests together\nWe need not rely on a single z \u2208 (cid:60)p+q (or a single z \u2208 (cid:60)q for the K-sample problem). If we apply a\nconsistent univariate test using many points zi for i = 1, . . . , M as our center points, where the test\nis applied on the distances of the N sample points from the center point, we obtain M test-statistics\nand corresponding p-values, p1, . . . , pM .\nWe can use the p-values or the test statistics of the univariate tests to design consistent multivariate\ntests. We suggest three useful approaches. The \ufb01rst approach is to combine the p-values, using a\ncombining function f : [0, 1]M \u2192 [0, 1]. Common combining functions include f (p1, . . . , pM ) =\n\nmini=1,...,M pi, and f (p1, . . . , pM ) = \u22122(cid:80)M\n\ni=1 log pi.\n\nThe second approach is to combine the univariate test statistics, by a combining function such as\nthe average, maximum, or minimum statistic. These aggregation methods can result in test statistics\nwhich converge to meaningful population values, see equations (1) and (2) below for multivariate\ntests based on the univariate Kolmogorov-Smirnov two sample test (18). We note that if the univariate\n\n3\n\n\f(a)\n\n(b)\n\n(c)\n\n(a) Realizations from two bivariate normal distributions, with a sample size of\nFigure 1:\n2 N2(0, diag(100, 100)) (black points), and F2 =\n1000 from each group: 1\n2 N2(0, diag(100, 100)) (red points); (b) the empirical density of the distance\n2 N2(0, diag(9, 1)) + 1\n1\nfrom the point (0,100) in each group; (c) the empirical density of the distance from the point (100,100)\nin each group.\n\n2 N2(0, diag(1, 9)) + 1\n\nAnother valid test is the test of Hommel (16), which rejects if minj\u22651{M ((cid:80)M\n\ntests are distribution-free then taking the maximum (minimum) p-value is equivalent to taking the\nminimum (maximum) test statistic (when the test rejects for large values of the test statistic). The\nsigni\ufb01cance of the combined p-value or the combined test statistic can be computed by a permutation\ntest.\nA drawback of the two approaches above is that the distribution-free property of the univariate test\ndoes not carry over to the multivariate test. In our third approach, we consider the set of M p-values as\ncoming from the family of M null hypotheses, and then apply a valid test of the global null hypothesis\nthat all M null hypotheses are true. Let p(1) \u2264 . . . \u2264 p(M ) be the sorted p-values. The simplest valid\ntest for any type of dependence is the Bonferroni test, which will reject the global null if M p(1) \u2264 \u03b1.\nl=1 1/l)p(j)/j} \u2264 \u03b1.\n(This test statistic was suggested independently in a multiple testing procedure for false discovery\nrate control under general dependence in (2).) The third approach is computationally much more\nef\ufb01cient than the \ufb01rst two approaches, since no permutation test is required after the computation of\nthe univariate p-values, but it may be less powerful. Clearly, if the univariate test is distribution free,\nthe resulting multivariate test has a distribution-free critical value.\nAs an example we prove that when using the Kolmogorov-Smirnov two sample test as the univariate\ntest, all the pooling methods above result in consistent multivariate two-sample tests. Let KS(z) =\nsupd\u2208(cid:60) |F (cid:48)\n2z(d)| be the population value of the univariate Kolmogorov-Smirnov two sample\ntest statistic comparing the distribution of the distances. Let N be the total number of independent\nobservations. We assume for simplicity an equal number of observations from F1 and F2.\nTheorem 3.1. Let z1, . . . , zM be a sample of center points from an absolutely continuous distribution\nwith probability measure \u03bd, whose support S has a positive Lebesgue measure in (cid:60)q. Let KSN (zi) be\nthe empirical value of KS(zi) with corresponding p-value pi, i = 1, . . . , M. Let p(1) \u2264 . . . \u2264 p(M )\nbe the sorted p-values. Assume that the distribution functions F1 and F2 are continuous. For\nM = o(eN ), if H0 : F1 = F2 is false, then \u03bd-almost surely, the multivariate test will be consistent\nfor the following level \u03b1 tests:\n\n1z(d)\u2212 F (cid:48)\n\n1. the permutation test using the test statistics S1 = maxi=1,...,M{KSN (zi)} or S2 = p(1).\n2. the test based on Bonferroni, which rejects H0 if M p(1) \u2264 \u03b1.\n3. for M log M = o(eN ), the test based on Hommel\u2019s global null p-value, which rejects H0 if\n\n(cid:110)\nM ((cid:80)M\n\nminj=1,...,M\n\n4. the permutation tests using the statistics T 1 =(cid:80)M\n\nl=1 1/l)p(j)/j\n\ni=1 KSN (zi) or T 2 = \u22122(cid:80)M\n\ni=1 log pi.\n\n(cid:111) \u2264 \u03b1.\n\n4\n\n\u221220\u22121001020\u221220\u22121001020Y1Y27080901001101201300.000.050.100.150.20||Y\u2212(0,100)||Density1101201301401501601700.000.020.040.060.080.10||Y\u2212(100,100)||Density\fArguably, the most natural choice of center points is the sample points themselves. Interestingly, if\nthe univariate test statistic is a U-statistic (15) of order m (de\ufb01ned formally in SM \u00a7sup-sec-technical),\nthen the resulting multivariate test statistic is a U-statistic of order m + 1, if each sample point acts as\na center point, and the univariate test statistics are averaged, as stated in the following Lemma (see\nSM \u00a7 A for the proof).\nLemma 3.1. For univariate random variables (U, V ), let TN\u22121((uk, vk), k = 1, . . . , N \u2212 1) be\na univariate test statistic based on a random sample of size N \u2212 1 from the joint distribution\nN [T{((cid:107)xk \u2212 x1(cid:107),(cid:107)yk \u2212 y1(cid:107)), k =\nof (U, V ). If TN\u22121 is a U-statistic of order m, then SN = 1\n2, . . . , N} + . . . + T{((cid:107)xk \u2212 xN(cid:107),(cid:107)yk \u2212 yN(cid:107)), k = 1, . . . , N \u2212 1}] is a U-statistic of order m + 1.\nThe test statistics S1 and T 1/M converge to meaningful population quantities,\n\nlim\n\nN,M\u2192\u221e S1 = lim\n\nM\u2192\u221e max\n\nz1,...,zM\n\nKS(z) = sup\nz\u2208S\n\nKS(z),\n\nlim\n\nN,M\u2192\u221e T1/M = lim\nM\u2192\u221e\n\nKS(zi)/M = E{KS(Z)},\n\nM(cid:88)\n\ni=1\n\nwhere the expectation is over the distribution of the center point Z.\n\n4 Connection to existing methods\n\n(1)\n\n(2)\n\ni=1\n\n(cid:80)N\n\nIn (12) a permutation test was introduced, using the test statistic(cid:80)N\n\nWe are aware of two multivariate test statistics of the above-mentioned form: aggregation of the\nunivariate test statistics on the distances from center points. The tests are the two sample test of (10)\nand the independence test of (12). Both these tests use the second pooling method mentioned above\nby summing up the univariate test statistics. Furthermore, both these tests use the N sample points\nas the center points (or z\u2019s) and perform a univariate test on the remaining N \u2212 1 points. Indeed,\n(10) recognized that their test can be viewed as summing up univariate Cramer von-Mises tests on\nthe distances from each sample point. We shall show that the test statistic of (12) can be viewed as\naggregation by summation of the univariate weighted Hoeffding independence test suggested in (27).\nj=1,j(cid:54)=i S(i, j), where\nS(i, j) is the Pearson test score for the 2\u00d72 contingency table for the random variables I((cid:107)X\u2212xi(cid:107) \u2264\n(cid:107)xj \u2212 xi(cid:107)) and I((cid:107)Y \u2212 yi(cid:107) \u2264 (cid:107)yj \u2212 yi(cid:107)), where I(\u00b7) is the indicator function. Since (cid:107)X \u2212 xi(cid:107)\nand (cid:107)Y \u2212 yi(cid:107) are univariate random variables, S(i, j) can also be viewed as the test statistic for\nthe independence test between (cid:107)X \u2212 xi(cid:107) and (cid:107)Y \u2212 yi(cid:107), based on the 2 \u00d7 2 contingency table\ninduced by the 2 \u00d7 2 partition of (cid:60)2 about the point ((cid:107)xj \u2212 xi(cid:107),(cid:107)yj \u2212 yi(cid:107)) using the N \u2212 2\nsample points ((cid:107)xk \u2212 xi(cid:107),(cid:107)yk \u2212 yi(cid:107)), k = 1, . . . , N, k (cid:54)= i, k (cid:54)= j. The statistic that sums the\nPearson test statistics over all 2 \u00d7 2 partitions of (cid:60)2 based on the observations, results in a consistent\nindependence test for univariate random variables (27). The test statistic of (27) on the sample points\nj=1,j(cid:54)=i S(i, j). The multivariate test\nstatistic of (12) aggregates by summation the univariate test statistics of (27), where the ith univariate\ntest statistic is based on the N \u2212 1 distances of xk from xi, and the N \u2212 1 distances of yk from yi,\nfor k = 1, . . . , N, k (cid:54)= i.\nOf course, not all known consistent multivariate tests belong to the framework de\ufb01ned above. As\nan interesting example we discuss the energy test of (25) and (1) for the two-sample problem.\nWithout loss of generality, let y1, . . . , yN1 be the observations from F1, and yN 1+1, . . . , yN be the\nobservations from F2, N2 = N \u2212 N1. The test statistic E is equal to\n(cid:107)yl \u2212 ym(cid:107) \u2212 1\nN1N2\nN 2\n2\n\n((cid:107)xk \u2212 xi(cid:107),(cid:107)yk \u2212 yi(cid:107)), k = 1, . . . , N, k (cid:54)= i, is therefore(cid:80)N\n\n(cid:107)yl \u2212 ym(cid:107)\n\n(cid:107)yl \u2212 ym(cid:107) \u2212 1\nN 2\n1\n\nwhere (cid:107) \u00b7 (cid:107) is the Euclidean norm. It is easy to see that E =(cid:80)N\nN(cid:88)\n\nN1(cid:88)\n\ni=1 Si, where the univariate score is\n(cid:107)yi \u2212 ym(cid:107)\n\nN(cid:88)\n(cid:41)\n\nN1(cid:88)\n\nSi =\n\n(3)\nN if i > N1, for i \u2208 {1, . . . , N}. The statistic Si is not\nand w(i) = \u2212 N2\nan omnibus consistent test statistic, since a test based on Si will have no power to detect difference\nin distributions with the same expected distance from yi across groups. However, the energy test is\nomnibus consistent.\n\nN if i \u2264 N1 and w(i) = N1\n\nm=N1+1\n\nw(i),\n\n(cid:107)yi \u2212 ym(cid:107) \u2212 1\nN2\n\nN(cid:88)\n\nN1(cid:88)\n\nN1(cid:88)\n\nN(cid:88)\n\nl=N1+1\n\nm=N1+1\n\nN\n\nN1N2\n\nl=1\n\nm=N1+1\n\n(cid:40)\n\n1\nN1\n\nm=1\n\n(cid:32)\n\n2\n\nl=1\n\nm=1\n\n(cid:33)\n\n,\n\n5\n\n\f5 Experiments\n\nIn order to assess the effect of using our novel approach, we carry out experiments. We have three\nspeci\ufb01c aims: (1) to compare the power of using a single center point versus multiple center points;\n(2) to assess the effect of different univariate tests on the power; and (3) to see how the resulting tests\nfare against other multivariate tests. For simplicity, we address the two-sample problem, and we do\nnot consider the more computationally intensive pooling approaches one and two, but rather consider\nonly the third approach that results in a distribution-free critical value for the multivariate test.\nSimulation 1: distributions of dimension \u2265 2. We examined the distributions depicted in Figure 2.\nScenario (a) was chosen to examine the classical setting of discovering differences in multivariate\nnormal distributions. The other scenarios were chosen to discover differences in the distributions\nwhen one or both distributions have clusters. These are similar to the settings considered in (9). In\naddition, we examined the following scenario from (25) in \ufb01ve dimensions: F1 is the multivariate\nstandard normal distribution, and F2 = t(5)(5) is the multivariate t distribution, where each of the\nindependent 5 coordinates has the univariate t distribution with \ufb01ve degrees of freedom.\nRegarding the choice of center points, we examine as single center point a sample point selected\nat random or the center of mass (CM), and as multiple center points all sample points pooled by\nthe third approach (using either Bonferroni\u2019s test or Hommel\u2019s test). Regarding the univariate tests,\nwe examine: the test of Kolmogorov-Smirnov (18), referred to as KS; the test of the Anderson and\nDarling family, constructed by (20) for the univariate two-sample problem, referred to as AD; the\ngeneralized test of (11), that aggregates over all partition sizes using the minimum p-value statistic,\nreferred to as minP (see SM \u00a7 C for a detailed description). We compare our tests to Hotelling\u2019s\nT 2 classical generalization of the Student\u2019s t statistic for multivariate normal data (17), referred\nto as Hotelling; to the energy test of (25) and (1), referred to as Edist; and to the maximum mean\ndiscrepancy test of (8), referred to as MMD.\n\n(a)\n\n(b)\n\n(c)\n\n(b) F1 = N2{(0, 0), diag(1, 1)} and F2 =(cid:80)4\n\u00b52 = c(\u22121, 1), \u00b53 = c(1,\u22121), \u00b54 = c(\u22121,\u22121) ; (c) F1 = (cid:80)9\nF2 =(cid:80)9\n\nFigure 2: Realizations from the three non-null bivariate settings considered, with a sample size\nof 100 from each group: (a) F1 = N2{(0, 0), diag(1, 1)} and F2 = N2{(0, 0.05), diag(0.9, 0.9)};\n4 N2{\u00b5i, diag(0.25, 0.25)}, where \u00b51 = c(1, 1),\n9 N2{\u00b5i, diag(1, 1)} and\n9 N2{\u00b5i + (1, 1), diag(0.25, 0.25)} are both mixtures of nine bivariate normals with\nequal probability of being sampled, but the centers of the bivariate normals of F1 are on the grid\npoints (10, 20, 30) \u00d7 (10, 20, 30) and have covariance diag(1, 1), and the centers of the bivariate\nnormals of F2 are on the grid points (11, 21, 31)\u00d7 (11, 21, 31) and have covariance diag(0.25, 0.25).\n\ni=1\n\n1\n\ni=1\n\n1\n\n1\n\ni=1\n\nTable 1 shows the actual signi\ufb01cance level (column 3) and power (columns 4\u20137), for the different\nmultivariate tests considered, at the \u03b1 = 0.1 signi\ufb01cance level. We see that the choice of center point\nmatters: comparing rows 4\u20136 to rows 7\u20139 shows that depending on the data generation, there can be\nmore or less power to the test that selects as the center point a sample point at random, versus the\ncenter of mass, depending on whether the distances from the center of mass are more informative\nthan the distances from a random point. Comparing these rows with rows 10\u201315 shows that in most\nsettings there was bene\ufb01t in considering all sample points as center points versus only a single center\npoint, even at the price of paying for multiplicity of the different center points. This was true despite\n\n6\n\n\u22122\u221210123\u22122\u22121012Y1Y2\u22123\u22122\u221210123\u22123\u22122\u221210123Y1Y210152025301015202530Y1Y2\fTable 1: The fraction of rejections at the 0.1 signi\ufb01cance level for the null case (column 3), the\nthree scenarios depicted in Figure 2 (columns 4\u20136), and the additional scenario of higher dimension\n(column 7). The sample size in each group was 100. Rows 4\u20136 use the center of mass (CM) as a\nsingle center point; rows 7\u20139 use a random sample point as the single center point; rows 10\u201312 use\nall sample points as center points. The adjustment for the multiple center points is by Bonferroni in\nrows 10\u201312, and by Hommel\u2019s test in rows 13\u201315. Based on 500 repetitions for columns 4\u20137, and on\n1000 repetitions for the true null setting in column 3.\n\nRow\nTest\n1\nHotelling\n2\nEdist\n3 MMD\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n\nsingle Z-CM - minP\nsingle Z -CM - KS\nsingle Z-CM-AD\nsingle Z -random - minP\nsingle Z -random - KS\nsingle Z - random - AD\nvector Z - minP-Bonf\nvector Z - KS-Bonf\nvector Z-ad-Bonf\nvector Z - minP-Hommel\nvector Z - KS-Hommel\nvector Z-AD-Hommel\n\nF1 = F2 =\nN2{(0, 0), diag(1, 1)}\n0.097\n0.090\n0.114\n0.095\n0.087\n0.112\n0.097\n0.099\n0.102\n0.028\n0.013\n0.011\n0.008\n0.009\n0.007\n\nScenarios in Figure 2\n(a)\n0.952\n0.958\n0.908\n0.308\n0.262\n0.350\n0.504\n0.502\n0.556\n0.592\n0.692\n0.772\n0.606\n0.588\n0.720\n\n(b)\n0.064\n0.826\n0.926\n0.990\n0.982\n0.994\n0.736\n0.702\n0.708\n0.962\n0.858\n0.820\n0.936\n0.776\n0.760\n\n(c)\n0.246\n0.298\n0.190\n0.634\n0.214\n0.266\n0.922\n0.394\n0.436\n1.000\n0.196\n0.132\n1.000\n0.174\n0.150\n\nN5{(0, 0), diag(1, 1, 1, 1, 1)}\n\n, t(5)(5)\n\n0.080\n0.438\n0.682\n0.974\n0.924\n0.978\n0.754\n0.656\n0.750\n0.906\n0.722\n0.778\n0.774\n0.550\n0.668\n\nthe fact that the cut-off for signi\ufb01cance when considering all sample points was conservative, as\nmanifest by the lower signi\ufb01cance levels when the null is true (in column 3, rows 10-15 the actual\nsigni\ufb01cance level is at most 0.028). Applying Hommel\u2019s versus Bonferroni\u2019s test matters as well, and\nthe latter has better power in most scenarios. The greatest difference in power is due to the univariate\ntest choice. A comparison of using KS (rows 5, 8, 11, and 14) versus AD (rows 6, 9, 12, and 15)\nand minP (rows 4, 7, 10, and 13) shows that AD and minP are more powerful than KS, with a\nlarge power gain for using minP when there are many clusters in the data (column 6). As expected,\nHotelling, Edist and MMD perform best for differences in the Gaussian distribution (column 4).\nHowever, in all other settings Hotelling\u2019s test has poor power, and our approach with minP as the\nunivariate test has more power than Edist and MMD in columns 5\u20137. A possible explanation for\nthe power advantage using an omnibus consistent univariate test over Edist is the fact that Edist\naggregates over the univariate scores in (3), and the absolute value of these scores is close to zero for\nsample points that are on average the same distance away from both groups (even if the spread of the\ndistances from these sample points is different across groups), and for certain center points the score\ncan even be negative.\nSimulation 2: a closer inspection of a speci\ufb01c alternative. For the data generation of Figure 1, we\ncan actually predict which of the partition based univariate tests should be most powerful. This of\ncourse requires knowing the data generations mechanism, which is unknown in practice, but it is\ninteresting to examine the magnitude of the gaps in power from using optimal versus other choices of\ncenter points and univariate tests. As one intuitively expects, choosing a point on one of the axes\ngives the best power. Speci\ufb01cally, looking at the densities of the distributions of distances from\n(0,100) in Figure 1 (b) one can expect that a good way to differentiate between the two densities is\nby partitioning the sample space into at least \ufb01ve sections, de\ufb01ned by the four intersections of the\ntwo densities closest to the center. In the power analysis in Table 2, M5, a test which looks for the\nbest 5-way partition, has the highest power among all Mk scores, k = 2, 3, . . .. Similarly, an Sk\nscore sums up all the scores of partitions into exactly k parts, and we would like a partition to be a\nre\ufb01nement of the best \ufb01ve way partition in order for it to get a good score. Here, S8 has the best\npower among all Sk scores, k = 2, 3, . . .. For more details about these univariate tests see SM \u00a7 C.\nIn summary, in this speci\ufb01c situation, it is possible to predict both a good center point and a good\nvery speci\ufb01c univariate score. However this is not the typical situation since usually we do not know\nenough about the alternative and therefore it is best to pool information from multiple center points\ntogether as suggested in Section 3, and to use a more general univariate score, such as minP , which\nis the minimum of the p-values of the scores Sk, k \u2208 {2, 3, . . .}.\nWe expect pooling methods one and two to be more powerful than the third pooling method used\nin the current study, since the Bonferroni and Hommel tests are conservative compared to using\n\n7\n\n\f2 N2(0, diag(1, 9)) + 1\n\nThe fraction of rejections at the 0.1 signi\ufb01cance level for testing H0 : F1 = F2\nTable 2:\nwhen F1 = 1\n2 N2(0, diag(9, 1)) +\n2 N2(0, diag(100, 100)), based on a sample of 100 points from each group, using different uni-\n1\nvariate tests and different center points schemes. Based on 500 repetitions. The competitors had the\nfollowing power: Hotelling, 0.090; Edist, 0.274; MMD 0.250.\n\n2 N2(0, diag(100, 100)) and F2 = 1\n\nTest\n\nminP\nKS\nAD\nM5\nS5\nM8\nS8\n\nPartitions\nconsidered\nall\n2 \u00d7 2\n2 \u00d7 2\n5 \u00d7 5\n5 \u00d7 5\n8 \u00d7 8\n8 \u00d7 8\n\nAggregation\ntype\n\nmaximum\nsum\nmaximum\nsum\nmaximum\nsum\n\nSingle center point\n\nz = (0, 100)\n0.896\n0.574\n0.504\n0.850\n0.890\n0.820\n0.924\n\nz = (0, 4)\n0.864\n0.508\n0.702\n0.834\n0.902\n0.794\n0.912\n\nSample points are the center points\nHommel\nBonferroni\n0.758\n0.870\n0.110\n0.208\n0.030\n0.064\n0.644\n0.904\n0.706\n0.550\n0.586\n0.856\n0.876\n0.736\n\nthe exact permutation null distribution of their corresponding test statistics. We learn from the\nexperiments above and in SM \u00a7 B, that our approach can be useful in designing well-powered tests,\nbut that important choices need to be made, especially the choice of univariate test, for the resulting\nmultivariate test to have good power.\n\n6 Discussion\n\nWe showed that multivariate K-sample and independence tests can be performed by comparing\nthe univariate distributions of the distances from center points, and that favourable properties of\nthe univariate tests can carry over to the multivariate test. Speci\ufb01cally, (1) if the univariate test is\nconsistent then the multivariate test will be consistent (except for a measure zero set of center points);\n(2) if the univariate test is distribution-free, the multivariate test has a distribution-free critical value\nif the third pooling method is used; and (3) if the univariate test-statistic is a U-statistic of order\nm, then aggregating by summation with the sample points as center points produces a multivariate\ntest-statistic which is a U-statistic of order m + 1. The last property may be useful in working out the\nasymptotic null distribution of the multivariate test-statistic, thus avoiding the need for permutations\nwhen using the second pooling method. It may also be useful for working out the non-null distribution\nof the test-statistic, which may converge to a meaningful population quantity.\nThe experiments show great promise for designing multivariate tests using our approach. Even\nthough only the most conservative distribution-free tests were considered, they had excellent power.\nThe approach is general, and several important decisions have to be made when tailoring a test to\na speci\ufb01c application: (1) the number and location of the center points; (2) the univariate test; and\n(3) the pooling method.We plan to carry out a comprehensive empirical investigation to assess the\nimpact of the different choices. We believe that our approach will generate useful multivariate tests\nfor various modern applications, especially applications where the data are naturally represented by\ndistances such as the study of microbiome diversity (see SM \u00a7 B for an example).\nThe main results were stated for given center points, yet in simulations we select the center points\nusing the sample. The theoretical results hold for a center point selected at random from the sample.\nThis can be seen by considering a two-step process, of \ufb01rst selecting the sample point that will be a\ncenter point, and then testing the distances from this center point to the remaining N-1 sample points.\nSince the N sample points are independent, the consistency result holds. However, if the center point\nis the center of mass, and it converges to a bad point, then such a test will not be consistent. Therefore\nwe always recommend at least one center point randomly sampled from a distribution with a support\nof positive measure.\nOur theoretical results were shown to hold for the Euclidean norm. However, imposing the restriction\nthat the multivariate distribution function is smooth, the theoretical results will hold more generally\nfor any norms or quasi-norms. From a practical point of view, adding a small Gaussian error to the\nmeasured signal guarantees that these results will hold for any normed distance.\n\nAcknowledgments\n\nWe thank Boaz Klartag and Elchanan Mossel for useful discussions of the main results.\n\n8\n\n\fReferences\n[1] BARINGHAUS, L. & FRANZ, C. (2004). On a new multivariate two-sample test. Journal of Multivariate\n\nAnalysis, 88:190\u2013206.\n\n[2] BENJAMINI, Y. & YEKUTIELI, D. (2001). The control of the false discovery rate in multiple testing under\n\ndependency. The Annals of Statistics, 29 (4):1165\u20131188.\n\n[3] CHWIALKOWSKI, K., RAMDAS, A. , SEJDINOVIC, D. & GRETTON, A. (2015). Fast two-sample testing\nwith analytic representations of probability measures. Advances in Neural Information Processing Systems\n(NIPS) , 28.\n\n[4] CUESTA-ALBERTOS, J. A., FREIMAN, R. & RANSFORD, T. (2006). Random projections and goodness-of-\n\n\ufb01t tests in in\ufb01nite-dimensional spaces. Bull. Braz. Math. Soc. 37(4), 1\u201325.\n\n[5] GRETTON, A., BOGWARDT, K.M., RASCH, M.J., SCHOLKOPF, B & SMOLA, A. (2007). A kernel method\n\nfor the two-sample problem. Advances in Neural Information Processing Systems (NIPS), 19.\n\n[6] GRETTON, A., FUKUMIZU, K., TEO, C.H., SONG, L., SCHOLKOPF, B. & SMOLA, A. (2008). A kernel\n\nstatistical test of independence. Advances in Neural Information Processing Systems, 20:585\u2013592.\n\n[7] GRETTON, A. & GYORFI, L. (2010). Consistent nonparametric tests of independence. Journal of Machine\n\nLearning Research, 11:1391\u20131423.\n\n[8] GRETTON, A., BORGWARDT, K.M., RASCH, M.J. SCHOLKOPF, B. & SMOLA, A. (2012). A kernel\n\ntwo-sample test. The Journal of Machine Learning Research, 13:723\u2013773.\n\n[9] GRETTON, A., SEJDINOVIC, D., STRATHMANN, H. BALAKRISHNAN, S. & PONTIL, M. & FUKUMIZU,\nK. & SRIPERUMBUDUR, B.K.(2012). Optimal kernel choice for large-scale two-sample tests. Advances in\nNeural Information Processing Systems, 25:1205\u20131213.\n\n[10] HALL, P. & TAJVIDI, N. (2002). Permutation tests for equality of distributions in high-dimensional\n\nsettings. Biometrika, 89 (2):359\u2013374.\n\n[11] HELLER, R., HELLER, Y., KAUFMAN, S., BRILL, B. & GORFINE, M. (2016). Consistent distribution-free\nK-sample and independence tests for univariate random variables Journal of Machine Learning 17 (29):\n1\u201354.\n\n[12] HELLER, R., HELLER, Y. & GORFINE, M. (2013). A consistent multivariate test of association based on\n\n[13] HENZE, N.(1988). A multivariate two-sample test based on the number of nearest neighbor type coinci-\n\nranks of distances. Biometrika, 100(2):503\u2013510.\n\ndences. The Annals of Statistics, 16(2): 772\u2013783.\n\n[14] HOEFFDING, W.(1948a). A non-parametric test of independence. Ann. Math. Stat., 19 (4), 546\u2013557.\n[15] HOEFFDING, W.(1948b). A class of statistics with asymptotically normal distributions. Annals of Statistics,\n\n19, 293-325.\n\n25:423\u2013430\n\n12, 461\u2013463.\n\n[16] HOMMEL, G. (1983). Tests of the overall hypothesis for arbitrary dependence structures Biom. J.\n\n[17] HOTELLING, H. (1931). The Generalization of Student\u2019s Ratio Ann. Math. Statist. 3:360\u2013378\n[18] KOLMOGOROV, A. N.(1941). Con\ufb01dence limits for an unknown distribution function. Ann. Math. Stat.\n\n[19] MAA, J.F., PEARL, D.K, & BARTOSZYNSKI, R. (1996). Reducing multidimensional two-sample data to\n\none-dimensional interpoint comparisons. Annals of Statistics, 24 (3), 1069-1074.\n\n[20] PETTITT, A.N.(1976). A two-sample Anderson-Darling rank statistics. Biometrika, 63 (1) 161-168.\n[21] RAWAT, R. & SITARAM, A. (2000). Injectivity sets for spherical means on Rn and on symmetric spaces\n\nJournal of Fourier Analysis and Applications, 6(3):343\u2013348.\n\n[22] ROSENBAUM, R. (2005). An exact distribution-free test comparing two multivariate distributions based on\n\nadjacency. Journal of the Royal Statitistical Society B, 67:515\u2013530.\n\n[23] SCHILLING, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Am. Statist. Assoc.\n\n81, 799\u2013806.\n\n[24] SEJDINOVIC, D., SRIPERUMBUDUR, B., GRETTON, A. & FUKUMIZU, K. (2013). Equivalence of\n\ndistance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics, 41 (5):2263\u20132291.\n\n[25] SZ\u00c9KELY, G. & RIZZO, M. (2004). Testing for equal distributions in high dimensions. InterStat.\n[26] SZ\u00c9KELY, G., RIZZO, M. & BAKIROV, N. (2007). Measuring and testing dependence by correlation of\n\ndistances. The Annals of Statistics, 35:2769\u20132794.\n\n[27] THAS, O. & OTTOY, J.P. (2004). A nonparamteric test for independence based on sample space partitions..\n\nCommuncations in Statistics - Simulation and Computation 33 (3), 711\u2013728.\n\n[28] WEI, S., LEE, C., WICHERS, L. & MARRON, J. S. (2015). Direction-Projection-Permutation\nJournal of Computational and Graphical Statisitcs, doi:\n\nfor High Dimensional Hypothesis Tests.\n10.1080/10618600.2015.1027773.\n\n9\n\n\f", "award": [], "sourceid": 150, "authors": [{"given_name": "Ruth", "family_name": "Heller", "institution": "Tel-Aviv University"}, {"given_name": "Yair", "family_name": "Heller", "institution": "Independent"}]}