{"title": "ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls", "book": "Advances in Neural Information Processing Systems", "page_first": 9388, "page_last": 9396, "abstract": "Major internet companies routinely perform tens of thousands of A/B tests each year. Such large-scale sequential experimentation has resulted in a recent spurt of new algorithms that can provably control the false discovery rate (FDR) in a fully online fashion. However, current state-of-the-art adaptive algorithms can suffer from a significant loss in power if null p-values are conservative (stochastically larger than the uniform distribution), a situation that occurs frequently in practice. In this work, we introduce a new adaptive discarding method called ADDIS that provably controls the FDR and achieves the best of both worlds: it enjoys appreciable power increase over all existing methods if nulls are conservative (the practical case), and rarely loses power if nulls are exactly uniformly distributed (the ideal case). We provide several practical insights on robust choices of tuning parameters, and extend the idea to asynchronous and offline settings as well.", "full_text": "ADDIS: an adaptive discarding algorithm\n\nfor online FDR control with conservative nulls\n\nDepartment of Statistics and Data Science\n\nDepartment of Statistics and Data Science\n\nAaditya Ramdas\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\naramdas@cmu.edu\n\nJinjin Tian\n\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\njinjint@andrew.cmu.edu\n\nAbstract\n\nMajor internet companies routinely perform tens of thousands of A/B tests each\nyear. Such large-scale sequential experimentation has resulted in a recent spurt of\nnew algorithms that can provably control the false discovery rate (FDR) in a fully\nonline fashion. However, current state-of-the-art adaptive algorithms can suffer\nfrom a signi\ufb01cant loss in power if null p-values are conservative (stochastically\nlarger than the uniform distribution), a situation that occurs frequently in practice.\nIn this work, we introduce a new adaptive discarding method called ADDIS\nthat provably controls the FDR and achieves the best of both worlds: it enjoys\nappreciable power increase over all existing methods if nulls are conservative (the\npractical case), and rarely loses power if nulls are exactly uniformly distributed\n(the ideal case). We provide several practical insights on robust choices of tuning\nparameters, and extend the idea to asynchronous and of\ufb02ine settings as well.\n\n1\n\nIntroduction\n\nRapid data collection is making the online testing of hypotheses increasingly essential, where a\nstream of hypotheses H1, H2, . . . is tested sequentially one by one. On observing the data for the t-th\ntest which is usually summarized as a p-value Pt, and without knowing the outcomes of the future\ntests, we must make the decision of whether to reject the corresponding null hypothesis Ht (thus\nproclaiming a \u201cdiscovery\u201d). Typically, a decision takes the form I(Pt \u2264 \u03b1t) for some \u03b1t \u2208 (0, 1),\nmeaning that we reject the null hypothesis when the p-value is smaller than some threshold \u03b1t. An\nincorrectly rejected null hypothesis is called a false discovery. Let R(T ) represent the set of rejected\nnull hypotheses until time T , and H0 be the unknown set of true null hypotheses; then, R(T ) \u2229 H0\nis the set of false discoveries. Then some natural error metrics are the false discovery rate (FDR),\nmodi\ufb01ed FDR (mFDR) and power, which are de\ufb01ned as\nFDR(T ) \u2261 E\n\n.\n(1)\nThe typical aim is to maximize power, while have FDR(T ) \u2264 \u03b1 at any time T \u2208 N, for some\nprespeci\ufb01ed constant \u03b1 \u2208 (0, 1). It is well known that setting every \u03b1t \u2261 \u03b1 does not provide any\ncontrol of the FDR in general. Indeed, the FDR can be as large as one in this case, see [1, Section\n1] for an example. This motivates the need for special methods for online FDR control (that is, for\ndetermining \u03b1t in an online manner).\n\n, mFDR(T ) \u2261 E [|H0 \u2229 R(T )|]\nE [|R(T )| \u2228 1]\n\n0 \u2229 R(T )|\n0|\n|Hc\n\n(cid:20)|H0 \u2229 R(T )|\n\n|R(T )| \u2228 1\n\n(cid:21)\n\n, power \u2261 E\n\n(cid:20)|Hc\n\n(cid:21)\n\nPast work. Foster and Stine [2] proposed the \ufb01rst \u201calpha-investing\u201d (AI) algorithm for online FDR\ncontrol, which was later extended to the generalized alpha-investing methods (GAI) by Aharoni and\nRosset [3]. A particularly powerful GAI algorithm called LORD was proposed by Javanmard and\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fMontanari [4]. Soon after, Ramdas et al. [1] proposed a modi\ufb01cation called LORD++ that uniformly\nimproved the power of LORD. Most recently, Ramdas et al. [5] developed the \u201cadaptive\u201d SAFFRON\nalgorithm, and alpha-investing is shown to be a special case of the more general SAFFRON framework.\nSAFFRON arguably represents the state-of-the-art, achieving signi\ufb01cant power gains over all other\nalgorithms including LORD++ in a range of experiments.\nHowever, an important point is that SAFFRON is more powerful only when the p-values are exactly\nuniformly distributed under the null hypothesis. In practice, one frequently encounters conservative\nnulls (see below), and in this case SAFFRON can have lower power than LORD++ (see Figure 1).\n\nUniformly conservative nulls. When performing hypothesis testing, we always assume that the\np-value P is valid, which means that if the null hypothesis is true, we have Pr{P \u2264 x} \u2264 x for all\nx \u2208 [0, 1]. Ideally, a p-value is exactly uniformly distributed, which means that the inequality holds\nwith equality. However, we say a null p-value is conservative if the inequality is strict, and often the\nnulls are uniformly conservative, which means that under the null hypothesis, we have\n\nPr{P/\u03c4 \u2264 x | P \u2264 \u03c4} \u2264 x\n\nfor all x, \u03c4 \u2208 (0, 1).\n\n(2)\nAs an obvious \ufb01rst example, the p-values being exactly uniform (the ideal setting) is a special case.\nIndeed, for a uniform U \u223c U [0, 1], if you know that U is less than (say) \u03c4 = 0.4, then the conditional\ndistribution of U is just U [0, 0.4], which means that U/0.4 has a uniform distribution on [0, 1], and\nhence Pr{U/0.4 \u2264 x | U \u2264 0.4} \u2264 x for any x \u2208 (0, 1). A mathematically equivalent de\ufb01nition of\nuniformly conservative nulls is that the CDF F of a null p-value P satis\ufb01es the following property:\n(3)\nHence, any null p-value with convex CDF is uniformly conservative. Particularly, when F is\ndifferentiable, the convexity of F is equivalent to its density f being monotonically increasing. Here\nare two tangible examples of tests with uniformly conservative nulls:\n\nfor all 0 \u2264 x, \u03c4 \u2264 1.\n\nF (\u03c4 x) \u2264 xF (\u03c4 ),\n\n\u2022 A test of Gaussian mean: we test the null hypothesis H0 : \u00b5 \u2264 0 against the alternative\nH1 : \u00b5 > 0; the observation is Z \u223c N (\u00b5, 1) and the p-value is computed as P = \u03a6(\u2212Z),\nwhere \u03a6 is the standard Gaussian CDF.\n\u2022 A test of Gaussian variance: we observe Z \u223c N (0, \u03c3) and we wish to test the null hypothesis\nH0 : \u03c3 \u2264 1 against the H1 : \u03c3 > 1 and the p-value is P = 2\u03a6(\u2212|Z|).\n\nIt is easy to verify that, if the true \u00b5 in the \ufb01rst test is strictly smaller than zero, or the true \u03c3 in the\nsecond test is strictly smaller than one, then the corresponding null p-values have monotonically\nincreasing density, thus being uniformly conservative. More generally, Zhao et al. [6] presented the\nfollowing suf\ufb01cient condition for a one-dimensional exponential family with true parameter \u03b8: when\nthe true \u03b8 is strictly smaller than \u03b80, the uniformly most powerful (UMP) test of H0 : \u03b8 \u2264 \u03b80 versus\nH1 : \u03b8 > \u03b80 is uniformly conservative. Since the true underlying state of nature is rarely exactly at\nthe boundary of the null set (like \u00b5 = 0 or \u03c3 = 1 or \u03b8 = \u03b80 in the above examples), it is common in\npractice to encounter uniformly conservative nulls. In the context of A/B testing, this corresponds\nto testing H0 : \u00b5B \u2264 \u00b5A against H1 : \u00b5B > \u00b5A, when in reality, B (the new idea) is strictly worse\nthan A (the existing system), a very likely scenario.\n\nOur contribution The main contribution of this paper is a new method called ADDIS (an ADaptive\nalgorithm that DIScards conservative nulls), that compensates for the power loss of SAFFRON with\nconservative nulls. ADDIS is based on a new serial estimate of the false discovery proportion,\nhaving adaptivity to both fraction of nulls (like SAFFRON) and the conservativeness of nulls (unlike\nSAFFRON). As shown in Figure 1, ADDIS enjoys appreciable power increase over SAFFRON as\nwell as LORD++ under settings with many conservative nulls, and rarely loses power when the nulls\nare exactly uniformly distributed (not conservative). Our work is motivated by recent work by Zhao\net al. [6] and Ellis et al. [7] who study nonadaptive of\ufb02ine multiple testing problems with conservative\nnulls, and ADDIS can be regarded as extending their work to both online and adaptive settings. The\nconnection to the of\ufb02ine setting is that ADDIS effectively employs a \u201cdiscarding\u201d rule, which states\nwe should discard (that is, not test) a hypothesis with p-value exceeding certain threshold. Beyond\nthe online setting, we also incorporate this rule into several other existing FDR methods, and formally\nprove that the resulting new methods still control the FDR, while demonstrating numerically they\nhave a consistent power advantage over the original methods. Figure 2 presents the relational chart of\n\n2\n\n\fhistorical FDR control methods together with some of the new methods we proposed. As far as we\nknow, we provide the \ufb01rst method that adapts to the conservativeness of nulls in the online setting.\n\nFigure 1: Statistical power and FDR versus fraction of non-null hypotheses \u03c0A for ADDIS, SAFFRON\nand LORD++ at target FDR level \u03b1 = 0.05 (solid black line). The curves above 0.05 line display\nthe power of each methods versus \u03c0A, while the lines below 0.05 display the FDR of each methods\nversus \u03c0A. The experimental setting is described in Section 3: we set \u00b5A = 3 for both \ufb01gures, but\n\u00b5N = \u22121 for the left \ufb01gure and \u00b5N = 0 for the right \ufb01gure (hence the left nulls are conservative,\nthe right nulls are not). These \ufb01gures show that (a) all considered methods do control the FDR at\nlevel 0.05, (b) SAFFRON sometimes loses its superiority over its nonadaptive variant LORD++ with\nconservative nulls (i.e. \u00b5N < 0); and (c) ADDIS is more powerful than SAFFRON and LORD++\nwith conservative nulls, while loses almost nothing under settings with uniform nulls (i.e. \u00b5N = 0).\n\nOf\ufb02ine methods\n\nBH [8]\n\nadaptivity\n\nonline analog\n\nOnline methods\n\nLORD [4]\nLORD++ [1]\n\nadaptivity\n\ndiscarding\n\ndiscarding\n\nStorey-BH [9]\n\nonline analog\n\nSAFFRON [5]\n\nspecial case\n\nAlpha-Investing [2]\n\nD-StBH (Section S-2)\n\nonline analog\n\nADDIS (Section 2)\n\nFigure 2: Historical context: ADDIS generalizes SAFFRON, which generalizes Alpha-Investing and\nLORD++. Analogously, D-StBH (supplement) generalizes Storey-BH, which generalizes BH.\n\nPaper outline.\nIn Section 2, we derive the ADDIS algorithm and state its guarantees (FDR and\nmFDR control), deferring proofs to the supplement. Speci\ufb01cally, in Section 2.4, we discuss how\nto choose the hyperparameters in ADDIS to balance adaptivity and discarding for optimal power.\nSection 3 shows simulations which demonstrate the advantage of ADDIS over non-discarding or\nnon-adaptive methods. We then generalize the \u201cdiscarding\u201d rule of ADDIS in Section 4 and use it to\nobtain the \u201cdiscarding\u201d version of many other methods under various settings. We also show the error\ncontrol with formal proofs for those variants in the supplement. Finally, we present a short summary\nin Section 5. The code to reproduce all \ufb01gures in the paper is included in the supplement.\n\n2 The ADDIS algorithm\n\nBefore deriving the ADDIS algorithm, it is useful to set up some notation. Recall that Pj is the\np-value for testing hypothesis Hj. For some sequences {\u03b1t}\u221e\nt=1, where each\nterm is in the range [0, 1], de\ufb01ne the indicator random variables\n\nt=1 and {\u03bbt}\u221e\n\nt=1, {\u03c4t}\u221e\n\nSj = 1{Pj \u2264 \u03c4j}, Cj = 1{Pj \u2264 \u03bbj}, Rj = 1{Pj \u2264 \u03b1j}.\n\nThey respectively answer the questions: \u201cwas Hj selected for testing? (or was it discarded?)\u201d, \u201cwas\nHj a candidate for rejection?\u201d and \u201cwas Hj rejected, yielding a discovery?\u201d. We call the sets\nS(t) = {j \u2208 [t] : Sj = 1}, C(t) = {j \u2208 [t] : Cj = 1}, R(t) = {j \u2208 [t] : Rj = 1}\n\nas the \u201cselected (not discarded) set\u201d, \u201ccandidate set\u201d and \u201crejection set\u201d after t steps respectively.\nSimilarly, we de\ufb01ne R1:t = {R1, . . . , Rt}, C1:t = {C1, . . . , Ct} and S1:t = {S1, . . . , St}. In what\nfollows in this section and the next section, we repeatedly encounter the \ufb01ltration\n\nF t := \u03c3(R1:t, C1:t, S1:t).\n\n3\n\n0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / Power0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / Power\fWe insist that \u03b1t, \u03bbt and \u03c4t are predictable, that is they are measurable with respect to F t\u22121. This\nmeans that \u03b1t, \u03bbt, \u03c4t are really mappings from {R1:t\u22121, C1:t\u22121, S1:t\u22121} (cid:55)\u2192 [0, 1].\nThe presentation is cleanest if we assume that the p-values from the different hypotheses are indepen-\ndent (which would be the case if each A/B test was based on fresh data, for example). However, we\ncan also prove mFDR control under a mild form of dependence: we call the null p-values conditionally\nuniformly conservative if for any t \u2208 H0, we have that\n\n\u2200x, \u03c4 \u2208 (0, 1), Pr(cid:8)Pt/\u03c4 \u2264 x(cid:12)(cid:12) Pt \u2264 \u03c4,F t\u22121(cid:9) \u2264 x.\n\n(4)\nNote that the above condition is equivalent to the (marginally) uniformly conservative property (2) if\nthe p-values are independent, and hence Pt is independent of F t\u22121. For simplicity, we will refer this\n\u201cconditionally uniformly conservative\u201d property still as \u201cuniformly conservative\u201d.\n\n2.1 Deriving ADDIS algorithm\nDenote the (unknown) false discovery proportion by FDP \u2261 |H0\u2229R(T )|\ncan control the FDR at any time t by instead controlling an oracle estimate of the FDP, given by\n\n|R(T )|\u22281 . As mentioned in [5], one\n\nFDP\u2217(t) : =\n\n\u03b1j\nj\u2264t,j\u2208H0\n|R(t)| \u2228 1\n\n(5)\nThis means that if we can keep FDP\u2217(t) \u2264 \u03b1 at all times t, then we can prove that FDR(t) \u2264 \u03b1 at all\ntimes t. Since the set of nulls H0 is unknown, LORD++ [1] is based on the simple upper bound of\n\nFDP\u2217(t), de\ufb01ned as (cid:100)FDPLORD++(t), and SAFFRON [5] is based on a more nuanced adaptive bound\non FDP\u2217(t), de\ufb01ned as (cid:100)FDPSAFFRON(t), obtained by choosing a predictable sequence {\u03bbj}\u221e\n\nj=1; where\n\n.\n\n(cid:80)\n\n(cid:100)FDPLORD++(t) : =\n\n(cid:80)\n\nj\u2264t \u03b1j\n|R(t)| \u2228 1\n\n(cid:80)\n\n, (cid:100)FDPSAFFRON(t) : =\n\n1{Pj >\u03bbj}\n\nj\u2264t \u03b1j\n1\u2212\u03bbj\n|R(t)| \u2228 1\n\n.\n\n(6)\n\nIt is easy to \ufb01x \u03b11 < \u03b1, and then update \u03b12, \u03b13, . . . in an online fashion to maintain the invariant\n\n(cid:100)FDPLORD++(t) \u2264 \u03b1 at all times, which the authors prove suf\ufb01ces for FDR control, while it is also\nproved that keeping (cid:100)FDPSAFFRON(t) \u2264 \u03b1 at all times suf\ufb01ces for FDR control at any time. However,\nwe expect (cid:100)FDPSAFFRON(t) to be closer 1 to FDP\u2217(t) than (cid:100)FDPLORD++(t), and since SAFFRON better\n\nuses its FDR budget, it is usually more powerful than LORD++. SAFFRON is called an \u201cadaptive\u201d\nalgorithm, because it is the online analog of the Storey-BH procedure [9], which adapts to the\nproportion of nulls in the of\ufb02ine setting.\nHowever, in the case when there are many conservative null p-values (whose distribution is stochasti-\ncally larger than uniform), many terms in { 1{\u03bbj \u03b8j}\n|R(t)| \u2228 1\n\n\u03c4j (1\u2212\u03b8j )\n\nthe idea that the numerator of (cid:100)FDPADDIS(t) is a much tighter estimator of(cid:80)\nwith that of (cid:100)FDPSAFFRON(t). In order to see why this is true, we provide the following lemma.\n\nWith many conservative nulls, the claim that ADDIS is more powerful than SAFFRON, is based on\n\u03b1j, compared\n\nLemma 1. If a null p-value P has a differentiable convex CDF, then for any constants a, b \u2208 (0, 1),\nwe have\n\nj\u2264t,j\u2208H0\n\nb(1 \u2212 a)\n\n(8)\n1To see this intuitively, consider the case when (a) \u03bbj \u2261 1/2 for all j, (b) there is a signi\ufb01cant fraction\nof non-nulls, and the non-null p-values are all smaller than 1/2 (strong signal), and (c) the null p-values are\nexactly uniformly distributed. Then, 1{1/2 a}\n(1 \u2212 a)\n= E(cid:104)(cid:80)\n\n(cid:105)\n\nPr{ab < P \u2264 b}\n\n(cid:80)\n\n(cid:100)FDPADDIS(t) : =\n\n(cid:80)\n\n4\n\n\ffor t = 1, 2, . . . do\n\nReject the t-th null hypothesis if Pt \u2264 \u03b1t, where \u03b1t : = min{\u03bb,(cid:98)\u03b1t}, and\n(cid:16)\n(cid:98)\u03b1t : = (\u03c4 \u2212 \u03bb)\nHere, St =(cid:80)\ni \u03bbt \u2265 \u03b1t for all t, which is needed for correctness of the proof\nof FDR control. This is not a major restriction since we often choose \u03b1 = 0.05, and the algorithms set\n\u03b1t \u2264 \u03b1, in which case \u03c4t > \u03bbt \u2265 0.05 easily satis\ufb01es the needed constraint. Now, the main nontrivial\nquestion is how to ensure the invariant in a fully online fashion. We address this by providing an\nexplicit instance of ADDIS algorithm, called ADDIS\u2217 (Algorithm 1), in the following section. From\n\nthe form of the invariant (cid:100)FDPADDIS(t) \u2264 \u03b1, we observe that any p-value Pj that is bigger than \u03c4j has\n\nno in\ufb02uence on the invariant, as if it never existed in the sequence at all. This reveals that ADDIS\neffectively implements a \u201cdiscarding\" rule: it discards p-values exceeded a certain threshold. If the\np-value is not discarded, then Pj/\u03c4j is a valid p-value and we resort to using adaptivity like (6).\n2.2 ADDIS\u2217: an instance of ADDIS algorithm using constant \u03bb and \u03c4\nHere we present an instance of ADDIS algorithm, with choice of \u03bbj \u2261 \u03bb and \u03c4j \u2261 \u03c4 for all j. (We\nconsider constant \u03bb and \u03c4 for simplicity, but these can be replaced by \u03bbj and \u03c4j at time j.)\nAlgorithm 1: The ADDIS\u2217 algorithm\nInput: FDR level \u03b1, discarding threshold \u03c4 \u2208 (0, 1], candidate threshold \u03bb \u2208 [0, \u03c4 ), sequence\nj=0 which is nonnegative, nonincreasing and sums to one, initial wealth W0 \u2264 \u03b1.\n\n{\u03b3j}\u221e\n\nend\n\nIn Section S-5.2, we verify that \u03b1t is a monotonic function of the past2. In Section S-10, we present\nAlgorithm S-3, which is an equivalent version of the above ADDIS\u2217 algorithm, but it explicitly\ndiscards p-values larger than \u03c4, thus justifying our use of the term \u201cdiscarding\u201d throughout this paper.\n\nNote that if we choose \u03bb \u2265 \u03b1, then the constraint \u03b1t : = min{\u03bb,(cid:98)\u03b1t} is vacuous and reduces to\n\u03b1t : =(cid:98)\u03b1t, because(cid:98)\u03b1t \u2264 \u03b1 by construction. The power of ADDIS varies with \u03bb and \u03c4, as discussed\n\nfurther in Section 2.4.\n\n2.3 Error control of ADDIS algorithm\n\nHere we present error control guarantees for ADDIS, and defer proofs to Section S-5 and Section S-6.\nTheorem 1. If the null p-values are uniformly conservative (4), and suppose we choose \u03b1j, \u03bbj and\n\u03c4j such that \u03c4j > \u03bbj \u2265 \u03b1j for each j \u2208 N, then we have:\n\n(a) any algorithm with (cid:100)FDPADDIS(t) \u2264 \u03b1 for all t \u2208 N also enjoys mFDR(t) \u2264 \u03b1 for all t \u2208 N.\n(b) any algorithm with (cid:100)FDPADDIS(t) \u2264 \u03b1 for all t \u2208 N also enjoys FDR(t) \u2264 \u03b1 for all t \u2208 N.\n\nIf we additionally assume that the null p-values are independent of each other and of the non-nulls,\nand always choose \u03b1t, \u03bbt and 1 \u2212 \u03c4t to be monotonic functions of the past for all t, then we\nadditionally have:\nAs an immediate corollary, any ADDIS algorithm enjoys mFDR control, and ADDIS\u2217 (Algorithm 1)\nadditionally enjoys FDR control since it is a monotonic rule.\n\n2We say that a function ft(R1:t\u22121, C1:t\u22121, S1:t\u22121) : {0, 1}3(t\u22121) \u2192 [0, 1] is a monotonic function of the\npast, if ft is coordinatewise nondecreasing in Ri and Ci, and is coordinatewise nonincreasing in Si. This is a\ngeneralization of the monotonicity of SAFFRON [5], which is recovered by setting Si = 1 for all i, that is we\nnever discard any p-value.\n\n5\n\n\fThe above result only holds for nonrandom times. Below, we also show that any ADDIS algorithm\ncontrols mFDR at any stopping time with \ufb01nite expectation.\nTheorem 2. Assume that the null p-values are uniformly conservative, and that minj{\u03c4j \u2212 \u03bbj} > \u0001\nfor some \u0001 > 0. Then, for any stopping time Tstop with \ufb01nite expectation, any algorithm that maintains\n\nthe invariant (cid:100)FDPADDIS(t) \u2264 \u03b1 for all t enjoys mFDR(Tstop) \u2264 \u03b1.\n\nOnce more, the conditions for the theorem are not restrictive because the sequences {\u03bbj}\u221e\n{\u03c4j}\u221e\n\nj=1 and\nj=1 are user-chosen, and \u03bbj = 1/4, \u03c4j = 1/2 is a reasonable default choice, as we justify next.\n\n2.4 Choosing \u03c4 and \u03bb to balance adaptivity and discarding\nAs we mentioned before, the power of our ADDIS\u2217 algorithm is closely related to the hyper-\nparameters \u03bb and \u03c4. In fact, there is also an interaction between the hyper-parameters \u03bb and \u03c4, which\nmeans that one cannot decouple the effect of each on power. One can see this interaction clearly\nin Figure 3 which displays a trade off between adaptivity (\u03bb) and discarding (\u03c4). Indeed, the right\nsub-\ufb01gure displays a \u201csweet spot\u201d for choosing \u03bb, \u03c4, which should neither be too large nor too small.\nIdeally, one would hope that there exists some universally optimal choice of \u03bb, \u03c4 that yields maximum\npower. Unfortunately, the relationship between power and these parameters changes with the\nunderlying distribution of the null and alternate p-values, as well as their relative frequency. Therefore,\nbelow, we only provide a heuristic argument about how to tune these parameters for ADDIS\u2217.\n\nRecall that the ADDIS\u2217 algorithm is derived by tracking the empirical estimator (cid:100)FDPADDIS (7) with\n\ufb01xed \u03bb and \u03c4, and keeping it bounded by \u03b1 over time. Since (cid:100)FDPADDIS serves as an estimate of\n(cid:100)FDPADDIS. One simple way to choose \u03bb and \u03c4 is to minimize the expectation of the indicator term in\n\nthe oracle FDP\u2217 (5), it is natural to expect higher power with a more re\ufb01ned (i.e. tighter) estimator\n\nthe estimator. Speci\ufb01cally, if the CDF of all p-values is F , then an oracle would choose \u03bb, \u03c4 as\n\n(\u03bb\u2217, \u03c4\u2217) \u2208 arg min\n\n\u03bb<\u03c4\u2208(0,1)\n\nF (\u03c4 ) \u2212 F (\u03bb)\n\n\u03c4 \u2212 \u03bb\n\n.\n\n(9)\n\nIn order to remove the constraints between the two variables, we again de\ufb01ne \u03b8 = \u03bb/\u03c4, then the\noptimization problem (9) is equivalent to\n(\u03b8\u2217, \u03c4\u2217) \u2208 arg min\n\u03b8,\u03c4\u2208(0,1)\n\n\u2261 (g \u25e6 F )(\u03b8, \u03c4 ).\n\nF (\u03c4 ) \u2212 F (\u03b8\u03c4 )\n\n\u03c4 (1 \u2212 \u03b8)\n\n(10)\n\nWe provide some empirical evidence to show the quality of the above proposal. The left sub\ufb01gure\nin Figure 3 shows the heatmap of (g \u25e6 F ) and the right one shows the empirical power of ADDIS\u2217\nwith p-values generate from F versus different \u03b8 and \u03c4 (the left is simply evaluating a function, the\nright requires repeated simulation). The same pattern is consistent across other reasonable choices\nof F , as shown in Section S-11. We can see that the two sub\ufb01gures in Figure 3 show basically the\nsame pattern, with similar optimal choices of parameters \u03b8 and \u03c4. Therefore, we suggest choosing \u03bb\nand \u03c4 as de\ufb01ned in (9), if prior knowledge of F is available; otherwise it seems like \u03b8 \u2208 [0.25, 0.75]\nand \u03c4 \u2208 [0.15, 0.55] are safe choices, and for simplicity we use \u03c4 = \u03b8 = 0.5 as defaults, that is\n\u03c4 = 0.5, \u03bb = 0.25, in similar experimental settings. We leave the study of time-varying \u03bbj and \u03c4j as\nfuture work.\n\nFigure 3: The left \ufb01gure shows the heatmap of function g \u25e6 F , where F is the CDF of p-values drawn\nas described in Section 3 with \u00b5N = \u22121, \u00b5A = 3, \u03c0A = 0.2. The right \ufb01gure is the empirical power\nof ADDIS\u2217 versus different choice of \u03b8 and \u03c4, with p-values drawn from F . The \ufb01gures are basically\nof the same pattern, with similar optimal values of \u03b8 and \u03c4.\n\n6\n\n0.050.150.250.350.450.550.650.750.850.95\u03b80.950.850.750.650.550.450.350.250.150.05\u03c40.40.81.21.62.0g\u25e6F0.050.150.250.350.450.550.650.750.850.95\u03b80.950.850.750.650.550.450.350.250.150.05\u03c40.540.600.660.720.78power\f3 Numerical experiments\n\n\u221a\n\n(j+1)e\n\n1\n\nIn this section, we numerically compare the performance of ADDIS against the previous state-of-\nthe-art algorithm SAFFRON [5], and other well-studied algorithms like LORD++ [4], LOND [10]\nand Alpha-investing [2]. Speci\ufb01cally, we use ADDIS\u2217 de\ufb01ned in Algorithm 1 as the representative\nof our ADDIS algorithm. Though as discussed in Section 2.4, there is no universally optimal\nconstants, given the minimal nature of our assumptions, we will use some reasonable default choices\nin the numerical studies to have a glance at the advantage of ADDIS algorithm. The constants\n\u03bb = 0.25, \u03c4 = 0.5 and sequence {\u03b3j}\u221e\nj=0 with \u03b3j \u221d 1/(j + 1)\u22121.6 were found to be particularly\nsuccessful, thus are our default choices for hyperparameters in ADDIS\u2217. We choose the in\ufb01nite\nconstant sequence \u03b3j \u221d\n(j+1)1.6 , and \u03bb = 0.5 for SAFFRON, which yielded its best performance.\nWe use \u03b3j \u221d log ((j+1)\u22272)\nlog (j+1) for LORD++ and LOND, which is shown to maximize its power in the\nGaussian setting [4]. The proportionality constant of {\u03b3j}\u221e\ni=0 is determined so that the sequence\n{\u03b3j}\u221e\nWe consider the standard experimental setup of testing Gaussian means, with M = 1000 hypotheses.\nMore precisely, for each index i \u2208 {1, 2, . . . , M}, the null hypotheses take the form Hi : \u00b5i \u2264 0,\nwhich are being tested against the alternative HiA : \u00b5i > 0. The observations are independent\nGaussians Zi \u223c N (\u00b5i, 1), where \u00b5i \u2261 \u00b5N \u2264 0 with probability 1 \u2212 \u03c0A and \u00b5i \u2261 \u00b5A > 0\nwith probability \u03c0A. The one-sided p-values are computed as Pi = \u03a6(\u2212Zi), which are uniformly\nconservative if \u00b5N < 0 as discussed in the introduction (and the lower \u00b5N is, the more conservative\nthe p-value). In the rest of this section, for each algorithm, we use target FDR \u03b1 = 0.05 and estimate\nthe empirical FDR and power by averaging over 200 independent trials. Figure 4 shows that ADDIS\nhas higher power than all other algorithms when the nulls are conservative (i.e. \u00b5N < 0), and ADDIS\nmatches the power of SAFFRON without conservative nulls (i.e. \u00b5N = 0).\n\ni=0 sums to one.\n\n(c)\n\n(a)\n\n(d)\n\n(b)\n\n(e)\n\nFigure 4: Statistical power and FDR versus fraction of non-null hypotheses \u03c0A for ADDIS, SAF-\nFRON, LORD++, LOND, and Alpha-investing at target FDR level \u03b1 = 0.05 (solid black line). The\nlines above the solid black line are the power of each methods versus \u03c0A, and the lines below are the\nFDR of each methods versus \u03c0A. The p-values are drawn using the Gaussian model as described\nin the text, while we set \u00b5N = \u22120.5 in plot (a), \u00b5N = \u22121 in plot (b), \u00b5N = \u22121.5 in plot (c), and\n\u00b5N = 0 in plots (d) and (e). And we set \u00b5A = 3 in plots (a, b, c, d), \u00b5A = 4 in plot (e). These plots\nshow that (1) FDR is under control for all methods in all settings; (2) ADDIS enjoys appreciable\npower increase as compared to all the other four methods; (3) the more conservative the nulls are\n(the more negative \u00b5N is), the more signi\ufb01cant the power increase of ADDIS is; (4) ADDIS matches\nSAFFRON and remains the best in the setting with uniform (not conservative) nulls.\n\n7\n\n0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / Power0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / Power0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / Power0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / Power0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / Power0.040.020.000.020.040.040.020.000.020.04ADDISSAFFRONLORD++LONDAlpha-investing\f4 Generalization of the discarding rule\n\nAs we discussed before in Section 2, one way to interpret what ADDIS is doing is that it is \u201cdis-\ncarding\u201d the large p-values. We say ADDIS may be regarded as applying the \u201cdiscarding\" rule to\nSAFFRON. Naturally, we would like to see whether the general advantage of this simple rule can be\napplied to other FDR control methods, and under more complex settings. We present the following\ngeneralizations and leave the details (formal setup, proofs) to supplement for interested readers.\n\u2022 Extension 1: non-adaptive methods with discarding\n\nWe derive the discarding version of LORD++ , which we would refer as D-LORD, in Section S-1,\nwith proved FDR control.\n\n\u2022 Extension 2: discarding with asynchronous p-values\n\nIn a recent preprint, Zrnic et al. [11] show how to generalize existing online FDR control methods\nto what they call the asynchronous multiple testing setting. They consider a doubly-sequential\nsetup, where one is running a sequence of sequential experiments, many of which could be running\nin parallel, starting and ending at different times arbitrarily. In Section S-3, we show how to unite\nthe discarding rule from this paper with the \u201cprinciple of pessimism\u201d of Zrnic et al. [11] to derive\neven more powerful asynchronous online FDR algorithms, which we would refer as ADDISasync.\n\n\u2022 Extension 3: Of\ufb02ine FDR control with discarding\n\nIn Section S-2, we provide a new of\ufb02ine FDR control method called D-StBH, to show how to\nincorporate the discarding rule with the Storey-BH method, which is a common of\ufb02ine adaptive\ntesting procedure [12, 13]. Note that in the of\ufb02ine setting, the discarding rule is fundamentally the\nsame as the idea of [6, 7], which were only applied to non-adaptive multiple testing.\n\nThe following simulation results in Figure 5, which are plotted in the same format as in Section 3,\nshow that those discarding variants (marked with green color) enjoys the same type of advantage\nover their non-discarding counterparts: they are consistently more powerful under settings with many\nconservative nulls and do not lose much power under settings without conservative nulls.\n\n(a) Extension 1\n\n(b) Extension 2\n\n(c) Extension 3\n\n(d) Extension 1\n\n(e) Extension 2\n\n(f) Extension 3\n\nFigure 5: Statistical power and FDR versus fraction of non-null hypotheses \u03c0A for extended methods\nmentioned above at target FDR level \u03b1 = 0.05 (solid black line). The p-values are drawn using the\nGaussian model as described in the text, while we set \u00b5A = 3 for all the \ufb01gures, but \u00b5N = \u22121\nin plots (a, b, c), \u00b5N = 0 in plots (d, e, f). We additionally set the \ufb01nish time for the j-th test as\nEj \u223c j \u2212 1 + Geom(0.5) in plots (b, e), which means the duration time of each individual tests\nindependently follows Geometric distribution with succeed probability 0.5.\n\n8\n\n0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / PowerD-LORDLORD++0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / PowerADDISasyncSAFFRONasyncLORDasync0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / PowerD-StBHStorey-BH0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / PowerD-LORDLORD++0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / PowerADDISasyncSAFFRONasyncLORDasync0.20.40.60.8\u03c0A0.050.250.50.751.0FDR / PowerD-StBHStorey-BH\f5 Conclusion\n\nIn this work, we propose a new online FDR control method, ADDIS, to compensate for the un-\nnecessary power loss of current online FDR control methods due to conservative nulls. Numerical\nstudies show that ADDIS is signi\ufb01cantly more powerful than current state of arts, under settings\nwith many conservative nulls, and rarely lose power under settings without conservative nulls. We\nalso discuss the trade-off between adaptivity and discarding in ADDIS, together with some good\nheuristic of how to balance them to obtain higher power. In the end, we generalize the main idea of\nADDIS to a simple but powerful rule \u201cdiscarding\u201d, and incorporate the rule with many current online\nFDR control methods under various settings to generate corresponding more powerful variants. For\nnow, we mainly examine the power advantage of ADDIS algorithm with constant \u03bb and \u03c4, though\nfor future work, how to choose time varying {\u03bbj}\u221e\nj=1 in a data adaptive matter with\nprovable power increase is worthy of more attention.\n\nj=1 and {\u03c4j}\u221e\n\nReferences\n[1] Aaditya Ramdas, Fanny Yang, Martin Wainwright, and Michael Jordan. Online control of the\nfalse discovery rate with decaying memory. In Advances In Neural Information Processing\nSystems, pages 5655\u20135664, 2017.\n\n[2] Dean Foster and Robert Stine. \u03b1-investing: a procedure for sequential control of expected false\ndiscoveries. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 70(2):\n429\u2013444, 2008.\n\n[3] Ehud Aharoni and Saharon Rosset. Generalized \u03b1-investing: de\ufb01nitions, optimality results and\napplication to public databases. Journal of the Royal Statistical Society, Series B (Statistical\nMethodology), 76(4):771\u2013794, 2014.\n\n[4] Adel Javanmard and Andrea Montanari. Online rules for control of false discovery rate and\n\nfalse discovery exceedance. The Annals of Statistics, 46(2):526\u2013554, 2018.\n\n[5] Aaditya Ramdas, Tijana Zrnic, Martin Wainwright, and Michael Jordan. SAFFRON: an adaptive\nalgorithm for online control of the false discovery rate. In Proceedings of the 35th International\nConference on Machine Learning, volume 80, pages 4286\u20134294, 2018.\n\n[6] Qingyuan Zhao, Dylan S. Small, and Weijie Su. Multiple testing when many p-values are\nuniformly conservative, with application to testing qualitative interaction in educational inter-\nventions. Journal of the American Statistical Association, 0(0):1\u201314, 2018.\n\n[7] Jules L Ellis, Jakub Pecanka, and Jelle Goeman. Gaining power in multiple testing of interval\n\nhypotheses via conditionalization. arXiv preprint arXiv:1801.00141, 2017.\n\n[8] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and\npowerful approach to multiple testing. Journal of the Royal Statistical Society, Series B\n(Statistical Methodology), 57(1):289\u2013300, 1995.\n\n[9] John Storey. A direct approach to false discovery rates. Journal of the Royal Statistical Society,\n\nSeries B (Statistical Methodology), 64:479\u2013498, 2002.\n\n[10] Adel Javanmard and Andrea Montanari. On online control of false discovery rate. arXiv preprint\n\narXiv:1502.06197, 2015.\n\n[11] Tijana Zrnic, Aaditya Ramdas, and Michael Jordan. Asynchronous online testing of multiple\n\nhypotheses. 2018. arXiv:1812.05068.\n\n[12] John Storey, Jonathan Taylor, and David Siegmund. Strong control, conservative point esti-\nmation and simultaneous conservative consistency of false discovery rates: a uni\ufb01ed approach.\nJournal of the Royal Statistical Society, Series B (Statistical Methodology), 66(1):187\u2013205,\n2004.\n\n[13] Aaditya K Ramdas, Rina F Barber, Martin J Wainwright, Michael I Jordan, et al. A uni\ufb01ed\ntreatment of multiple testing with prior knowledge using the p-\ufb01lter. The Annals of Statistics,\n47(5):2790\u20132821, 2019.\n\n9\n\n\f", "award": [], "sourceid": 5005, "authors": [{"given_name": "Jinjin", "family_name": "Tian", "institution": "Carnegie Mellon University"}, {"given_name": "Aaditya", "family_name": "Ramdas", "institution": "Carnegie Mellon University"}]}