NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:8941
Title:Power analysis of knockoff filters for correlated designs

Reviewer 1

II think there might be good stuff in here, but it was pretty abstract for a neurips paper. A more technical journal might be more appropriate. I believe this is the setup. You have a bunch of iid samples of the form - X_i ~ N(0,Sigma) - Y_i ~ N(theta^T X_i , sigma^2) where there are some j for which theta_j=0 and for all other j we have |theta_j|>1. The task is to estimate the set of indices j such that theta_j=0. To solve this problem: 1) you design a way of making knockoffs (in this case I think that amounts to designing a conditional distribution Xtilde |X so that (X,Xtilde) has all the nice knockoff properties) 2) you solve min [(Y - theta1^T X) + lambda |theta1|_1] 3) you solve min [(Y - theta2^T Xtilde) + lambda |theta2|_1] 4) You sort the indices by |theta1_j| - |theta2_j|, and take the ones which are most impressive (with a carefully chosen threshold) Some things that made me nervous: 1) in Definition 1 it says the amount of noise for each Y_i is actually proportional to the number of samples, n. It is written N_i ~ N(0, nsigma^2), which I don't understand at all. Why would that be? 2) it is never explicitly stated in definition 1 that X_i ~ N(0,Sigma), but I assume this is the case? 3) Any nontrivial example of how to design gaussian knockoffs would be nice. For example, say X is a tree graphical model. How might I make my knockoffs? Apparently "conditional independence knockoff" gives me a way, but I didn't understand how you actually design the distribution. 4) you call your regression estimates "lasso coefficients." That tells me that you must have fit the regression using lasso. But you didn't actually say that, so I was nervous that I might be confused. 5) I can't actually tell whether maybe you fit both lasso's simulatenously, i.e. you fit min [(Y - theta1^T X + theta2^T Xtilde) + lambda |theta1|_1 + lambda |theta2|_1]. 6) I conclude that the nonzero theta_j satisfy |theta_j|>1 because that assumption is made in proposition 4, but I'm not actually sure that it is made throughout the paper. But assuming I got everything right, you then go on to show conditions under which your discovery rate was high. Unfortunately, I couldn't make heads or tails of those conditions. I can read them and understand their literal meaning, but I have no intuition at all. The marginal variances Sigma_{jj} seems to have some very important role, I think I want them small. It almost seems like I need sum_j Sigma_jj to be bounded in the limit as p->infinity with n/p>1 -- but that seems like an absurd assumption, since it would imply that most of the variables are essentially deterministic?

Reviewer 2

Significance: Recently knockoffs have been proposed to control FDR in multiple testing problems. Alot of the literature in this area deals with designing procedures with guaranteed FDR control. However, the power analysis of these procedures is not well understood. This paper tackles the important problem of analyzing the power of knockoff procedures under an idealized setup with correlated gaussian features. Originality: The paper leverages previous AMP-based analysis of the LASSO estimator to derive their theoretical results. Existing work leveraged this analysis to study the power of knockoff procedures with i.i.d. Gaussian design, this work considers correlated gaussian design. Clarity: The work is reasonably well written, but there are still several typos. Detailed comments: Proposition 2,4: Proposition 2,4 provide a tight characterization of the Information Theoretic limit of achieving FDR=0 and POWER = 1 asymptotically. Can the author clarify how is the task of achieving asymptotically FDR = 0 and POWER = 1 different from support recovery? Are any of these results implied by existing results regarding support recovery? Lemma 5: From my understanding, this is only a sufficient condition to guarantee FDR = 0 and POWER = 1 asymptotically. This is a limitation of this work since the comparison between different knockoff procedures is done based on this sufficient condition, and it is not clear that this sufficient condition is tight. Simulations: The authors use simulations to demonstrate that comparision between theoretically derived sufficient condition (ESD) reflects the performance of the knockoff procedures in simulations. In particular the ESD condition predicts the failure of equi-knockoff procedure correctly. However the performance difference between conditional knockoff and sdp-knockoff don't seem to be significant and so these simulations don't really tell us anything about the power of the ESD condition to compare these two techniques. It would be great if the authors could find setups where these two knockoff methods have significantly different performance and compare the ESD in these setups.

Reviewer 3

Originality: Although the result was derived under Gaussian assumption, it is interesting and includes the original result; the authors give the notion of ESD that tends to zero implies type II error (1-power) goes to zero, and show that the power is closely related to the empirical distribution of the precision matrix of regressors. Quality: The theoretical result is constructed using Javanmard and Montanari (2014, IEEE-T-IT) and looks fine. Clarity: The paper focuses mainly on the theory, but it seems better to add some comments on the connection to practical viewpoints. For example, can many real data satisfy the conditions of Proposition 4? Significance: It is significant as there has been no such formal theoretical analysis on power in knockoffs inference. It is also important for empirical researchers since it is used for selecting a knockoff that yields higher power.