NeurIPS 2020

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Review 1

Summary and Contributions: In this paper, the authors propose theoretical and empirical results of backdoor attacks on federated learning. Furthermore, a new family of backdoor attacks called edge-case dackdoors is proposed.

Strengths: The theoretical analysis shows the existence of backdoor attacks on federated learning, and the difficulty of detection, supported by empirical results. A new family of backdoor attacks called edge-case dackdoors is proposed. Empirical results show the effectiveness of the new attacks.

Weaknesses: The baselines are limited to Krum and RFA. Most of the figures, especially Figure 2 are too small to read. I suggest the authors to put enlarged figures in the supplementary.

Correctness: The paper is technically sound.

Clarity: The paper is well written and easy to follow. This is a minor issue which won't affect the score. Some references are out-of-date or have some format issue. It will be better to double-check the formats and cite the conference versions instead of the arxiv versions. For example, [19] shoudl be "Byzantine generals" instead of "byzantine generals". [22] has a conference version "SLSGD: Secure and Efficient Distributed On-device Machine Learning" published in ECML-PKDD. [24, 29] are published in ICML. I've just had a glance at them. There might be more.

Relation to Prior Work: The difference and improvement compared to the previous work is clearly discussed.

Reproducibility: Yes

Additional Feedback: I wonder why the authors chooses Krum and RFA as baselines. What about the trimmed mean/median proposed in [59]? I believe coordinate-wise trimmed mean/median has similar error bounds as Krum and RFA, but lower computation complexity. ------------------- after authors' feedback I'm in general satisfied with the paper and authors' feedback. I will simply keep the positive review.

Review 2

Summary and Contributions: - The paper studies backdoor attacks in a federated learning setting. - The paper first proposes three theoretical results: (i) that if a model is vulnerable to adversarial examples, then the vulnerability also extends to backdoor attacks. (ii) that detecting backdoors in a model is an NP-hard problem, when reduced to a satisifiability problem (iii) extending (ii) to show that with high probability, gradient-based techniques cannot find or detect backdoors. - The paper also introduces "edge-case" backdoors by repurposing data in low-density data regions to a particular targeted class. - Evaluation is quite extensive and covers multiple datasets, black- and white-box attacks, and presence of multiple defenses.

Strengths: 1. Implications of edge-case backdoors / Timely problem - I think the findings of the paper present important implications and I appreciate this direction of study. - Specifically, that data in low-probability data regions lend themselves to effective backdoor attacks. This seems realistically possible -- an adversary could contribute malicious updates from (mislabeled) data sampled from an underrepresented group. 2. Experimental Setup - The experimental setup is elaborate and convincing. The authors evaluate across multiple challenging datasets (e.g., ImageNet), on a large-scale (thousands of clients) and with a simulated non-IID distribution (sometimes also realistic like in the case of Reddit).

Weaknesses: === Major Concerns === 1. D_edge - I have some concerns w.r.t the backdoors injected via D_edge (i.e., data from tail end of some distribution). - (i) Novelty: How is this approach different from related backdoor attacks - specifically semantic backdoor attacks e.g., [13] which repurposes cars with racing stripe as the backdoor? - (ii) Constructing a p-edge-case example set (L190-205): If I understand correctly, the authors demonstrate that ARDIS makes for a good edge dataset for MNIST because it lies in the low density data regions (Fig. 2). But, wouldn't essentially any set of data outside from MNIST display similar statistics (e.g., CIFAR, EMNIST) -- possible even adversarially crafted data? But more generally, I find constructing the p-edge-case dataset in the paper loosely defined. - (iii) Setup of D_edge: By default, does only one client in FL have access to the backdoored edge data? Because almost always in the paper, D' is defined as D \cap D_edge. I was expecting it to be mixed only with a particular data partition D_i. Additionally, what is the absolute number of edge examples used in the experiments? - (iv) Evaluation: Related to my concern (i), since there exists a long line of work on generating various types triggers in backdoor attacks (where almost lie outside the high density data regions), I would have appreciated discussion and evaluation to compare the proposed choice of "poisoned" edge-data vs. other methods e.g., BadNets (Gu et al). 2. Proposition 1 and 2 - I appreciate that the authors theoretically demonstrate the challenge in detecting backdoor attacks. - However, I found that this section (L234-241) is simply not self-contained and somewhat difficult to follow without the appendix. I would appreciate if the authors at least mention implications of the propositions in the main text. - Additionally, proposition 2 makes an assumption that the underlying data distribution is uniformly distributed. I wonder how strong is this assumption? 3. Some results are unexpected / unclear - I found some results in the paper somewhat unexpected and would appreciate if the authors clarified these. - (i) Fig. 4: I am surprised with the high-performance on Target task with 50% mislabeled + 50% honestly labeled data. I was expecting it to be much lower. After all, the training data consists of equal number of correct and incorrectly labeled instances. Is this perhaps because the two types of data are distributed among clients in different manners? - (ii) DP defense: I am also surprised with the ineffectiveness of DP-based defense against the attack. Does NDC provide the same user-level DP guarantees as [35]? Because, in this case, I would expect the influence of a particular user's data towards training the global model minimized and demonstrate lower target task accuracy. In addition, given that the main task accuracy remains somewhat consistent in both Fig. 4 and 5d, I wonder whether the DP defense is correctly calibrated (because I was expecting a drop here as well). === Minor Concerns + Nitpicks === 4. Writing - Since KRUM plays a significant role in the evaluation, I would appreciate if the authors added a 1-2 line intuition in the paper. - (Nitpick) A couple of typos: Fig. 1 (a) and (b) are flipped, "specfified" (L285), "verify the hypothesize" (L333). - (Nitpick) Please rename "Task N" to the name of the dataset. It would be much easier to follow the experiments without having to constantly look-up the mapping of N to dataset.

Correctness: Yes, they mostly appear correct.

Clarity: Yes, the paper is well-written. I admire that although the paper studies the problem extensively, it is easy to follow for the most part.

Relation to Prior Work: Somewhat. I do have a concern of how the edge-case dataset varies from related backdoor methods -- elaborated in point #1 under "3. Weaknesses"

Reproducibility: Yes

Additional Feedback: === Post-rebuttal === I read the rebuttal and other reviews. I will keep my positive rating.

Review 3

Summary and Contributions: This paper discusses the problem of backdoor attacks (i.e., adversarial model updating to ensure poor/alternative performance on a specific subset of the data space) in federated learning. The authors first establish a connection between adversarial examples and backdoor attacks, essentially implying that (under suitable assumptions), models that are prone to adversarial corruptions, then the models will be vulnerable to backdoor attacks as well. Next, the authors demonstrate that the backdoor detection problem is NP-Hard via a reduction of the problem from 3SAT, and that gradient-based techniques are unlikely to capture backdoors from edge-case samples.

Strengths: - I find the paper to be very well-written, with clear statements and exposition. - The authors do an admirable job of positioning the paper with relevant related work. - The theoretical contributions (Thm 1 and Props 1&2) are interesting: Thm 1 provides a connection between the backdoor problem and the adversarial perturbation problem and Props 1 & 2 provide an alternative insight into the hardness of the backdoor problem. - The experimental benchmarks seem exhaustive, the authors evaluate on 5 datasets that span computer vision and NLProc, and additionally demonstrate that existing methods do not easily provide defense.

Weaknesses: Regarding Thm 1: XX^T being invertible -- I am not sure if this assumption holds in practice (eigendecomposition of activations of neural networks suggests that they are generally much lower rank, and are highly correlated). Can the authors comment on this, since that makes their upper bound vacuous? Regarding Prop 2: How reasonable are the experimental D_edge sets considered to be actually having "exponentially small measure"? Is there a heuristic that can guide practitioners while creating defenses?

Correctness: The proofs are valid (although I have a question regarding an assumption in Thm 1), and the experimental pipeline follows the design choices and datasets typical in the problem domain.

Clarity: Yes.

Relation to Prior Work: The paper discusses an area of machine learning that is nascent, and the authors do a good job of positioning their work with respect to existing research.

Reproducibility: Yes

Additional Feedback: ===POST REBUTTAL=== I have read the rebuttal, and maintain my rating.