Review for NeurIPS paper: An Efficient Adversarial Attack for Tree Ensembles

NeurIPS 2020

An Efficient Adversarial Attack for Tree Ensembles

Review 1

Summary and Contributions: The authors study the problem of efficient adversarial attack on tree based ensembles such as gradient boosting decision trees. They propose a greedy algorithm to iteratively optimize the adversarial example.

Strengths: The authors provide theoretical guarantees for their work. They compare to baseline methods in their empirical validation.

Weaknesses: The overall writing of the paper can greatly be improved as some parts of the paper is not clear or easy to understand. The experiments are uses a small test sample size so its difficult to judge how efficient the method will be on large data.

Correctness: The authors claim their method is practical and efficient in evaluating robustness on tree ensembles but report results on small data sample size. They are yet to establish how their method will work for larger datasets.

Clarity: No, the paper is not well written and is difficult to follow.

Relation to Prior Work: The authors do compare to prior works but it is not clear how their work differ from some of the prior works. The authors didn't do a good job of explaining the differences.

Reproducibility: Yes

Additional Feedback: Some suggestions to improve the overall organization of the paper and aid understanding: Re-organize the background and related work section to clearly state how your method differs from the different related works. The limitation/similarity of each method to yours. The motivation for the work is not well established or should be better explained. Table 1 and 2 should highlight best performing methods to make it easier to read. Additionally, Figure 2 is too compact and adequate explanation of the performance of the methods in the figure should be provided. ========================================= I acknowledge that I read the rebuttal provided by the authors.

Review 2

Summary and Contributions: This paper transforms the attack problem into a discrete search problem specially designed for tree ensembles, where the goal is to find a valid leaf tuple that leads to mis-classification while having the shortest distance to the original input. Experiments also prove the effectiveness of the method.

Strengths: 1. This paper transforms the attack problem into a discrete search problem specially designed for tree ensembles. 2. The method is faster than many current adversarial attack methods. 3. The method can provide smaller (better) adversarial examples than decision-based black-box attacks on general Lp (p = 1, 2, 1) norm perturbations.

Weaknesses: 1. It seems that this method is only suitable for white box attacks. When the attacked GBDT tree based model is unknown (including the depth and number of trees), this method is obviously not applicable. 2. The paper does not give the experiments on more challenging data sets to verify the method performance on the tree with higher complexity. 3. The comparison methods are based on Python time statistics, while your method is based on C + +, which does not seem to have fair comparison. 4. Lack of ablation experiments and visual samples, which makes the manuscript difficult to understand.

Correctness: The expression of **bounding boxes B(C)** in method is not very clear.

Clarity: There are no obvious grammatical and spelling problems.

Relation to Prior Work: It is clearly discussed the difference from the previous contributions.

Reproducibility: No

Additional Feedback: Can you explain why the points x’ and a in Figure 1 are local minimums? The point b is not in the neighbor of a? Why the searching of adversarial sample will stop in x’ and a?

Review 3

Summary and Contributions: The paper presents an efficient attack algorithm against tree ensembles using a greedy search algorithm. The search cost is cut down significantly by transforming the continuous input space into discrete leaf tuples. Empirical results demonstrate impressive speedup with a tighter perturbation bound compared to the existing attack algorithms.

Strengths: The proposed attack technique works both efficiently and effectively in searching for adversarial samples with minimum perturbation. Extensive empirical studies demonstrate the superiority of the proposed attack technique in comparison to the existing ones.

Weaknesses: The practical significance of minimizing perturbation needs to be better motivated.

Correctness: The claims are strongly supported by the experimental results.

Clarity: Yes, although there's room for improvement. For example, the abuse of symbol \mathcal{C} in the mix with an input point x in "Tuple to Example Distance" is quite disturbing. On page 1, "require large amount of queries" ---> a large number of

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: The conventional wisdom of crafting an adversarial sample x' for x is to ensure x' and x are as close as possible according to a predefined distance measure. In practice, unless x' is distorted so much that it becomes an outlier and may be easily detected, how much does an adversary really care about the amount of distortion unless it is tied directly to the cost? It seems more reasonable to constraint adversarial attacks on their ability to transfer to other learning models or their preservation of malicious utilities.

Review 4

Summary and Contributions: The paper proposes a more efficient method of generating adversarial examples (in the white-box setting) for tree ensembles (random forest, boosted trees, etc.) by exploiting the structure of the ensemble. After selecting a random adversarial example from a pool of initial candidates, their algorithm greedily updates the initially chosen adversarial example until it is "close" to the target input sample (in terms of l_p norm). They do this by performing an iterative search over the transformed input space of valid leaf tuples, and stopping when no more adversarial examples with a smaller l_p norm exist. The paper contributes a new adversarial attack, specifically designed for tree ensembles, that is more efficient than alternative methods while generating adversarial examples similar to exact methods.

Strengths: Tree ensembles are widely used and important models in many domains; this is evident by the consistent publication of new implementations of gradient boosted frameworks such as XGBoost, LightGBM, and CatBoost. The problem of generating quality adversarial examples is important and can lead to more robust models, and thus would be of particular relevance to this community. This work introduces a method which generates adversarial examples close to the exact method while being significantly faster than previous methods, sometimes by several orders of magnitude. The idea of discretizing the input space into a set of valid leaf tuples from which an iterative greedy search is performed appears novel in contrast to previous works. The approach has a solid theoretical grounding and performs a thorough empirical evaluation that includes comparisons to relevant alternative methods.

Weaknesses: The biggest variable to this approach seems to be the size of the neighborhood of possible adversarial examples when updating, given the current adversarial example. The choice of this variable introduces a tradeoff between efficiency and optimality. However, the authors do provide a greedy algorithm for estimating the neighborhood size to obtain optimal results, and they empirically show that a hamming distance of 1 between neighbors generally works well. Since this approach utilizes the structure of the ensemble to create adversarial examples, is it robust to changes in the tree structure? For example, a tree may have multiple possible structures that are functionally equivalent; thus, is this method robust to structural changes and provide adversarial examples that ultimately improve robustness of a tree ensemble trained on a given dataset? In the same vein, have the authors performed any experiments applying their approach to building robust tree ensembles? If so, how do the resulting models compare with other "robustly" trained tree ensembles trained on datasets augmented with adversarial examples generated by alternative methods?

Correctness: The claims appear to be correct and the empirical methodology correct. However, I would like to know if the authors ran experiments on more of the datasets for the random forest experiments, as less than half of the overall number of datasets were used for those experiments.

Clarity: The paper is well written and organized with a clear motivation and problem definition. The arguments of their methodology also follows a logical and intuitive progression.

Relation to Prior Work: The authors do a good job of contrasting their work with alternative methods; one of the biggest differences of this approach is the significant decrease in number of full model queries in comparison to other methods.

Reproducibility: Yes

Additional Feedback: The text in all of the tables and figures are a bit small and hard to read. It can also be a bit difficult to compare each method's metric to one another in each table; perhaps one table for utility and another for efficiency may increase legibility and comprehension. I have read the author responses and they have adequately addressed my concerns, and I will maintain my recommendation of "accept".