Review for NeurIPS paper: Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond

NeurIPS 2020

Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond

Review 1

Summary and Contributions: This paper proposes perturbation analysis on general computational graph structures. The analysis is based on LiPRA algorithms, which analyze the tight lower bound and upper bound. The evaluation on both image feed-forward NN case, and NLP LSTM and Transformer models demonstrate the potentiality of proposed methods.

Strengths: + This paper works on an important problem. + The proposed technique should be a feasible solution. + Important contribution for general graph structure analysis with loss fusion for scalability + The evaluation covers diverse model architecture and tasks.

Weaknesses: - For the image and feed-forward NN case, existing methods such as abstract interpretation and other methods are also applicable, which should be compared for bound tightness analysis and efficiency. - The evaluations are performed on small datasets. But I do not see this to be a severe issue for formal bound analysis.

Correctness: The proposed techniques are sound and properly evaluated, although more comparison with existing methods are highly desiable.

Clarity: The paper is well written, dense, but easy to follow

Relation to Prior Work: Mostly

Reproducibility: Yes

Additional Feedback: Please refer to the comments. Post-rebutal: The authors properly address my concerns in the rebutal phase. This is a solid work and important contribution, which I would raise my score.

Review 2

Summary and Contributions: In this work, the authors create a method for doing linear bounding of general computational graphs. Their primary contribution is the algorithm for establishing and propagating linear bounds through computation graphs that may have irregular structure. This has the advantage of this framework is that it allows for perturbation analysis without hand-coding linear bounds. The authors then use their tool on a variety of tasks and evaluate its performance both with the verifications the tool is able to provide as well as the memory and computational requirements of their methods.

Strengths: I think this work definitely makes a solid contribution to the NeurIPS community as perturbation analysis is an important tool for furthering our theoretical and empirical understanding of the working of deep learning systems. The paper is well presented and has minimal type-os The concept is pretty easy to follow and the algorithm is clean enough to be read and understood. They provide a good covering in terms of tool evaluation. The authors evaluate their tool on "benchmarks" by which I mean present their tools results on datasets and models which have been done before to show that their method works. Further, they add to discourse, albeit in a minor way, with their analysis of loss landscapes to show how the tool can be used to test current hypotheses in the deep learning literature.

Weaknesses: There are two handicaps that I can see to this work: firstly, the methodology while impressive in its ability to generalize to LSTMs, CNNs, and Transformers actually seems like it may be limited in terms of its scope of verification procedures it admits. In the abstract the authors mention that they can implement things such as CROWN into their LiRPA algorithm, however CROWN also explicitly derives quadratic over and under approximations of pre-activation bounds and I am curious as to if the authors have a scheme for directly accommodating things of this nature along with higher-order convex relaxations (e.g. even degree 3 polynomials for small networks) which are harder to propagate in general as there are more complex dependencies. [I recognize that this may be difficult to do.] Further on this point, the method can really only handle linear constraints in the input. Indeed, the authors show how they can bound discrete perturbations in the NLP scenario, but at the end, it looks to me like the authors recursively build a linear region in the input space by taking maximally perturbed embeddings from the synonym set. Is there a plan for how to incorporate non-linear specifications? I suppose if the specification is non-linear but monotonic all of the same propositions hold, but I think it would be nice if the authors clearly laid out a plan for future work and extending their tool. Even sketches of how these computations would be compatible with the current framework would contribute, in my mind, to the potential significance of the method. The second potential weakness I see exist in the experimental validation of the framework. Unfortunately, it doesn't tell us anything particularly new. The authors compare their tool with IBP, but it isn't a really fair comparison as IBP is inherently more approximate than any linear bound propagation. Finally, I am hoping the authors may be able to find some space to address the kind of odd performance on the NLP benchmark. I have read other papers which do LBP for NLP models ([15] in the paper), and while I recognize the difference in setting, in that work there was a clear degradation of model performance (in terms of verifiability) as the perturbation magnitude increased. Which is what I would expect to be true, largely, of all neural network classifiers. Yet, this method reports that verified error remains constant across all delta's? At least in the rebuttal can the authors give me some intuition for why this is the case? I just hope it is not indicative of a bug in the code is all. **************************** After author response: I would like to thank the authors for their clear and concise answers to my questions and confusions. I think this is a solid and mature work that makes an interesting and important contribution to those interested in checking the provable robustness of their models.

Correctness: To the best of my knowledge all of the proofs seem sound and I have read all of the formulations and they seem correct to me. There is a type-o (I think) just before equation (3) where the authors define two lower bounds on A_i when I think they meant to define a lower and upper bound.

Clarity: Yes the paper was very clearly written.

Relation to Prior Work: I think it is clear that the authors rely greatly on the previously proposed methods of linear relaxation to get bounds but make their contribution in the framework which can be applied to general computational graphs.

Reproducibility: Yes

Additional Feedback: One thing I would suggest to the authors as a clear area that they could contribute and gain novel insights is in Bayesian Neural Networks. The formulation of probabilistic safety for BNNs was only recently published in UAI (https://arxiv.org/abs/2004.10281) and it seems the authors do not do LBP for their BNNs on MNIST and do little analysis of BNNs at the scale that this paper considers. I think this is a great place for an application of this method where the authors can show clear improvement over state of the art, especially given that their tool (in the flatness section) shows that they can handle constraints in the parameter space. Perhaps even including this in future aims would be nice.

Review 3

Summary and Contributions: This paper proposes a framework to enable LiRPA for a wide range of neural networks, including DenseNet, ResNeXt and Transformers. Existing work either do not scale to these complex model architectures, or derives architecture-specific bounds for verification. To my best knowledge, this is the first work to unify and scale LiRPA-based neural network verification to these more advanced model architectures. Specifically, their framework computes the lower and upper bounds with both the forward and backward modes, and they develop a library to automatically propagate the bounds, similar in spirit to backpropagation for neural network training. Meanwhile, they propose the loss fusion technique, which reduces the time complexity for bound computation by a factor of the label set size. Therefore, they can scale the robust training to datasets with much larger label sets (i.e., Tiny ImageNet with 200 labels) than studied in prior work on certified defense. Lastly, their approach can be extended for verification beyond L_p perturbations. For example, they also consider synonym-based word substitution for NLP models. They evaluate several model architectures on CIFAR-10 and Tiny ImageNet for image classification, and SST-2 for sentiment classification. They show that their framework achieves better nature error rate and verified error rate than existing verification approaches. Meanwhile, the loss fusion technique significantly reduces the training time and memory usage.

Strengths: 1. To my best knowledge, this is the first work to unify and scale LiRPA-based neural network verification to multiple complex model architectures, ranging from vision to NLP models. 2. The proposed loss fusion technique significantly reduces the training time and memory usage, and scales training to support dataset with a larger label set than those considered in previous certifiable defense work.

Weaknesses: 1. From the Appendix, both Transformer and LSTM in this paper only include one layer. Does the approach still work for multi-layer NLP models? 2. Does "nature error rate" mean the error rate on the clean test set? If so, I feel that the error rates are too low compared to models trained in a standard way. Some explanation could be helpful. UPDATE: I thank the authors for addressing my questions, and I keep my original review score.

Correctness: The approach is technically sound.

Clarity: The paper is generally clear. However, the authors should do a careful proofreading to fix some typos. For example: Line 27 in the Introduction, the terms should be f(x0+δ) instead of f(δ). Line 54~56: "LiRPA" is misspelled as "LiPRA". Line 255: "Transfomrer" -> "Transformer".

Relation to Prior Work: The paper provides a good comparison with existing work.

Reproducibility: Yes

Additional Feedback: