Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper is well-written and offers a plethora of detailed information to support its claims, including: -- The interpretation of Gaussian data augmentation as inducing low=pass filtering by the network. This observation, while not entirely surprising, in made within a principled framework for evaluating augmentation techniques. --An investigation of the effectiveness of the AutoAugment approach for data-driven augmentation strategies, pointing at the effectiveness of more diversified augmentation strategies. -- The observation that adversarial attacks are still possible, even after low-pass filtering. and suggesting connection with the DeepViz approach for visualizing the inner workings of CNNs. Overall the results of the paper present work that feels preliminary, in that they consist mostly of a set of insightful observations and do not, at this point, present a coherent and principled framework of addressing data augmentation strategies for ensuring robustness in vision algorithms. However, within its limited scope, the paper is quite thorough and it is very likely to lead others to further investigate these fundamental and important issues. This reviewer has read the authors' rebuttal. After the ensuing discussion, the final score assigned remains unchanged.
This paper explores a useful direction to study adversarial robustness and some experimental results add value to understanding this topic. However, some ad hoc choices and some concerns about the experimental evaluation in this empirical study detrimentally impacts the quality of this paper. More details below. Originality: The work seems original to me. Clarity: The presentation is mostly clear. - (line 18) What are i.i.d. benchmarks? Conversely, what are non-i.i.d ones? Do you mean training and test distributions are the same? Quality/ Technical Correctness and Experimental Evaluation (clubbed together as this is an empirical study): The framing of the problem and technical (design) choices need to be clarified. - The main premise of the paper is a bit hazy. Since adversarial training is being used for robustness but testing is not against adversarial attacks/ perturbations but predefined ‘parametric’ ones (brightness, contrast etc. as mentioned in Figure 2), is the goal to be robust against ‘natural’ adversaries, i.e. those caused by natural distribution shifts which occur, when say, an ML system is deployed? Are malicious adversaries considered? If only the former, this point should be carefully elaborated and studied. Perhaps this is what ‘common corruptions’ alludes to (lines 27-30, 126-127). - Another aspect that needs clarification is the reason for using the Fourier domain (lines 35-36, 112-116) for the study. The domain assumes (spatial) stationarity and is used for modeling and studying the properties of linear, time invariant systems. - The observation that in adversarial robustness is being achieved in a part of the Fourier space and not everywhere doesn’t just suggest that more diverse set of augmentations are needed (lines 11-12, 40-43), but that a strategy for choosing this diverse set may be defined in the Fourier space but this is never investigated. - What is the impact of changing the strength of pure harmonic perturbations (v in line 78)? - The impact of varying the bandwidth of the bandpass signal is not at all clear and discussed properly (Figure 5, lines 147 – lines 173). Should the impact of increasing bandwidth be similar to that from smoothing the responses one observes in Figure 3 since AWGN has a flat spectrum and BPF is akin to adding all frequencies in the band in equal amounts? It’s not at all clear what clarity does using a variety of bandwidth choices brings to understanding the phenomenon the paper is trying to study. Similarly, the different choices of the magnitude of added noise (l_2 – lines 147, l_2 = 4, 15.7 in Fig 3 and 4 respectively) is not well motivated. - Section 4.2 (lines 174-189) needs more clarity. It’s not clear if the Fog noise model is a good model for fog corruption. Is fog corruption expected to be an LTI filter? If so, then it raises doubts about the results in Table 1. If not, then the assumptions for the experiment is Sec 4.2 are not met. But this can be studied by itself instead of the speculative musings in lines 185 – 189. - The motivation for Section 4.3 to use a more varied set of augmentations – this is reasonable but not informed (non-trivially) by the study. In other words, it’s a trivial statement to make. The two known augmentation strategies – AutoAugment and SIN+IN used the study are not really suggested by the previous sections. One would expect to propose an augmentation strategy in the Fourier domain to achieve robustness across the entire Fourier spectrum. - In the experiment to study adversarial training (Section 4.4, Figure 7), Madri’s PGD attack is used to generate adversarial examples. The set of adversarial samples generated is a function of the attack. It’s not clear if generic statements can be drawn about adversarial training based on a single adversarial attack model. Secondly, the comparison between the statistics of natural images and the adversarial perturbations seems misplaced. It’s clear at all what the spectrum of the entire gamut of natural images has to do with the spectrum of adversarial perturbations. The latter is more a statement on the classification boundaries of the trained model.
Originality: The frequency-domain interpretation of the effects of noise in training data as well as adversarial noise is original. . Quality: By analyzing common corruptions and model performance in the frequency domain, the paper establishes connections between frequency of a corruption and model performance under data augmentation. Clarity: The paper is easy to follow. Significance:: The analysis is made firm with results on a benchmark data.