Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Xiaoyi Dong, Dongdong Chen, Jianmin Bao, Chuan Qin, Lu Yuan, Weiming Zhang, Nenghai Yu, Dong Chen
Modern deep neural networks(DNNs) are vulnerable to adversarial samples. Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels. The existence of the sparse adversarial attack points out that DNNs are much more vulnerable than people believed, which is also a new aspect for analyzing DNNs. However, current sparse adversarial attack methods still have some shortcomings on both sparsity and invisibility. In this paper, we propose a novel two-stage distortion-aware greedy-based method dubbed as ''GreedyFool". Specifically, it first selects the most effective candidate positions to modify by considering both the gradient(for adversary) and the distortion map(for invisibility), then drops some less important points in the reduce stage. Experiments demonstrate that compared with the start-of-the-art method, we only need to modify 3 times fewer pixels under the same sparse perturbation setting. For target attack, the success rate of our method is 9.96% higher than the start-of-the-art method under the same pixel budget.