Paper ID: | 7897 |
---|---|

Title: | Computing Linear Restrictions of Neural Networks |

Overall, the paper is well-written and presents an important novel contribution to the field in the ExactLine algorithm, although the description of the algorithm is a little unclear. The ACAS Xu example showcases the usefulness of such an algorithm in investigating and diagnosing problems with the decision boundary of a NN (e.g. the non-symmetric portions of the ACAS Xu boundary). The use of ExactLine to investigate inaccuracies in Integrated Gradient approximations is also an important contribution. Finally, the investigation of adversarial examples is also very interesting; further similar experiments using ExactLine could be a key component of developing a solid understanding of adversarial examples in the future.

Detailed comments: I have several comments to this work: 1. In general, the idea of this paper is novel and interesting, which provides a new way for human to understand the neural networks. Three applications for ExactLine are introduced and some interesting results are shown to investigate questions about the neural networks. 2. For the Integrated Gradients, experimental results are expected to prove whether the exact IG computed by equation (8) can generated better saliency maps than the approximate IG computed by equation (6). 3. Although the proposed method provided some new insights about the behavior of the neural networks, the application of the proposed method seems limited. The authors also admit that the results for the adversarial examples do not show whether reduced density of linear partitions contributes to robustness. Is there a way to utilize the proposed method to better distinguish the adversarial examples or better design more robust neural networks? More thinking about how to utilize the proposed method are expected.

The paper proposes an algorithm for computing the partition of a line into segments corresponding to different regions in a ReLU network. After explaining the algorithm, the authors present an application by studying the decision boundary of a network that determines what action an aircraft should take in order to avoid a collision with an intruder. In particular, they apply their algorithm to compute the output of the network on all points on a grid of lines (in polar coordinates) and show benefits over sampling the output at a discrete set of points. The authors then discuss another possible application of the algorithm, namely the exact computation of integrated gradients (IG). They analyze the efficacy of approximations of IG and discuss possible improvements to commonly used heuristics. Finally, the algorithm is used to analyze adversarial examples in neural networks. The authors observe that the original image and the perturbed adversarial image are often separated by many regions, a fact that they take as evidence against the "linear explanation" of adversarial examples. They also note that networks trained to be robust to adversarial perturbations seem to have significantly fewer linear partitions. Overall, the paper is well written (even though mathematical notation is sometimes confusing, see below). My main concern is that I find that the algorithm for ExactLine is simply an iterated application of the most straightforward way of computing segments along a line induced by ReLU activations. For this reason, it's hard to believe that no one has used a similar strategy before, and the algorithm does not seem like a substantial theoretical contribution. On the other hand, the different applications and empirical observations that the authors present may be useful for practitioners (I'm not sure). More comments: - The notation used in Theorem 1 should be explained: Q_i and R_i are i-th coordinates of Q and R, but this is not stated (and is also confusing because elsewhere P_i is the i-th point of the partition). - What is QR with \leftrightarrow in Theorem 1? - A better explanation of the ACAS Xu example could be helpful. Why does "one usually wishes to probe and visualize the recommendations of the network" (l.120)? - Integrated gradients: it seems surprising that "number of samples needed is relatively consistent within each network" (l.194). How were the averages in Table 2 computed? What was the variance? - Isn't the trapezoidal approximation of an integral always better than left and right sums? If so, ExactLine did not "provide novel insights into the behavior of neural networks" (l.209). - Some of the figures and tables are not very clear. For example, the different lines in Figure 4 should be explained.