Reviews: Approximate Feature Collisions in Neural Nets

This work is outside by immediate research area, but I found it interesting. I saw examples along the lines of those shown in Figure 3 at least fifteen years ago illustrating the finite capacity of ANNs and the importance of architecture. In the particular context considered here, this is shown very explicitly in the context of the input layer when using ReLU activations. The thresholding nature of these activation functions indeed allow a precise characterization of the behavior. I find it difficult to believe that experts in the field are not aware of this potential behavior but the paper develops the idea considerably and seems interesting. It seemed to me that one possible use of the developed algorithm would be to provide a post hoc assessment of the adequacy of a trained neural network: given the trained network is there a collision polytope of sufficiently diverse inputs that one would be unhappy to depend upon the network? I was surprised not to see this in the discussion. The example in Section 3.1 seems to be somewhat simpler than the state of the art in this area: does one find similar behavior with more complex conv nets with appropriate input features? l216 founding -> found? It might be interesting to comment on the extent to which the findings might transfer approximately to other activation functions -- sigmoidal and tanh activations, for example, map large parts of the state space to values very close to +/-1. Details: l83: the intersection of halfspaces is in general a convex polytope, but can easily be empty. I'd have liked to see some discussion of if/when that situation might arise. l87 infinitely colliding examples -> infinitely many colliding examples? I thank the reviewers for their response, particularly developing some possible applications on the work, which seem to me to strengthen the work somewhat, and for providing numbers for a more sophisticated network.

Reviewer 2

I believe the phenomenon identified here seems interesting and worth considering. The authors point out the work of Jacobsen et al. which addresses a similar topic, but as concurrent work this does not diminish the originality of the paper here. (And moreover Jacobsen et al. address network invariance from the perspective of the classification boundary and not necessarily with respect to the internal features.) The paper is easy to follow in most parts. The introduction in particular very clearly defines and unpacks the phenomenon of interest. The Method section would benefit from slightly more care in its technical details. For instance, possible edge-cases occur to me in the claim that "the intersection of all the sets listed above is a convex polytope in at least a d-n_p dimensional subspace". I believe this would not hold for instance in the case that several (w_i, b_i) define parallel but distinct hyperplanes? Likewise, the implications of the objective function (3) are not clear to me. Is the true minimizer of this always necessarily a corner of the polytope, or do we simply hope it is close to one? From the comment on line 159 that the "intermediate activations of the polytope and the target are similar", i.e. not *exact*, I suspect the latter, but it isn't obvious. In any case, the clarity of this section could benefit from some small Propositions in the appendix that address these and similar points. I was also not completely clear on the methodology used for the experiment in Section 4.1. Is the same database of patches that is extracted from the original dataset used to define the possible patches at each location in the new image? E.g. would the softmax described in lines 203-204 have 37011074 outputs? Or do you only consider patches drawn from the same locations in the original images? In summary, the paper explores an interesting phenomenon that has potential practical implications for neural network robustness. However, it could benefit from some points of greater clarity, particularly in sections 2 and 4. === UPDATE ============ Having read the authors' response and the other reviews, my opinion is largely unchanged. I believe the topic is potentially of some interest to the community, although I agree with the other reviewers that it is a bit incremental. I also still think it would benefit from greater clarity - the authors have at least promised to address this, although without seeing a revised version I can't confirm that this is not still an issue. Overall, my score is still at a 6 (weak accept). I would give the paper a 5.5 if I could.

Reviewer 3

This paper explores the problem of feature collision in ReLU networks --- where examples contained within large polytopes in the input space all map to the same (or approximately the same) feature space. The paper shows that this is driven by the characteristics of the ReLU function, namely that they are many-to-one over negative activations. The existence of such large polytopes is an interesting and complimentary result to the recent work on adversarial examples (i.e. excessive invariance versus excessive sensitivity). In terms of methodology, the paper gives a well reasoned and executed solution for finding the polytopes. The proposed method enables one to quickly find polytopes that all map to approximately the same feature space for (what empirically appears to be practically any) arbitrary input examples. Overall, the paper is very well-written and the approach is interesting and novel. Though the analysis of this behavior and the findings regarding the ubiquity of these invariant polytopes is stimulating in its own right, my only concern is that I am still slightly unsure of the significance of these results. This doesn’t seem to be a goal of the paper, but I didn't find the discovered polytopes to be that convincingly distinctive. They still seemed visually similar (albeit at low human frequencies) or have clear artifacts (even in the image-patch-constrained experiments), so adversarial generation doesn’t quite seem to be a clear next step. Additionally, though a qualitative observation, it’s not completely obvious that the polytopes given in the figures should even be considered distinctly that bad for mapping to the same class/~features. Clearly there are artifacts of the target image in the polytope corners (even in Table 5). However, I do agree to some extent that in general it can be quite undesirable behavior for them to have the same confidence scores. Finally, I wonder if some additional analysis into the polytopes could be useful for interpretability or deriving rationales (particularly in the case of natural language, where using words instead of image patches might lead to very distinguishable results). Minor comments: line 32: in fact, there could be a line 91: I would move the comment in parentheses to a footnote line 174: "the polytope that even larger relative" needs fixing line 216: the colliding polytopes found === After Response === Thank you for your response and clarification. The suggested possible ramifications of these findings (data representation, regularization) indeed seem reasonable (if a bit optimistic).

Paper ID:	9264
Title:	Approximate Feature Collisions in Neural Nets

Reviewer 1

Reviewer 2

Reviewer 3