__ Summary and Contributions__: The authors consider the problem of detecting interactions between feature
elements in the feature space learned by a neural network. With the observation
that such interactions can be modeled after the actual network connectivity,
the authors propose a method based on topological analysis to address this issue.
This is achieved by means of a new quantitative metric called PID that measures
the interaction between input and output units in a neural network. The authors
provide both theoretical as well as experimental analyses to corroborate their
main ideas. Please see some notes below.

__ Strengths__: Strengths
- The paper is very well written and it is quite easy to follow the key ideas,
motivations, and considerations. One suggestion for the authors here would be
to consider including an introductory teaser figure in the front that encompasses
the problem and the key ideas- this will help with material exposition.
- The idea of using topological analysis in determining connectivity and
interaction strength by formulating the problem as one of measuring
graph persistence is interesting and well-motivated.

__ Weaknesses__: Weaknesses
Evaluating feature interactions
- I think the synthetic dataset experiment in Section 4.1 is a good step towards
evaluating the efficacy of PID. Looking at the AUC numbers, the performance of
PID, AG, and NID seems quite close. Given these are synthetic data results,
and the testbed is quite controlled, I would encourage the authors to provide
more insights on why the performance is similar/close. For instance, the authors
note AG is "tree-based", but it is not immediately clear how or why this may the
main reason PID to perform better in F5, F6, and F8. Furthermore, with NID being
a similar (in spirit) baseline, I would expect more in-depth analysis and discussion
on the benefits that PID brings comparatively.
Interaction detection in images
- I think we would need more information here to understand the problem setup. Are
the CNNs trained to perform categorical prediction, i.e., classify images? This
does seem to be the case but I'd request the authors to clarify.
If this is the case, when Q4 talks about understanding the behavior of CNNs,
my understanding is this refers to identifying which regions in the input image
are critical for the trained CNN to classify that particular image as belonging
to a certain category. In the explainable CV literature, there is much recent work,
with GradCAM being a notable example, that starts from the output prediction and
uses various ways (e.g., gradients) to determine pixel regions that are important,
producing saliency maps similar to Fig 4. Since the authors mention Q4 is fairly
broad terms, I would expect discussion and comparison w.r.t. existing methods
in this domain (at least CAM or GradCAM)
- Additionally, I do not think results on MNIST and FashionMNIST datasets are enough
to convincingly note PID can help explain behavior of CNNs. I encourage the authors to
evaluate on more challenging classification scenariors, including at least ImageNet. Finally,
to better understand the broader impact of PID in the context of CNNs, one would have
to look at objectives other than classification, e.g., regression, metric learning, etc.
While I do not expect results with these tasks already, I would encourage the authors
to at least discuss how PID can be used in these cases.
Minor, typos etc
L38, L145: homolgy-> homology
L94: measureing-> measuring

__ Correctness__: Seems to be so.

__ Clarity__: Yes

__ Relation to Prior Work__: Please see my notes above.

__ Reproducibility__: Yes

__ Additional Feedback__: Post-rebuttal comments
I have read the authors response and other reviews and I appreciate the effort put in by the authors in trying to address most of the concerns. As noted, the idea is interesting and well-motivated; while the experimental analysis could have been more comprehensive (e.g., my notes on the classification experiments in Section 4.3), the paper nevertheless would be a good contribution. I keep my rating unchanged.

__ Summary and Contributions__: Post-rebuttal update:
I recommend this work put some more emphasis on the experiments in Appendix E.3: "Sensitivity to the Architecture and Regularization Strength". Some of the results could be shown in the main paper
I also think the image experiment in Section 4.3 should be de-emphasized or put in the appendix. If it shall be in the main paper, clarify that the experiment is exploratory and clarify any more limitations with the application to images (in the main paper) besides the large search space
I keep my rating
----------------------------------------
This paper proposes to detect interactions from neural networks via topological analysis with a method called Persistent Interaction Detection (PID). PID leverages and extends Persistent homology in topological analysis to describe the connectivity and strength of interacting features in a feed-forward neural network (FNN) on the output unit. Key to the proposed measure for interaction strength are the notions of “birth” and “death” times of connected components of interactions (interaction subgraphs of an FNN which is viewed as a graph). Here the birth and death times describe what are the start and end points of an interaction-subgraph in an FNN given decreasing lower thresholds on weight magnitudes (the lifetime of interaction subgraph is known as persistence). Interaction strength is defined as the minimal “amount of change” to eliminate an interaction subgraph, i.e. the difference between the smallest weight magnitude of an interaction subgraph and the weight that causes the “death” of the subgraph. To efficiently detect interactions, masks are used based on interaction subgraphs to focus on an interaction created at a specific neuron. Experiments are shown to outperform the state-of-the-art at 1) interaction detection where synthetic ground truth is known and 2) prediction performance via interaction encoding.

__ Strengths__: I think this is fascinating work.
Soundness: This work takes an in-depth look at how to detect feature interactions from neural networks from the perspective of topological analysis. The definition of interaction strength is principled, and the efficiency treatment is appropriate. There is also a nice discussion about the stability of PID with accompanying theory.
Significance and Novelty: This work is significant and novel. Leveraging topological analysis to develop a principled approach for interaction detection is novel. Then, showing that it works compared to the state-of-the-art is a significant advancement for the interpretability of neural networks.
Relevance: Interaction detection and its resulting prediction performance are becoming increasingly relevant. This method addresses this through a novel angle.

__ Weaknesses__: I cant think of a noteworthy weakness. Good job.

__ Correctness__: The claims and empirical methodology are correct - to my knowledge. My apologies, I didn't have enough time to carefully check the theory in the appendix.

__ Clarity__: The paper is well written. Minor: “born” is an adjective.

__ Relation to Prior Work__: Very clear and relevant discussions w.r.t. prior work. Good comparisons.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper proposes to detect interactions and quantify interaction strength from the topological perspective. The interaction in this paper is defined as a set of input variables that has an effect on the output of the network. Authors quantify the strength of interactions by analyzing the topological structure of the neural network. In experiments, authors conducted various experiments to show the effectiveness of the proposed method.

__ Strengths__: It is interesting to use the algebraic topology to study the connectivity and detect the interaction in neural networks.

__ Weaknesses__: 1. The proposed measure assumes that different dimensions of the input are well-aligned, i.e., the global interaction is independent with the specific input. However, in real datasets and applications, the mapping relationship between semantics and pixels varies among different input images. For example, give each specific input image, the pixelwise interaction depends on the image itself, instead of being pre-determined. In this case, the application of the proposed method is restricted.
2. This paper defines the connectivity of MLPs based on all paths. However, there are nonlinear operations, e.g. ReLU layers, in neural networks. These nonlinear operations will block some paths to the output, which is not considered by authors. Thus, the formulation of the proposed method seems not realistic.
3. In this paper, the interaction mainly considers edges with large weights. However, there is no experiment to show that edges with large weights can reflect all interaction effects (mot interaction effects) within the network. Authors should quantitatively prove that edges with large weights play a major role, w.r.t. effects of all other edges with not-so-large weights. In comparison, the interaction proposed by Suin Lee et al. [cite 1] can reflect all interactions modeled by the network, which is more objective than this work.
4. In Section 4.1, authors use MLPs to fit ten different functions, and detect the interaction modeled by MLPs. However, it is not always true to assume that the MLP can successfully learn the ground truth interactions encoded in the function. The MLP (if with ReLU layers) is a piece-wise linear model and cannot fit some complex functions. For example, theoretically, it is difficult for the MLP to regress y=exp(x) and y=sin(x). Thus, this experiment setting is confusing.
5. In Section 4.2, authors constructed synthetic features for each interaction, and combined these synthetic features with original inputs for training. However, at the application level, the improvement with synthetic features seems to be limited. At the interpretation level, experimental results cannot show whether the explanation is correct. The comparison is not straight forward.
6. There is no detailed introduction about the extension to the image data mentioned in section 4.3. Besides, it lacks a quantitative evaluation for the interaction detected on image data.
7. Many notations are difficult to follow.
[cite 1] Lundberg, Scott, Gabriel G. Erion, and Suin Lee. "Consistent Individualized Feature Attribution for Tree Ensembles.." arXiv: 1802.03888.

__ Correctness__: Maybe correct, but many notations are difficult to follow.

__ Clarity__: No, many notations are difficult to follow.

__ Relation to Prior Work__: Yes

__ Reproducibility__: No

__ Additional Feedback__:

__ Summary and Contributions__: The paper proposes a new method to identify strongly-interacting input features of a neural network. They apply filtration on the network weights to derive the persistence of subgroups of input features. The persistence of a subgroup is measured by the difference between the birth time, i.e., the weight at which all features of the subgroup is connected to the output, and the death time, i.e., the weight at which any other input feature is also connected to the subgroup. A stability theorem is provided saying that when a subgroup is detected by two networks, then the difference between its persistence is bounded by the supreme-norm difference between the two network weights. The paper also provides an algorithm to compute the strongly-interacting feature subgroups efficiently using binary matrix multiplication among the layers of the network. Experiments on synthetic data and real images are presented.

__ Strengths__: Finding input feature interactions is an interesting and relevant problem.
The algorithm and theorem are correct.

__ Weaknesses__: The theoretical foundation of the idea is not very convincing. First of all, although the paper repeatedly stressed the connection with persistent homology, I am not convinced. Persistent homology relies on a sequence of homomorphism of homology groups through the filtration induced by inclusion. The birth and death times are well-founded as the birth and death times of topological structures (connected components, loops). However, in this paper, the structure of interest (sub-network connecting a group of input features and the output neuron) is not a well-formulated algebraic topology concept. Therefore, a lot of important theoretical property of the subject across the filtration is lost.
Second, the stability theorem, although correct, is not as useful as the original stability theorem of persistent homology. In particular, the theorem only states the stability for feature sets that are detected by both networks. It is unclear whether there is high persistent feature sets in one network, but not the other. This reminds me of the fact that persistent homology’s stability is regarding critical values, but not the generators (e.g., connected components). This paper indeed focus on ``connected components’’, and thus are not well guaranteed to be stable as persistent homology.
As for experiments, the synthetic experiments are interesting. But the real data experiments are not really convincing as to what the strongly-interacting input features provides that other existing interpretability approaches cannot.
** After the rebuttal period, I am willing to increase my score to 6.
I trust the idea is good and useful in practice and I do believe the application of PH to neuron interaction seem to be a promising direction. However, I still have preservations on the topological analysis part. I think the method and the theorem could be improved from the following two perspectives.
1), Important details are missing as to what exact topology is captured and how is the method related to persistent homology. In the rebuttal, the authors mentioned the connection to 0D persistence through MST. But important details are still missing. The main issue to me is the constraint that the set of interactive input features need to be *connected* to the output neuron. This adds complications and it gets hard to interpret the results. For example, if you have a group of input features (a candidate set) that are connected for a long while during the filtration, but they are not connected to the final output until very late. Would you consider them persistent or not? I think both the details of connection to 0D persistence, and how to handle the challenge I just mentioned, are important and should be thoroughly explained/investigated.
2), the power of the stability results could have been improved. I suspect it is possible to create some pathological cases, in which two networks differing by a small perturbation of weights may have almost no common interaction candidate; it is possible that network 1 detects candidate set [1, 2, 3] while network 2 detects candidate set [1,2,3,4]. They are indeed very relevant but will not be considered at all in the stability theorem as it only guarantees the stability of birth/death times for a same candidate set detected by both networks. I think this is something worth diving into and better defined: for example, having a better definition of similarity between two interacting sets and have a better stability results extended accordingly.

__ Correctness__: Yes.

__ Clarity__: Not particularly. I think the connection with persistent homology is not really deep and convincing. To some extent, calling the underlying method a "topological analysis" is an over-statement.

__ Relation to Prior Work__: Yes.

__ Reproducibility__: Yes

__ Additional Feedback__: