NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 7490 Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering

### Reviewer 1

In many scenarios, the causal relationships considered over a set of variables vary across groups and at the same time share some common causal relationships. So it is better to find different causal graphs for each individual. This paper solves this problem by first dividing the set of agents into a number of groups and then finding a causal graph for each group. The authors propose a model over m variables that includes both instantaneous effects and time-lagged effects. Ideally, we would have to estimate this model separately for each user, but that might be impossible with a small number of samples. So the model assumes a mixture of Gaussian prior for the effects. The number of components in the mixture is the number of clusters and the goal is to estimate the prior probabilities over the clusters and individual components of the mixture. Subsequently, the authors use EM algorithm to estimate the parameters of the model. However, computing the posterior exactly is intractable, so they use Monte Carlo integration and stochastic approximation for the E step. Both the simulation and experiment on real-world dataset show that the proposed method performs better than various existing methods in terms of F1 score, clustering, and approximating the true model. I have some suggestions and questions for the authors: 1. Theorem 1 proves an identification result for the degenerate distributions. What breaks down for the general case even if we consider just the instantaneous effects? It would have been nice to see a discussion in this regard. 2. How did the authors choose the parameter l_p the number of time-steps considered for time-lagged effects? 3. Why did the authors choose the threshold of 0.1 for converting the weights to the presence / absence of edges in the graph? Is there a systematic procedure to guide this choice? Originality This paper makes a significant contribution in terms of proposing a new model for sharing causal relations. The proposed algorithm seems to recover individual specific causal graphs and will be of immense interest to researchers working in the field if it can be scaled across a large number of clusters and a large number of variables. Quality I thought the paper makes several significant contributions and will be really helpful for the researchers working in the field of causal modeling and causal discovery. Clarity I thoroughly enjoyed reading the paper. Both the model and the experiment section was clear to me. However, I thought that the paper might benefit from a brief discussion of SAEM algorithm before deriving the steps of the algorithm. Significance The modeling contributions of this paper are sound. The proposed algorithm is interesting, seems to perform better than the existing methods and will be significant if it can be scaled for a large number of variables and a large number of clusters.

### Reviewer 2

[Originality] The use of Gaussian mixture as a specific model for causal discovery from data with group-wise causal mechanisms seems novel and interesting. [Quality] - Considering that the essence of the paper is the proposal of using a mixture of Gaussians, experimental assessment on real-world data is quite important. - I am not entirely convinced by the experiments of the current version of the paper. (Table 1) Without a comparison against plain clustering methods (e.g., k-means), it seems still possible for SSCM to take advantage of other clustering signals such as distinct data regions, not the structural difference. To validate the proposed model, I think it is crucial for the paper to collect convincing evidence that the proposed method really likely conducted a mechanism-based clustering. For example, (1) showing the results of a comparison against a plain clustering method or (2) showing variability of the estimated graphs for each group (to see if the posterior is concentrated well around the MAP or the posterior mean) or (3) providing an interpretation of the estimated graphs based on domain knowledge (similarly to the one in the fMRI experiment) may help. Using a biased sampling from each group to create mock "individuals" may also be an option. - The problem of estimating the number of groups remains to be addressed. It seems to be an essentially difficult problem, but the paper did not specify a concrete method for it, and only used the underlying truth value for the experiments (line 262). [Clarity] - I think the manuscript is very well prepared. All paragraphs are easy and smooth to comprehend. - The only problem I had with the presentation is in the statement of Theorem 1. The notion of identifiability is often under some form of (hidden) asymptotics. If I understood correctly, for the case of Theorem 1, the identifiability is under the limit of $n \to \infty$. I think it is important to clarify, especially when there is a "sample size" in the statement of the theorem which is quite confusing (because in standard estimation problems, the identifiability of a parameter is under the limit of (sample size) \to \infty). - (Typo) Supplementary material p.9: "Adjusted Random Index" should be "Adjusted Rand Index." [Significance] Considering the nature of the paper (proposing a specific model), its significance largely depends on the experimental results using real-world data. The experimental results are interesting, providing some insights into the proposed model, but not completely satisfactory for the reasons stated above in the Quality section.