Reviews: Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation

The authors propose a neuropathic pain simulator to generate data for causal discovery algorithm evaluation. To the best of my knowledge, this is the first work to build a simulator that can generate data with causal relations. This paper is novel, but my concern is that it is not rigorous enough. To make the simulated data close to the real-world scenario, the authors learn the simulator from a real-world dataset. However, the real-world dataset only contains 141 patient diagnostic records. I am not sure whether this amount of data includes enough causal relations for estimating the conditional probability distribution of each variable given other variables. If not, the simulator may generate data with biases. The section 2.2 does not provide a clear statement. The authors should first formally introduce the format of the real-word dataset and then present the procedure of the parameter estimation. In addition, there exists a potential problem about the heuristic proposed in this section. The heuristic indicates that if a parent Pa1(X) and another parent Pa2(X) of a variable X happen at the same time, the conditional probability P(X=1|Pa1(X)=1,Pa2(X)=1) is higher than or equal to the maximum value of P(X=1|Pa1(X)=1) and P(X=1|Pa2(X)=1). The authors only focus on the causal relations between that the parent value is 1 (Pai(X)=1) and a variable X. What if the parent of X having value 0 (Pai(X)=0) has causal relations with a variable X? Although the experimental results show that the physicians cannot differentiate the simulated data from the real-world data, I think the experiment may exist some sampling biases. The size of experimental dataset used in the physician evaluation is too small. To examine the quality of simulated data, the authors only mix 30 simulated records with 30 data sampled from the real-world dataset. This sampling bias can highly affect the evaluation results. The causal relation considered in this paper is relatively simple. In this paper, the causal relation consists of three layers: pathophysiological diagnosis, pattern diagnosis, and symptom diagnosis. The symptom diagnosis is only caused by the pattern diagnosis, and the pattern diagnosis is only caused by the pathophysiological diagnosis. Moreover, the nodes in each layer have no causal relation with each other. In the real world, the causal relation is much more complex than the setting presented in the paper. If the simulator can generate dataset with complex causal relations, it can be a better benchmark.

Reviewer 2

1. Thanks to the known surface map of the region controlled by different nerves and empirically known relationship between various diseases and corresponding pain regions it is possible to reconstruct a complete graph of relations for this small subdomain. The paper proposes a causal simulator with this graph structure and parameters estimated based on real data example. The samples from the simulator are evaluated by human experts and concluded to be more likely indicators of known diseases than the real data. To demonstrate the utility of the simulator the paper runs a comparison of a few well established causal learning algorithms. The general idea is great and the authors should definitely place their simulator online and actively advertise is to the interested group of researchers. However, the binary nature of the data and a highly specific structure of the causal graph limit the application area of the simulator. I am not sure if NeurIPS is the right venue for presenting the simulator although I understand the paradoxical feel of this statement. Yes causal learning needs the tools like presented in order to simplify development. However, I feel a paper where a proposed simulator would have been used to simplify development of a novel algorithm and demonstrate that the thus obtained approach beats the state of the art would be more appropriate. Note, I fully support the proposed work and my only concern is the fit. 2. It would be nice to see a layout of the complete graph as the partial representation in Figure 2, which currently is a tree and does not convey the sense of possible complexities of the proposed model graph. 3. With the BDeU score, GES is naturally fit for the binary data produced by the simulator but it is unclear what test was used for the PC and FCI algorithms.

Reviewer 3

Originality: In terms of causal inference, the paper is highly original, as there are few works devoted to developing new simulation systems that can provide realistic evaluations. However, the related work section is somewhat sparse and it's not clear to me how novel the system is in terms of the application domain. In general, the related work could be improved. Quality: The paper clearly articulates a gap in the literature, introduces a seemingly sound simulation system, and presents well thought out evaluations. I was especially pleased to see evaluation with domain experts. Clarity: The paper is very well written, and the methods are well-reasoned. Some questions: -It is not clear to me why there are ~800 causal relationships. In medical records I am familiar with diagnosis codes such as ICD9/10 are used. Since ICD codes form a tree, one can choose highly specific or general codes for the same event. Potentially if one used the most specific code for every illness, this would lead to many relationships (though many may be redundant, if for example every child of a node shares the same effect). Is something similar at play here? -For section 2.3, there is minimal detail about how these components are implemented. Are they part of the simulation system that will be made available, or something the authors did in post-processing of the data generated?

Paper ID:	6955
Title:	Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation

Reviewer 1

Reviewer 2

Reviewer 3