NIPS 2017
Mon Dec 4th through Sat the 9th, 2017 at Long Beach Convention Center
Paper ID: 1060 Reliable Decision Support using Counterfactual Models

### Reviewer 1

Comments: - formula (3). There are misprints in the formula. E.g. in the second term other arguments for lambda^*(t) should be used; in the third term lambda depends on theta, however theta are parameters of the CGP process and lambda in the second term does not depend on theta. Then is this a misprint? - Figure 2. The figure is pale, names of axes are too small - page 7, section "Model". 1) How did the authors combine these two covariance functions? 2) Used covariance functions depend only on time. How did the authors model depedence of outcome distribution on a history, actions, etc.? 3) In this section the authors introduced response functions for the first time. Why do they need them? How are they related with model (2)? - page 8. Discussion section. I would say that the idea to combine MPP and GP is not new in the sense that there already exist some attempts to use similar models in practice, see e.g. https://stat.columbia.edu/~cunningham/pdf/CunninghamAISTATS2012.pdf Conclusions: - the topic of the paper is important - the paper is very well written - a new applied model is proposed for a particular class of applications - although the idea is very simple, experimental results clearly show efficiency of the developed method - thus, this paper can be interesting for a corresponding ML sub-community

### Reviewer 2

The paper presented a counterfactual model for time-continuous data building upon Gaussian processes. The literature review is extensive and thorough, and the experimental results are convincing. I have the following questions/comments: 1. To what extend one has to use the counterfactual arguments in order to get the final Bayesian model, which under assumptions 1 and 2 is sufficient for inference? 2. In (3), what is the model for $\lambda_\theta^*(s)$. Is it a parametric form? How is the integration done? 3. In all the examples, the observations are constrained to be non-negative. What is the likelihood model for the observation model that allows this to be done? Or is this restriction ignored and a Guasssian likelihood used? 4. How is the mixture of three GPs done? Is it via variational inference of MCMC? 5. In lines 329 and 330, it will be clearer if expressions for the two alternatives are given. Is the first of these the so called RGP?