NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
I have read the author response and am keeping my scores as is. I didn't have any real concerns so the response was mainly aimed toward other reviewers. The paper should probably get an oral presentation. *** I think this is a good application of the ideas of inference networks (finding a good use case for the functional approximation of a difficult distribution of interest). It is highly impactful in experimental design, of course, and noticing how the functional approximation could be taken advantage of in the context of nested MC in particular is not necessarily straightforward to spot. The paper is well written and communicated. It is very easy to follow and covers a vast breadth of material succinctly yet still thoroughly. With that said, it may be worth considering placing some concrete examples along the way. For example, early on in Sec 2 you could have followed your psychology motivation and provided what the design d looks like, the data y, the model p(y, \theta | d) and even what sampling y means intuitively in this case.
Reviewer 2
Quality: Is the submission technically sound? Are claims well supported by theoretical analysis or experimental results? Is this a complete piece of work or work in progress? Are the authors careful and honest about evaluating both the strengths and weaknesses of their work? Overall, the paper seems mathematically sound. Two of the four variational estimators introduced are already known in the context of approximating mutual information. Hence, results for these follow easily from the connection between mutual information and expected information gain and the original paper on variational information maximization. The bound property of the third estimator and the convergence rate analysis are based on established techniques. Clarity: Is the submission clearly written? Is it well organized? (If not, please make constructive suggestions for improving its clarity.) Does it adequately inform the reader? (Note: a superbly written paper provides enough information for an expert reader to reproduce its results.) The paper is well-written and self-contained. The problem is well motivated. Theoretical results are introduced on an intuitive level with the main ideas explained in the text and the mathematical details moved to the appendix. Originality: Are the tasks or methods new? Is the work a novel combination of well-known techniques? Is it clear how this work differs from previous contributions? Is related work adequately cited? (Abstracts and links to many previous NeurIPS papers are available here.) The work applies variational estimators of mutual information to optimal experimental design. This is a new take on the classical field of BOED and conceptionally interesting. In a second step, the authors combine their ideas with standard nested MC estimation. This allows to trade speed for accuracy and can be important for practitioners. Significance: The key idea of combining variational inference and BOED is helpful in several ways. Most importantly, the approach has the potential to be included in probabilistic programming toolboxes which would help to make BOED more accessible to practitioners without statistical background. In addition, the variational approach may inspire future work on more complex problems for which traditional approximations such as Laplace are not well-suited. Additional comments: Section 3, variational marginal: The Barber-Agakov paper is stated as a reference. However, this paper seems to deal only with the lower bound which corresponds to the variational posterior. Eq. 9: The VNMC-estimator section could benefit from some additional high-level information. It is not clear from the exposition, where the equation comes from or why one should consider replacing the usual expectation for computing EIG with the extended form. Related Work: A recent publication (ref 31) deals with variational bounds on mutual information but is only mentioned in the appendix. Since both topic and time of publication are so close, this work should be discussed in the main text. Table 1: A reader might wonder here about the baseline methods because they are introduced later in the text. I suggest adding a quick reference such as “Baselines explained in Section 5”. App. A1 and A2: These proofs are unnecessary because the bounds are known. A reference would be sufficient. At least a remark indicating that the proofs are “provided for completeness” or similar is appropriate. App. A3: The proof refers to U_marg several times. Is this supposed to be U_VNMC?
Reviewer 3
This paper provides four different estimators for expected information gain in a Bayesian optimal experimental design framework, with the objective of using amortized variational inference to reduce the computational cost. The main idea is using variation approximations with shared parameters for either posterior of parameters of interest, or the marginal of outcome given the design. Furthermore, an importance sampling estimator with asymptotic consistency. Over the idea is interesting and its presentation is neat. Theoretical study of convergence for the proposed estimators and their performance in practice is also provided. Some minor comments: 1. In the sequential setting, it seems that the implied assumption is independence of designs across different times, as entropy of the prior for parameters is assumed constant w.r.t design [line 169]. Does this hold in practice? 2. Regarding the performance results in Table 2, somehow the lower bias of \mu_{m+l} compared to \mu_{post} seems counter-intuitive, as the former uses two variational approximations. Are parameters shared between q_m and q_l, so that biases cancel out?