NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 2992 Variational Learning on Aggregate Outputs with Gaussian Processes

### Reviewer 1

In this paper, the authors propose a general framework of aggregated observation models using Gaussian process with variational methods for inference. They focus on the Poisson link function in exponential family for counts data and derive some lower bounds for optimization using variational inference, Taylor expansion and Jensen's inequality. They apply the methods on synthetic data and Malaria Atlas data to show the performance. In this work, the authors try to predict the labels for individuals while only aggregated labels are available. This problem is studied in Ref. [13, 14, 20]. In this work, the authors extend previous works to exponential family and bring in Gaussian process which can derive uncertainty. Overall, I think this is a good paper. The theorems and experiments are comprehensive and solid. The extension to use Poisson distribution can be useful for count data and the usage of Gaussian process enables one to get the uncertainty. The paper is well-written. Below are some suggestions which I think can make the paper more clear. 1. For equation (7) in line 145, please define what is z. 2. For equation (6) in line 144, please reformulate the equation. Currently, there exists \prod_{i=1}^{N_a} but i does not appear in the equation. 3. In variational inference, the authors choose the variational posterior $q(f) = \int p(f|u)q(u)du$, which shares the conditional distribution $p(f|u)$ with the prior definition. Though the authors have presented that in line 156, I prefer the authors to discuss their choice of variational posterior $q(f)$ before presenting the lower bound in line 149. 4. In Section 4. Experiments, it seems that the authors compare their methods VBAgg with two other approaches Nystrom and NN. Could the authors provide reference for Nystrom and NN and explain more about the formulation of these two approaches? In practice, how can one know the 'population' $p^a_i$ in line 113? Could the authors explain more about how they get $p^a_i$ in line 264? Could $p^a_i$ be treated as parameters to learn as well? Note: I have read the feedback. Thanks.