NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1711
Title:Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs

Reviewer 1

The paper proposes to use integral/summed observations within the Gaussian process framework. For regression, the outcome of this model is neat because all operators are linear. The paper is clear in its presentation, except for a few points listed below. 1, In terms of originality, the paper is a useful application of block kriging and Murray-Smith [35]. However, the paper currently lack references to and discussions on these two works. In additional, eq (4) is an instance of the linear mode of coregionalisation, and references to this is missing. 2, The paper uses numerical approximation of integrals for integral observations (line 198). This is a practical and common approach. However, it'll be contribute to the paper significantly if the authors can propose an alternative, given that this paper is primarily about integral observations. 3. In the experiment, the observations are rates, but the paper uses a regression model. A warp GP model or a more appropriate likelihood would be more appropriate. 4. The experiment compares SAGP with GPR, 2-stage GP and SLFM. I am not clear the relevance of comparing with SLFM. It'll be better if the paper can have a more thorough comparison with 2-stage GP instead, for example, by examining the variances of the predictions and also the significance of the learned hyper-parameters and perhaps also the computational time complexity of the two methods. Clarity points 1. Line 125: Is the domain "a set of cities" rather than just "a city"? 2. Line 302: It is not clear how the transfer learning across multiple cities is achieved. Are these just treated as different locations? In this case, the correlation between data from the two cities would be very minute, unless there is a significant long-scale component in the (stationary) covariance function. [35] Roderick Murray-Smith, Barak A. Pearlmutter: Transformations of Gaussian Process Priors. Deterministic and Statistical Methods in Machine Learning 2004: 110-123

Reviewer 2

Originality: In order to deal with multivariate data the paper proposes to use a particular case of the Linear Model of Coregionalization (LMC [1]). For dealing with spatially aggregated data the paper proposes to use a Gaussian likelihood around the weighted aggregated value of the prediction across regions, resulting in a intractable marginal likelihood which is approximated by using Riemann sums. These are very straight-forward ideas, nevertheless they are effective. Quality: The paper is technically correct and the experimental section is well done. Clarity: The paper is well written and easy to follow for the most part. Only the section concerning the extension to multiple domains needs more details in my opinion. For instance, is normalization needed across domains? Significance: It is a good paper and the experimental section shows the importance of choosing the right likelihood function for the data, but I find the contribution rather limited in scope. [1] Álvarez, M. A., Rosasco, L. & Lawrence, N. D. Kernels for Vector-valued Functions: A Review (Now Publishers Incorporated, 2012). [2] Ho Chung Law, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu, Variational learning on aggregate outputs with gaussian processes, Advances in Neural Information Processing Systems 31 (NeurIPS 2018).

Reviewer 3

# [Updated after author feedback] I thank the authors for their feedback. My suggestions for improvements were only sparingly addressed, but I will keep my score as it is. My request for updating table 1 was perhaps unclear. I appreciate that you focus on just three features as they are important in socioeconomics, but I would like to see the same results for all datasets for both cities (thus ten columns for New York, three columns for Chicago). Only choosing a subset, even though it can be motivated from an application point-of-view, seems arbitrary and makes one suspect cherrypicking. This is, hopefully, an unfair suspicion, so why not include a large table with results from all features? Even if a feature is not interesting from an application point-of-view, it is still important for judging the performance of the model. # Summary The paper is concerned with modelling a multivariate function from multiple areal datasets at different granularities. The authors propose a model, based on Gaussian processes (GPs), that handles data defined as regions of the input space. The model initially follows the standard multivariate GP strategy by defining independent latent GPs, which are then linearly combined to form a multivariate dependent GP. To handle data at different granularities, observations are assumed to be area integrals of the multivariate GP. This allows the model to infer function values on a fine-scale from coarsely sampled data. The model also naturally handles data from different domains by sharing the latent GPs across the domains. The proposed model is evaluated using a total of 13 datasets from two cities, each with varying granularity. A refinement task, estimating small-scale structure from large-scale, is considered in two different set-ups: refining data within a single city and refining data across cities by utilising the transfer learning capabilities of the model. The model shows performance improvements over both baselines and competing models. # Quality The paper appears technically sound. The sections deriving the model and inference are detailed, yet concise, and further information is provided in the supplementary. Related work is adequately cited, and the shortcomings of these methods are nicely outlined. The main difference between the proposed model and the semi-parametric latent factor model (SLFM) it builds upon is clearly described. I think a reference to Alvarez, Rosasco, & Lawrence, "Kernels for vector-valued functions: A review", Foundations and Trends in Machine Learning (2012) would, however, be appropriate to include. The experiments are interesting and convincing. Is there a reason why only some domains have been included in table 1 and 2? I would like to see similar results for all domains in the two cities and for both tasks. Right now, one can get the feeling that the results are cherry-picked. I think it would also be interesting to see if the performance gain saturates as more cities or domains are added, though this is probably better left for future work. # Clarity The paper reads quite well. The structure is clear and good and I found it surprisingly easy to follow the definition of the model despite the dense notation. Good job! Figure 2 very nicely explains the model. I found it very helpful. # Originality To my knowledge, the introduction of spatial aggregation of the input space is indeed novel, though perhaps not overly so. The idea might seem simple, but it is good and the resulting model elegant. # Significance The proposed model seamlessly shares information between regions of different granularities as well as between cities, which makes it both interesting and important, particularly for the geostatistical community. While focusing on a somewhat specific problem, it is a solid paper that I think would be welcomed by the NeurIPS community.