Paper ID: | 6770 |
---|---|

Title: | Distribution Learning of a Random Spatial Field with a Location-Unaware Mobile Sensor |

The setting is interesting and the potential contribution here could be significant. Also, in terms of clarity and presentation, the authors have made a very good job, in my opinion. However, although overall this has the potential to be a good paper, at this point I think there are some issues. In particular: 1. In (4), N has to tend to infinity, which may not be practical in many applications of the type considered in the paper. Instead, one would be more interested in an actually finite sample guarantee for the proposed estimator, which would be also implementable. The limiting operation in (4) seems kind of restrictive and thus practical merit of the work limited. 2. The second main result presented, Theorem 2, is quite strange. In (5), with respect to what is the maximization involving the pdf on the right-hand-side? I guess x. However, if this is true, then what happens later in (16)? Something seems not right here, because in (16) one could take the infimum of the right-hand-side, and get a better bound. However, you do not seem to do something like that. Please explain in sufficient detail what is happening. Also, (20) does not seem right to me, because the first term on the right-hand-side depends on s, and it seems that this dependence has been ignored when taking the supremum over s, on both sides of the preceding expression. 3. The experiments presented do not really match the setting studied in the analysis. For example, in lines 155 - 157: The application does not really justify the theoretical analysis. If N is days, then N cannot be something really large right? I mean, for how many days are you going to measure the acoustic field X anyways? In particular, I see that N=43, which is rather small to be supported by the theoretical results, where N tends to infinity. Therefore, based on the above, this paper might not be of sufficient quality for acceptance. However, I am willing change my opinion as long as the responses by the authors are satisfactory, especially regarding Theorem 2, which an important part of the contributions claimed. ===========After Author Response========== I have also read the author response. The fact that the authors state that it might be possible to study the problem in the finite sample regime makes me wonder that the paper might be possible to be strengthened (with a new version), in order to support the particular application stated in the title. However, I still believe that the results are interesting. For now, I will increase to my score to 6.

The paper deals with distribution learning for a spatial field in the context of mobile sensors but location-unaware. As far as I understand, the paper resorts to the standard empirical cumulative density function as a tool for learning. The paper seems technically sound and clearly written. The empirical cumulative density is an usual estimate of the distribution. The paper proposes a specific study of its properties in the considered context. To this end the inter-sample intervals are modeled as a renewal process. It could be interesting to explain why this model is a realistic one. I must add that I am not able to check the proof because they are out of the scope of my research. Regarding the experimental study of Section 5, it is correctly described but the analysis ans comment lines 210-211 must be developed.

Update: I have read the author response. - I acknowledge that topic is within the general scope of NeurIPS (especially under the signal processing/time series area). I've increased my score from 3 to 4 accordingly, and updated my review to reflect this. - Regarding the metric for measuring accuracy, I meant to suggest it would be useful to be more precise earlier in the paper about what metric is being used, since there are many ways to measure the error between two distributions (\ell_1, \ell_2, TV, KL, ...). - In the experiments, consider reporting the error metric used in the Theorems as well as the upper bounds, in addition to plotting the CDFs. - I agree that C2 is encompassed by your assumptions, but I would also expect that stronger results may be achievable if one makes stronger assumptions (such as C2), and in some applications it may be very reasonable to make a stronger assumption. It's fair to say that this is for future work. My point is that having a concrete motivating application scenario (e.g., monitoring air quality in doors) would make it easier to justify modeling assumptions. --- This paper mathematically formulates the problem of estimating a spatial field using a location-unaware mobile sensor and it proposes an algorithm for distribution learning in this setting. The problem formulation, algorithm, and analysis of the algorithm appear to be novel and original. The main weakness of this paper, and the reason my overall score is "clear reject", is the lack of strong motivation and justification for the problem formulation and assumptions. When is it useful to have an estimate of the distribution of values along a path (i.e., a concrete motivating application)? For that application, what is an appropriate metric to use for measuring the accuracy of a method? Why is it reasonable to only assume that the field is Lipschitz? Why not something stronger (e.g., C^2)? Why is it reasonable to assume that the observations are not corrupted by any noise (so the only randomness is due to uncertainty about the position)? In addition, there were some aspects of the problem formulation that were not clear to me upon reaching the end of Sec 2. - What is the goal? (Distribution learning, but no specific metric or way of measuring performance was described, nor were baselines or fundamental performance limits discussed) - It also wasn't clear that a "path" really means a closed loop with known the starting and ending point always being the same. The example signal simulated in Sec 4 isn't time-varying. Is that intentional? There is some mismatch between the title (which would imply that the aim is to estimate a random field) and the actual setting of the paper (the field is deterministic, the sampling locations are random).