NeurIPS 2020

Optimal Learning from Verified Training Data

Review 1

Summary and Contributions: This paper considers a problem of linear regression where some of your inputs are strategically corrupted. In particular, the algorithm is given some vector data x, which it uses to approximate y by w.x for some learned value w. However, an adversary can corrupt the value of x to x^ in an attempt to make the predicted value closer to z. Given a training set of triples (x,y,z) the objective is to learn a w that does as well as possible despite these corruptions. Formally, the adversary sends the algorithm x^ that minimizes |w.x^-z|^2+gamma|x-x^|^2. In particular, they try to minimize the error between the prediction and x^ without making x^ too far from z. The algorithm tries to pick a w so that the average value of |w.x^-y|^2 is minimized.

Strengths: This is a natural problem to consider and the solution involves some clever manipulations and optimization algorithms.

Weaknesses: The results are very specific to the particular model, which although natural, seems narrow in potential application.

Correctness: The results appear to be correct.

Clarity: The paper is reasonably easy to understand.

Relation to Prior Work: It is not clear from the paper how their model differs from previously considered ones.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper describes a Stackelberg prediction model for offline least squares regression in the case where data comes from multiple data providers, each of which has a goal for the learner's outcome on its provided data. The authors reformulate the considered problem as a fractional programming problem, and then provide a fast algorithm for solving it based on the results of Dinkelbach. The paper then gives an empirical evaluation on toy data.

Strengths: I am not an expert in non-convex optimization/SPGs. Therefore I cannot comment on the strength of this paper with respect to its significance/novelty or technical contributions.

Weaknesses: Motivation of setting: In the proposed model, the learner tries to learn a linear predictor that does well on the manipulated data, while the adversary manipulates points with the goal of having the learner minimize an objective on the *unmanipulated data*. What is the motivation for this mismatch? (E.g. why would the learner ever evaluate on both the manipulated and unmanipulated data in practice?)

Correctness: Which line in Figure 1 corresponds to the previous work of Bruckner et al? I am guessing it is the "interior point" line --- this seems suspiciously bad. Is this a strong baseline / was the baseline correctly executed with proper hyper-parameter choice etc? Is there any way to contextualize the MSE in Figure 1? Currently it is unclear how to relate the MSE to the solved problem in a meaningful way.

Clarity: The paper is well written and was relatively easy to follow, especially as an outsider in the field. Grammar: line 89, 93-94, end of Figure 1 caption

Relation to Prior Work: The authors give a detailed related work section that made me feel like I had a good overview of the considered problem and related approaches.

Reproducibility: Yes

Additional Feedback: Should the \hat{X} in the first term of the constraint in (1) be X? (Otherwise (1) is not the same optimization problem as the non-vectorized version in lines 115-116). ----- Post response feedback: The response addressed my concerns over the setup and the experimental results, but after reviewing the other reviewer responses in combination with the response I still am not sure of the significance of the results.

Review 3

Summary and Contributions: In this paper, the authors study the possibility of achieving the global optimality of a special non-convex problem. The problem is about a self-interested agent who could manipulate the data sampling process at a cost so that he cannot arbitrarily modify the data. The authors showed that under the case of a linear, MSE-evaluated learner, such a non-convex problem could be efficiently solved with its global optimality.

Strengths: 1. A significant step towards solving a class of practical and impactful non-convex problems. 2. Remarkable consistency of its theory and practice. 3. Motivating towards a provable solution of global optimality of a class of problems in non-uniform data sampling scenarios.

Weaknesses: The experiment is impressive, but it could be better if we had more of it. Also, it seems to me that the analysis cannot be directly extended to more complicated cases beyond linear regression. I was wondering if the author could revisit the analysis and come up with a relaxed, general conclusion with less assumption on the value function y = f(x).

Correctness: I am aware of the correctness of the paper.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: I don't feel I have understood every bit of the paper according to previous discussion with other reviewers and the rebuttal. I tend to give a score of 7 with low confidence 2.