Review for NeurIPS paper: From Predictions to Decisions: Using Lookahead Regularization

NeurIPS 2020

From Predictions to Decisions: Using Lookahead Regularization

Review 1

Summary and Contributions: The paper proposes an interesting method named lookahead regularization aimed to optimize both accurate predictions and good actions/decisions derived from these predictions. The main contribution is therefore the possibility of combining predictions with the effects of human decisions derived from these predictions. The approach is represented as a learning objective that considers a lookahead regularization term, which makes use of an uncertainty model that is also learned. This helps to penalize models for which decisions do not improve the outcome at a sufficiently large rate \tau, where improvement is defined in a positive direction. The uncertainty model is represented as a problem of learning under covariate shift, using importance weighting through an approximate model based on logistic regression. The training method alternates the optimization of the desired function f, the uncertainty model g and the logistic regression model h. The proposed method relies on some assumptions: 1) users make decisions in the direction of the function f learned by the model, 2) those decisions have an impact on the distribution of the observed features, which can also be observed and, 3) the condition distribution on outcomes, p(y | x’) is invariant for any arbitrary marginal distribution p(x’) – no unobserved cofounders are considered, similarly to some of the previous literature. The authors experiment with both synthetic and real-world datasets. The reported results suggest that the proposed method has the potential to get closer to a ground-truth model as decisions are made using the proposed approach, in the context of the data considered in these experiments.

Strengths: I would like to remark as strengths of the paper: - The paper tackles an important problem for the use of machine learning models in decision making, in which the input features of the model are not static but may change according to actions induced by the model. This is highly relevant for the machine learning community interested in applications of models for decision making. The novelty relies on the combination of prediction and actions under the same learning objective. - The empirical evaluation is sound in the sense that the method seems to follow the ground truth direction as actions are executed based on the model and the results of these actions are considered through the look-ahead regularization term.

Weaknesses: The main weaknesses I see in this paper are: - Although technically sound and interesting, the assumption on p(y | x’) not changing is a very strong assumption that seems to contradict the purpose of the framework which is to encourage better decisions. As these decisions are better and better, the target y’ will change in a positive manner, and therefore likely p(y | x’) will change. Not considering may pose the risk of making the problem of feedback loops even worse, as the effect of p(x) that changes to p’(x) is being considered, but not p(y | x’). In the context of predictive policing for example, the actions may imply more arrests for particular neighborhoods (Ensign et al., 2018), i.e. p’(x), while p(y | x’) remains fixed but may be actually changing because of these actions. I think more concrete effect of this assumption should be discussed more thoroughly on the paper, by quantifying, theoretically or experimentally, the effect of these assumptions. - In the experiments, although the experiments are tested with different values of the step-size parameter in the real-world datasets, it is not explained how the choice of this parameter can be made in practice. This is important for the application of this method to other datasets/problems. - There is no mention to the computational complexity of the proposed method. What is the cost of the extra-regularization term and the alternating mechanism proposed to solve this problem? Ensign, D., Friedler, S. A., Neville, S., Scheidegger, C., & Venkatasubramanian, S. (2018, January). Runaway feedback loops in predictive policing. In Conference on Fairness, Accountability and Transparency (pp. 160-171).

Correctness: The method sounds correctly formulated under the stated assumptions. The empirical methodology considers both synthetic and real-world datasets. It would be interesting to see an actual example, e.g. a case study, that shows the effectiveness of the method for encouraging better decisions as it is claimed at the beginning of the paper.

Clarity: The paper is very well-written, easy to follow and has no major errors or typos.

Relation to Prior Work: I am not fully familiar with the literature in strategic classification and causal modelling, therefore I could not tell it is complete with certainty. However, the description of the literature seems sound and connects naturally with the aim of the paper.

Reproducibility: Yes

Additional Feedback: I would suggest to: - Quantify and discuss more thoroughly the effect of the assumption on the invariance of p(y | x’), which is very relevant in a range of application areas. - Clarify ways in which the step-size parameter could be chosen in practice. - Provide information on the computational complexity of the proposed objective, and the alternating solution proposed. - There could be some theoretical analyses that could be useful for accompanying the paper – for example in terms of upper-bounds of actions or similar. ----- Response to rebuttal ----- I am happy with the authors’ explanations, and would suggest the authors to add their ideas on assumptions around the covariate shift as future work and the computational complexity somewhere in the paper.

Review 2

Summary and Contributions: the authors propose a new paradigm for learning models that are not only good at the prediction task, but also useful in telling users what features to change to improve favorable outcomes

Strengths: this is a great problem setup, relevant to a lot of real-world contexts where it is important to have recourse,with a clean formulation, and a well-written paper with nice figures

Weaknesses: the weaknesses are few and far-between. I think the choice of evaluation metrics could have been discussed a bit more. Also, the choice between uncertainty quantification methods appraised a bit more. I'm not so impressed by the choice of datasets, but you have to start somewhere.

Correctness: everything appears to be correct

Clarity: yes

Relation to Prior Work: yes, the paper explains how it relates to past work, and what is new and different. I think all the important references are captured and explained. one newer reference that the authors may want to discuss: https://arxiv.org/abs/2002.06673

Reproducibility: Yes

Additional Feedback: generally good work

Review 3

Summary and Contributions: This paper presents lookahead regularization, which essentially boils down to the inclusion of a regularization term in the objective, in the context of a framework that casts the problem of balancing prediction and decision quality as one of model selection. The regularization term includes a separate uncertainty model that estimates the uncertainty in the predictive model for out-of-distribution data points.

Strengths: The paper is well-presented. The masking framework is an interesting approach that I haven't seen before. The methods appear solid, and the problem the authors try to solve is an important one. The paper seems relevant to the broader NeurIPS community.

Weaknesses: Assumption 2 is quite strong. I know you can't verify this assumption in practice, but it would at least be interesting to see how sensitive the gains presented are to violations of this assumption. Otherwise it seems like this method would be quite fragile. If not in this version, this should be addressed in a follow-up. I think the example in the second paragraph of the Introduction is not as compelling as I think it could be. Perhaps an example such as that of the unintuitive effect of asthma in pneumonia risk prediction, as in Caruana '15. Moreover, the proposed methodology bears many similarities to Hardt's recent work on performative prediction and strategic classification. In light of those works, I'm not sure if this paper represents a novel leap forward—particularly insofar as the proposed methodology rely on strong assumptions.

Correctness: The claims and method appear to be correct. The experiments are well-motivated, although I am not sure about the connection to real-world settings. There might be some concern about shift under actions.

Clarity: The labels for the panels in Figure 1 need to be corrected. While attractively presented, the figures have text that I think is a bit too small. There should be an 'a' inserted in front of 'masking operator' on line 112. Otherwise, there seem to be no major typos or grammatical errors.

Relation to Prior Work: The literature review seems sufficient, but it omits Hardt's very recent work on performative prediction (https://arxiv.org/pdf/2002.06673.pdf). I think there is a lot of overlap between these papers methods-wise. I also wonder if there is some connection to online learning that may have been missed, although I am not an expert in that particular field.

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: This paper introduces a framework to learn an accurate predictive model that promotes good actions. The idea is that the result of making decision based on a predictive model can lead to a data distribution shift. The authors propose a framework to learn an accurate predicative model that leads to data distribution shift with better outcome in the future, by regularizing the objective function with a look-ahead regularization, based on the uncertainty of the prediction after a decision made by the user. The new regularization encourages actions that improve outcomes. This setup is quite new to me but it seems to be an important problem in certain applications such as medicine. The proposed approach requires explicit model of how users use the model to take action and the authors assume that decision made by the users follows a gradient ascent type of algorithm based on the predictive function (which we aim to learn) to change the mutable part of the features in order to improve action. They further make a strong assumption that the relationship between the features and outcome variables is invariant as the decisions are made and feature distribution changes. This is too simplistic as mentioned by the authors. In order to optimize the objective function, the authors need to learn three elements: the predictive model that maps features vector to output, the uncertainty model the measure the uncertainty of the model prediction as a result of data attribution shift, and propensity function to use the predictive model for the shifted data in test. The authors assume specific parametric forms for each one of these functions and perform an alternating optimization process to learn them. In their empirical study, the authors assume they have access to the true predictive function f* and build the evaluation based on this. This is too unrealistic and not sure how much the result is biased. After rebuttal: Thanks to authors responses, I increased my score.

Strengths: Addresses a very important and difficult problem Novel problem and solution (to the best of my knowledge) Difficult empirical evaluation

Weaknesses: Strong simplifying assumptions Not convincing empirical evaluation

Correctness: The claims and method seem correct. I am not sure about the set up in empirical study where they needed access to the ground truth model f* and had to define it. There are also other aspects of the framework that changes from one data set to another. I recommend defining the set up of the experiments first and then using consistent set up for all data. As I said, the empirical study for this problem is not easy.

Clarity: It is well-written but it is difficult to comprehend the details.

Relation to Prior Work: I am not familiar with the related work so cannot comment on whether the related work section is comprehensive.

Reproducibility: No

Additional Feedback: