NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1749
Title:Robust Multi-agent Counterfactual Prediction

Reviewer 1

# Summary The paper presents a framework for calculating bounds on the value of counterfactual multi-agent games using only logged data of actions that occurred in a factual game. This problem arises in a number of mechanism design contexts, where intervening on a system constitutes changing the rules of the game. Calculating counterfactual value requires reasoning about how rule changes affect equilibrium behavior of the agents. Under strong assumptions this counterfactual value is point-identified, but these assumptions are often implausible. The authors present a scheme for relaxing these assumptions, and characterizing the set of values that are compatible with the observed data under this relaxation. The relaxation of point-identification assumptions is presented in terms of a second game, which the authors call the Revelation Game. When the standard identifying assumptions hold, the equilibrium of this game recovers agent types as well as counterfactual actions. The authors relax the standard identifying assumptions by relaxing the equilibrium conditions of this game by some constant \epsilon, which corresponds to relaxing the equilibrium conditions of the observed game and/or the counterfactual game by \epsilon. The authors then discuss methods for seeing out the extreme values of the value function attained among the (type, counterfaction action) pairs within this \epsilon-equilibrium set. In general, this problem is NP-hard, but the authors show that the Fictitious Play heuristic can be extended to the revelation game to obtain locally extreme values in cases where the fictitious play iteration converges. Finally, the authors demonstrate computation of the RMAC bounds in some simple examples. # Feedback Overall, this paper was a pleasure to read. The paper is very clearly written, and as somebody from an adjacent focus (causal inference), the paper did a great job of explaining the necessary game-theoretic concepts to get me up to speed on its own. The problem is well-motivated, and the approach is compelling, although I think the scope should be narrowed a bit. The experiments/demos could have been a bit more compelling, but I chalk some of this up to space constraints. Comments on broader points that could improve the paper follow. ## The Revelation Game The one part of the paper where I thought the exposition fell a little short was the description of the Revelation Game. After my second read through the paper, I think I was able to get what was going on, but the first time through it was a little bit difficult to parse. I think the presentation of the game could be better motivated. For example, later on line 164, you seem to indicate that relaxations of the game correspond to relaxations of the factual and counterfactual games, respectively. Could you use this to motivate the revelation game loss function directly? I think it would also make the interpretation of \epsilon clearer from the get-go. As the paper is now written, I left this section with two unanswered questions: What do each of the regret expressions represent? Why are they aggregated using a max, rather than, say, a weighted average? I think these questions are related to a larger question that I have, which is, why is relaxing the revelation game the right relaxation? I personally feel like it’s reasonable, but it seems like there are many ways to parameterize sets of (types, counterfactual actions) that shrink to a point when the standard assumptions hold. ## Which assumptions are being relaxed? The authors indicate (e.g., line 166) that relaxing the revelation game corresponds to relaxing assumptions 1-4, but this statement is vague. While technically true that the conjunction of these assumptions is being relaxed, it seems that \epsilon is only relevant for the equilibrium Assumption 1 and (potentially) the specification Assumption 4. For example, if Assumption 2 were violated, one could still characterize the set of (types, counterfactual actions) that are equally supported by the observed data without having any notion of an \epsilon-BNE for the revelation game (this is essentially the argument that is made in the last paragraph of the school choice example). Likewise, if Assumption 3 were violated, one could enumerate the set of equilibria in the counterfactual game without reference to the revelation game. I personally think that it would be wise to claim that the epsilon-RMAC method is really only relaxing Assumptions 1 and 4. Actually, I think it would probably make sense to remove Assumption 4 because I’m not sure that the \epsilon-BNE would even contain the equilibria implied by the correct reward functions (misspecification can be really bad). On the other hand, it’s clear how the \epsilon relaxation relates to the equilibrium assumption. As far as the identification assumptions go, I think that it’s fine to say that the RMAC bounds are _compatible_ with violations of Assumptions 2 and 3. This is closer to the statement that’s made in the contributions section of the introduction. ## Interpretation of \epsilon It would be useful to have more explicit discussion of the interpretation of \epsilon. Can you give concrete examples of how \epsilon related directly to relaxations of the standard assumptions? There are some hints along this vein (paragraph starting on line 164, the mention of “misoptimization/misspecification” on line 252), but no formal discussion. In particular, how do we know what reasonable values of \epsilon are? Are there empirical quantities that we can anchor on? Is it possible to separately reason about each source of fragility (misoptimization vs misspecification)? Again, the auctions example touches on this, but to me interpreting epsilon requires a much deeper discussion. I ask these questions drawing parallels to the literature on sensitivity analysis in causal inference that explores deviations from the unconfoundedness/ignorability/backdoor criterion assumption in observational studies. Here, one of the standard tasks is to tie the sensitivity parameters (which play the same role as epsilon here) to observable quantities such as variance in treatment assignment or outcome explained by observed variables. No need to cite these, but some examples of this type of argument appear in: Cinelli and Hazlett: Franks et al: Imbens: To this end, it might be useful to see an example where you actually instantiate a population of faulty agents, to show how their “faults” in optimization or specification translate to an \epsilon, and to show that the RMAC bounds include the value associated with the game played with that population. This would help clarify the mapping from \epsilon to concrete relaxations of the standard assumptions, and would be closer to an “end to end” test of the method in something resembling a real application. ## Computing Equilibria Could you be more explicit about what the gap is between the NP-hard exact solution to the problem and the RFP solution? I am gathering that there are cases where the RFP may not converge, and where the locally V-optimal solution may not be globally V-optimal. It might be useful to pull these together so the reader can understand quickly what is lost by moving to the first-order RFP solution. ## Experiments I like the auctions experiment a lot. I think I would be more interested in seeing this example treated in more depth, rather than seeing the school choice experiment (which would be fine in the appendix). Per my discussion above about which assumptions are being relaxed, I think the auctions experiment actually explores the implications of having nonzero epsilon, whereas the school choice experiment is largely concerned with assumptions that don’t require the revelation game / epsilon-BNE formalism at all (you’re only relaxing the identification assumption, so you could get all of your results with epsilon = 0). To this end, I think it would make more sense to give a more thorough treatment of the auctions example (e.g., by adding the concrete “faulty agent” simulation I suggested above), and to show that RMAC is compatible with violations of Assumption 2 by moving the example from the appendix into the main text. ---------------------------- # Post-Rebuttal and Discussion Edit: Following discussion with the other reviewers, I'd like to reiterate that I think the exposition of the revelation game really does need to be clarified. The other reviewers convinced me that the writing and notation in that section was a bit too convoluted and vague, especially for the section that presents the key contribution of the paper. They convinced me to temper my recommendation a bit, but I still think this paper is an Accept. I also stand by my assessment that the school choice example probably doesn't belong in the main text because the lack of identification there doesn't really fit into the epsilon-BNE framework. So I do think that it would still make sense to have an example where you show that a population of sub-optimal agents would in fact induce values contained in the RMAC bounds.

Reviewer 2

Update: Based on the authors' response, I am not convinced that my main concerns will be addressed adequately. I will, therefore, maintain my current recommendation. Originality: As far as I know the approach proposed is innovative –– it uses ideas from the partial identification literature, but adapts them to a new setting in a novel way. Quality / Clarity: the paper has a number of issues concerning rigor and clarity. The authors try to explain concepts in words instead of drowning the reader in mathematical formalism –– this is generally a good thing. Unfortunately, this paper falls into the other excess: some important objects are either undefined or defined ambiguously. - What is the mathematical structure of $\mathcal{D}$? What does an element of $\mathcal{D}$ look like? Does it contain information on a single game, or multiple games? Are the types observed and part of $\mathcal{D}$? - On line 132 you have an object $d_i$ that, as far as I can see, is undefined. You call it an action but up to that point actions where denoted $a_i$. You use $d_i$ again when writing down $Regret^{\mathcal{G}}_j$, as if it were an action. - As defined in Definition 1, a utility function takes $N+1$ arguments: N actions and a type (the type coming last). When you use utilities to define the regret, however, the first argument is an action, the second is a type and the third argument, \mathcal{D}_j$ is... well, not defined formally anywhere. Please clarify. - You sort of define it in the text but I that the concept of $\epsilon$-BNE deserved a proper definition environment. - I also found the revelation game somewhat unhelpful for the exposition. I is not clear to me how it connects to problem at hand. Significance: I think that the objective of the paper – relaxing excessively strong assumptions usually made in the literature – is an important one, and any contribution in that direction is welcome.

Reviewer 3

UPDATE: I thank the authors for their comments. I decided to update the score upwards as I expect that all issues pointed out by the reviewers can be addressed in revision. This papers deals with the counterfactual prediction problem in settings with multi agent systems. The problem is that if we change the rules of the game, then the system may try to adapt to the new rules. Estimating the effects of this adaptation requires taking into account the incentive structure of the game. Overall, I find the paper interesting and the applications useful. The paper is well written and the main ideas are clearly presented. At the same time, I have some concerns: 1) There is no adequate comparison with existing ideas and methods in the literature. For example, it is not explained how this method differs practically from [1] and [2]. In [1] the authors discuss the inference problem and propose methods to quantify the uncertainty of the policy change, similar to the confidence bands derived by the authors in this paper. In [2] the setting is also very similar to the setting in the current paper: how to estimate the long-term effects of the policy change on outcomes (e.g., auction revenue). The general approach in [2] also looks very similar to what is proposed in Section 4 (and summarized in Fig 4). Both papers also give consistency results under some assumptions. I understand that the goal in this paper is different, namely, to address robustness to rationality assumptions. However, the end result in all papers is similar, that is, confidence bands on the effects of the policy change. It would help then to understand what are the comparative benefits and limitations of each method. 2) Perhaps with the exception of Theorem 3, there appears not to be much technical contribution in the paper. Not necessarily a crucial problem, but it would be nice to know what new techniques/methods may be more broadly interesting. Minor comment: The sentence is broken in Page 4. [1] "Counterfactual reasoning and prediction in learning systems: The example of computational advertising", Bottou et al. [2] "Long-term causal effects via behavioral game theory", Toulis, Parkes