NeurIPS 2020

### Review 1

Summary and Contributions: The paper introduces a multiagent network formation algorithm based on game-theoretic principles and inverse reinforcement learning to model the appearance, retainment and disappearance of network structures and reconstruct the payoffs. The algorithm shows some robustness which appears to improve beyond more generic algorithms applied in the domain.

Strengths: The paper's model is adapted to the specific domain and therefore takes stronger account of the specificities of the network game model. The model is based on a Markovian treatment of the sequence of networks, on evaluating the "networked" preferences and matching the modeled preference dynamics to the actually observed one.

Weaknesses: While I am not a domain expert, I regularly read game-theoretic and network dynamics papers, yet I found the present paper difficult to read. I suspect that the paper probably tries to convey so much material that it is affected by the space limitation at NeurIPS. It's not necessarily badly written, but it is simply too dense and takes to little time to develop ideas and motivation. I believe that the paper would better be placed in a journal with less space limitation and where the authors can expand properly on context and ideas; I do not think that NeurIPS is the best place to communicate this. I also didn't find the choice of network examples in Figure 2 particularly convincing or transparent. These are two very small networks and it was not clear to me what the rationale of the choice of these networks is and what insight these examples transport. I cannot say that the appendix helped clarifying this. That being said, maybe this paper needs really a deep domain expert for review, but as the conference has a more general audience, I believe the paper should be sufficiently accessible also to a generally informed scientific public. # Edit after rebuttal/discussion I haven't changed my opinion.

Correctness: As far as I could judge, the paper looks correct. On line 643, appendix B, a bracket is missing in the numerator.

Clarity: As mentioned under weaknesses, the paper could profit from a significant expansion, motivation and development to be more accessible. It is very densely written and important decisions are not always motivated. Just as one prominent example: the reward function on line 278 is not motivated, just mentioned and referred to in literature. What is the circle with the center point? Component-wise multiplication? With respect to such things the paper needs to be self-contained. Note that this is not even explained in the supplementary material (but it should be explained in the main text, anyway). This looks more like material for a journal paper condensed into a NeurIPS submission.

Relation to Prior Work: Looks fine.

Reproducibility: No

Additional Feedback: - line 2: non-inanimate - sounds overwrought, why not just "animate"? If there is a reason, explain. - line 53: sentence grammar: "have proposed Markov version" - article missing - line 98: where did the A disappear to in P_T(S)? - line 111: the notation does not seem consistent. G = (V,A,X) is an element of G, but in the previous line, G has been defined as a set symmetric zero-diagonal matrices (or else undirected edge sets). Which of these? If they are equivalent, explain. - line 113: what is the feature set going to model? - line 116: A_0 = 0 means no connections? - line 121: what is d referring to? The neighbours that a vertex can see? - line 164: space should be after not before colon - line 164: the very last seems to have been a \pi, but lost the backslash. - line 165: "to perceive noise injected" -> "to perceive the noise-injected" - line 194: what's k? Number of iteration steps? - line 215: there is too much information packed in this line. What is the message doing, what is it transporting, what is U evaluating, etc. - line 222: "Gaussian policy" - what space does this live in. The actions, I thought were creation, maintenance and deletion of links. - line 230: how are the actions of the two agents combined? I found the whole process of how agents' action matrix operates difficult to parse. - Figure 2: (a) you talk about darker nodes, but all I see are different colours. Different colours cannot be easily compared with respect to their brightness. What does (b) tell us? What are the little circles in (c) indicating and how should we read/interpret the result? - Figure 1: why does it come after Figure 2? - I had to spend time looking for it. - line 299: this statement seems to be important. Are you saying that the advantage of your method is due to the learned policy and that is more effective than learned objectives? If so, and I understood this correctly, this needs to be a prominent tagline of your work and your arguments more built up to make this point. - line 329: what do you keep fixed when you transfer the learned reward function? - Table 2 (a): What numbers is the first column showing? Percentages?

### Review 2

Summary and Contributions: This work seeks to estimate payoffs and policies that jointly form a Markov perfect equilibrium for sequential network formation games, with the aim of predicting the future evolution of real-world networks (of both the same class, and through transfer learning, of different classes). The main contribution is a learning framework using graph neural networks (allowing for abstraction away from the details of number of agents and exact network structure) for both the policy and reward models. This model is evaluated on both synthetic and real-world data.

Strengths: This work considers an important area. There is a lot to like about its approach: Learning payoff functions directly from the data is preferable to specifying an ad hoc model (as the paper convincingly argues). The use of GNNs to learn the policies and payoff functions is an intriguing direction.

Weaknesses: There are a number of errors in the preliminaries section, both minor and substantive; see "correctness" for details. I found the paper difficult to understand in many places; see "clarity" for details. Overall, I think the project is worthwhile and seems to have merit; however, I cannot recommend that this paper be accepted in its current form. This paper packs a lot into eight pages, but it's not always clear that it has demonstrated its claims. "Interpretability" appears to be a central claimed example, but it's not clear what is meant by that in the context of neural models. One aspect of the empirical evaluation that I am concerned by is the dependence on the final form of the network. In training, actual edges are added to the network is essentially random order (since the real order is not available); this makes it hard to validate the learned model.

Correctness: I am not convinced by some of the decisions made for the empirical evaluation (the need to infer connection order described in "weaknesses" primarily). Other parts of the evaluation simply don't make sense to me; they may be sensible, but I don't understand how the claims follow from the presented evaluation (see the third item in "clarity"). Beyond the notation issues described in "clarity", the preliminaries sections contained several a number of definitions that were either errors or needed further clarification: * p.3 l.108: The expression for the expected discounted utility starting from time t is off by a constant factor (namely, \gamma^{t-1}) * p.4 l.162: V^*_i(s) as defined is simply the constant value 1 at all states s; this is surely not what you meant. Also the sum ranges over a', which is never used within the sum; I assume a_i is meant to be a'. * p.4 l.166: "In the quantal response framework, agent i is assumed to perceive noise injected payoff version of this expression a \hat{u}_i(...) = u_i(...) + \epsilon_(s,a)...": This is true but not clearly related by the text to the form of the policy described above. The reason that agents behave stochastically is precisely because they are deterministically best-responding to a stochastically perturbed utility. * p.4 l.186: subtracting RL(r) seems to minimize entropy and/or max attainable reward for a given reward function, which seems curious for a max-entropy approach. Can you give some intuition about what is going on here?

Clarity: The paper is not as clear as it needs to be. Critical aspects of the architecture and datasets are described in a way that feels rushed and incomplete. I did not feel that enough detail was provided for me to thoroughly evaluate the paper's claims. I list four major clarity issues below, followed by a number of notation issues and other minor problems. * p.5 l.236-242: The description of the reward function is extremely brief and confusing. Are we learning a separate reward function for each agent? If so, how does transfer work? The text seems to suggest that reward is a function of state only, but I don't understand how. * Similarly, the policy function conditions on player-specific parameters \phi_i; how could this work for transfer when the test set consists of different agents? * p.7 l.312-325: I found this paragraph on the relationship between learned payoffs and Katz centrality very confusing; I don't understand the checks being performed here or what they are in aid of. * s.4: I'd like to hear a lot more about what these datasets actually measure. E.g., the bank dataset purports to measure "confiding relationships"; what are those and how would you ever measure them? Notation issues / minor issues - p.3 l.97: The notation for distributions is inconsistent and potentially confusing; e.g., the use of P_T as a function of state and action profile, as a set of distributions over states, and as a function of state alone, is confusing. I suggest using explicit notation such as $\Delta(X)$ (the set of all distributions over discrete set $X$) where appropriate. - Why is u_i in bold? - It would be better to choose different letters for adjacency matrix \mathbf{A} and actions sets A_i rather than distinguishing them by typeface alone. - p.3 l.131 "we do not impose any specific functional form on the reward function", and p.3 l.135: "we outline the parameterization of the reward function in the next section": These two statements are not consistent with each other. I think you mean that you do not make strong structural assumptions about the reward function, and instead use a very flexible GNN functional form for the reward function. - p.5 l.221: What is \bar{o}_{i,t}? Why is a quantity with a bar (representing profile of something) being indexed by user? - Table 2 needs much better captions. (a) is labeled only as "transfer performance", but the dataset and prediction tasks are not described. Even though some of this information is given in the text, a table or figure plus its caption should ideally make sense standing alone.

Relation to Prior Work: A good survey of related work is included as part of the introduction.

Reproducibility: No

### Review 3

Summary and Contributions: Propose MINE, a framework to learn network formation games as a way to predict or explain network structure. The approach assumes a structured form of network emergence game, and uses a form of multiagent iRL to induce payoff data fitting a QRE model. The goal is to jointly discover agents’ strategy profiles (network emergence policy) and the latent payoff mechanism in the form of learned reward function, and then use this to account for observed structures or make predictions.

Strengths: Elaborate and conceptually well-founded approach for combining data and strategic modeling for network modeling. Thorough experimental analysis covering a range of examples and issues.

Weaknesses: Complex methods combining a plethora of techniques. Difficult to unpack the underlying assumptions and contributions of different elements of the model.

Clarity: Generally readable, albeit quite dense.

Relation to Prior Work: A lot of good description of related work, especially threads built on. Sweeping statement about lack of prior consideration of game-theoretic equilibrium in structure learning (lines 31-32) is not accurate. See for example prior work by Honorio & Ortiz (JMLR 2015), Irfan & Ortiz (various), and Duong et al. (several papers on graphical multiagent models). Perhaps also work on learning graphical game models.

Reproducibility: No

Additional Feedback: Experimental evaluation seems to be generally well-considered and creative. For instance, the piece with swapping a leader and follower in the karate network and changing the KC value of nodes without changing their feature importance values in the network were good examples of MINE’s learning capabilities (not just learning structure, but deeper interactions). In the experiments they report fraction of links recovered, but could MINE’s learning come up with links that don’t exist in the real network? And if so, at what rate does this happen? Depending on the application, this could be important in addition to (or even more so) than just the fraction of links recovered that are accurate. Misc Notes • line 110, not sure why it is {0,1}^nx(n-1) not {0,1}^nxn • line 118, should the definition of η include “for all j in V” or something? • It seems like the paper defines the η function as the neighborhood of I (ex: line 118), but also uses N(i) to denote neighborhood (ex: line 216) • line 239, what is o with a bar on top? In section two, it defines the bar as meaning “for all agents”, but o_i appears to mean the observation for just agent i • line 256, “trajectories trajectory” • line 260, paypoff --- Post-rebuttal Our review was positive before the rebuttal and the author response did not change that.

### Review 4

Summary and Contributions: Network emergence in games is often considered to as a function of the state of the network without exploiting the the sequential decision making of individual actors in the game. The authors propose a framework, MINE, that leverages the existence of Markov perfect equilibrium in multi-agent games with intuitions from IRL. The authors experiment on different datasets to evaluate on three author defined metrics quality, transferability and interpretability. The authors claim that the proposed architecture performs well.

Strengths: - The approach is novel to understand network emergence in multi-agent games. - The approach is grounded with existing literature on inverse RL. - The proposed architecture helps in learning on the toyish tasks.

Weaknesses: - The writing can be improved to better convey the results of the experiments. - The proposed method is evaluated on certain dimensions (quality, transferability, interpretability) but in the strategic link prediction task the proposed model does not perform better than the baselines. - It is not clear if the baselines fall short in the qualitative metrics (transferability, quality and interpretability), as MINE is not compared against the baselines. - The description of the proposed approach can be improved for reproducibility.

Correctness: - The intuition and theoretical intuitions of the proposed appears valid to the best of my knowledge.

Clarity: - Line 17 sentence structure - Line 246 “the the” - Line 256 -trajectory-

Relation to Prior Work: - The authors cite the relevant work and also compare against them in an experiment.

Reproducibility: No