Review for NeurIPS paper: Learning Retrospective Knowledge with Reverse Reinforcement Learning

NeurIPS 2020

Learning Retrospective Knowledge with Reverse Reinforcement Learning

Meta Review

This paper proposes Reverse Generalized Value Functions (RGVF) to model the influence of past events on the current state. In addition to the theoretical analysis of this novel concept, experiments on anomaly detection and representation learning illustrate its potential benefits. All reviewers appreciated the clear presentation of the core idea, which might have the potential to lead to further applications beyond those found in this submission. That being said, there were a few concerns regarding the significance of the empirical results, which I would like to amplify as I believe this is definitely a significant weakness of the paper: 1: For anomaly detection there is no comparison at all. I realize that "anomaly detection in RL" is not a popular research field, but couldn't one use standard anomaly detection algorithms (or even just straightforward heuristics) and apply them to some quantities collected by the agent? For instance for the drone "policy anomaly" example, just collecting the frequency of each action over N timesteps, and checking regularly if it changes, should be enough to detect that something is going wrong. For the "reward anomaly", similarly, it might be enough to just keep track of the accumulated reward every N timesteps. Or, you could monitor the error of your critic (which predicts how much reward you should get in the future). Also, as mentioned by several reviewers, the plots in Fig. 3 show that although the anomaly change points can easily been seen visually from the plots, it remains unclear how to translate the proposed method into a reliable anomaly detector (how do you set the threshold? how do you deal with the large variance of the anomaly detection score?) 2: For representation learning the comparisons are only vs IMPALA (the main baseline), IMPALA+RewardPrediction (it is good to check how this performs, but clearly it is very bad), and IMPALA+PixelControl (a relevant baseline, but based on a 4-year-old technique). And these experiments are only performed on 10 Atari games (while it is well known that there is a lot of variance across the typical 57 games of the full benchmark). There is no comparison to more recent representation learning techniques for RL, nor any mention of them in the "Related Work" section. The raw scores are not provided (in main text or Appendix), making it impossible to compare to other work. Finally, comparison to IMPALA+GVF should have definitely been included. At least, the authors do acknowledge that the goal of their work is not really to improve on existing techniques: "our empirical study is just a proof of concept to show the utility of Reverse GVFs in representation learning and does not aim for a new state of the art in designing auxiliary tasks" (this is a comment on the representation learning section, which I consider to also apply to the anomaly detection section). But the current experiments clearly aren’t convincing enough to show that the novel concept of RGVFs are actually useful in practice (either to do something better, or to enable new applications). None of the three reviewers participated in the post-rebuttal discussion. In light of this, given that they all considered the paper worth accepting in spite of its lack of convincing empirical results, I will follow their original recommendation. But I encourage the authors to reflect on the issues mentioned above, and consider adding more experimental results / references / discusses to address some of them in the camera ready version.