NeurIPS 2020

Information-theoretic Task Selection for Meta-Reinforcement Learning


Meta Review

This paper was quite controversial among the four reviewers, leading to more than 10 pages of discussion (longer than the paper itself!) In the end, two reviewers were advocating for acceptance (R1, R3), one was advocating for rejection (R2), and one was leaning towards rejection (R4). [Note that R4 did not update their score/review, but participated in the discussion] The main pros of the paper are: + The paper studies a new direction of selecting tasks in meta-RL. This is a direction that hasn't been studied before, and will likely become quite relevant in settings where the task distribution is quite heterogeneous + The experimental results suggest that the algorithm performs very well on a large number of simple domains, when combined with MAML and RL^2. + The experiments also include an ablation study. + Time complexity is not an issue; the reviewers appreciated the author response here. These are the main reasons that R1 and R3 were advocating for acceptance. I agree that these are strong points, and make me want to accept the paper. However, there are also a number of weaknesses. The overall execution of the paper is lacking: - There is no theoretical motivation, making it difficult to understand the intuition behind why this method should work. In principle, this should not be a deal-breaker on its own. However, see the next point. - There is limited empirical analysis, which makes it difficult to understand *why* the algorithm works well. - The combination of the two points means that it may be difficult for the reader to draw useful lessons or take-aways from method or experiments in the paper. - The quality of the writing could be significantly improved. - The experiments section doesn't sufficiently describe the experimental set-up See the updated reviews for details on each of these points. The reviewers came up with very specific recommendations for how to improve the paper based on these points. Because the reviewers were unable to come to a consensus, I also took a look through the paper. I have some concerns about the quality of the writing in the methods section and left some detailed comments below. In the balance, I think that the ideas in this paper are worthy of publication, but I urge the authors to improve the writing and analysis in the paper based on the reviewer's detailed feedback and the feedback below. ------------------------------ Feedback on the Writing ------------------------------ Methods section - \mathcal{F} is referred to as both the validation tasks and the test tasks. Which is it? In most places, it is referred to as the validation tasks. In this case, are there a separate set of test tasks that are used for evaluation? If these tasks are actually used in the evaluation, then they should be called the test tasks. - On line 127, it says that n states are sampled uniformly from the tasks in F. What policy collects these states? Or do you assume access to a uniform state distribution that you can sample from? The latter is a strong assumption. - The algorithm boxes are generally quite disconnected from the equations in the main text. The paper would be easier to follow if the algorithm box directly referenced equations in the main text (e.g. if Alg 2 line 9 referenced the equation in line 141). - The overall description of the algorithm is terse — the description in the main text is less than a page long. It would be helpful to go into more detail to clearly convey the method. The algorithm boxes are helpful, but including comments in the pseudo code would make it a lot easier to understand, especially given the large amount of non-standard notation. - Minor: The notation for policy entropy is confusing. H_pi(s) is often terminology used to refer to marginal state entropy, rather than policy entropy. Experiments section (more minor comments) - “We set out to demonstrate its effectiveness experimentally” —> If this is actually what the authors did, then this is poor science. The goal of the experiments should be to experimentally test hypotheses and report the results, not to set out to show good results. - Text in the plots is tiny and hard to read