NeurIPS 2020

Learning to summarize with human feedback


Meta Review

This paper presents an empirical study on learning summarization models from human feedback. The authors use RL (PPO) to learn an abstractive summarization model from human judgements on top of an MLE-based supervised model. The thorough experiments produce strong results in the large-scale and cross-domain settings. Cons: As the authors admitted, this paper essentially has little technical novelty compared to previous existing works. Pros: However, the paper obtains strong empirical results with large data scale and model size, which is likely to set a new standard for the task of summarization, and the human feedback dataset collected with the experiments is likely to be very useful for other researchers in this area. Although all reviewers acknowledge the merit of this empirical paper, there are complaints about the long appendix and referring too much to it in the main paper. We hope the authors will take this into consideration and include important results from the appendix. Based on these, our recommendation for this paper is acceptance.