Review for NeurIPS paper: Consequences of Misaligned AI

NeurIPS 2020

Consequences of Misaligned AI

Meta Review

The paper describes a theoretical setting where using incompletely specified ground truth human can perform arbitrarily poorly, and prove theorems that show conditions for arbitrarily poor performance. The paper also discuss several ways to mitigate this concern. The reviews were more on the positive side, but a few concerns were raised. One concern is the applicability of the results to current real world settings. Another was that the conditions and results are not very transparent. A third raised concern was a more philosophical one. I think the paper does raise interesting results and formalizes an important issue in a novel way. In the rebuttal, the authors describe ways to resolve the second concern, which is quite reasonable and doable. I would like to see more theory papers such as this that raise interesting discussion by formalizing accepted lore, and pave the way to future discussion and related work, and so think that the technical contributions outweigh the first and third concerns.