NeurIPS 2020

Dynamic Regret of Policy Optimization in Non-Stationary Environments


Meta Review

Reviewers generally agreed that this paper makes a good theoretical contribution towards learning in non-stationary environments, depite some shortcomings such as generality of non-stationary MDP assumption and lack of experiments.