Dynamic Regret of Policy Optimization in Non-Stationary Environments
Meta Review
Reviewers generally agreed that this paper makes a good theoretical contribution towards learning in non-stationary environments, depite some shortcomings such as generality of non-stationary MDP assumption and lack of experiments.