NeurIPS 2020

Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition


Meta Review

The paper shows a model-free algorithm with an improved regret bound for finite-state finite-horizon MDP problems. The new bound closes the gap with the best model-based result. This is a nice theoretical contribution.