NeurIPS 2020

### Meta Review

The paper shows a model-free algorithm with an improved regret bound for finite-state finite-horizon MDP problems. The new bound closes the gap with the best model-based result. This is a nice theoretical contribution.