Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
All reviewers agree on the contribution of the work in establishing theoretical conditions for existence and uniqueness of NE in MGFs with unknown rewards and dynamics and proposing a soft Q-learning based algorithm and provide convergence analysis. Reviewer #2 had some concerns regarding the difference between stationary and non-stationary settings as well as significance in difference and missing discussions to prior related works. Most of these concerns are addressed in the rebuttal and reviewer discussion. We thus decide to accept the paper. Please incorporate reviewers' comments in preparing the camera-ready version.