The paper provides a new tool (theory of switching systems) for the convergence analysis of RL algorithms that can be of interest to the wider RL theory community. Compared with existing results, several improvements are made. Authors should revise the paper to address reviewer comments. Prior works in this area need to be discussed more carefully, as pointed out by reviewers.