Ralf Schoknecht, Artur Merke
Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. How(cid:173) ever, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields suffi(cid:173) cient conditions of convergence for combinations of reinforcement learning algorithms and linear function approximation. This allows to analyse if a certain reinforcement learning algorithm and a cer(cid:173) tain function approximator are compatible. For the combination of the residual gradient algorithm with grid-based linear interpolation we show that there exists a universal constant learning rate such that the iteration converges independently of the concrete transi(cid:173) tion data.