This work addresses three problems with reinforcement learning and adap(cid:173) tive neuro-control: 1. Non-Markovian interfaces between learner and en(cid:173) vironment. 2. On-line learning based on system realization. 3. Vector(cid:173) valued adaptive critics. An algorithm is described which is based on system realization and on two interacting fully recurrent continually running net(cid:173) works which may learn in parallel. Problems with parallel learning are attacked by 'adaptive randomness'. It is also described how interacting model/controller systems can be combined with vector-valued 'adaptive critics' (previous critics have been scalar).