Part of Advances in Neural Information Processing Systems 5 (NIPS 1992)
In this paper, we discuss on-line estimation strategies that model the optimal value function of a typical optimal control problem. We present a general strategy that uses local corridor solutions obtained via dynamic programming to provide local optimal con(cid:173) trol sequence training data for a neural architecture model of the optimal value function.
In this paper, the problems of adaptive control using neural architectures are ex(cid:173) plored in the setting of general on-line estimators. 'Ve will try to pay close attention to the underlying mathematical structure that arises in the on-line estimation pro(cid:173) cess.
The complete effect of a control action Uk at a given time step t/.; is clouded by the fact that the state history depends on the control actions taken after time step tk' So the effect of a control action over all future time must be monitored. Hence, choice of control must inevitably involve knowledge of the future history of the state trajectory. In other words, the optimal control sequence can not be determined until after the fact. Of course, standard optimal control theory supplies an optimal control sequence to this problem for a variety of performance criteria. Roughly, there are two approaches of interest: solving the two-point boundary value