Part of Advances in Neural Information Processing Systems 2 (NIPS 1989)

*Michael Jordan, Robert Jacobs*

The forward modeling approach is a methodology for learning con(cid:173) trol when data is available in distal coordinate systems. We extend previous work by considering how this methodology can be applied to the optimization of quantities that are distal not only in space but also in time.

In many learning control problems, the output variables of the controller are not the natural coordinates in which to specify tasks and evaluate performance. Tasks are generally more naturally specified in "distal" coordinate systems (e.g., endpoint coordinates for manipulator motion) than in the "proximal" coordinate system of the controller (e.g., joint angles or torques). Furthermore, the relationship between proximal coordinates and distal coordinates is often not known a priori and, if known, not easily inverted.

The forward modeling approach is a methodology for learning control when train(cid:173) ing data is available in distal coordinate systems. A forward model is a network that learns the transformation from proximal to distal coordinates so that distal specifications can be used in training the controller (Jordan & Rumelhart, 1990). The forward model can often be learned separately from the controller because it depends only on the dynamics of the controlled system and not on the closed-loop dynamics.

In previous work, we studied forward models of kinematic transformations (Jordan, 1988, 1990) and state transitions (Jordan & Rumelhart, 1990). In the current paper,

Learning to Control an Unstable System with Forward Modeling

325

we go beyond the spatial credit assignment problems studied in those papers and broaden the application of forward modeling to include cases of temporal credit assignment (cf. Barto, Sutton, & Anderson, 1983; Werbos, 1987). As discussed below, the function to be modeled in such cases depends on a time integral of the closed-loop dynamics. This fact has two important implications. First, the data needed for learning the forward model can no longer be obtained solely by observing the instantaneous state or output of the plant. Second, the forward model is no longer independent of the controller: If the parameters of the controller are changed by a learning algorithm, then the closed-loop dynamics change and so does the mapping from proximal to distal variables. Thus the learning of the forward model and the learning of the controller can no longer be separated into different phases.

1 FORWARD MODELING In this section we briefly summarize our previous work on forward modeling (see also Nguyen & Widrow, 1989 and Werbos, 1987).

1.1 LEARNING A FORWARD MODEL

Given a fixed control law , the learning of a forward model is a system identification problem. Let z = g(s, u) be a system to be modeled, where z is the output or the state-derivative, s is the state, and u is the control. We require the forward model to minimize the cost functional

Jm = ~ J (z - z)T(z - z)dt.

Do not remove: This comment is monitored to verify that the site is working properly