Part of Advances in Neural Information Processing Systems 2 (NIPS 1989)
Michael C. Mozer, Jonathan Bachrach
Consider a robot wandering around an unfamiliar environment. performing ac(cid:173) tions and sensing the resulting environmental states. The robot's task is to con(cid:173) struct an internal model of its environment. a model that will allow it to predict the consequences of its actions and to determine what sequences of actions to take to reach particular goal states. Rivest and Schapire (1987&, 1987b; Schapire. 1988) have studied this problem and have designed a symbolic algo(cid:173) rithm to strategically explore and infer the structure of "finite state" environ(cid:173) ments. The heart of this algorithm is a clever representation of the environment called an update graph. We have developed a connectionist implementation of the update graph using a highly-specialized network architecture. With back propagation learning and a trivial exploration strategy - tions - gorithm on simple problems. The network has the additional strength that it can accommodate stochastic environments. Perhaps the greatest virtue of the connectionist approach is that it suggests generalizations of the update graph representation that do not arise from a traditional, symbolic perspective.
choosing random ac(cid:173) the connectionist network can outperfonn the Rivest and Schapire al(cid:173)