Volker Tresp, Michiaki Taniguchi
This paper discusses the linearly weighted combination of estima(cid:173) tors in which the weighting functions are dependent on the input . We show that the weighting functions can be derived either by evaluating the input dependent variance of each estimator or by estimating how likely it is that a given estimator has seen data in the region of the input space close to the input pattern. The lat(cid:173) ter solution is closely related to the mixture of experts approach and we show how learning rules for the mixture of experts can be derived from the theory about learning with missing features. The presented approaches are modular since the weighting functions can easily be modified (no retraining) if more estimators are ad(cid:173) ded. Furthermore, it is easy to incorporate estimators which were not derived from data such as expert systems or algorithms.