Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)

*David Nix, Andreas Weigend*

We present a new method for obtaining local error bars for nonlinear regression, i.e., estimates of the confidence in predicted values that de(cid:173) pend on the input. We approach this problem by applying a maximum(cid:173) likelihood framework to an assumed distribution of errors. We demon(cid:173) strate our method first on computer-generated data with locally varying, normally distributed target noise. We then apply it to laser data from the Santa Fe Time Series Competition where the underlying system noise is known quantization error and the error bars give local estimates of model misspecification. In both cases, the method also provides a weighted(cid:173) regression effect that improves generalization performance.

1 Learning Local Error Bars Using a Maximum Likelihood

Framework: Motivation, Concept, and Mechanics

Feed-forward artificial neural networks used for nonlinear regression can be interpreted as predicting the mean of the target distribution as a function of (conditioned on) the input pattern (e.g., Buntine & Weigend, 1991; Bishop, 1994), typically using one linear output unit per output variable. If parameterized, this conditional target distribution (CID) may also be

·http://www.cs.colorado.edu/~andreas/Home.html.

This paper is available with figures in colors as ftp://ftp.cs.colorado.edu/pub/ Time-Series/MyPapers/nix.weigenCLnips7.ps.Z .

490

David A. Nix, Andreas S. Weigend

viewed as an error model (Rumelhart et al., 1995). Here, we present a simple method that provides higher-order information about the cm than simply the mean. Such additional information could come from attempting to estimate the entire cm with connectionist methods (e.g., "Mixture Density Networks," Bishop, 1994; "fractional binning, "Srivastava & Weigend, 1994) or with non-connectionist methods such as a Monte Carlo on a hidden Markov model (Fraser & Dimitriadis, 1994). While non-parametric estimates of the shape of a C1D require large quantities of data, our less data-hungry method (Weigend & Nix, 1994) assumes a specific parameterized form of the C1D (e.g., Gaussian) and gives us the value of the error bar (e.g., the width of the Gaussian) by finding those parameters which maximize the likelihood that the target data was generated by a particular network model. In this paper we derive the specific update rules for the Gaussian case. We would like to emphasize, however, that any parameterized unimodal distribution can be used for the em in the method presented here.

j------------, I I------T-------, , I A I A2,. ) o y(x) 0 cr IX /\ i '.

Do not remove: This comment is monitored to verify that the site is working properly