Babak Hassibi, David Stork, Gregory Wolff
We extend Optimal Brain Surgeon (OBS) - to allow for general error mea(cid:173) method for pruning networks - sures, and explore a reduced computational and storage implemen(cid:173) tation via a dominant eigenspace decomposition. Simulations on nonlinear, noisy pattern classification problems reveal that OBS does lead to improved generalization, and performs favorably in comparison with Optimal Brain Damage (OBD). We find that the required retraining steps in OBD may lead to inferior generaliza(cid:173) tion, a result that can be interpreted as due to injecting noise back into the system. A common technique is to stop training of a large network at the minimum validation error. We found that the test error could be reduced even further by means of OBS (but not OBD) pruning. Our results justify the t ~ 0 approximation used in OBS and indicate why retraining in a highly pruned network may lead to inferior performance.