Jay Alexander, Michael C. Mozer
Casting neural network weights in symbolic terms is crucial for interpreting and explaining the behavior of a network. Additionally, in some domains, a symbolic description may lead to more robust generalization. We present a principled approach to symbolic rule extraction based on the notion of weight templates, parameterized regions of weight space corresponding to specific symbolic expressions. With an appropriate choice of representation, we show how template parameters may be efficiently identified and instantiated to yield the optimal match to a unit's actual weights. Depending on the requirements of the application domain, our method can accommodate arbitrary disjunctions and conjunctions with O(k) complexity, simple n-of-m expressions with O( k!) complexity, or a more general class of recursive n-of-m expressions with O(k!) complexity, where k is the number of inputs to a unit. Our method of rule extraction offers several benefits over alternative approaches in the literature, and simulation results on a variety of problems demonstrate its effectiveness.