Demystifying Black-box Models with Symbolic Metamodels

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental


Ahmed M. Alaa, Mihaela van der Schaar


Understanding the predictions of a machine learning model can be as crucial as the model's accuracy in many application domains. However, the black-box nature of most highly-accurate (complex) models is a major hindrance to their interpretability. To address this issue, we introduce the symbolic metamodeling framework — a general methodology for interpreting predictions by converting "black-box" models into "white-box" functions that are understandable to human subjects. A symbolic metamodel is a model of a model, i.e., a surrogate model of a trained (machine learning) model expressed through a succinct symbolic expression that comprises familiar mathematical functions and can be subjected to symbolic manipulation. We parameterize symbolic metamodels using Meijer G-functions — a class of complex-valued contour integrals that depend on scalar parameters, and whose solutions reduce to familiar elementary, algebraic, analytic and closed-form functions for different parameter settings. This parameterization enables efficient optimization of metamodels via gradient descent, and allows discovering the functional forms learned by a machine learning model with minimal a priori assumptions. We show that symbolic metamodeling provides an all-encompassing framework for model interpretation — all common forms of global and local explanations of a model can be analytically derived from its symbolic metamodel.