Review for NeurIPS paper: Learning outside the Black-Box: The pursuit of interpretable models

NeurIPS 2020

Learning outside the Black-Box: The pursuit of interpretable models

Review 1

Summary and Contributions: The paper proposes a method to globally approximate any continuous black-box function using Meijer G-functions instead of polynomial splines. The method is tested on an SVM and MLP trained on 5 UCI datasets which contain attribute data.

Strengths: - The paper demonstrates that the method achieves good results on the tested domain and black-box models; - The method is compared against relevant alternatives; - The method succeeds in achieving a parsemonious representation, satisfying important XAI model requirements

Weaknesses: Model agnostic explanation methods find strength in the fact they can be used on various types of models and implicitly datasets, often including text and images, see [1, 2] yet, in this paper only datasets from UCI repository is used and this type of data is not representative of the wide range of datatypes that could be used. In itself, this is not problematic, however the paper would be significantly improved if there was some indication of how well this method would translate outside of the tested domain. For example what can I expect if I wanted to approximate a modern CNN trained on an image classification dataset? [1] Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., & MÃžller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research, 11 (Jun), 1803–1831. [2] Ribeiro, M. T., Singh, S., & Guestrin, C. (2018). Anchors: High-precision model-agnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence.

Correctness: I think so

Clarity: The paper is well written from a grammatical, style, and organizational point of view. The proposed method and novel concepts are well articulated. Section names and titles are well done, and contain pertinent pieces of information. So well presented from content and style viewpoint.

Relation to Prior Work: Yes

Reproducibility: No

Additional Feedback: Post-rebuttal comment: I thank the authors for their feedback. The method achieves good results on the tested domain and black-box models and the authors indicate why their method has advantages over symbolic metamodels.

Review 2

Summary and Contributions: I read the feedback from the authors. I satisfy their answers to my questions. I keep the same position for accepting this paper. I consider the key advantage of this work is showing a good direction in terms of functional level discovery, which is mostly critical in explainable AI. This papers presented a study of projection pursuit using Meijer G-functions, which was able to produces analytical interpretations about the black-box model. The authors called their algorithm to be Faithful Pursuit. They conducted numerical studies on two popular black-box models, Multilayer Perceptron (MLP) and Support Vector Machine (SVM), from five UCI datasets. They compared the performance of Faithful Pursuit with the two black-box models. For the interpretable property of Feature Importance, they showed one example in comparison with a local linear surrogate model (LIME). The proposed Faithful Pursuit was able to capture nonlinear properties for interpretation which is beyond the capacity of a linear model.

Strengths: I like this study from its principle of constructing interpretable models based on Meijer G-functions. The study was in the same principle of [1], but provided a further development. First, the authors investigated the problem from the view of Projection Pursuit. Second, they derive the tree relations between the hyperparameters of Meijer G-functions and the most familiar functions. Third, their model showed the properties of parsimonious with a small number of free parameters and efficiency from using differentiable forms directly in learning. Fourth, they used a mixup strategy in training the interpreter which is meaningful. I consider the study is solid and shows the significance and novelty of the contribution. The study will be interested to most researchers in the NeurIPS community. I agree with the point by the authors about "a new and very promising direction" for this study. It shows a direction for a functional form learning, which is the most important information to understand the system investigated.

Weaknesses: I consider this study is similar to [1] in the principle in constructing models based on Meijer G-functions. In [1], they called their model "Symbolic Metamodels". In this study, the authors called their model "Faithful Pursuit", or " faithful model". For me, it is ok to give a new name, but the main difference between two models seems to be on the parsimonious property, instead of faithful one. I do not think "faithful" is a suitable term to describe the new model, which may be misleaded or confused to the users.

Correctness: I consider the study is in generally correct on the claims and methods.

Clarity: Yes, in most parts. However, I have difficulties in some parts given below. 1. I have a difficulty to understand #Terms in Table 2 and its explanations. For me, the solution for the Faithful model should output the selected (m,n,p,q) from H. which is an important information about the functional forms learned. 2. In the Supplementary Material, I cannot see why H = (0,1,3,1) is selected? It is for approximating what types of functions. 3. I have a difficulty to understand Figure 3 in the Supplementary Material and its explanations. Did the authors want to say the other Meijer G-functions are able to get the similar solution around the optimum so that it is difficult to tell which function should be selected.

Relation to Prior Work: Yes, in a general sense.

Reproducibility: Yes

Additional Feedback: 4. If the authors would like to keep the term "faithful", it is better to define it. I know the work by Andrews, R., Diederich, J., & Tickle, A.B. (1995). Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-based Syst., 8, 6, 373-389. where they used the term "fidelity" for the similar study. This is why to define the term. 5. What did authors mean for "a global interpretation"? If a model is faithful to the physical system, it should be able to cover the locality property (higher orders of Taylor expansions?). I was confused on this point for the reason of missing definitions. 6. The data of the selected (m,n,p,q) is better to be given, such as from one example in Table 2. Readers will know the selected functional forms in the different k.

Review 3

Summary and Contributions: The authors propose a method for approximating a given continuous black-box function with a linear combination of (a subclass of) Meijer G-functions in an attempt to generate its ``global interpretation''. In particular, they identified a small subset of the Meijer G-functions that already covers a large class of many familiar functions (e.g., polynomial, rational, exponential, trigonometric and hypergeometric functions) and empirically showed that such subset suffices to accurately approximate supervised models (SVM and MLP) trained on five UCI datasets. The generated approximation is global in the sense that it approximates the original black box function on the whole input space, and is more interpretable in the sense that it is a sum of functions of well-known forms. While the evaluation remains rather limited, the authors demonstrate a gain from their method in the interpretability aspect by peforming a 2nd order Taylor expansion of the approximation and showing on the Wine dataset that interaction between some features may play an important role in prediction --- a form of information that cannot be provided by the prior popular interpretability methods such as DeepLift, SHAP, LIME, etc that can only quantify importance of individial features independently.

Strengths: - the idea of employing the combination of projection persuit and back-fitting to iteratively approximate a black-box function with a linear combination of G-functions seems intuitive and works well on some relatively low-dimensional public datasets. The identification of an expressive subset circumvents the need for hyper-parameter optimization.

Weaknesses: - The validation of the interpretability aspect of the proposed method seems rather preliminary. Quantitative validations or comparison with other relevant methods are absent. For example, evaluating the feature importace values in synthetic experiments where the underlying true feature ranking is known could be added e.g., Fig.4 in [1] might be a good option. Since the proposed method can also capture feature interaction, designing a similar synthetic scenario where the importance of pairwise interactions are known might also be worthwhile. - The advantages of the proposed method over the most relevant prior work [1] still remains unclear. It would be nice if the authors could include an example or an experiment in which some aspects of the two methods are compared in concrete terms. For example, a comparison of the two methods in the prognosis prediction task for breast cancer (as done in [1]) may be informative. [1] demystifying black-box models with symbolic metamodels, NeurIPS 2019

Correctness: I believe that the notion of interpretability should be defined more precisely in the specific context of this work. If a "non-interpretable" blackbox function is approximated by a linear combination of (potentiall many) functions of well-known forms (e.g., polynomial, rational, exponential, trigonometric and hypergeometric functions), what kinds of interpretations could we elicit that we otherwise could not? I believe the Taylor approximation to capture higher order interactions is one form of such interpretations, but there could be more to this, and a more extended discussion would be a valuable addition.

Clarity: The paper is well-written overall, but I believe that the paper would benefit a lot from concretising the specific notion of interpretability that they aim to address in their work (see my comments to the above section).

Relation to Prior Work: While the conceptual differences w.r.t. the closest prior work [1] have been clearly discussed in the related work section, no experiments have been performed to demonstrate convincingly how these differences would matter in practice. Alternatively, perhaps a toy-ish example where such differences would affect the interpretability of the resultant approximations could be included. [1] demystifying black-box models with symbolic metamodels, NeurIPS 2019

Reproducibility: Yes

Additional Feedback: Minor comments: - line 106: the authors describe that projection persuit with polynomial splines is less interpretable because the indicidual terms (polynomials) may not have natural interpretations ... what do you mean by this? - line 212: "in" => "is" - line 249: R^2 is not defined -------------------------- after rebuttal --------------------------------- Having read the author response and other reviews, I have revised my score from 5 to 6. This was my attempt to calibrate my score based on the reviews of the other reviewers, and the authors have provided extra evidence on the benefits of the proposed method. However, I still remain unchanged in my view that the quality of empirical comparison of the gained interpretability is insufficient to claim practical improvements.

Review 4

Summary and Contributions: The paper proposes an algorithm, based on the use of Meijer G-functions, that produces global interpretation of any given continuous black-box function. In this way, the topic of interpreting Machine Learning models is once again covered, proposing a method that provides, as other methods also do, knowledge about feature importance and feature interaction, as well as independence to the model used. However, it differs from most of the other existing methods precisely in that it provides global information and not only local information. The last aspect that makes this algorithm remarkable is that it is an algorithm that produces parsimonious expressions.

Strengths: The claims in this article are sound. They are based on a strong mathematical basis and are supported by experimentation The topic discussed (interpretability in Machine Learning models) is a topic of high interest. The proposal made is adequately compared with previous techniques and shows better properties

Weaknesses: This is a work that shows quite a lot of mathematical complexity. Thus, the jumps along the article necessary to understand some parts of it make it difficult to understand at certain times. It is a work that represents the first step in a new promising line so that the experimentation performed was not done with the use of datasets or complex models.

Correctness: - Claims, methods, and empirical methodology are correct.

Clarity: The paper is well written. In order to help in some corrections, it could be nice to note some minor faults: * Sometimes to express multiplication, instead of using the appropriate point, a normal point is used (the appropriate point corresponding to the multiplication is centered and not at the bottom). See formula 12 (bad) vs formula 13 (well) as an example. * Sometimes the term backfitting is used and sometimes back-fitting. Unify this. * In section 6, line 279, figure 2 b refers to v, not to v1. It is imperative to clarify again that the paper is well written and errors noted are only to help future corrections.

Relation to Prior Work: Works in the same direction are shown and are compared by a comparative table of properties of these previous works and the proposed work, showing the advantages that the latter has over such previous works.

Reproducibility: Yes

Additional Feedback: Having read the author response and other reviews, I decided to keep my previous score. I think it is a good paper which deserves an acceptation.