Polina Golland

#### Abstract

In many scientific and engineering applications, detecting and under- standing differences between two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classifi- cation setting, the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. How- ever, such analysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the sta- tistical model in the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier func- tion. We derive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work. 1 Introduction

Once a classifier is estimated from the training data, it can be used to label new examples, and in many application domains, such as character recognition, text classification and oth- ers, this constitutes the final goal of the learning stage. The statistical learning algorithms are also used in scientific studies to detect and analyze differences between the two classes when the correct answer'' is unknown, and the information we have on the differences is represented implicitly by the training set. Example applications include morphologi- cal analysis of anatomical organs (comparing organ shape in patients vs. normal controls), molecular design (identifying complex molecules that satisfy certain requirements), etc. In such applications, interpretation of the resulting classifier in terms of the original feature vectors can provide an insight into the nature of the differences detected by the learning algorithm and is therefore a crucial step in the analysis. Furthermore, we would argue that studying the spatial structure of the data captured by the classification function is important in any application, as it leads to a better understanding of the data and can potentially help in improving the technique. This paper addresses the problem of translating a classifier into a different representation

that allows us to visualize and study the differences between the classes. We introduce and derive a so called discriminative direction at every point in the original feature space with respect to a given classifier. Informally speaking, the discriminative direction tells us how to change any input example to make it look more like an example from another class without introducing any irrelevant changes that possibly make it more similar to other examples from the same class. It allows us to characterize differences captured by the classifier and to express them as changes in the original input examples. This paper is organized as follows. We start with a brief background section on kernel- based classification, stating without proof the main facts on kernel-based SVMs necessary for derivation of the discriminative direction. We follow the notation used in [3, 8, 9]. In Section 3, we provide a formal definition of the discriminative direction and explain how it can be estimated from the classification function. We then present some special cases, in which the computation can be simplified significantly due to a particular structure of the kernel. Section 4 demonstrates the discriminative direction for different kernels, followed by an example from the problem of statistical analysis of shape differences that originally motivated this work.