{"title": "A Game Theoretic Approach to Class-wise Selective Rationalization", "book": "Advances in Neural Information Processing Systems", "page_first": 10055, "page_last": 10065, "abstract": "Selection of input features such as relevant pieces of text has become a common technique of highlighting how complex neural predictors operate. The selection can be optimized post-hoc for trained models or incorporated directly into the method itself (self-explaining). However, an overall selection does not properly capture the multi-faceted nature of useful rationales such as pros and cons for decisions. To this end, we propose a new game theoretic approach to class-dependent rationalization, where the method is specifically trained to highlight evidence supporting alternative conclusions. Each class involves three players set up competitively to find evidence for factual and counterfactual scenarios. We show theoretically in a simplified scenario how the game drives the solution towards meaningful class-dependent rationales. We evaluate the method in single- and multi-aspect sentiment classification tasks and demonstrate that the proposed method is able to identify both factual (justifying the ground truth label) and counterfactual (countering the ground truth label) rationales consistent with human rationalization. The code for our method is publicly available.", "full_text": "A Game Theoretic Approach to Class-wise Selective\n\nRationalization\n\nShiyu Chang1,2\u2217\n\nYang Zhang1,2\u2217\n\n1MIT-IBM Watson AI Lab\n\n{shiyu.chang,yang.zhang2}@ibm.com yum@us.ibm.com\n\nMo Yu2\u2217\n2IBM Research\n\nTommi S. Jaakkola3\n3CSAIL MIT\ntommi@csail.mit.edu\n\nAbstract\n\nSelection of input features such as relevant pieces of text has become a common\ntechnique of highlighting how complex neural predictors operate. The selection can\nbe optimized post-hoc for trained models or incorporated directly into the method\nitself (self-explaining). However, an overall selection does not properly capture the\nmulti-faceted nature of useful rationales such as pros and cons for decisions. To this\nend, we propose a new game theoretic approach to class-dependent rationalization,\nwhere the method is speci\ufb01cally trained to highlight evidence supporting alternative\nconclusions. Each class involves three players set up competitively to \ufb01nd evidence\nfor factual and counterfactual scenarios. We show theoretically in a simpli\ufb01ed\nscenario how the game drives the solution towards meaningful class-dependent\nrationales. We evaluate the method in single- and multi-aspect sentiment classi\ufb01ca-\ntion tasks and demonstrate that the proposed method is able to identify both factual\n(justifying the ground truth label) and counterfactual (countering the ground truth\nlabel) rationales consistent with human rationalization. The code for our method is\npublicly available2.\n\n1\n\nIntroduction\n\nInterpretability is rapidly rising alongside performance as a key operational characteristics across\nNLP and other applications. Perhaps the most straightforward means of highlighting how a complex\nmethod works is by selecting input features relevant for the prediction (e.g., [19]). If the selected\nsubset is short and concise (for text), it can potentially be understood and veri\ufb01ed against domain\nknowledge. The selection of features can be optimized to explain already trained models [24],\nincorporated directly into the method itself as in self-explaining models [19, 12], or optimized to\nmimic available human rationales [8].\nOne of the key questions motivating our work is extending how rationales are de\ufb01ned and estimated.\nThe common paradigm to date is to make an overall selection of a feature subset that maximally\nexplains the target output/decision. For example, maximum mutual information criterion [12, 19]\nchooses an overall subset of features such that the mutual information between the feature subset and\nthe target output decision is maximized, or, equivalently, the entropy of the target output decision\nconditional on this subset is minimized. Rationales can be multi-faceted, however, involving support\nfor different outcomes, just with different degrees. For example, we could understand the overall\nsentiment associated with a product in terms of weighing associated pros and cons contained in the\nreview. Existing rationalization techniques strive for a single overall selection, therefore lumping\ntogether the facets supporting different outcomes.\nWe propose the notion of class-wise rationales, which is de\ufb01ned as multiple sets of rationales\nthat respectively explain support for different output classes (or decisions). Unlike conventional\n\n\u2217Authors contributed equally to this paper.\n2https://github.com/code-terminator/classwise_rationale\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\frationalization schemes, class-wise rationalization takes a candidate outcome as input, which can\nbe different from the ground-truth class labels, and uncovers rationales speci\ufb01cally for the given\nclass. To \ufb01nd such rationales, we introduce a game theoretic algorithm, called Class-wise Adversarial\nRationalization (CAR). CAR consists of three types of players: factual rationale generators, which\ngenerate rationales that are consistent with the actual label, counterfactual rationale generators, which\ngenerate rationales that counter the actual label, and discriminators, which discriminate between\nfactual and counterfactual rationales. Both factual and counterfactual rationale generators try to\ncompetitively \u201cconvince\u201d the discriminator that they are factual, resulting in an adversarial game\nbetween the counterfactual generators and the other two types of players.\nWe will show in a simpli\ufb01ed scenario how CAR game drives towards meaningful class-wise rational-\nization, under an information-theoretic metric, which is a class-wise generalization of the maximum\nmutual information criterion. Moreover, empirical evaluation on both single- and multi-aspect senti-\nment classi\ufb01cation show that CAR can successfully \ufb01nd class-wise rationales that align well with\nhuman understanding. The data and code will become publicly available.\n\n2 Related Work\n\nThere are two lines of research on generating interpretable features of neural network. The \ufb01rst is\nto directly incorporate the interpretations into the models, a.k.a self-explaining models [3, 4, 5, 15].\nThe other line is to generate interpretations in a post-hoc manner. There are several ways to perform\npost-hoc interpretations. The \ufb01rst class of method is to explicitly introduce a generator that learns\nto select important subsets of inputs as explanations [12, 19, 21, 30, 31], which often comes with\nsome information-theoretic properties. The second class is to evaluate the importance of each input\nfeature via backpropagation of the prediction. Many of these methods utilize gradient information [6,\n20, 25, 26, 27, 28], while techniques like local perturbations [11, 13, 16, 22] and Parzen window [7]\nhave also been used to loose the requirement of differentiability. Finally, the third class is locally\n\ufb01tting a deep network with interpretable models, such as linear models [2, 24]. There are also some\nrecent works trying to improve the \ufb01delity and/or stability of post hoc explanations by including the\nexplanation mechanism in the training procedure [17, 18].\nAlthough none of the aforementioned approaches can perform class-wise rationalization, gradient-\nbased methods can be intuitively adapted for this purpose, which produces explanations toward a\ncertain class by probing the importance with respect to the corresponding class logit. However, as\nnoted in [24], when the input feature is far away from the corresponding class, the local gradient or\nperturbation probe can be very inaccurate. Evaluation of such methods will be provided in section 5.\n\n3 Class-wise Rationalization\n\nIn this section, we will introduce our adversarial approach to class-wise rationalization. For notations,\nupper-cased letters, e.g. X or X, denote random variables or random vectors respectively; lower-cased\nletters, e.g. x or x, denote deterministic scalars or vectors respectively; script letters, e.g. X , denote\nsets. pX|Y (x|y) denotes the probability of X = x conditional on Y = y. E[X] denotes expectation.\n\n3.1 Problem Formulation\n\nConsider a text classi\ufb01cation problem, where X is a random vector representing a string of text, and\nY \u2208 Y represents the class that X is in. The class-wise rationalization problem can be formulated as\nfollows. For any input X, our goal is to derive a class-wise rationale Z(t) for any t \u2208 Y such that\nZ(t) provides evidence supporting class t. Each rationale can be understood as a masked version\nX, i.e. X with a subset of its words masked away by a special value (e.g. 0). Note that class-wise\nrationales are de\ufb01ned for every class t \u2208 Y. For t = Y (the correct class) the corresponding rationale\nis called factual; for t (cid:54)= Y we call them counterfactual rationales. For simplicity, we will focus on\ntwo-class classi\ufb01cation problems (Y = {0, 1}) for the remainder of this section. Generalization to\nmultiple classes will be discussed in appendix A.4.\nAs a clari\ufb01cation, notice that during inference, the class t that is provided to the system does not\nneed to be the ground truth. No matter what t is provided, factual or counterfactual, the algorithm is\nsupposed to try its best to \ufb01nd evidence in support of t. Therefore, the inference does not need to\n\n2\n\n\f(a)\n\n(b)\n\nFigure 1: CAR training and inference procedures of the class-0 case. (a) The training procedure. (b) During\ninference, there is no ground truth label. In this case, we will always trigger the factual generators.\n\naccess the ground truth label. However, the training of the algorithm requires the ground truth label\nY , because it needs to learn the phrases and sentences that are informative of each class.\n\n3.2 The CAR Framework\n\nCAR uncovers class-wise rationales using adversarial learning, inspired by outlining pros and cons for\nt (X), t \u2208 {0, 1}, which generate\ndecisions. Speci\ufb01cally, there are two factual rationale generators, gf\nrationales that justify class t when the actual label agrees with t, and two counterfactual rationale\nt (X), t \u2208 {0, 1}, which generate rationales for the label other than the ground truth.\ngenerators, gc\nFinally, we introduce two discriminators dt(Z), t \u2208 {0, 1}, which aim to discriminate between factual\nand counterfactual rationales, i.e., between gf\nt (X). We thus have six players, divided into\ntwo groups. The \ufb01rst group pertains to t = 0 and involves gf\n0(X) and d0(Z) as players. Both\ngroups play a similar adversarial game, so we focus the discussion on the \ufb01rst group.\nDiscriminator: In our adversarial game, d0(\u00b7) takes a rationale Z generated by either gf\nas input, and outputs the probability that Z is generated by the factual generator gf\ntarget for d0(\u00b7) is similar to the generative adversarial network (GAN) [14]:\n0 (X))|Y = 0] \u2212 pY (1)E[log(1 \u2212 d(gc\n\n0(\u00b7)\n0 (\u00b7). The training\n\n\u2212pY (0)E[log d(gf\n\n0(X)))|Y = 1].\n\nd0(\u00b7) = argmin\n\nt (X) and gc\n\n0 (\u00b7) or gc\n\n0 (X), gc\n\n(1)\n\nd(\u00b7)\n\nGenerators: The factual generator gf\nThe counterfactual generator gc\nconvince the discriminator that they are factual generators for Y = 0.\n0(\u00b7) = argmax\n\nE[h0(d0(g(X)))|Y = 0],\n\n0 (\u00b7) = argmax\ngf\n\nand gc\n\n0 (\u00b7) is trained to generate rationales from text labeled Y = 0.\n0(\u00b7), in contrast, learns from text labeled Y = 1. Both generators try to\n\nE[h1(d0(g(X)))|Y = 1],\n\n(2)\n\ng(\u00b7)\n\ng(\u00b7)\n\ns.t. gf\n\n0 (X)) and gc\n\n0(X) satisfy some sparsity and continuity constraints.\n\nThe constraints stipulate that the words selected as rationales should be a relatively small subset of\nthe entire text (sparse) and they should constitute consecutive segments (continuous). We will keep\nthe constraints abstract for generality for now. The actual form of the constraints will be speci\ufb01ed\nin section 4. h0(\u00b7) and h1(\u00b7) are both monotonically-increasing functions that satisfy the following\nproperties:\n\n(cid:18) x\n\n(cid:19)\n\nxh0\n\nx + a\n\n(cid:18) a\n\n(cid:19)\n\nx + a\n\nis convex in x,\n\nand xh1\n\nis concave in x,\n\n\u2200x, a \u2208 [0, 1].\n\n(3)\n\nOne valid choice is h0(x) = log(x) and h1(x) = \u2212 log(1 \u2212 x), which reduces the problem to the more\ncanonical GAN-style problem. In practice, we \ufb01nd that other functional forms have more stable\ntraining behavior. As shown later, this generalization is closely related to f-divergence.\n0(\u00b7) plays an\nFigure 1(a) summarizes the training procedure of these three players. As can be seen, gc\n0 (\u00b7), because it tries to trick d0(\u00b7) into misclassifying its output\nadversarial game with both d0(\u00b7) and gf\n1 (\u00b7),\nas factual, whereas gf\n1(\u00b7) and d1(\u00b7), play a similar game. The only difference is that now the factual generator operates on\ngc\ntext with label Y = 1, and the counterfactual generator on text with label Y = 0.\n\n0 (\u00b7) helps d0(\u00b7) make the correct decision. The other group of players, gf\n\n3\n\n\ud835\udc99~\ud835\udc5d\ud835\udc99\ud835\udc66=0)\ud835\udc99\u2032~\ud835\udc5d\ud835\udc99\ud835\udc66=1)\ud835\udc88+,(\u22c5)\ud835\udc51+(\u22c5,\u22c5)factual / counter\ud835\udc88+1(\u22c5)\ud835\udc9b+,\ud835\udc9b+1\ud835\udc9b\"(\u22c5)\ud835\udc99\ud835\udc88\"((\u22c5)\ud835\udc9b)(\u22c5)\ud835\udc99\ud835\udc88)((\u22c5)\f3.3 How Does It Work?\n\n0 and Z c\n\n0 are also multivariate binary vectors. Z f\n\nConsider a simple bag-of-word scenario, where the input text is regarded as a collection of words\ndrawn from a vocabulary of size N. In this case, X can be formulated as an N-dimensional binary\nvector. Xi = 1, if the i-th word is present, and Xi = 0 otherwise. pX|Y (x|y) represents the probability\ndistribution of X in natural text conditional on different classes Y = y.\nThe rationales Z f\n0,i = 1 if the i-th word is selected\n0 |Y (z|0) denotes the induced distribution\nas part of the factual rationale, and Z f\nof the factual rationales, which is only well-de\ufb01ned in the factual case (Y = 0). This distribution is\n0 (\u00b7) generates the rationales across examples. In the optimization problem, we\ndetermined by how gf\nwill primarily make use of the induced distribution, and similarly for the counterfactual rationales.\nTo simplify our discussion, we assume that the dimensions of X are independent conditional on Y .\nFurthermore, we assume that the rationale selection scheme selects each word independently, so\nthe induced distributions over Z f\n0 are also independent across dimensions, conditional on Y .\nFormally, \u2200x, z \u2208 {0, 1}N ,\u2200y \u2208 {0, 1},\n\n0,i = 0 otherwise. pZf\n\n0 and Z c\n\n0|Y (z|y) =\n\n0 |Y (z|y) =\n\npXi|Y (xi|y), pZf\n\n0,i|Y (zi|y), pZc\npZf\n\npX|Y (x|y) =\n(4)\nFigure 2(left) plots pXi|Y (1|0) and pXi|Y (1|1) as functions of i (the horizontal axis corresponds to\nsorted word identities). These two curves represent the occurrence of each word in the two classes.\nIn the \ufb01gure, the words to the left satisfy pXi|Y (1|0) > pXi|Y (1|1), i.e. they occur more often in\nclass 0 than in class 1. These words are most indicative of class 0, which we will call class-0 words.\nSimilarly, the words to the right are called class-1 words.\n0,i|Y (1|1) curves (solid, shaded curves),\nFigure 2(left) also plots an example of pZf\nwhich represents the occurrence of each word in the factual and counterfactual rationales respectively.\nNote that these two curves must satisfy the following constraints:\n\n0,i|Y (1|0) and pZc\n\n0,i|Y (zi|y).\n\npZc\n\nN(cid:89)\n\ni=1\n\nN(cid:89)\n\ni=1\n\nN(cid:89)\n\ni=1\n\n0,i|Y (1|0) \u2264 pXi|Y (1|0),\npZf\n\nand pZc\n\n0,i|Y (1|1) \u2264 pXi|Y (1|1).\n\n(5)\n\n0,i|Y (1|0) and pZc\n\nThis is because a word can be chosen as a rationale only if it appears in a text, and this strict\nrelation translates into an inequality constraint in terms of the induced distributions. As shown in\n0,i|Y (1|1) curves are always below the pXi|Y (1|0) and pXi|Y (1|1)\n\ufb01gure 2(left), the pZf\ncurves respectively. For the remainder of this section, we will refer to pXi|Y (1|0) as the factual\nupper-bound, and pXi|Y (1|1) as the counterfactual upper-bound. What we intend to show is that the\noptimal strategy for both rationale generators in this adversarial game is to choose the class-0 words.\nThe optimal strategy for the counterfactual generator: We will \ufb01rst \ufb01nd out what is the optimal\n0,i|Y (1|1) curve, given an\nstrategy for the counterfactual generator, or, equivalently, the optimal pZc\n0,i|Y (1|1) curve. The goal of the counterfactual generator is to fool the discriminator.\narbitrary pZf\nTherefore, its optimal strategy is to match the the counterfactual rationale distribution with the factual\n0,i|Y (1|1) (blue) curve tries to overlay\nrationale distribution. As shown in \ufb01gure 2(middle), the pZc\n0,i|Y (1|1) (green) curve, within the limits of the counterfactual upper bound constraint.\nwith the pZf\nThe optimal strategy for the factual generator: The goal of the factual generator is to help the\ndiscriminator. Therefore, its optimal strategy, given the optimized counterfactual generator, is to\n\u201csteer\u201d the factual rationale distribution away from the counterfactual rationale distribution. Recall\nthat the counterfactual rationale distribution always tries to match the factual rationale distribution,\nunless its upper-bound is binding. The factual generator will therefore choose the words whose factual\nupper-bound is much higher than the counterfactual upper-bound. These words are, by de\ufb01nition,\nmost indicative of class 0. The counterfactual generator will also favor the same set of words, due to\nits incentive to match the distributions. Figure 2(right) illustrates the optimal strategy for the factual\nrationale under sparsity constraint\n\nN(cid:88)\n\nN(cid:88)\n\nE[Z f\n\n0,i] =\n\n0,i|Y (1|1) \u2264 \u03b1.\npZf\n\n(6)\n\ni=1\n\ni=1\n\nThe left-hand side in equation (6) represents the expected factual rationale length (in number of\n0,i|Y (1|1) curve (the green shaded areas in \ufb01gure 2).\nwords). It also represents the area under the pZf\n\n4\n\n\fFigure 2: An illustration of how CAR works in the bag-of-word scenario with independence assumption\n(equation (4)). Left: example probability of occurrence of each word in the rationales from each class (solid\nlines), upper bounded by the probability of occurrence of each word in the natural text from each class (dashed\nlines). Middle: the optimal strategy for the counterfactual rationale is to match the factual rationale distribution,\nunless prohibited by the upper-bound. Right: the optimal strategy for the factual rationale is to steer away from\nthe counterfactual rationale distribution, leveraging the upper-bound difference.\n\n3.4\n\nInformation-theoretic Analysis\n\nNow we are ready to embark on a more formal analysis of the effectiveness of the CAR framework,\nas stated in the following theorem.\nTheorem 1. In the bag-of-word scenario with the independence assumption as in equation (4):\n(1) Given the optimal d0(\u00b7) and an arbitrary gf\nthe counterfactual rationales that follow the following distribution:\n\n0(\u00b7) to equation (2) (left) will generate\n\n0 (\u00b7), the optimal gc\n(cid:110)\n\n0,i|Y (1|1) = min\n\npZc\n\n0,i|Y (1|0), pXi|Y (1|1)\npZf\n\n(cid:111)\n\n.\n\n(7)\n\n(2) Under some additional assumptions (see appendix A.1), given the optimal d0(\u00b7) and the optimal\n0(\u00b7), the optimal gf\n0 (\u00b7) to equation (2) (right) subject to the sparsity constraint as in equation (6) is\ngc\ngiven by Z f\n\n0,i = XI\u2217, where\nEX\u223cpX|Y (\u00b7|0)\n\n(cid:20)\n\n(cid:18) pXI|Y (XI|0)\n\n(cid:19)(cid:21)\n\nI\n\nh\n\nI\u2217\n\n= argmax\n\ns.t.\nwhere XI denotes a subvector of X containing Xi, \u2200i \u2208 I.\nThe proof will be given in the appendix. To better understand equation (8), it is useful to \ufb01rst write\ndown the mutual information between XI and Y , a similar quantity to which has been applied to the\nmaximum mutual information criterion [12, 19].\n\npXI (XI)\n\npXi|Y (1|0) > pXi|Y (1|1),\u2200i \u2208 I,\n\n(8)\n\n,\n\nI(Y ; XI) = EX,Y \u223cpX,Y (\u00b7,\u00b7)\n\nlog\n\n=\n\npY (y)EX\u223cpX|Y (\u00b7|y)\n\nlog\n\n(9)\nAs can be seen, there is a correspondence between equations (8) and (9). First, the log(\u00b7) function\nin equation (9) is generalized a wider selection of functional forms, h(\u00b7). As will be shown in\nthe appendix A.2, equation (8) applies the f-divergence [1], which is a generalization to the KL-\ndivergence as applied in equation (9). Second, notice that equation (9) is decomposed into two\nclass-dependent terms, while equation (8) is for class-0 generators only. It can be easily shown that\nthe class-1 generators come with a similar theoretical guarantee that corresponds to the term with\ny = 1. Therefore, the target function in equation (8) can be considered as the component in the mutual\ninformation that is speci\ufb01cally related to class 0. Hence we call it class-wise mutual information.\n\n(cid:20)\n\n(cid:18) pXI|Y (XI|Y )\n\n(cid:19)(cid:21)\n\npXI (XI)\n\n1(cid:88)\n\ny=0\n\n(cid:20)\n\n(cid:18) pXI|Y (XI|y)\n\n(cid:19)(cid:21)\n\npXI (XI)\n\n.\n\n3.5 Coping with Degeneration\n\nIt has been pointed out in [32] that the existing generator-predictor framework in [12] and [19]\ncan suffer from the problem of degeneration. Since the generator-predictor framework aims to\nmaximize the predictive accuracy of the predictor, the generator and predictor can collude by\nselecting uninformative symbols to encode the class information, instead of selecting words and\nphrases that truly explain the class. For example, consider the following punctuation communication\nscheme: when Y = 0, the rationale would select only one comma \u201c,\u201d; when Y = 1, the rationale\nwould select only one period \u201c.\u201d. This rationalization scheme guarantees a high predictive accuracy.\nHowever, this is apparently not what we expect. Such cases are called degeneration.\n\n5\n\n!Prob. Class 0 wordsClass 1 wordsProb. !Prob. !!\"|$1|0!\"|$1|1!'()|$1|0!'(*|$1|1\fFrom section 3.3, we can conclude that CAR will not suffer from degeneration. This is because if\nthe factual rationale generators attempt to select uninformative words or symbols like punctuation\n(i.e. words in the middle of the x-axis in \ufb01gure 2), then the factual rationale distribution can be easily\nmatched by the counterfactual rationale distribution. Therefore, this strategy is not optimal for the\nfactual generators, whose goal is to avoid being matched by the counterfactual generators.\n\n4 Architecture Design and Implementation\n\n1(\u00b7) and gf\n\n0(\u00b7) and gf\n\n0 (\u00b7) = g0(\u00b7, 0), and gc\n\nArchitecture with parameter sharing: In our actual implementation, we impose parameter sharing\namong the players. This is motivated by our observation in sections 3.3 and 3.4 that both the\nfactual and counterfactual generators adopt the same rationalization strategy upon reaching the\nequilibrium. Therefore, instead of having two separate networks for the two generators, we introduce\none uni\ufb01ed generator network for each class, a class-0 generator and a class-1 generator, with the\nground truth label Y as an additional input to identify between factual and counterfactual modes.\n0 (\u00b7) now share the same parameters in a single generator network g0(\u00b7, Y ),\nSpeci\ufb01cally, gc\n0(\u00b7) = g0(\u00b7, 1). Please note that after the parameter sharing, g0(\u00b7, 0) and\nwhere gf\ng0(\u00b7, 1) are still considered as two distinct players, in the sense that they are still trained to optimize\ndifferent target functions (equation (2)), and they still play the same adversarial game with each other.\n1 (\u00b7) share the same parameters in a single generator network g1(\u00b7, Y ). We also\nSimilarly, gc\nimpose parameter sharing between the two discriminators, d0(\u00b7) and d1(\u00b7), by introducing a uni\ufb01ed\ndiscriminator, d(\u00b7, t), with an additional input t to identify between the class-0 and class-1 cases. The\ntrainable parameters are signi\ufb01cantly reduced with parameter sharing.\nBoth the generators and the discriminators consist of a word embedding layer, a bi-direction LSTM\nlayer followed by a linear projection layer. The generators generate the rationales by the independent\nselection process as proposed in [19]. At each word position k, the convolutional layer outputs a\nquantized binary mask Sk, which equals to 1 if the k-th word is selected and 0 otherwise. The binary\nmasks are multiplied with the corresponding words to produce the rationales. For the discriminators,\nthe outputs of all the times are max-pooled to produce the factual/counterfactual decision.\nFor parameter sharing, we append the input class as a one-hot vector to each word embedding vector\nin both the generators and the discriminator. For the generators, the groundtruth class label Y of each\ninstance is appended; while for the discriminator, the class of generator t used for generating the\ninput rationale is appended.\nTraining: The training objectives are essentially equations (1) and (2). The only difference is that\nwe instantiate the constraints in equation (2) transform it into a multiplier form. Speci\ufb01cally, the\nmultiplier terms (or the regularization terms) are\n\n|Sk \u2212 Sk\u22121|\n\n,\n\n(10)\n\n(cid:12)(cid:12)(cid:12)(cid:12) 1\n\nK\n\n\u03bb1\n\nE[(cid:107)S(cid:107)1] \u2212 \u03b1\n\n(cid:12)(cid:12)(cid:12)(cid:12) + \u03bb2E\n\n(cid:20) K(cid:88)\n\nt=2\n\n(cid:21)\n\nwhere K denotes the number of words in the input text. The \ufb01rst term constrains on the sparsity\nof the rationale. It encourages that the percentage of the words being selected as rationales is\nclose to a preset level \u03b1. The second term constrains on the continuity of the rationale. \u03bb1, \u03bb2 and\n\u03b1 are hyperparameters. The constraint is slightly different from the one in [19] in order have a\nmore precise control of the sparsity level. The h0(\u00b7) and h1(\u00b7) functions in equation (2) are set to\nh0(x) = h1(x) = x, which empirically shows good convergence performance, and which can be\nshown to satisfy equation (3). To resolve the non-differentiable quantization operation that produces\nSt, we apply the straight-through gradient computation technique [9]. The training scheme involves\nthe following alternate stochastic gradient descent. First, the class-0 generator and the discriminator\nare updated jointly by passing one batch of data into the class-0 generator, and the resulting rationales,\nwhich contain both factual and counterfatual rationales depending on the actual class, are fed into the\ndiscriminator with t = 0. Then, the class-1 generator and the discriminator are updated jointly in a\nsimilar fashion with t = 1.\nInference: During the inference, the ground truth label is unavailable for fair comparisons with the\nbaselines, therefore we have no oracle knowledge of which class is factual and which is counterfactual.\nIn this case, we always trigger the factual generators, no matter what the ground truth is, as shown\nin \ufb01gure 1(b). This is again justi\ufb01ed by our observation in sections 3.3 and 3.4 that both the factual\nand counterfactual modes adopt the same rationalization strategy upon reaching the equilibrium. The\n\n6\n\n\fonly reason why we favor the factual mode to the counterfactual mode is that the former has more\nexposure to the words it is supposed to select during training.\n\n5 Experiments\n\n5.1 Datasets\n\nTo evaluate both factual and counterfactual rationale generation, we consider the following three\nbinary classi\ufb01cation datasets. The \ufb01rst one is the single-aspect Amazon reviews [10] (book and\nelectronic domains), where the input texts often contain evidence for both positive and negative\nsentiments. We use prede\ufb01ned rules to parse reviews containing comments on both the pros and cons\nof a product, which is further used for automatic evaluations. We also evaluate algorithms on the multi-\naspect beer [23] and hotel reviews [29] that are commonly used in the \ufb01eld of rationalization [8, 19].\nThe labels of the beer review dataset are binarized, resulting in a harder rationalization task than in\n[19]. The multi-aspect review is considered as a more challenging task, where each review contains\ncomments on different aspects. However, unlike the Amazon dataset, both beer and hotel datasets\nonly contain factual annotations. The construction of evaluation tasks is detailed in appendix B.1.\n\n5.2 Baselines\n\nRNP: A generator-predictor framework proposed by Lei et al. [19] for rationalizing neural prediction\n(RNP). The generator selects text spans as rationales which are then fed to the predictor for label\nclassi\ufb01cation. The selection maximizes the predictive accuracy of the target output and is constrained\nto be sparse and continuous. RNP is only able to generate factual rationales.\nPOST-EXP: The post-explanation method generates rationales of both positive and negative classes\nbased on a pre-trained predictor. Given the predictor trained on full-text inputs, we train two separate\ngenerators g0(X) and g1(X) on the data to be explained. g0(X) always generate rationales for the\nnegative class and g1(X) always generate rationales for the positive class. The two generators are\ntrained to maximize the respective logits of the \ufb01xed predictor subject to sparsity and continuity\nregularizations, which is closely related to gradient-based explanations [20].\nTo seek fair comparisons, the predictors of both RNP and POST-EXP and the discriminator of CAR\nare of the same architecture; the rationale generators in all three methods are of the same architecture.\nThe hidden state size of all LSTMs is set to 100. In addition, the sparsity and continuity constraints\nare also in the same form as our method. It is important pointing out that CAR does not use any\nground truth label for generating rationales, which follows the procedures discussed in section 4.\n\n5.3 Experiment Settings\n\nObjective evaluation: We compare the generated rationales with the human annotations and report\nthe precision, recall and F1 score. To be consistent with previous studies [19], we evaluate different\nalgorithms conditioned on a similar actual sparsity level in factual rationales. Speci\ufb01cally, the target\nfactual sparsity level is set to around (\u00b12%) 20% for the Amazon dataset and 10% for both beer and\nhotel review. The reported performances are based on the best performance of a set of hyperparameter\nvalues. For details of the setting, please refer to appendix B.2.\nSubjective evaluation: We also conduct subjective evaluations via Amazon Mechanical Turk. Specif-\nically, we reserve 100 randomly balanced examples from each dev set for the subjective evaluations.\nFor the single-aspect dataset, the subject is presented with either the factual rationale or the counter-\nfactual rationale of a text generated by one of the three methods (unselected words blocked). For the\nfactual rationales, a success is credited when the subject correctly guess the ground-truth sentiment;\nfor the counterfactual rationales, a success is credited when the subject is convinced to choose the\nopposite sentiment to the ground-truth. For the multi-aspect datasets, we introduce a much harder\ntest. In addition to guessing the sentiment, the subject is also asked to guess what aspect the rationale\nis about. A success is credited only when both the intended sentiment and the correct aspect are\nchosen. Under this criterion, a generator that picks the sentiment words only will score poorly. We\nthen compute the success rate as the performance metric. The test cases are randomly shuf\ufb02ed.\nThe subjects have to meet certain English pro\ufb01ciency and are reminded that some of the generated\n\n7\n\n\fTable 1: Objective performances of selected rationales of the Amazon review dataset. The numbers in each\ncolumn represent the sparsity level, precision, recall, and F1 score, respectively. Each domain is trained\nindependently. All results are calculated in a \u201cmicro\u201d perspective.\n\nAmazon\n\nRNP [19]\nPOST-EXP\nCAR\n\nBook\n\nFactual\n\nCounterfactual\n\nFactual\n\nElectronic\n\nCounterfactual\n\n18.6/55.1/20.1/29.5\n20.2/64.5/28.8/39.8 27.9/70.2/35.8/47.4\n20.9/68.7/31.9/43.6 15.2/72.2/20.2/31.5\n\n-\n\n20.7/49.7/22.8/31.3\n18.6/64.1/27.8/38.8 15.3/72.6/19.5/30.7\n21.2/70.0/34.7/46.4\n10.2/76.4/13.6/23.1\n\n-\n\nTable 2: Objective performances of selected factual rationales for both (a) beer and (b) hotel review datasets.\nEach aspect is trained independently. S, P, R, and F1 indicate the sparsity level, precision, recall, and F1 score.\n\n(a)\n\n(b)\n\nBeer\n\nRNP [19]\nPOST-EXP\nCAR\n\nHotel\n\nRNP [19]\nPOST-EXP\nCAR\n\nS\n11.9\n11.9\n11.9\n\nS\n10.9\n8.9\n10.6\n\nAppearance\nR\nP\n46.1\n72.0\n64.2\n41.4\n49.3\n76.2\n\nLocation\nR\nP\n43.3\n55.5\n31.8\n30.4\n46.6\n58.1\n\nF1\n56.2\n50.4\n59.9\n\nF1\n48.6\n31.1\n51.7\n\nS\n10.7\n10.3\n10.3\n\nS\n11.0\n10.0\n11.7\n\nAroma\nP\nR\n48.3\n70.5\n50.0\n33.1\n33.3\n50.3\n\nService\nP\nR\n38.2\n40.0\n28.3\n32.5\n40.7\n41.4\n\nF1\n57.3\n39.8\n40.1\n\nF1\n39.1\n30.3\n41.1\n\nS\n10.0\n10.0\n10.2\n\nS\n10.6\n9.2\n9.9\n\nPalate\n\nP\n53.1\n33.0\n56.6\n\nR\n42.8\n26.5\n46.2\n\nCleanliness\nR\nP\n36.0\n30.5\n23.7\n23.0\n32.3\n35.7\n\nF1\n47.5\n29.4\n50.9\n\nF1\n33.0\n23.3\n33.9\n\nrationales are intended to trick them via word selections and masking (e.g. masking the negation\nwords). Appendix B.2 contains a screenshot and the details of the online evaluation setups.\n\n5.4 Results\n\nTable 1 shows the objection evaluation results for both factual and counterfactual rationales on\nAmazon reviews. Constrained to highlighting 20% of the inputs, CAR consistently surpasses the other\ntwo baselines in the factual case for both domains. Compared to the POST-EXP, our method generates\nthe counterfactual rationales with higher precision. However, since the sparsity constraint regularizes\nboth factual and counterfactual generations and the model selection is conducted on factual sparsity\nonly, we cannot control counterfactual sparsity among different algorithms. POST-EXP tends to\nhighlight much more text, resulting in higher recall and F1 score. However, as will be seen later, the\nhuman evaluators still favor the counterfactual rationales generated by our algorithm.\nSince the beer and hotel datasets contain factual annotations only, we report objective evaluation\nresults for the factual rationales in table 2. CAR achieves the best performances in \ufb01ve out of the six\ncases in the multi-aspect setting. Speci\ufb01cally, for the hotel review, CAR achieves the best performance\nalmost in all three aspects. Similarly, CAR delivers the best performance for the appearance and\npalate aspects of the beer review dataset, but fails on the aroma aspect. One possible reason for the\nfailure is that compared to the other aspects, the aroma reviews often have annotated ground truth\ncontaining mixed sentiments. Therefore, CAR has low recalls of these annotated ground truth even\nwhen it successfully selects all the correct class-wise rationales. Also to ful\ufb01ll the sparsity constraint,\nsometimes CAR has to select irrelevant aspect words with the desired sentiment, which decreases the\nprecision. Illustrative examples of the described case can be found in appendix B.3. Please note that\nthe RNP is not directly comparable to the results in [19], because the labels are binarized under our\nexperiment setting.\nWe visualize the generated rationales on the appearance aspect of beer reviews in \ufb01gure 3. More\nexamples of other datasets can be found in appendix B.3. We observe that the CAR model is able to\nproduce meaningful justi\ufb01cations for both factual and counterfactual labels. The factual generator\npicks \u201ctwo inches of frothy light brown head with excellent retention\u201d while the counterfactual one\npicks \u201creally light body like water\u201d. By reading these selected texts alone, humans will easily predict\na positive sentiment for the \ufb01rst case and be tricked for the counterfactual case.\nAt last, we present the subjective evaluations in \ufb01gure 4. Similar to the observations in the objective\nstudies, CAR achieves the best performances in almost all cases with two exceptions. The \ufb01rst one is\nthe aroma aspect of the beer reviews, of which we have discussed the potential causes already. The\n\n8\n\n\fBeer - Appearance\nLabel - Positive\npoured into pint glass . a : used motor oil color . two inches of frothy light brown head with excellent\nretention and quite a bit of lacing . nice cascade going for a while . s : oatmeal is the biggest component\nof the aroma . not any hops content . a bit fusely and a bit of alcohol . t : tastes like slightly sour nothing .\ni do n\u2019t know what the hell made this dark because their is no crystal malt or roasted barley component in\nthe taste . this sucks . m : light body , really light body like water . carbonation is \ufb01ne , but that \u2019s about\nit . d : this is slightly sour water . how does anybody like this ?\n\nFigure 3: Examples of CAR generated rationales on the appearance aspect of the beer reviews. All selected\nwords are bold and underlined. Factual generation uses blue highlight while the counterfactual uses red one.\n\n(a)\n\n(b)\n\nFigure 4: Subjective performances of generated rationales for both (a) factual and (b) counterfactual cases.\nFor the Amazon reviews, subjects are asked to guess the sentiment based on the generated rationales, which\nrandom guess will have 50% accuracy. For multi-aspect beer and hotel reviews, subjects need to guess both the\nsentiment and what aspect the rationale is about, which makes random guess only 16.67%.\n\nsecond one is the counterfactual performance on the cleanliness aspect of the hotel reviews, where\nboth POST-EXP and CAR fail to trick human. One potential reason is that the reviews on cleanliness is\noften very short and the valence is very clear without a mix of sentiments. Thus, it is very challenging\nto generate counterfactual rationales to trick a human. This can be veri\ufb01ed by the analysis in appendix\nB.3. Speci\ufb01cally, according to \ufb01gure 4, 69% of the time CAR is able to trick people to guess the\ncounterfactual sentiment, but often with the rationales extracted from the other aspects.\n\n6 Conclusion\n\nIn this paper, we propose a game theoretic approach to class-wise rationalization, where the method\nis trained to generate supporting evidence for any given label. The framework consists of three\ntypes of players, which competitively select text spans for both factual and counterfactual scenarios.\nWe theoretically demonstrate the proposed game theoretic framework drives the solution towards\nmeaningful rationalizations in a simpli\ufb01ed case. Extensive objective and subjective evaluations\non both single- and multi-aspect sentiment classi\ufb01cation datasets demonstrate that CAR performs\nfavorably against existing algorithms in terms of both factual and counterfactual rationale generations.\n\nAcknowledgment\n\nWe would like to thank Yujia Bao, Yujie Qian, and Jiang Guo from the MIT NLP group for their\ninsightful discussions. We also want to thank Prof. Regina Barzilay for her support and help.\n\n9\n\nBookElectronicAmazon Domains0.650.700.750.800.850.900.95Joint accuarcy0.810.790.840.800.910.88AppearanceAromaPalateBeer aspects0.00.20.40.60.80.540.570.500.460.440.350.590.420.68Factual performanceLocationServiceCleanlinessHotel aspects0.20.40.60.80.660.710.400.550.760.240.740.780.52RNPPost-expOursBookElectronicAmazon Domains0.500.550.600.650.700.75Joint accuarcy0.590.570.680.63AppearanceAromaPalateBeer aspects0.00.20.40.60.430.390.490.580.170.51Counterfactual performanceLocationServiceCleanlinessHotel aspects0.00.10.20.30.40.340.350.060.410.360.08\fReferences\n[1] Syed Mumtaz Ali and Samuel D Silvey. A general class of coef\ufb01cients of divergence of one distribution\nfrom another. Journal of the Royal Statistical Society: Series B (Methodological), 28(1):131\u2013142, 1966.\n\n[2] David Alvarez-Melis and Tommi Jaakkola. A causal framework for explaining the predictions of black-box\nsequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural\nLanguage Processing, pages 412\u2013421, 2017.\n\n[3] David Alvarez-Melis and Tommi S Jaakkola. Towards robust interpretability with self-explaining neural\n\nnetworks. arXiv preprint arXiv:1806.07538, 2018.\n\n[4] Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Learning to compose neural networks\n\nfor question answering. arXiv preprint arXiv:1601.01705, 2016.\n\n[5] Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In Proceedings\n\nof the IEEE Conference on Computer Vision and Pattern Recognition, pages 39\u201348, 2016.\n\n[6] Sebastian Bach, Alexander Binder, Gr\u00e9goire Montavon, Frederick Klauschen, Klaus-Robert M\u00fcller, and\nWojciech Samek. On pixel-wise explanations for non-linear classi\ufb01er decisions by layer-wise relevance\npropagation. PloS one, 10(7):e0130140, 2015.\n\n[7] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert\nM\u00c3\u017eller. How to explain individual classi\ufb01cation decisions. Journal of Machine Learning Research,\n11(Jun):1803\u20131831, 2010.\n\n[8] Yujia Bao, Shiyu Chang, Mo Yu, and Regina Barzilay. Deriving machine attention from human rationales.\n\narXiv preprint arXiv:1808.09367, 2018.\n\n[9] Yoshua Bengio, Nicholas L\u00e9onard, and Aaron Courville. Estimating or propagating gradients through\n\nstochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.\n\n[10] John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, bollywood, boom-boxes and blenders:\nDomain adaptation for sentiment classi\ufb01cation. In Proceedings of the 45th annual meeting of the association\nof computational linguistics, pages 440\u2013447, 2007.\n\n[11] Jianbo Chen, Le Song, Martin J Wainwright, and Michael I Jordan. L-shapley and c-shapley: Ef\ufb01cient\n\nmodel interpretation for structured data. arXiv preprint arXiv:1808.02610, 2018.\n\n[12] Jianbo Chen, Le Song, Martin J Wainwright, and Michael I Jordan. Learning to explain: An information-\n\ntheoretic perspective on model interpretation. arXiv preprint arXiv:1802.07814, 2018.\n\n[13] Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic transparency via quantitative input in\ufb02uence:\nTheory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP),\npages 598\u2013617. IEEE, 2016.\n\n[14] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron\nCourville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing\nsystems, pages 2672\u20132680, 2014.\n\n[15] Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C Lawrence Zitnick,\nand Ross Girshick. Inferring and executing programs for visual reasoning. In Proceedings of the IEEE\nInternational Conference on Computer Vision, pages 2989\u20132998, 2017.\n\n[16] Igor Kononenko et al. An ef\ufb01cient explanation of individual classi\ufb01cations using game theory. Journal of\n\nMachine Learning Research, 11(Jan):1\u201318, 2010.\n\n[17] Guang-He Lee, David Alvarez-Melis, and Tommi S Jaakkola. Towards robust, locally linear deep networks.\n\narXiv preprint arXiv:1907.03207, 2019.\n\n[18] Guang-He Lee, Wengong Jin, David Alvarez-Melis, and Tommi S Jaakkola. Functional transparency for\n\nstructured data: a game-theoretic approach. arXiv preprint arXiv:1902.09737, 2019.\n\n[19] Tao Lei, Regina Barzilay, and Tommi Jaakkola. Rationalizing neural predictions. arXiv preprint\n\narXiv:1606.04155, 2016.\n\n[20] Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. Visualizing and understanding neural models\nin NLP. In Proceedings of the 2016 Conference of the North American Chapter of the Association for\nComputational Linguistics: Human Language Technologies, pages 681\u2013691, 2016.\n\n10\n\n\f[21] Jiwei Li, Will Monroe, and Dan Jurafsky. Understanding neural networks through representation erasure.\n\narXiv preprint arXiv:1612.08220, 2016.\n\n[22] Scott M Lundberg and Su-In Lee. A uni\ufb01ed approach to interpreting model predictions. In Advances in\n\nNeural Information Processing Systems, pages 4765\u20134774, 2017.\n\n[23] Julian McAuley, Jure Leskovec, and Dan Jurafsky. Learning attitudes and attributes from multi-aspect\n\nreviews. In 2012 IEEE 12th International Conference on Data Mining, pages 1020\u20131025. IEEE, 2012.\n\n[24] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should I trust you?: Explaining the\npredictions of any classi\ufb01er. In Proceedings of the 22nd ACM SIGKDD international conference on\nknowledge discovery and data mining, pages 1135\u20131144. ACM, 2016.\n\n[25] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propa-\ngating activation differences. In Proceedings of the 34th International Conference on Machine Learning-\nVolume 70, pages 3145\u20133153. JMLR. org, 2017.\n\n[26] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising\n\nimage classi\ufb01cation models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.\n\n[27] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity:\n\nThe all convolutional net. arXiv preprint arXiv:1412.6806, 2014.\n\n[28] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In Proceedings\nof the 34th International Conference on Machine Learning-Volume 70, pages 3319\u20133328. JMLR. org,\n2017.\n\n[29] Hongning Wang, Yue Lu, and Chengxiang Zhai. Latent aspect rating analysis on review text data: a rating\nregression approach. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge\ndiscovery and data mining, pages 783\u2013792. ACm, 2010.\n\n[30] Adam Yala, Constance Lehman, Tal Schuster, Tally Portnoi, and Regina Barzilay. A deep learning\nmammography-based model for improved breast cancer risk prediction. Radiology, page 182716, 2019.\n\n[31] Mo Yu, Shiyu Chang, and Tommi S Jaakkola. Learning corresponded rationales for text matching. 2018.\n\n[32] Mo Yu, Shiyu Chang, Yang Zhang, and Tommi S Jaakkola. Rethinking cooperative rationalization:\nIntrospective extraction and complement control. In Empirical Methods in Natural Language Processing,\n2019.\n\n11\n\n\f", "award": [], "sourceid": 5315, "authors": [{"given_name": "Shiyu", "family_name": "Chang", "institution": "IBM T.J. Watson Research Center"}, {"given_name": "Yang", "family_name": "Zhang", "institution": "MIT-IBM Watson AI Lab"}, {"given_name": "Mo", "family_name": "Yu", "institution": "IBM Research"}, {"given_name": "Tommi", "family_name": "Jaakkola", "institution": "MIT"}]}