{"title": "Combining Neural and Symbolic Learning to Revise Probabilistic Rule Bases", "book": "Advances in Neural Information Processing Systems", "page_first": 107, "page_last": 114, "abstract": null, "full_text": "Combining Neural and Symbolic \n\nLearning to Revise Probabilistic Rule \n\nBases \n\nJ. Jeffrey Mahoney and Raymond J. Mooney \n\nmahoney@cs.utexas.edu, mooney@cs.utexas.edu \n\nDept. of Computer Sciences \n\nUniversity of Texas \nAustin, TX 78712 \n\nAbstract \n\na system for revising probabilis(cid:173)\n\nThis paper describes RAPTURE -\ntic knowledge bases that combines neural and symbolic learning \nmethods. RAPTURE uses a modified version of backpropagation \nto refine the certainty factors of a MYCIN-style rule base and uses \nID3's information gain heuristic to add new rules. Results on re(cid:173)\nfining two actual expert knowledge bases demonstrate that this \ncombined approach performs better than previous methods. \n\n1 \n\nIntroduction \n\nIn complex domains, learning needs to be biased with prior knowledge in order to \nproduce satisfactory results from limited training data. Recently, both connectionist \nand symbolic methods have been developed for biasing learning with prior knowl(cid:173)\nedge lFu, 1989; Towell et a/., 1990; Ourston and Mooney, 1990]. Most ofthese meth(cid:173)\nods revise an imperfect knowledge base (usually obtained from a domain expert) to \nfit a set of empirical data. Some of these methods have been successfully applied to \nreal-world tasks, such as recognizing promoter sequences in DNA [Towell et ai., 1990; \nOurston and Mooney, 1990]. The results demonstrate that revising an expert-given \nknowledge base produces more accurate results than learning from training data \nalone. \n\nIn \n\nthis paper, we describe \n\nthe RAPTURE system (Revising Approximate \n\n107 \n\n\f108 \n\nMahoney and Mooney \n\nProbabilistic Theories Using Repositories of Examples), which combines connec(cid:173)\ntionist and symbolic methods to revise both the parameters and structure of a \ncertainty-factor rule base. \n\n2 The Rapture Algorithm \n\nThe RAPTURE algorithm breaks down into three main phases. First, an initial \nrule-base (created by a human expert) is converted into a RAPTURE network. The \nresult is then trained using :ertainty-factor backpropagation (CFBP). The theory \nis further revised through network architecture modification. Once the network is \nfully trained, the solution is at hand-there is no need for retranslation. Each of \nthese steps is outlined in full below. \n\n2.1 The Initial Rule-Base \n\nRAPTURE uses propositional certainty factor rules to represent its theories. These \nrules have the form A ~ D, which expresses the idea that belief in proposition A \ngives a 0.8 measure of belief in proposition D [Shafer and Pearl, 1990]. Certainty \nfactors can range in value from -1 to + 1, and indicate a degree of confidence in a \nparticular proposition. Certainty factor rules allow updating of these beliefs based \nupon new observed evidence. \nRules combine evidence via probabilistic sum, which is defined as a EB b - a + b - abo \nIn general, all positive evidence is combined to determine the measure of belief(MB) \nfor a given proposition, and all negative evidence is combined to obtain a measure \nof disbelief (MD). The certainty factor is then calculated using C F = M B + MD. \nRAPTURE uses this formalism to represent its rule base for a variety of reasons. \nFirst, it is perhaps the simplest method that retains the desired evidence-summing \naspect of uncertain reasoning. As each rule fires, additional evidence is contributed \ntowards belief in the rule's consequent. The use of probabilistic sum enables many \nsmall pieces of evidence to add up to significant evidence. This is lacking in for(cid:173)\nmalisms that use only MIN or MAX for combining evidence [Valtorta, 1988]. Sec(cid:173)\nond, probabilistic sum is a simple, differentiable, non-linear function. This is cru(cid:173)\ncial for implementing gradient descent using backpropagation. Finally, and perhaps \nmost significantly, is the widespread use of certainty factors. Numerous knowledge(cid:173)\nbases have been implemented using this formalism, which immediately gives our \napproach a large base of applicability. \n\n2.2 Converting the Rule Base into a Network \n\nOnce the initial theory is obtained, it is converted into a RAPTURE -network. Build(cid:173)\ning the network begins by mapping all identical propositions in the rule-base to the \nsame node in the network. Input features (those only appearing as rule-antecedents) \nbecome input nodes, and output symbols (those only appearing as rule-consequents) \nbecome output nodes. The certainty factors of the rules become the weights on the \nlinks that connect nodes. Networks for classification problems contain one output \nfor each category. When an example is presented, the certainty factor for each of \nthe categories is computed and the example is assigned to the category with the \n\n\fCombining Neural and Symbolic Learning to Revise Probabilistic Rule Bases \n\n109 \n\n1.3 \n\nI \nI \n\n.2t' , , \nI , \n\nFigure 1: A RAPTURE NETWORK \n\nhighest value. Figure 1 illustrates the following set of rules. \n\nABC~D E~D C~G EF~G HI~C \n\nAs shown in the network, conjuncts must first pass through a MIN node before \nany activation reaches the consequent node. Note that each of the conjuncts is \nconnected to the corresponding MIN mode with a solid line. This represents the \nfact that the link is non-adjustable, and simply passes its full activation value onto \nthe MIN node. The standard (certainty-factor) links are drawn as dotted lines \nindicating that their values are adjustable. \nThis construction shows how easily a RAPTURE-network can model a MYCIN rule(cid:173)\nbase. Each representation can be converted into the other, without loss or cor(cid:173)\nruption of information. They are two equivalent representations of the same set of \nrules. \n\n2.3 Certainty Factor Backpropagation \n\nUsing the constructed RAPTURE-network, we desire to maximize its predictive ac(cid:173)\ncuracy over a set of training examples. Cycling through the examples one at a time, \nand slightly adjusting all relevant network weights in a direction that will minimize \nthe output error, results in hill-climbing to a local minimum. This is the idea be(cid:173)\nhind gradient descent [Rumelhart et al., 1986), which RAPTURE accomplishes with \nCertainty Factor Backpropagation (CFBP), using the following equations. \n\nApWji = 7Jopj(1 \u00b1 LWjkOpk) \n\n(1) \n\nk#-i \n\nIf Uj is an output unit \n\n\f110 \n\nMahoney and Mooney \n\nIf Uj is not an output unit \n\nhpj = L: hpk wkj(1 \u00b1 EWjkOpk) \n\nkmin \n\ni;tk \n\n(2) \n\n(3) \n\nThe \"Sigma with circle\" notation is meant to represent probabilistic sum over the \nindex, and the \u00b1 notation is shorthand for two separate cases. If WjiOpi ~ 0, then \nis used, otherwise + is used. The kmin subscript refers to the fact that we do \n-\nnot perform this summation for every unit k (as in standard backpropagation), but \nonly those units that received some contribution from unit j. Since a unit j may \nbe required to pass through a min or max-node before reaching the next layer (k), \nit is possible that its value may not reach k. \n\nRAPTURE deems a classification correct when the output value for the correct cate(cid:173)\ngory is greater than that of any other category. No error propagation takes place in \nthis case (hpj = 0). CFBP terminates when overall error reaches a minimal value. \n\n2.4 Changing the Network Architecture \n\nWhenever training accuracy fails to reach 100% through CFBP, it may be an indi(cid:173)\ncation that the network architecture is inappropriate for the current classification \ntask. To date, RAPTURE has been given two ways of changing network architecture. \nFirst, whenever the weight of a link in the network approaches zero, it is removed \nfrom the network along with all of the nodes and links that become detached due \nto this removal. Further, whenever an intermediate node loses all of its input links \ndue to link deletion, it too is removed from the network, along with its output link. \nThis link/node deletion is performed immediately after CFBP, and before anything \nnew is introduced into the network. \nRAPTURE also has a method for adding new nodes into the network. Specific nodes \nare added in an attempt to maximize the number of training examples that are clas(cid:173)\nsified correctly. The simple solution employed by RAPTURE is to create new input \nnodes that connect directly, either positively or negatively, to one or more output \nnodes. These new nodes are created in a way that will best help the network distin(cid:173)\nguish among training examples that are being misclassified. Specifically, RAPTURE \nattempts to distinguish for each output category, those examples of that category \nthat are being misclassified (Le. being classified into a different output category), \nfrom those examples that do belong in these different output categories. Quinlan's \nID3 information gain metric [Quinlan, 1986] has been adopted by RAPTURE to se(cid:173)\nlect this new node, which becomes positive evidence for the correct category, and \nnegative evidence for mistaken categories. \n\nWith these new nodes in place, we can now return to CFBP, where hopefully more \ntraining examples will be successfully classified. This entire process (CFBP followed \nby deleting links and adding new nodes) repeats until all training examples are \ncorrectly classified. Once this has occurred, the network is considered trained, and \ntesting may begin. \n\n\fCombining Neural and Symbolic Learning to Revise Probabilistic Rule Bases \n\n111 \n\nSoybean Test Ac:eurac:y \n\n..-\n,-' , :-:...-\n\n,.' \n\n,,' \n\nI,' \n\nI\"~ \n\nn.oo-+-+---Lf--,.:---:-f--':........-::-t~=-\"'--+--\n\n70.00--+--+-_,-\"-\u2022 ...,..::. f---~ ''--+-----+----1----\n\ni \n1 \n\nI , \n\n,,' \n\n/ \nI \n(1 ,. ':' \n\n. ' , \n\n. ~ I,' \n\n015.00-++-+---+.--+----+---+---+--\n6O.00'-++-+_..;...