{"title": "Learning Fuzzy Rule-Based Neural Networks for Control", "book": "Advances in Neural Information Processing Systems", "page_first": 350, "page_last": 357, "abstract": null, "full_text": "Learning Fuzzy Rule-Based Neural \n\nNetworks for  Control \n\nCharles M.  Higgins and Rodney M.  Goodman \n\nDepartment of Electrical  Engineering,  116-81 \n\nCalifornia Institute of Technology \n\nPasadena, CA  91125 \n\nAbstract \n\nA  three-step  method for  function  approximation with  a fuzzy  sys(cid:173)\ntem  is  proposed.  First,  the  membership  functions  and  an  initial \nrule  representation  are  learned;  second,  the  rules  are  compressed \nas  much as  possible  using  information theory;  and finally,  a  com(cid:173)\nputational network  is  constructed  to  compute  the  function  value. \nThis system  is  applied  to  two control examples:  learning the truck \nand trailer backer-upper control system, and learning a  cruise  con(cid:173)\ntrol system for  a  radio-controlled model car. \n\n1 \n\nIntroduction \n\nFunction  approximation is  the  problem of estimating a  function  from  a  set  of ex(cid:173)\namples of its  independent  variables  and function  value.  If there  is  prior knowledge \nof the type of function  being learned,  a mathematical model of the function  can be \nconstructed  and the  parameters  perturbed  until  the  best match is  achieved.  How(cid:173)\never,  if there  is  no  prior  knowledge  of the function,  a  model-free system such  as  a \nneural  network  or  a  fuzzy  system  may  be  employed  to  approximate  an  arbitrary \nnonlinear  function.  A  neural  network's  inherent  parallel  computation  is  efficient \nfor  speed;  however,  the  information learned  is  expressed  only in  the  weights  of the \nnetwork.  The advantage of fuzzy  systems over neural networks  is  that the informa(cid:173)\ntion  learned  is  expressed  in  terms  of linguistic  rules.  In  this  paper,  we  propose  a \nmethod for  learning  a  complete fuzzy  system  to  approximate example  data.  The \nmembership functions and a minimal set of rules are constructed automatically from \nthe example data, and in addition the final  system is  expressed  as a  computational \n\n350 \n\n\fLearning Fuzzy  Rule-Based  Neural Networks for  Control \n\n351 \n\nPos \n\n5.0 \n\n0 \n\n-1.0 \n1.0 \nVariable Value \n\nFigure  1:  Membership function example \n\n(neural) network for  efficient  parallel computation of the function  value,  combining \nthe advantages of neural  networks  and fuzzy  systems.  The proposed learning algo(cid:173)\nrithm can be used  to construct a  fuzzy  control system from examples of an existing \ncontrol system's actions. \n\nHereafter,  we  will  refer  to the function  value as  the output variable,  and  the inde(cid:173)\npendent  variables of the function  as  the input variables. \n\n2  Fuzzy Systems \n\nIn  a  fuzzy  system,  a  function  is  expressed  in  terms of membership  functions  and \nrules.  Each  variable has membership functions  which  partition its  range  into over(cid:173)\nlapping classes  (see  figure  1).  Given these  membership functions  for  each  variable, \na  function  may be  expressed  by  making rules  from  the  input space  to  the  output \nspace  and smoothly varying between  them. \n\nIn order to simplify the learning of membership functions,  we  will specify a number \nof their properties  beforehand.  First, we  will use  piecewise linear membership func(cid:173)\ntions.  We will also specify that membership functions  are fully  overlapping; that is, \nat  any given  value of the variable the  total membership sums to one.  Given  these \ntwo  properties  of the  membership functions,  we  need  only specify  the  positions  of \nthe  peaks  of the  membership functions  to completely describe  them. \n\nWe  define  a fuzzy  rule  as  if y  then  X,  where  y (the condition side)  is  a  conjunction \nin  which  each  clause  specifies  an  input  variable  and  one  of the  membership func(cid:173)\ntions  associated  with  it,  and  X  (the  conclusion  side)  specifies  an  output  variable \nmembership function. \n\n3  Learning a  Fuzzy System from  Example Data \n\nThere are three steps in our method for  constructing a fuzzy system:  first,  learn the \nmembership functions and an initial rule representation; second, simplify (compress) \nthe  rules  as  much  as  possible  using  information  theory;  and  finally,  construct  a \ncomputational network  with  the  rules  and  membership  functions  to  calculate  the \nfunction  value given  the independent variables. \n\n\f352 \n\nHiggins and Goodman \n\n3.1  Learning the Membership Functions \n\nBefore  learning,  two  parameters must  be specified.  First,  the  maximum allowable \nRMS  error  of the  approximation  from  the  example  data;  second,  the  maximum \nnumber  of membership  functions  for  each  variable.  The  system  will  not  exceed \nthis  number  of membership  functions,  but  may  use  fewer  if the  error  is  reduced \nsufficiently before  the maximum number is  reached. \n\n3.1.1  Learning by Successive Approximation to the Target  Function \n\nThe following  procedure  is  performed to construct  membership functions  and a set \nof rules  to  approximate the  given  data set.  All  of the  rules  in  this  step  are  eel/(cid:173)\nbased,  that  is,  they  have  a  condition  for  every  input  variable;  there  is  a  rule  for \nevery  combination of input  variables (eeIQ. \n\nWe  begin with  input membership functions  at input extrema.  The closest  example \npoint  to  each  \"corner\"  of the  input space  is  found  and a  membership function  for \nthe  output  is  added  at  its  value  at  the  corner  point.  The  initial  rule  set  contains \na  rule  for  each  corner,  specifying  the  closest  output  membership  function  to  the \nactual value at that corner. \n\nWe now find  the example point with the greatest RMS error from the current model \nand  add  membership functions  in  eaeh  variable  at  that  point.  Next,  we  construct \na  new  set  of rules  to  approximate the  function.  Constructing  rules  simply  means \ndetermining  the  output  membership  function  to  associate  with  each  cell.  While \nconstructing this rule set,  we  also add any output membership functions  which  are \nneeded.  The best  rule for  a given  cell  is found  by finding  the closest  example point \nto  the  rule  (recall  each  rule  specifies  a  point  in  the  input  space).  If the  output \nvalue at this  point is  \"too far\"  from  the  closest  output membership function  value, \nthis  output  value  is  added  as  a  new  output  membership.  After  this  addition  has \nbeen made, if necessary,  the closest output membership function to the value at the \nclosest point is  used as the conclusion of the rule.  At this point, if the error threshold \nhas  been  reached  or  all  membership functions  are  full,  we  exit.  Otherwise,  we  go \nback  to find  the  point with  the greatest error from  the model and iterate  again. \n\n3.2  Simplifying the Rules \n\nIn order to have as simple a fuzzy  system as  possible,  we  would like  to use  the min(cid:173)\nimum possible number of rules.  The initial cell-based rule set can be  \"compressed\" \ninto a minimal set of rules;  we  propose the use of an information-theoretic algorithm \nfor  induction of rules  from  a  discrete  data set  [1]  for  this purpose.  The  key  to the \nuse  of this  method  is  the  interpretation  of each  of the  original  rules  as  a  discrete \nexample.  The rule set  becomes a  discrete  data set which is  input to a rule-learning \nalgorithm.  This algorithm learns the best rules  to describe  the  data set. \n\nThere are  two  components of the rule-learning scheme.  First, we  need  a way  to tell \nwhich of two candidate rules is  the best.  Second,  we  need a way  to search the space \nof all  possible  rules  in  order  to find  the  best  rules  without  simply checking  every \nrule  in  the search space. \n\n\fLearning Fuzzy  Rule-Based  Neural Networks for  Control \n\n353 \n\n3.2.1  Ranking Rules \n\nSmyth  and  Goodman[2]  have  developed  an  information-theoretic measure  of rule \nvalue  with  respect  to  a  given  discrete  data  set.  This  measure  is  known  as  the \nj-measure; defining a  rule as  if y  then X, the j-measure can  be expressed  as follows: \n\n. \np(Xly) \nJ(Xly) = p(Xly) log2(  p(X)  ) + p(Xly) log2(  p(X)  ) \n\np(Xly) \n\n-\n\n[2]  also suggests  a  modified rule  measure,  the J-measure: \n\nJ(Xly) = p(y)j(Xly) \n\nThis  measure  discounts  rules  which  are  not  as  useful  in  the  data  set  in  order  to \nremove  the  effects  of  \"noise\"  or  randomness.  The  probabilities  in  both  measures \nare  computed from relative frequencies  counted in the given discrete  data set. \n\nUsing the j-measure, examples wilt be combined only when no error is  caused in the \nprediction ofthe data set.  The J-measure, on the other hand, will combine examples \neven  if some prediction  ability of the  data is  lost.  If we  simply use  the j-measure \nto  compress  our  original  rule  set,  we  don't  get  significant  compression.  However, \nwe  can only tolerate a  certain  margin of error  in prediction of our original rule  set \nand  maintain the  same  control  performance.  In  order  to  obtain  compression,  we \nwish  to allow  some error,  but  not so  much  as  the  J-measure  will  create.  We  thus \npropose  the following  measure,  which  allows  a  gradual variation of the  amount of \nnoise tolerance: \n\n-ax \nL(Xly) =  f(p(y),a)j(XIY)  where  !(x,a) =  1- e- a \n\n-e \n\nI\n\nThe parameter  a  may be set  at 0+  to  obtain the  J-measure since  !(x,O+) = x  or \nat  00 to obtain thej-measure, since  f(x, 00) = 1  (x> 0).  Any  value ofa between \no and  00  will  result  in  an  amount  of compression  between  that  of the  J-measure \nand  the j-measure;  thus  if we  are  able  to  tolerate  some  error  in  the  prediction  of \nthe original rule set,  we  can obtain more compression than the j-measure could give \nus,  but  not  as  much  as  the  J-measure would  require.  We  show  an example of the \nvariation of a  for  the truck  backer-upper  control system in  section 4.1. \n\n3.2.2  Searching for  the Best Rules \n\nIn [1],  we  presented an efficient method for searching the space of all possible rules to \nfind  the most representative ones for  discrete  data sets.  The basic idea is  that each \nexample is  a very  specific  (and  quite perfect)  rule.  However,  this  rule  is  applicable \nto only one example.  We wish  to generalize this very specific  rule  to cover  as  many \nexamples as  possible,  while  at the same time keeping  it  as  correct  as  possible.  The \ngoodness-measures shown above are just the tool for  doing this.  If we  calculate the \n\"goodness\"  of all  the rules  generated  by  removing  a  single  input  variable from  the \nvery  specific  rule,  then  we  will  be  able  to  tell  if any  of the  slightly  more  general \nrules generated from this rule are better.  If so, we take the best and continue in this \nmanner until no  more general rule  with a  higher  \"goodness\"  exists.  When we  have \nperformed  this  procedure  on  the  very  specific  rule  generated  from  each  example \n(and removed duplicates),  we  will have a set of rules  which represents  the data set. \n\n\f354 \n\nHiggins and Goodman \n\nLateral inhibitory connecti~ns \n\nInput \nMembership \nFunctions \n\nRules \n\nDefuzzification \n\nOutput \nMembership \nFunctions \n\nFigure 2:  Computational network  constructed from fuzzy  system \n\n3.3  Constructing a  Network \n\nConstructing  a  computational  network  to  represent  a  given  fuzzy  system  can  be \naccomplished  as  shown  in  figure  2.  From input  to output,  layers  represent  input \nmembership functions,  rules,  output membership functions,  and finally defuzzifica(cid:173)\ntion.  A  novel feature  of our  network is  the lateral links shown  in figure  2  between \nthe outputs of various rules.  These links allow  inference  with dependent  rules. \n\n3.3.1  The Layers of the Network \n\nThe first layer contains a node for every  input membership function used in the rule \nset.  Each  of these  nodes  responds  with  a  value between  zero  and  one  to a  certain \nregion  of  the  input  variable  range,  implementing  a  single  membership  function. \nThe  second  layer  contains  a  node  for  each  rule  - each  of these  nodes  represents \na  fuzzy  AND,  implemented  as  a  product.  The  third  layer  contains  a  node  for \nevery  output  membership  function.  Each  of these  nodes  sums  the  outputs  from \neach  rule  that  concludes  that  output  fuzzy  set.  The  final  node  simply takes  the \noutput  memberships collected  in  the  previous  layer  and performs a  defuzzification \nto produce the final  crisp output by normalizing the weights from each output node \nand  performing  a  convex  combination  with  the  peaks  of the  output  membership \nfunctions. \n\n3.3.2  The Problem with Dependent Rules and a  Solution \n\nThere  is  a  problem with  the  standard  fuzzy  inference  techniques  when  used  with \ndependent  rules.  Consider a  rule  whose conditions are  all contained in a  more spe(cid:173)\ncific  rule  (i.e.  one  with  more conditions)  which  contradicts  its  conclusion.  Using \nstandard fuzzy  techniques,  the more general  rule  will  drive  the output to an inter(cid:173)\nmediate  value  between  the  two  conclusions.  What  we  really  want  is  that  a  more \ngeneral  rule  dependent  on  a  more  specific  rule  should  only  be  allowed  to  fire  to \nthe  degree  that  the  more  specific  rule  is  not firing.  Thus the  degree  of firing  of the \nmore  specific  rule  should  gate  the  maximum firing  allowed  for  the  more  general \nrule.  This is  expressed  in network form in the links between  the rule  layer  and the \noutput  membership functions  layer.  The  lateral  arrows  are  inhibitory connections \nwhich  take  the  value  at  their  input,  invert  it  (subtract  it from  one),  and  multiply \nit by  the value at their output. \n\n\fLearning Fuzzy  Rule-Based  Neural Networks for Control \n\n355 \n\n'---- Truck and Trailer \n\n'---(cid:173)\n\n'----\n\n'----\n\nCab \nAngle \n---\n\n--)--Truck \n\n'----\n\nAngle \n\n'----\n\n'----\n\nLoading \nDock \n\nt \n\nY position \n(of truck rear) \n\nFigure 3:  The truck  and trailer backer-upper  problem \n\n4  Experimental Results \n\nIn  this section,  we  show  the results  of two experiments:  first,  a  truck  backer-upper \nin  simulation;  and  second,  a  simple  cruise  controller  for  a  radio-controlled  model \ncar  constructed  in our laboratory. \n\n4.1  Truck and Trailer Backer-Upper \n\nJenkins  and  Yuhas  [3]  have  developed  by  hand  a  very  efficient  neural  network  for \nsolving the  problem of backing up a  truck and trailer to a  loading dock.  The truck \nand  trailer backer-upper  problem is  parameterized in  figure  3. \n\nThe function  approximator system  was  trained on  225  example runs  of the Yuhas \ncontroller, with initial positions distributed  symmetrically about the field  in which \nthe truck operates.  In order to show the effect  of varying the number of membership \nfunctions,  we  have  fixed  the  maximum number  of membership functions  for  the  y \nposition  and  cab  angle  at  5  and  set  the  maximum  allowable  error  to  zero,  thus \nguaranteeing  that  the system  will  fill  out all of the allowed  membership functions. \nWe  varied  the maximum number of truck angle membership functions  from  3 to 9. \nThe effects  of this  are shown in figure  4.  Note that the error  decreases  sharply and \nthen holds  constant,  reaching  its  minimum at 5 membership functions.  The Yuhas \nnetwork  performance  is  shown  as  a  horizontal  line.  At  its  best,  the  fuzzy  system \nperforms slightly better  than the system it is  approximating. \n\nFor this experiment, we set a goal of 33% rule compression.  We varied the parameter \na  in  the  L-measure for  each rule set to get the desired compression.  Note in figure  4 \nthe performance of the system with compressed rules.  The performance  is  in every \ncase  almost identical  to that of the original rule sets.  The number of rules  and the \namount of rule compression obtained can be seen  in  table  1. \n\n4.2  Cruise Controller \n\nIn  this section,  we  describe  the  learning of a  cruise  controller  to keep  a  radio con(cid:173)\ntrolled model car  driving at a  constant  speed  in  a  circle.  We designed  a simple PD \ncontroller to perform this task, and then learned a fuzzy system to perform the same \ntask.  This example  is  not  intended  to suggest  that  a  fuzzy  system should  replace \na  simple  PD  controller,  since  the  fuzzy  system  may  represent  far  more  complex \n\n\f356 \n\nHiggins and Goodman \n\n, .. \n, .. \n\n... \n... \n\n1\\ \n\\ \n\\ \n\\F IIIVS .. ... \n\\ \n\\.  \\ \n1\\ \n\n\\ \n\n\\ \n\n\\ \n\nI~ ~ \n\nI\"-PO -\n\nYuI iooSy-\n\nNO> \n\nMO> \n\nDO> \n\n,0> \n\n.0> \n\n,0> \n\n1\\ \n\\ \n\\ \n\\ 1\"'\",1  III \n\\\\  \\ \n\\ \n. ... ~ \n\nc;jj:jj ~ \n\n'1'.:'\" \n\nYuba  Sv ... \n\n-~~----\n\n.. \n\nS \n\n, \n\nJ \n\n\u2022 \n\n~w~W\u00ab~Fm=~~~_ \n\na) Cmtrol error: final y positim \n\n.. \n\n5 \n\n, \n\n7 \n\nI \n\nNumw ofwct qIo m=bcnhip tunCllona \nb) Cmtrol error: fmal truck angle \n\nFigure 4:  Results of experiments with the  truck backer-upper \n\nN umber of Rules  Cell-Based \n\nCompressed \n\nCompressIOn \n\nN umber of truck  angle membership functions \n3 \n75 \n48 \n36%  33% \n\n7 \n9 \n225 \n175 \n154 \n114 \n35%  31%  32% \n\n6 \n150 \n100 \n33% \n\n5 \n125 \n86 \n31% \n\n4 \n100 \n67 \n\n8 \n200 \n138 \n\nTable  1:  Number of rules  and  compression figures  for  learned TBU systems \n\nfunctions,  but rather to show that the fuzzy system can learn from real control data \nand operate in  real-time. \n\nThe fuzzy  system  was  trained on  6  runs  of the  PD  controller  which  included  runs \ngoing forward  and backward, and conditions in which the car's speed was perturbed \nmomentarily by  blocking the car or pushing it.  Figure 5 shows the error  trajectory \nof both  the  hand-crafted  PD and learned fuzzy  control systems from rest.  The  car \nbuilds speed until it reaches the desired set point with a well-damped response,  then \nholds speed  for  a  while.  At a  later  time,  an obstacle  was  placed  in  the  path of the \ncar  to stop  it  and  then  removed;  figure  5  shows  the  similar recovery  responses  of \nboth systems.  It can  be seen  from  the  numerical  results  in  table  2  that  the  fuzzy \nsystem performs as  well  as  the original PD controller. \n\nNo  compression was  attempted because  the  rule  sets  are already  very  small. \n\nTime from 90% error  to  10% error  (s) \nRMS error  at steady state  (uncal) \nTime to correct  after obstacle  (s) \n\nPD Controller  Learned  Fuzzy  System \n0.9 \n59 \n6.2 \n\n0.7 \n45 \n6.2 \n\nTable 2:  Analysis of cruise  control performance \n\n\fLearning Fuzzy  Rule-Based  Neural  Networks for  Control \n\n357 \n\n\u2022 00 \n\n... \n\n~ \n\nI~ ~  u... \n\n, .-\n\nr \n\n11'1 \n\n-'\",y' \n\nI/Y \n\n-.00 \n\n-... \n\nJ \n\nn \n\nV ..... \n\n\u00b7 ~V' \n\n'\" \n\nV'W \n\n.. \n\n\" \n\nDO \n\n-I \n\n-. ~ \n\n000 \n\ntoo \n\n1000 \n\n1500 \nTimo(o) \n\n:3(01) \n\n21.C1O \n\n1)0) \n\ng<o \n\nsoo \n\n10(0 \n\n1500 \n\nTimo(o) \n\nlOCO \n\n2100 \n\n:1100 \n\na) PD COII.trol SYl1lem \n\nb) Fuzzy COII.troi System \n\nFigure  5:  Performance of PD controller  vs.  learned fuzzy  system \n\n5  Summary and  Conclusions \n\nWe  have  presented  a  method  which,  given  examples  of a  function  and  its  inde(cid:173)\npendent  variables,  can  construct  a  computational network  based  on fuzzy  logic  to \npredict  the  function  given  the  independent  variables.  The  user  must  only  specify \nthe maximum number of membership functions for each  variable and the maximum \nRMS  error from the example data. \nThe final fuzzy  system's actions  can be  explicitly explained in terms of rule firings. \nIf a system designer  does  not like some aspect  of the learned system's performance, \nhe  can simply change the rule set  and the membership functions to his liking.  This \nis in direct contrast to a neural network system, in which he would have no recourse \nbut another round of training. \n\nAcknowledgements \n\nThis work was supported in part by  Pacific  Bell,  and in part by  DARPA and  ONR \nunder grant no.  NOOOI4-92-J-1860. \n\nReferences \n\n[1]  C.  Higgins  and  R.  Goodman,  \"Incremental  Learning  using  Rule-Based  Neural \nNetworks,\"  Proceedings  of the  International  Joint  Conference  on  Neural Networks, \nvol.  1,  875-880, July  1991. \n\n[2]  R. Goodman, C.  Higgins, J.  Miller,  P.  Smyth,  \"Rule-Based  Networks for  Classi(cid:173)\nfication and Probability Estimation,\"  Neural Computation 4(6),781-804, November \n1992. \n\n[3]  R. Jenkins and B. Yuhas,  \"A Simplified Neural-Network Solution through Prob(cid:173)\nlem  Decomposition:  The  Case  of the  Truck  Backer-Upper,\"  Neural  Computation \n4(5), 647-9, September  1992. \n\n\f\fPART  IV \n\nVISUAL \n\nPROCESSING \n\n\f\f", "award": [], "sourceid": 649, "authors": [{"given_name": "Charles", "family_name": "Higgins", "institution": null}, {"given_name": "Rodney", "family_name": "Goodman", "institution": null}]}