{"title": "Reasoning about Time and Knowledge in Neural Symbolic Learning Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 921, "page_last": 928, "abstract": "", "full_text": "Reasoning about  Time and  Knowledge \nNeural-Symbolic Learning  Systems \n\n. In \n\nArtur S.  d' Avila Garcez\"  and Luis  C.  Lamb A \n\n\"Dept.  of Computing,  City University  London \nLondon,  EC1V OHB,  UK  (aag@soi.city.ac.uk) \n\nADept.  of Computing Theory,  PPGC-II-UFRGS \n\nPorto Alegre,  RS  91501-970,  Brazil  (lamb@inf.ufrgs.br) \n\nAbstract \n\nWe  show  that  temporal logic  and  combinations of temporal logics \nand modal logics of knowledge  can be effectively represented in ar(cid:173)\ntificial  neural  networks.  We  present  a  Translation  Algorithm from \ntemporal  rules  to  neural  networks,  and  show  that  the  networks \ncompute  a  fixed-point  semantics  of the  rules.  We  also  apply  the \ntranslation to the muddy children puzzle, which has been used as a \ntestbed for distributed multi-agent systems.  We provide a complete \nsolution to the puzzle with the use of simple neural networks, capa(cid:173)\nble of reasoning  about  time  and of knowledge  acquisition through \ninductive learning. \n\n1 \n\nIntroduction \n\nHybrid  neural-symbolic \nsystems  concern  the  use  of  problem-specific  symbolic \nknowledge  within  the  neurocomputing  paradigm  (d'Avila  Garcez  et  al.,  2002a). \nTypically,  translation algorithms from  a  symbolic to a  connectionist representation \nand vice-versa are employed to provide either  (i)  a neural implementation of a logic, \n(ii)  a  logical  characterisation of a  neural  system,  or  (iii)  a  hybrid learning system \nthat brings together features from connectionism and symbolic artificial intelligence \n(Holldobler,  1993). \n\nUntil recently,  neural-symbolic systems were not able to fully represent,  reason and \nlearn  expressive  languages  other  than  propositional  and  fragments  of  first-order \nlogic  (Cloete  &  Zurada,  2000).  However,  in  (d'Avila Garcez  et  al.,  2002b;  d'Avila \nGarcez  et  al.,  2002c;  d'Avila  Garcez  et  al.,  2003),  a  new  approach  to  knowledge \nrepresentation and reasoning in neural-symbolic systems  based on neural  networks \nensembles has been  introduced.  This new approach shows  that  modal logics can be \neffectively  represented in artificial neural networks. \n\nIn this  paper,  following  the  approach introduced  in  (d'Avila  Garcez  et  al.,  2002b; \nd'Avila Garcez et al.,  2002c;  d'Avila Garcez et al.,  2003),  we  move one step further \nand  show  that  temporal  logics  can  be  effectively  represented  in  artificial  neural \n\no Artur  Garcez  is  partly  supported  by  the  Nuffield  Foundation.  Luis  Lamb  is  partly \n\nsupported by CNPq.  The authors would like  to thank the referees  for  their  comments. \n\n\fnetworks.  This  is  done  by  providing  a  translation  algorithm  from  temporal  logic \ntheories  to  the  initial  architecture  of  a  neural  network.  A  theorem  then  shows \nthat the  translation is  correct  by proving that  the  network computes  a  fixed-point \nsemantics of its corresponding temporal theory (van Emden & Kowalski,  1976) .  The \nresult is  a  new learning system capable of reasoning about knowledge and time.  We \nhave validated  the  Connectionist  Temporal Logic  (CTL)  proposed  here by applying \nit to a distributed time and knowledge representation problem known as the muddy \nchildren  puzzle  (Fagin et  al.,  1995). \n\nCTL  provides  a  combined  (multi-modal)  connectionist  system  of knowledge  and \ntime,  which  allows  the  modelling  of evolving  situations  such  as  changing environ(cid:173)\nments  or  possible  worlds.  Although  a  number  of multi-modal  systems  - e.g.,  com(cid:173)\nbining knowledge and time  (Halpern &  Vardi,  1986;  Halpern et al.,  2003)  and com(cid:173)\nbining beliefs,  desires  and intentions  (Rao &  Georgeff,  1998)  - have  been proposed \nfor  distributed  knowledge representation,  little  attention has been  paid to the inte(cid:173)\ngration of a  learning component  for  knowledge  acquisition.  This  work  contributes \nto  bridge  this  gap  by  allowing  the  knowledge  representation to  be  integrated  in  a \nneural learning system.  Purely  from  t he  point of view of knowledge representation \nin  neural-symbolic  systems,  this  work  contributes  to  the  long  term  aim  of repre(cid:173)\nsenting expressive and computationally well-behaved symbolic formalisms in neural \nnetworks. \n\nThe  remainder  of this  paper  is  organised  as  follows.  We  start ,  in  Section  2,  by \ndescribing  the  muddy  children  puzzle,  and  use  it  to  exemplify  the  main  features \nof  CTL.  In  Section  3,  we  formally  introduce  CTL's  Translation  Algorithm,  which \nmaps  knowledge  and  time  theories  into  artificial  neural  networks,  and  prove  that \nthe t ranslation is correct.  In Section 4,  we conclude and discuss directions for  future \nwork. \n\n2  Connectionist  Reasoning about  Time and  Knowledge \n\nTemporal  logic  and  its  combination  with other  modalities  such  as  knowledge  and \nbelief operators have been the subject of intense investigation (Fagin et al.,  1995).  In \nthis section,  we  use the muddy children puzzle, a testbed for  distributed knowledge \nrepresentation formalisms,  to  exemplify  how knowledge  and t ime  can  be expressed \nin  a  connectionist setting.  We  start by  stating the puzzle  (Fagin et  al.,  1995;  Huth \n&  Ryan,  2000). \n\nThere  is  a  number  n  of  (truthful  and  intelligent)  children  playing  in  a  garden.  A \ncertain number of children k  (k  :S  n)  has  mud on their faces .  Each child  can see  if \nthe other  are  muddy,  but not themselves.  Now,  consider  the following  situation:  A \ncaret aker  announces  that at  least one  child is  muddy  (k  2':  1)  and asks  does  any  of \nyou know if you have mud on  your own face?  To help understanding the puzzle,  let \nus  consider  the  cases  in which  k  =  1,  k  =  2  and  k  =  3.  If k  =  1  (only  one  child  is \nmuddy),  the muddy child answers  yes  at the first  instance since she cannot see any \nother muddy child.  All  the other children answer no at the first  instance.  If k  = 2, \nsuppose  children  1  and  2  are  muddy.  At  the  first  instance,  all  children  can  only \nanswer  no.  This  allows  1 to reason as  follows:  if 2  had  said  yes  the  first  time, she \nwould have been the only muddy child.  Since 2 said no , she must be seeing someone \nelse  muddy;  and since I  cannot see  anyone else  muddy apart from  2, I  myself must \nbe  muddy!  Child  2  can reason  analogously,  and  also  answers  yes  the  second  time \nround.  If k  =  3,  suppose  children  1,  2  and  3  are  muddy.  Every children  can only \nanswer  no  the  first  two  times  round.  Again,  this  allows  1  to  reason  as  follows:  if \n2  or  3  had  said  yes  the  second  time,  they  would  have  been  the  only  two  muddy \nchildren.  Thus,  there must be  a  third person with mud.  Since  I  can  only see  2 and \n\n\f3 with mud, this third person must be me!  Children 2 and 3 can reason analogously \nto conclude as  well  that  yes,  they are muddy. \n\nThe  above  cases  clearly  illustrate the need  to distinguish between an agent's  indi(cid:173)\nvidual  knowledge and  common knowledge about the world in a  particular situation. \nFor  example,  when  k  =  2,  after  everybody  says  no  at  the  first  round,  it  becomes \ncommon knowledge  that  at  least  two  children  are  muddy.  Similarly,  when  k  =  3, \nafter  everybody  says  no  twice,  it  becomes  common knowledge  that  at  least  three \nchildren are muddy,  and so on.  In other words,  when  it is  common knowledge  that \nthere are at least k -1 muddy children;  after the announcement that nobody knows \nif they  are  muddy  or  not ,  then  it  becomes  common  knowledge  that  there  are  at \nleast  k  muddy  children,  for  if there  were  k  - 1  muddy  children  all  of them  would \nknow  that they had mud in their faces. I \n\nIn what follows,  a  modality K j  is  used to represent the knowledge of an agent  j.  In \naddition,  the term Pi  is  used  to  denote  that  proposition P  is  true  for  agent  i.  For \nexample,  KjPi  means  that  agent  j  knows  that P  is  true  for  agent  i.  We  use Pi  to \nsay that child i  is  muddy,  and qk  to say that at least k children are muddy (k  :s;  n). \nLet us  consider the case  in which three children are playing in the garden  (n  =  3). \nRule ri below states that when child  1 knows  that at least one child is  muddy and \nthat  neither  child  2  nor  child  3  are  muddy  then  child  1  knows  that  she  herself is \nmuddy.  Similarly,  rule  r~  states  that  if child  1  knows  that  there  are  at  least  two \nmuddy children and she knows that child 2 is  not muddy then she must also be able \nto  know  that  she  herself is  muddy,  and  so  on.  The  rules  for  children  2  and  3  are \ninterpreted  analogously. \n\nri:  K Iql!\\KI\"\"'P2!\\KI\"\"'P3  ---+KIPI \nrj:  K Iq2!\\K I\"\"'P3  ---+KIPI \n\nd:  K Iq2!\\KI\"\"'P2  ---+KIPI \nrl:  K Iq3  ---+KIPI \nTable  1:  Snapshot rules  for  agent ( child)  1 \n\nEach set of snapshot rules  r~ (1  :s;  I :s;  n;  mE N+)  can be  implemented in  a  single \nhidden layer neural network Ni as follows.  For each rule,  a hidden neuron is created. \nEach rule antecedent (e.g.,  KIql in ri) is  associated with an input neuron.  The rule \nconsequent  (KIPI)  is  associated with an output neuron.  Finally,  the input neurons \nare  connected  to  the  output  neuron  through  the  hidden  neuron  associated  with \nthe rule  (ri).  In  addition,  weights  and  biases  need  to be set  up to  implement  the \nmeaning of the rule.  When  a  neuron is  activated  (i.e.  has  activation above  a  given \nthreshold), we say that its associated concept (e.g.,  KIql) is true.  Conversely, when \na  neuron is  not  activated,  we  say  that  its  associated  concept  is  false.  As  a  result , \neach  input vector of Ni  can be  associated with an  interpretation (an assignment  of \ntruth-values)  to the set of rules .  Weights  and biases  must  be such that the output \nneuron is  activated if and only if the interpretation associated with the input vector \nsatisfies  the  rule  antecedent.  In  the  case  of rule ri,  the  output  neuron  associated \nwith KIPI  must  be activated  (true)  if the input  neuron associated with KIql, the \ninput neuron associated with K I\"\"'P2,  and the input neuron associated with K I\"\"'P3 \nare all  activated  (true). \n\nThe  Connectionist  Inductive  Learning  and  Logic  Programming  (C-ILP)  System \n(d'Avila Garcez et al.,  2002a;  d'Avila Garcez &  Zaverucha,  1999)  makes  use  of the \nabove kind of translation.  C-ILP is  a massively parallel computational model based \non an artificial neural network that integrates inductive learning from examples and \nbackground knowledge with deductive learning through logic programming.  Follow-\n\nINotice  that  this  reasoning  process  can  only start once  it  is  common  knowledge  that \n\nat  least  one child is  muddy,  as  announced by the caretaker. \n\n\fing  (Holldobler &  Kalinke,  1994)  (see  also  (Holldobler et al. ,  1999)) ,  a  Translation \nAlgorithm maps  any  logic  program  P  into a  single  hidden  layer  neural  network N \nsuch t hat N  computes the least fixed  point of P .  This provides a  massively parallel \nmodel  for  computing the  stable  model  semantics  of P  (Lloyd,  1987) .  In  addition, \nN  can b e t rained  wit h  examples using,  e.g.,  Backpropagation,  and using P  as back(cid:173)\nground  knowledge  (Pazzani  &  Kibler,  1992) .  The  knowledge  acquired  by  training \ncan then be  extracted  (d'Avila  Garcez  et  al. ,  2001) ,  closing  the  learning  cycle  (as \nin  (Towell  &  Shavlik,  1994)). \n\nFor  each  agent  (child) ,  a  C-ILP  network  can  be  created.  Each  network  can  be \nseen  as  representing  a  (learnable)  possible  world  containing information about  the \nknowledge held by an  agent in a  distributed  system .  Figure  1 shows the  implemen(cid:173)\ntation of rules ri to d.  In addition,  it  contains output  neurons PI 2 and Kql ,  Kq2 \nand Kq3 , all represented as facts. 3  This is  highlighted in grey in Figure 1.  Neurons \nthat  appear  on  both the  input  and  output  layers  of a  C-ILP network  (e.g.,  Kqd \nare recurrently connected using weight one,  as depicted in Figure 1.  This allows the \nnetwork to iterate the computation of truth-values when chains occur in the set of \nrules.  For example, if a  ---+  b and b ---+  C  are  rules of the theory,  neuron b will  appear \non both the input and output layers of the network, and if a is  activated then c will \nbe activated through the activation of b. \n\nFigure  1:  The  implementation of rules  {ri, ... , rn. \n\nIf child  1  is  muddy,  output  neuron PI  must  be  activat ed.  Since,  child  2  and  3  can \nsee child 1,  they will know that PI  is  muddy.  This can  be represented  as PI  ---+  K 2PI \nand PI  ---+  K 3PI ,  and  analogously  for  P2  and P3 .  This  means  that  the  activation of \noutput neurons KI 'P2 and K I'P3 in Figure  1 depends on the activation of neurons \nthat  are  not in this network (NI ),  but in N2  and N 3 .  We  need,  therefore,  to model \nhow  the networks  in the ensemble  interact with each other. \n\nFigure  2  illustrat es  the  interaction  between  three  C-ILP  networks  in  the  muddy \nchildren puzzle.  The arrows connecting the networks implement the fact  that when \na  child  is  muddy,  the other  children can see  her.  So  if, e.g.,  neuron PI  is  activated \nin N I ,  neuron  KPI  must  be  activat ed  in N2  and N3 .  For  the  sake  of clarity,  the \nsnapshot rules r;\" shown in Figure 1 are omitted  here, and this is indicat ed in Figure \n\n2Note Pl  means  'child  1  is  muddy'  while KPl  means  'child  1  knows she  is  muddy'. \n3 A fact  is normally represented as a  rule with no antecedents.  C-ILP represents fact s by \nnot connecting the rule's hidden neuron to any input neuron (in the case of fully-connected \nnetworks,  weights with initial value  zero are  used). \n\n\f2 by neurons highlighted in black.  In  addition, only positive information about the \nproblem  is  shown in Figure 2.  Negative  information such as  -'PI, K-'PI,  K-'P2  and \nK -'P3  would  be  implemented  analogously. \n\nI \nI \nI \nI \n\n--------- - - -\n\nFigure  2:  Interaction between  agents in t he  muddy children  puzzle. \n\nFigure  2  illustrates  well  the  idea  behind  this  paper.  By  combining  a  number  of \nsimple  C-ILP networks,  we  are  able  to  model  individual  and  common knowledge. \nEach network represents a possible world or an agent's current set of beliefs  (d' Avila \nGarcez et al. , 2002b).  If we  allow a  number of ensembles like the one of Figure 2 to \nbe combined, we  can represent the evolution in time of an agent's set of beliefs.  This \nis  exactly what  is  required  for  a  complete solution of the muddy children  puzzle,  as \ndiscussed  below. \n\nAs  we have seen,  the solution to the muddy children  puzzle illustrated  in Figures 1 \nand  2  considers  only  snapshots  of knowledge  evolution  along  time  rounds  without \nthe addition of a time variable (Ruth &  Ryan, 2000).  A complete solution, however, \nrequires the addition of a temporal variable to allow reasoning about t he knowledge \nacquired  after  each  time  round.  The  snapshot  solution  of Figures  1  and  2  should \nthen be seen as representing the knowledge held by the agents at an arbitrary time \nt.  The  knowledge  held  by  the  agents  at  time  t  + 1  would  then  be  represented \nby  anot her  set  of  C-ILP networks,  appropriately  connected  to  the  original  set  of \nnetworks.  Let  us  consider  again the case  where  k  =  3.  There are alternative ways \nof representing that , but one possible representation for  child 1 would be as  follows: \n\ntl : -,KIPI /\\ -,K 2P2  /\\ -,K 3P3  ---+  O K I Q2 \nt2  : -,KIPI /\\ -,K2P2  /\\ -,K3P3  ---+  O K I Q3 \n\nTable 2:  Temporal rules for  agent(child)  1 \n\nEach temporal rule is labelled by a time point ti in which the rule holds.  In addition, \nif a  rule labelled  t i  makes  use of the  n ext time temporal operator 0  then  whatever \no qualifies  refers  to the next  time ti+l  in a  linear  time flow.  As  a  result , the first \ntemporal  rule  above  states  that  if,  at tl,  no  child  knows  whether  she  is  muddy  or \nnot  then,  at t 2 ,  child  1 will  know  that  at  least  two  children are  muddy.  Similarly, \nthe  second  rule  states  that,  at  t2,  if still  no  child  knows  whether  she  is  muddy  or \nnot then,  at t3,  child 1 will know that  at least three children  are muddy.  As  before, \nanalogous  temporal  rules  exist  for  agents  (children)  2  and  3.  The  temporal  rules, \ntogether  with the  snapshot  rules,  provide  a  complete  solution  to the  puzzle.  This \nis  depicted  in Figure  3  and discussed  below. 4 \n\nIn Figure  3,  networks  are replicated  to represent  an  agent's knowledge evolution in \ntime.  A  network represents an agent 's knowledge today (or at  tl), a  network repre-\n\n41t  is  worth  noting  that  each  network  remains  a  simple,  single  hidden  layer  neural \n\nnetwork  that  can  be  trained  with  the  use  of standard  Backpropagation or  other  off-the(cid:173)\nshelf learning algorithm. \n\n\fTo Agents 2 and 3 (Kpl) at tl \n\n$ \"~;~~;'---:-\\  )if~~;;;3) ~\\o \n\nTo Agents 2 and 3 (Kp1) at t2 \n\n)  J, .6~o:s;(t:).~~_~_ );::~AgrnU(Kp3) \n\n\u2022  CL)(). CLX I) \n\n1 at t1~. \n\n/ \n,.'\" \n'\"  \\:.  '\"  ~// \n\n-,  ~K, \n\n- ____ ~ \n\n;' \n\n' .! \n\n\" \n\n--\"..  -- From  Agent 2  (Kp2) \n\n\\~ ,  1 at t2 \n\\\n.... \n\",,\"\", __ ~ __ ~ _ l  /~/ \n\n'~K~'  / \n~.~ I \n\n,. \n\n/// \n\n,\n\n-~~ /  ~ att2 \n\nat t1 \nFrom Agent 3 (p3) \n\nFrom Agent 2 (p2) \nat t2 \n\n___  - - From Agent 3 (p3) at t1 \n\n'\",,' _____ ->~---. \n\n-\n\nFrom Agent 2 (p2)  at t1 \n\nFigure 3:  Knowledge evolution of agent  (child)  1 from  time tl to time h \n\nsents  the same  agent's  knowledge  tomorrow  (t 2 ),  and the  appropriate  connections \nbetween  networks  model  the  relations  between  today  and  tomorrow  according  to \nO.  In  the  case  of tl  :  ,KIPI 1\\  ,K2P2  1\\  ,K3P3  -+  OKl q2,  for  example,  output \nneuron  KIPI  of the  network  that  represents  agent  1  at  t l , output  neuron K 2P2  of \nthe network that represents agent  2 at  tl, and output  neuron K 3P3  of the network \nthat  represents  agent  3  at  tl  need  to  be  connected  to  output  neuron  K l q2  of the \nnetwork  that  represents  agent  1  at  t2  (the  next  time)  such  that  K l q2  is  activated \nif  KIPI,  K 2P2  and  K 3P3  are  not  activated.  In  conclusion,  in  order  to  represent \ntime,  in addition to knowledge,  we need  to use  a  two-dimensional  C-ILP ensemble. \nIn  one  dimension  we  encode  the  knowledge  interaction  between  agents  at  a  given \ntime point,  and in the other dimension we  encode  the agents'  knowledge  evolution \nthrough time. \n\n3  Temporal Translation  Algorithm \n\nj \n\nIn  this  section,  we  present  an  algorithm  to  translate  temporal  rules  of the  form \nt  :  OKaLI' ... , OKbLk  -+  OKcLk+I'  where  a, b, c ...  are  agents  and  1  :s;  t  :s;  n,5 \ninto  (two-dimensional)  C-ILP network ensembles.  Let  P  represent  a  number  q  of \nground6  temporal  rules .  In  such  rules,  we  call  Li  (1  :s;  i  :s;  k  + 1)  a  literal,  and \ncall  KjLi  (1  :s; \n:s;  m)  an  annotated  literal.  Each  Li  can  be  either  a  positive \nliteral  (p)  or  a  negative  literal  ('p).  Similarly,  KjL i  can  be  preceded  by , .  We \nuse  Amin  to  denote  the  minimum  activation  for  a  neuron  to  be  considered  active \n(true),  Amin  E  (0,1).  We  number  the  (annotated)  literals7  of P  from  1  to  v  such \nthat,  when  a  C-ILP  network N  is  created,  the  input  and  output  layers  of N  are \nvectors  of length  v,  where  the  i-th  neuron  represents  the  i-th  (annotated)  literal. \nFor convenience, we  use a bipolar semi-linear activation function h(x)  =  l+e2- IlX  -1, \nand inputs in {-I, I}. \n\nLet  kz denote  the  number  of  (annotated)  literals  in  the  body  of rule  rl;  f..L1,  the \nnumber  of rules  in  P  with  the  same  (annotated)  literal  as  consequent ,  for  each \nrule  Tl;  MAXrz (kl' f..L1),  the  greater  element  between  kz  and  f..L1  for  rule  Tl;  and \nMAX p (kl' ... , kq, f..LI,  ... , f..Lq),  the greatest element  among all  kl's  and f..Lz'S  of P.  We \n\n5There  may  be  n + 1  time  points  since,  e.g.,  h  : Kja, K k f3  ->  OKj, means  that  if \n\nagent  j  knows  a  and agent  k  knows f3  at time tl  then agent  j  knows /  at time t2. \n\n6Variables such as ti  are instantiated into the language's ground terms  (tl, t2, t3 ... ). \n7We  use ' (annotated)  literals'  to refer  to any literal,  annotated or not  annotated ones. \n\n\f-----+ \n\n-----+ \n\nalso  use  k  as  a  shorthand  for  (k1, ... , kq),  and  fJ,  as  a  shorthand  for  (fJ,1,  ... , fJ,q). \nFor  example,  for  P  =  {r1  :  b /\\  c /\\  ---,d  ----+  a, r2  :  e  /\\  f \n----+  a, r3  :  ----+  b},  k1  =  3, \nk2  =  2,  k3  =  0,  fJ,1  =  2,  fJ,2  =  2,  fJ,3  =  1,  MAXr 1 (k1,fJ,1)  =  3,  MAXr2 (k2,fJ,2)  =  2, \nM AXr 3  (k3,  fJ,3)  =  1 and  M AXp( k  ,  fJ,  )  =  3. \nCTL  Translation Algorithm: \n\n-----+ \n\n-----+ \n\n1.  For each time point t  in P  do:  For each agent j  in P  do:  Create a  C-ILP Neural \nNetwork Nj,t. \n2.  Calculate  W  such that W  2':  2.  . \n\nIn(l\u00b1,~in)-ln(l -Amin) \n\n; \n(3  MAXp(k ,  M ).(Amin-1)+Amin+1 \n\nI \n\n. \n\n. \n\nI \n\nI \n\nI \n\n2 \n\n' \n\n2 \n\nm \n\nk+1 \n\nto et  =  (1+Amin )(l-Md W \u00b7 \n\n3.  For  each  rule  in  P  of the  form  t  :  OK1L 1, ... , OKm- 1L k  ----+  OKmL k+1,8  do: \n(a)  Add  a  hidden  neuron  LO  to N m,t+1  and  set  h(x)  as  the  activation  function \nof  L O;  (b)  Connect  each  neuron  OKjLi  (1  ::;  i  ::;  k)  in  Nj,t  to  LO.  If L i  is  a \npositive  (annotated)  literal then set the connection weight to W;  otherwise,  set the \nconnection  weight  to  -W  Set  the threshold eO  of L O  to eO  = \n(1+ A min)(k l -1)W' \n' \n(c)  Connect L O  to KmLk+1  in N m,t+1 and set the connection weight to W.  Set the \n(1+ A mi;)(l-Md W ; (d)  Add a  hidden  neuron L e \nthreshold e;+l  of KmLk+1  to e;+l  = \nto Nm ,t  and set  h(x)  as the activation function of L e ;  (e)  Connect neuron KmLk+1 \nin N m,t+1  to Le  and set  the connection weight  to W;  Set the threshold ei  of Le  to \nzero;  (f)  Connect  L e  to  OKmLk+1  in Nm ,t  and  set  the  connection  weight  to W. \nSet the threshold et  of K  L \n4.  For  each  rule  in  P  of  the  form  t  :  OK1L 1, ... , OKm-1Lk  ----+  KmLk+1 '  do: \n(a)  Add  a  hidden  neuron  L O  to  Nm, t  and  set  h(x)  as  the  activation  function  of \ni  ::;  k)  in  Nj ,t  to  L O .  If L i  is  a \nL O;  (b)  Connect  each  neuron  OKjLi  (1 \npositive  (annotated)  literal then set the connection weight to W;  otherwise, set the \nconnection  weight  to  -W  Set  the threshold eO  of LO  to eO  = \n(1+ A min)(k l -1)W' \n' \n(c)  Connect  LO  to  K mL k+1 in Nm ,t  and set  the connection weight  to W .  Set  the \nthreshold ei+1 of K mL k+1 to e;+l  =  (1+ Ami;)(l- Md W; \n5.  If N  ought to be  fully-connected,  set  all  other connections to zero. \nIn the above  algorithm it  is  worth  noting that,  whenever  a  rule  consequent  is  pre(cid:173)\nceded by 0, a  forward  connection from  t  to t + 1 and  a  feedback  connection from \nt  + 1  to  t  need  to  be  added  to  the  ensemble.  For  example,  if  t  :  a  ----+  Ob  is  a \nrule  of P  then  not  only  must  the  activation of neuron  a  at  t  activate  neuron  b at \nt + 1,  but  the  activation  of neuron  b at  t + 1  must  also  activate  neuron  Ob  at  t . \nThis  is  implemented  in steps 3(d)  to  3(1)  of the  algorithm.  The  remainder  of the \nalgorithm is  concerned with the implementation of snapshot rules  (as  in Figure 1). \nThe values of Wand e come from  C-ILP's Translation Algorithm  (d'Avila Garcez \n&  Zaverucha,  1999),  and are chosen so  that the behaviour of the network  matches \nthat of the temporal rules,  as  the following  theorem shows. \n\n::; \n\n2 \n\nI \n\nI \n\nTheorem 1  (Correctness  of Translation  Algorithm)  For  each  set  of ground  tem(cid:173)\nporal  rules  P,  there  exists  a  neural network  ensemble N  such  that N  computes  the \nfixed-point  operator T p  of P. \n\nProof.  (sketch)  This  proof follows  directly from  the  proof of the  analogous  theorem \nfor  single  C-ILP  networks  presented in  (d 'Avila  Garcez  fj Zaverucha,  1999).  This \nis  so  because  C-ILP's  definition for Wand e  values  makes hidden neurons L O  and \nLe  behave  like  and  gates,  while  output neurons  behave  like  or  gates.  D \n\n8Note that 0  is not required to precede every rule antecedent.  In the network, neurons \n\nare  labelled as  OKILI or KILl  to differentiate the two  concepts. \n\n\f4  Conclusions \n\nIn his seminal paper  (Valiant,  1984),  Valiant argues for  the need of rich logic-based \nknowledge  representation  mechanisms  within  learning  systems.  In  this  paper,  we \nhave  addressed  such  a  need,  yet  complying  with  important  principles  of connec(cid:173)\ntionism such as  massive  parallelism.  In particular,  a  very  important feature  of the \nsystem  presented here  (CTL)  is  the temporal dimension that can be combined with \nan epistemic dimension.  This  paper provides  the first  account  of how  to  integrate \nsuch  dimensions  in a  neural-symbolic  learning system.  The  CTL framework opens \nup  several  interesting  research  avenues  in  the  domain  of neural-symbolic  integra(cid:173)\ntion,  allowing for  the representation and learning of expressive  formalisms.  In this \npaper,  we  have  illustrated  this  by  providing  a  full  solution to the  muddy  children \npuzzle,  where  agents reason about their knowledge  at  different  time  points.  In the \nnear future,  we  plan to also  apply the system to a  large,  real world  case  study. \n\nReferences \nC loete,  1.,  &  Zurada,  J.  M.  (Eds.).  (2000) .  Knowl edge-based  neurocomputing.  The  MIT  Press. \nd'Avila Garcez,  A.  S.,  Broda,  K.,  &  Gabbay,  D.  M.  (2001).  Symbolic knowledge  extraction from  trained \n\nneural  networks:  A  sound  approach.  Artificial  Intelligence ,  125,  155- 207. \n\nd'Avila Garcez,  A.  S.,  Broda,  K.,  &  Gabbay,  D.  M.  (2002a) .  Neural-symbolic  learning  systems:  Foun(cid:173)\n\ndations  and  applications.  Perspectives  in  Neural  Computing.  Springer-Verlag. \n\nd'Avila  Garcez,  A.  S .,  Lamb,  L.  C.,  Broda,  K. ,  &  Gabbay,  D.  M .  (2003).  Distributed  knowledge  re p(cid:173)\n\nresentation  in  neural-symbolic  learning  systems:  a  case  study.  Accepted  for  Proceedings  of  16th \nInternational  FLAIRS  Conference.  St .  Augustine  Florida. \n\nd 'Avila  Garcez,  A.  S.,  Lamb,  L.  C. ,  &  Gabbay,  D .  M .  (2002b).  A  connectionist  inductive  learning \nsystem for modal logic  programming (Technical Report 2002/6).  Department of Computing, Imperial \nCollege,  London. \n\nd 'Avila Garcez,  A.  S. ,  Lamb,  L.  C. ,  &  Gabbay,  D .  M.  (2002c).  A  connectionist inductive learning system \nfor modal logic programming.  Proceedings  of IEEE International Conference  on Neural  Information \nProcessing  I CONIP'02 (pp.  1992-1997).  Singapore. \n\nd'Avila  Garcez,  A.  S .,  &  Zaverucha,  G.  (1999) .  The  connectionist  inductive  learning  and  logic  pro(cid:173)\n\ngramming system.  Applied  Intelligence  Journal,  Special  Issue  on  Neural  N etworks  and  Structured \nKnowledge,  11 ,  59-77. \n\nFagin,  R.,  Ha lpern,  J.,  Moses,  Y.,  &  Vardi,  M.  (1995).  R easoning  about  knowledg e.  M IT  Press . \n\nHalpern ,  J .  Y.,  van  der  Meyden,  R.,  &  Vardi ,  M.  Y.  (2003).  Complete  axiomatizations  for  reasoning \n\nabout  knowledge  and time .  SIAM Journal  on  Computing.  to appear. \n\nHalpern ,  J .  Y.,  &  Vardi ,  M.  (1986).  The  complexity  of  reasoning  about  knowledge  and  time  I:  lower \n\nbounds.  Journal  of Computer and  System  Sciences ,  38,  195- 237. \n\nHolldobler,  S.  (1993).  Automated inferencing and connectionist models.  Postdoctoral Thesis , Intellektik, \n\nInformatik,  TH  Darmstadt . \n\nHolldobler,  S.,  &  Kalinke ,  Y .  (1994).  Toward  a  new  massively  parallel  computationa l  model  for  logic \nprogramming.  Proceedings  of the  Workshop  on  Combining  Symbolic  and  Connectionist Processing, \nECAI94  (pp.  68-77). \n\nHolldobler,  S.,  Kalinke ,  Y.,  &  Storr,  H .  P.  (1999).  Approximating  the  semantics  of logic  programs  by \nrecurrent  n e ural  n etworks.  Applied  Int ellig ence  Journal,  Special  Issu e  on  N eural  Networks  and \nStructured  Knowledg e,  11,  45-58. \n\nHuth,  M.  R.  A.,  &  Ryan ,  M.  D.  (2000).  Logic  in  comput er  science:  Mod elling  and  reasoning  about \n\nsystems.  Cambridge  University  Press. \n\nLloyd,  J.  W.  (1987) .  Foundations  of logic  programming.  Springer-Verlag. \n\nPazzani,  M.,  &  Kibler ,  D.  (1992).  The utility of knowledge  in inductive learning.  Machine  Learning,  9, \n\n57-94. \n\nRao,  A.  S.,  &  Georgeff,  M.  P.  (1998).  Decision  procedures  for  BDI  logics.  Journal  of  Logic  and \n\nComputation,  8,  293-343. \n\nTowell,  G.  G .,  &  Shavlik,  J.  W.  (1994).  Knowledge-based  artificial  neural  networks.  Artificial  Intelli(cid:173)\n\ngence ,  70,  119- 165. \n\nValiant,  L .  G.  (1984).  A  theory  of the  learnable.  Communications  of the  ACM,  27,  1134- 1142. \nvan  Emden,  M .  H. ,  &  Kowalski,  R.  A.  (1976).  The  semantics  of  predicate  logic  as  a  programming \n\nlanguage.  Journal  of the  ACM,  23,  733- 742. \n\n\f", "award": [], "sourceid": 2490, "authors": [{"given_name": "Artur", "family_name": "Garcez", "institution": null}, {"given_name": "Luis", "family_name": "Lamb", "institution": null}]}