{"title": "Connecting to the Past", "book": "Neural Information Processing Systems", "page_first": 505, "page_last": 514, "abstract": null, "full_text": "505 \n\nCONNECTING TO THE PAST \n\nBruce  A.  MacDonald,  Assistant Professor \n\nKnowledge Sciences  Laboratory,  Computer Science  Department \n\nThe University of Calgary,  2500  University  Drive  NW \n\nCalgary,  Alberta T2N  IN4 \n\nABSTRACT \n\nRecently  there  has  been  renewed  interest  in  neural-like  processing  systems,  evidenced  for  ex(cid:173)\nample in  the two volumes  Parallel Distributed Processing edited by Rumelhart and McClelland, \nand  discussed  as  parallel  distributed  systems,  connectionist  models,  neural  nets,  value  passing \nsystems  and  multiple  context  systems.  Dissatisfaction  with  symbolic  manipulation  paradigms \nfor  artificial  intelligence seems  partly  responsible  for  this  attention, encouraged by  the promise \nof massively  parallel  systems  implemented  in  hardware.  This  paper  relates  simple  neural-like \nsystems  based  on  multiple  context  to  some  other  well-known  formalisms-namely  production \nsystems, k-Iength sequence prediction, finite-state  machines and Turing machines-and presents \nearlier sequence  prediction  results  in  a  new  light. \n\n1 \n\nINTRODUCTION \n\nThe  revival  of neural  net  research  has  been  very  strong,  exemplified  recently  by  Rumelhart \nand  McClelland!,  new  journals  and  a  number  of meetingsG \u2022  The  nets  are  also  described  as \nparallel distributed systems!, connectionist models2 ,  value passing systems3  and multiple context \nlearning systems4,5,6,7,8,9.  The  symbolic  manipulation  paradigm  for  artificial  intelligence  does \nnot seem to have  been  as  successful  as some hoped!,  and there seems at last to be  real promise \nof massively  parallel  systems  implemented  in  hardware.  However,  in  the  flurry  of new  work  it \nis  important to  consolidate new  ideas  and  place them solidly  alongside  established  ones.  This \npaper relates simple  neural-like systems to some other well-known  notions-namely production \nsystems, k-Iength sequence prediction, finite-state machines and Turing machines-and presents \nearlier results on the abilities  of such networks in  a  new  light. \n\nThe  general  form of a  connectionist systemlO  is  simplified  to  a  three  layer  net  with  binary \n\nfixed  weights  in  the  hidden  layer,  thereby  avoiding  many  of the  difficulties-and  challenges(cid:173)\nof the  recent  work  on  neural  nets,  The  hidden  unit  weights  are  regularly  patterned  using  a \ntemplate.  Sophisticated,  expensive  learning  algorithms  are  avoided,  and  a  simple  method  is \nused  for  determining output unit weights.  In  this  way we  gain some of the advantages of multi(cid:173)\nlayered nets, while retaining some of the simplicity of two layer  net training methods.  Certainly \nnothing  is  lost  in  computational  power-as  I  will  explain-and  the  limitations  of  two  layer \nnets  are  not  carried  over  to  the  simplified  three  layer  one.  Biological  systems  may  similarly \navoid  the  need  for  learning  algorithms  such  as  the  \"simulated  annealing\"  method  commonly \nused  in  connectionist models ll .  For  one  thing,  biological systems  do  not have  the same  clearly \ndistinguished  training phase. \n\nBriefly, the simplified netb  is  a  production system implemented as three layers of neuron-like \nunits;  an output layer,  an input  layer,  and a  hidden layer for  the productions themselves.  Each \nhidden  production  unit  potentially  connects  a  predetermined  set  of inputs  to  any  output.  A \nk-Iength  sequence predictor  is  formed  once  Ie  levels  of delay  unit  are  introduced  into  the  input \nlayer.  k-Iength predictors are unable to distinguish simple sequences such as  ba . .. a and aa ... a \nsince  after  Ie  or  more  characters  the system has  forgotten  whether  an  a  or  b appeared  first.  If \nthe  k-Iength  predictor  is  augmented  with  \"auxiliary\"  actions,  it is  able  to  learn  this  and  other \nregular languages,  since  the  auxiliary  actions can  be equivalent  to states,  and  can  be  inputs  to \n\naAmong  them  the  1st  International  Conference  on  Neural  Nets,  San Diego,CA,  June  21-24,  1987,  and  this \n\ncon.ference. \n\nMacDonald12 . \n\nbRoughly  equivalent  to  a  single  context  system  in  Andreae's  multiple  context  system4.5,6,7,8,9.  See  also \n\n@)  American Institute of Physics 1988 \n\n\f506 \n\nFigure  1:  The general form  of a  connectionist system 10 . \n\n(a)  Form of a  unit \n\n(a)  Operations within  a  unit \n\nin~uts ;::; L'\" excitation-.I  1:.. aCtiVation--W'\" output \n\nweIghts \n\nsum \n\n\u00a5--== \n\nTypical F \n\nTypical f \n\nthe production units enabling  predictions to depend  on previous states7 .  By  combining several \naugmented sequence predictors a Thring machine tape can be simulated along with a finite-state \ncontroller9 ,  giving  the  net  the  computational  power  of a  Universal Turing  machine.  Relatively \nsimple  neural-like  systems  do  not  lack  computational  ability.  Previous  implementations 7,9  of \nthis  ability  are  production system equivalents  to the simplified  nets. \n\n1.1  Organization  of the  paper \nThe next section  briefly  reviews  the general  form of connectionist systems.  Section  2 simplifies \nthis,  then  section  3  explains  that  the  result  is  equivalent  to  a  production  system  dealing  only \nwith  inputs and outputs of the net.  Section 4 extends the simplified version, enabling it to learn \nto predict sequences.  Section 5 explains  how  the computational power of the sequence  predictor \ncan be increased to that of a Thring machine if some input units receive auxiliary actions; in  fact \nthe  system can  learn  to be  a TUring machine.  Section 6 discusses  the possibility of a number  of \nnets  combining  their outputs,  forming  an overall  net  with  \"association  areas\". \n\n1.2  General form  of a  connectionist  system \n\nFigure  1 shows  the  general form of a  connectionist system unit,  neuron  or  ce1l 10 .  In  the  figure \nunit i  has inputs,  which  are the outputs OJ  of possibly all  units in  the network,  and an output of \nits own,  0i'  The net input excitation, net\"  is  the weighted sum of inputs, where  !Vij  is  the weight \nconnecting  the  output from  unit  j  as  an  input to  unit  i.  The  activation,  ai  of the  unit  is  some \nfunction  Fi  of the  net  input  excitation.  Typically  Fi  is  semilinear,  that  is  non-decreasing  and \ndifferentiable 13 ,  and is  the same function  for  all,  or at least large groups of units.  The output  is \na function  fi  of the activation; typically some kind  of threshold function.  I will  assume that the \nquantities vary over  discrete  time steps,  so  for  example  the  activation  at time  t + 1 is  ai (t + 1) \nand is  given  by  Fi((neti(t)). \n\nIn  general  there  is  no  restriction  on  the  connections  that  may  be  made  between  units. \nUnits  not  connected  directly  to  inputs  or  outputs  are  hidden  units. \nIn  more  complex  nets \nthan  those  described  in  this  paper,  there  may  be  more  than  one  type  of connection.  Figure  2 \nshows  a  common  connection topology,  where  there are  three layers of units-input,  hidden  and \noutput-with no  cycles of connection. \n\nThe  net  is  trained  by  presenting  it  with  input  combinations,  each  along  with  the  desired \noutput  combination.  Once  trained  the  system  should  produce  the  desired  outputs  given  just \n\n\fFigure  2:  The basic structure of a  three layer  connectionist system. \n\n507 \n\ninput units \n\nhidden  output units \nunits \n\ninputs .  During  training  the  weights  are  adjusted  in  some  fashion  that reduces  the  discrepancy \nbetween  desired  and  actual output.  The general method is lO : \n\n(1) \n\nwhere  t;  is  the  desired,  \"training\"  activation.  Equation  1  is  a  general  form  of Hebb's  classic \nrule  for  adjusting the weight between two units with  high activations lO \u2022  The weight  adjustment \nis  the  product of two  functions,  one  that depends on  the  desired  and  actual  activations--often \njust the difference-and another that depends on the input to that weight  and the weight itself. \nAs  a  simple  example suppose  9  is  the difference  and  h  as  just the  output  OJ.  Then  the  weight \nchange  is  the  product of the output error  and  the input excitation to that weight: \n\nwhere  the constant T}  determines the learning rate.  This is  the Widrow-Hoff or Delta rule which \nmay  be  used  in  nets  without hidden  units. 1o \n\nThe  important  contribution  of recent  work  on  connectionist  systems  is  how  to  implement \nequation  1  in  hidden  units;  for  which  there  are  no  training  signals  ti  directly  available .  The \nBoltzmann learning method iteratively varies  both weights  and hidden  unit training activations \nusing the  controlled,  gradually decreasing randomizing method  \"simulated  annealing\" 14.  Back(cid:173)\npropagation 13  is  also iterative, performing gradient descent by propagating training signal errors \nback  through  the  net  to  hidden  units.  I  will  avoid  the  need  to  determine  training  signals  for \nhidden  units,  by  fixing  the weights  of hidden  units  in  section  2 below. \n\nAssume  these simplifications  are  made  to the  general  connectionist system of section  1.2: \n\n2 \n\nSIMPLIFIED  SYSTEM \n\n1.  The system has  three layers,  with  the topology shown  in  Figure 2  (ie  no cycles) \n2.  All hidden layer  unit  weights  are  fixed,  say  at unity or zero \n3.  Each  unit  is  a  linear  threshold  unit lO ,  which  means  the  activation  function  for  all  units \nis  the  identity  function,  giving  just  net;,  a  weighted  sum of the  inputs,  and  the  output \n\nfunction  is  a simple  binary threshold of the  form: ! output \n\n- I \n\n\u2022 \n\nactivation \n\nthreshold / \n\n\f508 \n\nso  that  the  output  is  binary;  on  or  oft'.  Hidden  units  will  have  thresholds  requiring  all \ninputs  to be active for  the output to be  active  (like  an  AND  gate)  while  output units  will \nhave  thresholds requiring only  1 or two active highly  weighted  inputs for  an output to  be \ngenerated  (like  an  OR  gate).  This  is  in  keeping  with  the  production system  view  of the \nnet,  explained in section 3. \n\n4.  Learning-which  now  occurs  only  at  the  output  unit  weights-gives weight  adjustments \n\naccording to: \n\nWij \nWij \n\nif ai =  OJ  = 1 \n\n1 \n0  otherwise \n\nso  that weights are  turned on if their input  and the unit output are on,  and off otherwise. \nThat is,  Wij  = ai A OJ.  A simple example is  given  in  Figure  3 in section 3 below. \n\nThis simple form of net  can be made probabilistic by  replacing 4  with 4' below: \n\n4'.  Adjust weights so that Wij  estimates the conditional probability of the unit i  output being \n\non  when  output j  is  on.  That is, \n\nWij  = estimate of P(odoj). \n\nThen, assuming independence of the inputs to a unit, an output unit is  turned on when  the \nconditional  probability  of occurrence  of that  output exceeds  the  threshold  of the output \nfunction. \n\nOnce these simplifications  are  made,  there is  no  need  for  learning in  the hidden  units.  Also  no \niterative learning is  required;  weights  are either  assigned binary values,  or estimate  conditional \nprobabilities.  This  paper  presents  some  of the  characteristics  of the  simplified  net.  Section  6 \ndiscusses  the motivation for  simplifying  neural nets  in  this  way. \n\n3 \n\nPRODUCTION  SYSTEMS \n\nThe  simplified  net  is  a  kind  of simple  production  system.  A  production  system  comprises  a \nglobal  database,  a  set of production  rules  and  a  control  system15 .  The  database  for  the  net  is \nthe system it interacts with,  providing inputs  as  reactions to outputs from  t.he  net.  The hidden \nunits of the  network  are  the  production rules,  which  have  the  form \n\nIF  precondition  THEN  action \n\nThe  precondition is  satisfied  when  the input excitation exceeds  the threshold  of a  hidden  unit. \nThe  actions  are  represented  by  the  output  units  which  the  hidden  production  units  activate. \nThe  control system of a  production system  chooses the  rule  whose  action  to  perform,  from  the \nset of rules whose preconditions have been met.  In  a neural net the control system is  distributed \nthroughout the net in the output units.  For example, the output units might form a winner-take(cid:173)\nall  net.  In production systems more complex control involves  forward  and backward chaining to \nchoose  actions  that seek  goals .  This  is  discussed  elsewhere4.12.16.  Figure  3  illust.rates  a  simple \nproduction  implemented  as  a  neural  net.  As  the  figure  shows,  the  inputs  to  hidden  units  are \njust  the  elements  of the  precondition.  When  the  appropriate  input  combination  is  present  the \nassociated hidden  (production) unit is  fired.  Once weights have  been leamed connecting hidden \nunits  to output units, firing  a  production results in  output.  The simplified  neural net  is  directly \nequivalent to a  production system whose  elements are  inputs and outputse  . \n\nSome  production  systems  have  symbolic  elements,  such  as  variables,  which  can  be  given \nvalues  by  production  actions.  The  neural  net  cannot  directly  implement  this,  since  it  can \nhave  outputs only  from  a  predetermined set.  However,  we  will  see  later  that extensions  t.o  the \nframework  enable this and other  abilities. \n\nCThis might  be  referred to as a  \"sensory-motor\"  production system, since when implemented ill a  l'eal system \nsuch as  a  robot,  it deals  only with sensed  inputs and executable  motor actions,  which may  include  the auxiliary \nactions  of section 4.3. \n\n\fFigure 3:  A  production implemented  in  a  simplified  neural net . \n\n509 \n\n(a)  A  production  rule \n\nrr==~--r=======~~~==~ \n\nIF I cloudy I AND I pressure falling I THEN I it will  rain I \n\n(b)  The rule implemented as a hidden unit.  The threshold of the hidden unit is  2 so it is. \nan AND  gate.  The threshold of the output unit is  1 so  it is  an OR gate.  The learned \nweight will  be 0 or 1 if the net is  not probabilistic, otherwise it will  be an estimate of \nP(it will  rainlclouds  AND  pressure falling) \n\nIt will \nrain \n\nweight \n\nFigure 4:  A net that predicts the next character in  a sequence,  based on only  the last character . \n\n(a)  The  net .  Production  units  (hidden  units)  have  been  combined  with  input  units. \nFor  example this net  could predict the sequence abcabcabc . . ..  Productions have the \nform :  IF  last character is  . .. THEN  next  character  will  be . . ..  The  learning  rule  is \nWij  = 1 if (inputj  AND  outputi).  Output is  ai  = ~R WijOj \n\ninput \na \nb \nc \n\nneural net \n\noutput \n\na \nb \nc \n\n(b)  Learning procedure. \n\n1.  Clamp inputs and outputs to desired  values \n\n2.  System calculates weight  values \n\n3.  Repeat 4  and  4 for  all  required  input/output combinations \n\n4  SEQUENCE  PREDICTION \n\nA  production  system  or  neural  net  can  predict sequences.  Given  examples  of a  repeating  se(cid:173)\nquence,  productions are learned which predict future events on the basis of recent ones .  Figure 4 \nshows  a  trivially simple  sequence  predictor.  It predicts the  next  character  of a  sequence  based \non the previous one.  The figure  also gives the details of the learning procedure for  the simplified \nnet.  The  net  need  be  trained  only  once  on  each  input  combination,  then  it  will  \"predict\"  as \nan  output  every  character seen  after  the  current  one.  The  probabilistic form  of the  net  would \nestimate conditional  probabilities for  the next  character, conditional on  the  current one.  Many \n\n\f510 \n\nFigure 5:  Using  delayed  inputs,  a  neural net  can  implement a  k-length sequence  predictor. \n(a)  A  net with  the last three characters as input. \n\ninput \n\nhidden \n\noutput \n\na':\"\"'\" -......:;;;;;;;::::~;:-\n\na \n\nb \n\nc \n\nz \n\n{  a' \na \nb':-;-'----..\", \n\n{:';;g 'J.,o \n{e'_~ 0 \ne -__ _  -\" \n\ne\" \n\n2nd last \n\n(b)  An  example production. \n\n~----------------------------------, \n\nIF  last  three  characters were ~ THEN 0 \n\npresentations of each possible character pair would  be needed to properly estimate the probabil(cid:173)\nities.  The net would  be learning the probability distribution of character pairs.  A  predictor like \nthe one in  Figure 4  can  be extended to a  general k-Iength 17  predictor so long  as  inputs delayed \nby 1,2, ... , k steps are available.  Then, as illustrated in  Figure 5 for  3-length prediction, hidden \nproduction  units  represent  all  possible  combinations  of k  symbols.  Again  output  weights  are \ntrained to respond to previously seen input combinations, here of three characters.  These delays \ncan  be  provided  by  dedicated  neural nets d ,  such  as  that shown  in  Figure  6.  Note  that the  net \nis  assumed  to  be synchronously  updated,  so  that  the  input  from  feedback  around  units  is  not \nchanged until one step after  the output changes.  There are various ways of implementing  delay \nin  neurons,  and Andreae4  investigates some of them for  the same purpose-delaying inputs-in \na  more detailed simulation of a similar  net. \n\n4.1  Other work  on  sequence  prediction  in  neural nets \nFeldman  and  Ballard2  find  connectionist  systems  initially  not  suited  to  representing  changes \nwith  time.  One  form  of change  is  sequence,  and  they  suggest  two  methods  for  representing \nsequence  in  nets.  The  first  is  by  units  connected  to each  other  in  sequence  so  that  sequential \ntasks  are  represented  by  firing  these  units  in  succession.  The  second  method  is  to  buffer  the \ninputs  in  time  so  that inputs from  the  recent  past  are  available  as  well  as  current  inputs;  that \nis,  delayed  inputs  are  available  as  suggested  above.  An  important  difference  is  the  necessary \nlength of the buffer;  Feldman and Ballard suggest the buffer  be long enough to hold  a  phrase of \nnatural language, but I  expect to use  buffers no  longer than about 7,  after  Andreae4 .  Symbolic \ninputs can represent more  complex information effectively  giving the  length seven  buffers more \ninformation  than  the most recent  seven simple inputs,  as  discussed in  section  5. \n\nThe method of back-propagation13  enables recurrent networks to learn sequential  tasks in  a \n\ndFeldman and Ballard2 give some dedicated  neural net  connections  for  a  variety  of flUlctions \n\n\fFigure 6:  Inputs  can be  delayed  by  dedicated neural subnets.  A  two stage  delay  is  shown. \n(a)  Delay  network. \n\n511 \n\n(b)  Timing diagram for  (a). \n\nA \nB \n\nC \nD \nE \n\n--.r- 1.0 IL..-_-_-_-_-_-.:-_-.:-_-.:-_-_-_-_ ... _ tml_\u00b7 _e_ \n0.75 ~ 0.375 ....,L-___ _ \n\n-r-0.5 \n\n__________ ~r--------r--------~------~L \n\noriginal signal \n\ndelay of one step \n\ndelay of two steps \n\nmanner similar  to the first suggestion in  the last paragraph, where sequences of connected units \nrepresent  sequenced  events.  In  one  example  a  net  learns to  complete  a  sequence  of characters; \nwhen  given  the first  two  characters of a  six character sequence  the next four  are output.  Errors \nmust  be  propagated around cycles  in  a  recurrent net  a  number of times. \n\nSeriality may also be achieved by a sequence of states of distributed activation 18.  An example \nis  a net  playing both sides of a tic-tac-toe game 18 .  The sequential nature of the net's behavior is \nderived from  the sequential nature of the responses to the net's actions; tic-tac-toe moves.  A  net \ncan  model  sequence  internally  by  modeling  a  sequential  part of its  environment.  For  example, \na  tic-tac-toe  playing  net  can have  a  model of its opponent. \n\nk-Iength  sequence  predictors  are  unable  to learn  sequences  which  do  not  repeat  more  fre(cid:173)\n\nquently that every k characters.  Their k-Iength context includes only information  about the last \nk  events.  However,  there  are  two  ways  in  which  information from before  the  kth  last  input  can \nbe retained in  the net.  The first method latches some inputs, while the second involves auxiliary \nactions. \n\n4.2  Latch  units \nInputs  can  be  latched  and  held  indefinitely  using  the  combination  shown  in  Figure  7.  Not  all \ninputs  would  normally  be  latched.  Andreae 4  discusses  this  technique  of  \"threading\"  latched \nevents  among  non-latched  events,  giving  the  net  both  information  arbitrarily  far  back  in  its \ninput-output  history  and  information  from  the  immediate  past .  Briefly,  the  sequence  ba  . .. a \ncan  be  distinguished  from  aa ... a  if the  first  character  is  latched.  However,  this  is  an  ad  hoc \nsolution  to this  probleme . \n\n4-3  Auxiliary  actions \nWhen an output is  fed  back into the net as  an input signal, this enables the system to choose the \nnext  output  at least  partly  based on  the previous one,  as  indicated  in  Figure  8.  If a  particular \nfed  back  output  is  also  one  without  external  manifestation,  or  whose  external  manifestation \nis  independent  of the  task  being  performed,  then  that  output  is  an  auxiliary  action.  It Las \n\n\"The  interested  reader should refer to  Andreae 4  where  more extensive analysis  is  given. \n\n\f512 \n\nFigure  7:  Threading.  A  latch  circuit  remembers  an event  until  another  comes  along.  This is  a \ntwo input latch, e.g.  for  two letters a  and b,  but any number of units may be similarly connected. \nIt  is  formed  from  a  mutual  inhibition  layer,  or  winner-take-all  connection,  along with  positive \nfeedback  to keep  the selected  output activated when  the input disappears. \n\na \n\nb---;~..!!J \n\nFigure  8:  Auxiliary  actions-the S outputs-are fed  back  to  the  inputs  of a  net,  enabling  the \nnet  to remember  a  state.  Here  both part of a  net  and  an  example  of a  production  are  shown. \nThere are  two  types of action,  characters and S actions. \n\nSinputs \n\nS outputs \n\ncharacter inputs \n\ncharacter outputs \n\nIF  S input is [\u00a7l] and  character input  is 0  THEN  output character lliJ and S [ill \n\nno  direct  effect  on  the  task  the  system  is  performing  since  it  evokes  no  relevant  inputs,  and \nso  can  be  used  by  the  net  as  a  symbolic  action.  If an  auxiliary  action  is  latched  at  the  input \nthen  the  symbolic  information  can  be  remembered  indefinitely,  being  lost  only  when  another \nauxiliary  action  of that  kind  is  input  and  takes over  the  latch.  Thus  auxiliary  actions  can  act \nlike  remembered  states;  the system performs  an  action to  \"remind\"  itself to  be  in  a  particular \nstate.  The  figure  illustrates  this  for  a  system that  predicts  characters  and state  changes  given \nthe  previous  character  and  state.  An  obvious  candidate  for  auxiliary  actions  is  speech.  So \nthe  blank  oval  in  the  figure  would  represent  the  net's  environment,  through  which  its  own \nspeech  actions  are  heard.  Although  it  is  externally  manifested,  speech  has  no  direct  effect  on \nour  physical  interactions  with  the  world.  Its  symbolic  ability  not  only  provides  the  power  of \nauxiliary actions,  but  also  includes other speakers in  the interaction. \n\n5  SIMULATING  ABSTRACT  AUTOMATA \n\nThe example in  Figure 8 gives  the essence of simulating a finite  state automaton with  a produc(cid:173)\ntion  system or  its  neural net equivalent.  It illustrates  the transition  function  of an  automaton; \nthe new  state and output are  a  function  of the previous state and input.  Thus a  neural net  can \nsimulate  a  finite  state automaton, so  long as  it has additional,  auxiliary actions. \n\nA  Thring  machine  is  a  finite  state  automaton  controller  plus  an  unbounded  memory.  A \nneural net could simulate a 'lUring machine in  two ways,  and both ways have been demonstrated \nwith  production  system  implementations-equivalent  to  neural  nets----(;alled  \"multiple  context \nlearning systems\"', briefly explained in  section 6.  The first  Thring machine simulation 7  has the \nsystem simulate only the finite state controller, but is able to use an unbounded external memory \n\nfSee  John Andreae's and his  colleagues'  work4 ,5,6,7,8,9,12 ,16 \n\n\fFigure  9:  Multiple  context  learning  system  implementation  as  multiple  neural  nets.  Each:3 \nlayer  net  has  the  simplified  form  presented  above,  with  a  number  of elaborations such  as  extra \nconnections  for  goal-seeking by  forward  and backward chaining. \n\n513 \n\nOutput \nchannels \n\nfrom  the real world,  much like the paper of Turing's original  work 19 .  The second simnlat.ion[\"  1 '2 \nembeds the memory in  the multiple context learning system,  along  with  a  counter  for  accessing \nthis  simulated  memory.  Both  learn  all  the  productions-equivalent  to  learning  output  unit \nweights-required  for  the  simulations.  The second  is  able  to  add  internal  memory  as  required, \nup to a  limit dependent on the size  of the network (which can easily be large enough to allow 70 \nyears of computation!).  The second could also employ external memory as the first  did.  Briefly, \nthe second simulation  comprised multiple sequence predictors which predicted auxiliary actions \nfor  remembering  the  state  of the  controller,  and  the  current  memory  position.  The  memory \nelement  is  updated  by  relearning the  production  representing  that element;  the  precondition  is \nthe address  and the  production  action the stored item. \n\n6  MULTIPLE  SYSTEMS  FORM  ASSOCIATION  AREAS \n\nA  multiple  context  learning  system  is  production  system  version  of a  multiple  neural  net,  al(cid:173)\nthough  a  simple  version  has  been  implemented  as  a  simulated  net 4 \u202220 .  It effectively  comprises \nseveral nets--or  \"association\"  areas-which may  have  outputs and  inputs  in  common,  as  indi(cid:173)\ncated in  Figure 9.  Hidden  unit  weights are specified  by templates ; one for  each net .  A  template \ngives  the  inputs  to  have  a  zero  weight  for  the  hidden  units  of a  net  and  the  inputs  to  have  a \nweight  of unity.  Delayed and latched  inputs  are also  available .  The actual outputs are selected \nfrom the  combined  predictions  of the  nets in  a  winner-take-all  fashion . \n\nI  see  the  design  for  real  neural  nets,  say  as  controllers  for  real  robots,  requiring  a  large \ndegree  of predetermined  connectivity.  A  robot  controller  could  not  be  one three  layer  net  wit.h \nevery  input  connected  to  every  hidden  unit  in  turn  connected  to  every  output.  There  will \nneed  to be some  connectivity  constraints so  the net  reflects  the  functional  specialization  in  the \ncontrol requirements9 .  The multiple context learning system has all  the hidden layer connections \npredetermined, but allows output connections to be learned.  This avoids the  \"credit assignment\" \nproblem  and  therefore  also  the  need  for  learning  algorithms  such  as  Boltzmann  learning  and \nback-propagation.  However,  as  the multiple  context  learning system  has  auxiliary  actions,  and \ndelayed  and  latched  inputs,  it  does  not  lack  computational  power.  Future  work  in  this  area \nshould  investigate ,  for  example,  the  ability of different  kinds  of nets  to  learn  auxiliary  act.ions. \nThis  may  be  difficult  as symbolic actions  may  not  be  provided in  training  inputs  and  output.s . \n\n9 For example a  controller for  a  robot  body  would  have  to deal  with vision,  manipulation,  motion,  etc. \n\n\f514 \n\n7  CONCLUSION \n\nThis  paper  has  presented  a  sImplified  three  layer  connectionist  model,  with  fixed  weights  for \nhidden units, delays and latches for  inputs, sequence prediction ability, auxiliary  \"state\"  actions, \nand  the  ability  to  use  internal  and  external  memory.  The  result  is  able  to  learn  to  simulate  a \nTuring machine.  Simple  neural-like systems do  not lack  computational  power. \n\nThis  work  is  supported by  the Natural  Sciences  and  Engineering Council  of Canada. \n\nACKNOWLEDGEMENTS \n\nREFERENCES \n\n1.  Rumelhart,D.E. and  McClelland,J .L . Parallel distributed processing. Volumes  1 and 2.  MIT \n\nPress.  (1986) \n\n2.  Feldman,J .A. and Ballard,D.H. Connectionist models and their properties. Cognitive  Science \n\n6,  pp.205-254.  (1982) \n\n3.  Fahlman,S .E.  Three Flavors of Parallelism. Proc.4th Nat.Conf. CSCSI/SCSEIO, Saskatoon. \n\n(1982) \n\n4.  Andreae,J .H.  Thinking  with  the  teachable  machine.  Academic Press.  (1977) \n5.  Andreae,J.H.  Man-Machine  Studies  Progress  Reports  UC-DSE/1-28.  Dept  Electrical  and \nElectronic  Engineering,  Univ.  Canterbury,  Christchurch,  New  Zealand.  editor.  (1972-87) \n(Also  available from  NTIS,  5285  Port Royal  Rd,  Springfield,  VA  22161) \n\n6.  Andreae,J .H.  and  Andreae,P.M.  Machine  learning  with  a  multiple  context.  Proc.9th \n\nInt.Conf.on  Cybernetics and Society.  Denver.  October.  pp.734-9.  (1979) \n\n7.  Andreae,J.H.  and  Cleary,J.G.  A  new  mechanism  for  a  brain.  Int.J.Man-Machine  Studies \n\n8(1):  pp.89-1l9. (1976) \n\n8.  Andreae,P.M.  and Andreae,J.H. A teachable machine in  the real world.  Int.J.Man-Machine \n\nStudies  10:  pp.301-12.  (1978) \n\n9.  MacDonald,B.A.  and Andreae,J .H.  The  competence  of a  multiple  context learning system. \n\nInt.J.Gen.Systems  7:  pp.123-37.  (1981) \n\n10.  Rumelhart,D.E.,  Hinton,G.E.  and  McClelland,J .L.  A  general  framework  for  parallel  dis(cid:173)\n\ntributed  processing.  chapter 2 in  Rumelhart and  McClelland l ,  pp.45-76.  (1986) \n\n11.  Hinton,G.E. and Sejnowski,T.L.  Learning and relearning in  Boltzmann machines . chapter 7 \n\nin  Rumelhart  and  McClelland l ,  pp.282-317.  (1986) \n\n12.  MacDonald,B.A.  Designing  teachable  robots.  PhD  thesis,  University  of  Canterbury, \n\nChristchurch,  New  Zealand.  (1984) \n\n13.  Rumelhart,D.E., Hinton,G.E. and Williams,R.J. Learning Internal Representations by Error \n\nPropagation. chapter 8 in  Rumelhart  and  McClelland l ,  pp.318-362.  (1986) \n\n14.  Ackley,D.H.,  Hinton,G.E.  and  Sejnowski,T.J.  A  Learning  Algorithm  for  Boltzmann  Ma(cid:173)\n\nchines.  Cognitive  Science  9,  pp.147-169.  (1985) \n\n15.  Nilsson,N.J.  Principles of Artificial Intelligence.  Tioga.  (1980) \n16.  Andreae,J .H.  and  MacDonald,B~_A.  Expert  control  for  a  robot  body.  Research  Report \n87/286/34 Dept. of Computer Science,  University  of Calgary,  Alberta,  Canada,  T2N-1N4. \n(1987) \n\n17.  Witten,I.H.  Approximate, non-deterministic modelling of behaviour sequences.  Int.  1.  Gen(cid:173)\n\neral Systems,  vol.  5 pp.1-12. (1979) \n\n18.  Rumelhart,D.E.,Smolensky,P.,McClelland,J.L.  and  Hinton,G.E.  Schemata  and  Sequential \nthought  Processes  in  PDP  Models.  chapter  14,  vol  2 in  Rumelhart  and  McClelland 1 .  pp.7-\n57. (1986) \n\n19.  Thring,A.M.  On  computable  numbers,  with  an  application  to  the  entscheidungsproblem. \n\nProc.  London  Math.  Soc.  vol 42(3).  pp.  230-65.  (1936) \n\n20.  Dowd,R.B.  A  digital simulation  of mew-brain.  Report no.  UC-DSE/105 .  pp.25-46.  (1977) \n\n\f", "award": [], "sourceid": 49, "authors": [{"given_name": "Bruce", "family_name": "MacDonald", "institution": null}]}