{"title": "Compositionality, MDL Priors, and Object Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 838, "page_last": 844, "abstract": null, "full_text": "Compositionality,  MDL  Priors, and \n\nObject Recognition \n\nElie Bienenstock  (elie@dam.brown.edu) \nStuart Geman (geman@dam.brown.edu) \n\nDaniel Potter (dfp@dam.brown.edu) \n\nDivision of Applied  Mathematics, \n\nBrown University,  Providence,  RI 02912  USA \n\nAbstract \n\nImages  are ambiguous  at  each  of many  levels  of a  contextual  hi(cid:173)\nerarchy.  Nevertheless,  the high-level  interpretation of most  scenes \nis  unambiguous,  as  evidenced  by the superior  performance of hu(cid:173)\nmans.  This observation argues for global vision models, such as de(cid:173)\nformable templates.  Unfortunately,  such  models are computation(cid:173)\nally intractable for unconstrained problems.  We propose a composi(cid:173)\ntional model in which  primitives are recursively composed, subject \nto syntactic restrictions, to form tree-structured objects and object \ngroupings.  Ambiguity  is  propagated up the hierarchy in  the form \nof multiple interpretations, which are later resolved by a  Bayesian, \nequivalently minimum-description-Iength,  cost functional. \n\n1  Bayesian decision theory and compositionaiity \n\nIn  his  Essay  on  Probability,  Laplace  (1812)  devotes  a  short  chapter-his  \"Sixth \nPrinciple\" -to what  we  call  today  the  Bayesian  decision  rule.  Laplace  observes \nthat  we  interpret  a  \"regular  combination,\"  e.g.,  an  arrangement  of objects  that \ndisplays some particular symmetry, as having resulted from a  \"regular cause\"  rather \nthan arisen  by  chance.  It is  not,  he argues,  that a  symmetric configuration is  less \nlikely to happen by chance than another arrangement.  Rather, it is  that among all \npossible  combinations,  which  are equally  favored  by chance,  there are very  few  of \nthe regular type:  \"On a table  we  see  letters  arranged in this  order,  Constantinople, \nand  we  judge  that  this  arrangement  is  not  the  result  of chance,  not  because  it  is \nless  possible  than  the  others,  for  if this  word  were  not  employed  in  any  language \n\n\fCompositionality, MDL Priors, and Object Recognition \n\n839 \n\nwe  should  not suspect it came from  any particular  cause,  but this word  being  in use \namongst  us,  it  is  incomparably  more probable  that  some  person  has  thus  arranged \nthe  aforesaid letters than that this  arrangement is  due  to  chance.\" In this example, \nregularity is not a mathematical symmetry.  Rather, it is a convention shared among \nlanguage users, whereby  Constantinople is  a word, whereas Jpctneolnosant,  a string \ncontaining the same letters but arranged in a  random order,  is  not. \n\nCentral  in  Laplace's argument is  the observation that the number of words in  the \nlanguage  is  smaller,  indeed  \"incomparably\"  smaller,  than  the  number  of possible \narrangements of letters.  Indeed,  if the collection  of 14-letter words  in  a  language \nmade up, say, half of all 14-letter strings- a rich language indeed-we would, upon \nseeing the string  Constantinople on the table, be far less inclined to deem it a word, \nand far  more inclined  to accept it as  a  possible coincidence.  The sparseness of al(cid:173)\nlowed combinations can be observed at all linguistic articulations (phonetic-syllabic, \nsyllabic-lexical, lexical-syntactic, syntactic-pragmatic, to use broadly defined levels), \nand may be viewed  as a form of redundancy-by analogy to error-correcting codes. \nThis redundancy was likely devised by evolution to ensure efficient  communication \nin  spite  of the  ambiguity of elementary  speech  signals.  The hierarchical  composi(cid:173)\ntional  structure of natural  visual  scenes  can also  be thought  of as  redundant:  the \nrules  that  govern  the  composition  of edge  elements  into  object  boundaries,  of in(cid:173)\ntensities into surfaces etc., all the way to the assembly of 2-D  projections of named \nobjects, amount to a  collection of drastic combinatorial restrictions.  Arguably,  this \nis why in all but a few-generally hand-crafted-cases, natural images have a unique \nhigh-level interpretation in spite of pervasive low-level ambiguity-this being amply \ndemonstrated by the performances of our brains. \n\nIn sum, compositionality appears to be a fundamental aspect of cognition (see also \nvon  der  Malsburg 1981,  1987;  Fodor and Pylyshyn 1988;  Bienenstock,  1991,  1994, \n1996;  Bienenstock and Geman 1995).  We  propose here to account for  mental com(cid:173)\nputation  in  general  and  scene  interpretation in  particular in  terms  of  elementary \ncomposition  operations,  and  describe a  mathematical framework  that  we  have de(cid:173)\nveloped  to this  effect.  The present  description is  a  cursory one,  and some notions \nare illustrated on two simple examples rather than formally defined-for a  detailed \naccount, see  Geman et al.  (1996),  Potter (1997).  The  binary-image example refers \nto an  N  x  N  array of binary-valued pixels,  while the  Laplace-Table  example refers \nto a  one-dimensional array of length N, where each position can be filled  with  one \nof the 26 letters of the alphabet or remain blank. \n\n2  Labels and  composition rules \n\nThe  objects  operated  upon  are  denoted  Wi, i  =  1,2, ... , k.  Each  composite object \nW  carries  a  label,  I  =  L(w),  and  the  list  of  its  constituents,  (Wt,W2,\u00b7\u00b7 .).  These \nuniquely  determine  w,  so  we  write W  =  I (WI, W2, .\u2022. ) .  A  scene S  is  a  collection  of \nprimitive  objects.  In  the  binary-image  case,  a  scene  S  consists  of a  collection  of \nblack pixels in the N  x N  array.  All these primitives carry the same label,  L(w)  =  p \n(for  \"Point\"), and a parameter 7r(w)  which is the position in the image.  In Laplace's \nTable, a scene S  consists of an arrangement of characters on the table.  There are 26 \nprimitive labels,  \"A\", \"B\" , ... , \"Z\" , and the parameter of a primitive W  is  its position \n1 ~ 7r(w)  ~ N  (all primitives in such a  scene must have different  positions). \n\nAn example of a composite W  in the binary-image case is an arrangement composed \n\n\f840 \n\nE.  Bienenstock. S.  Geman and D.  Potter \n\nof a  black pixel at any position except on the rightmost column and another black \npixel  to  the  immediate  right  of the  first  one.  The  label  is  \"Horizontal  Linelet,\" \ndenoted  L(w)  =  hl,  and there are  N(N - 1)  possible horizontallinelets.  Another \nnon-primitive  label,  \"Vertical  Linelet,\"  or vl,  is  defined  analogously.  An  example \nof a  composite  W  for  Laplace's  Table  is  an  arrangement  of 14  neighboring  primi(cid:173)\ntives  carrying the labels  \"G\", \"0\", \"N\", \"S\", ... , \"E\"  in  that order, wherever  that \narrangement will  fit.  We  then have L(w) = Ganstantinople, and there are N  - 13 \npossible Constantinople objects. \n\nThe  composition  rule  for  label  type  1 consists  of a  binding  junction,  B\"  and a  set \nof allowed  binding-function  values,  or  binding  support,  S,:  denoting  by  0  the set \nof  all  objects  in  the  model,  we  have,  for  any  WI, ' .. ,Wk  E  0,  B, (WI. ... ,Wk)  E \ns,  \u00a2:} l(WI\"\"  ,Wk)  E O.  In the binary-image example, Bhl(WI,W2)  =  Bv,(WI,W2)  = \n(L(WI),L(W2),7I'(W2)  -7I'(WI)),  Sh'  =  {(P,p,(I,O))}  and Sv'  =  {(p,p,(O,I))}  define \nthe hl- and vl-composition rules, p+p -+  hl and p+p -+  vl.  In Laplace's Table, G + \n0+\u00b7 .. + E  -+ Ganstantinpole is an example of a 14-ary composition rule, where we \nmust check the label and position of each constituent.  One way to define the binding \nfunction and support for  this rule is:  B(WI, ' \"  ,WI4)  =  (L(WI),' \"  ,L(WI4), 71'(W2)  -\n71'(Wt} , 71'(W3)  - 71'(WI),\"', 71'(W14)  - 71'(WI))  and S  =  (G,\"', E, 1,2\"\",13). \n\nWe now introduce recursive labels and composition rules:  the label of the composite \nobject is  identical to the label of one or more of its constituents, and the rule may \nbe applied  an arbitrary number of times,  to yield objects of arbitrary complexity. \nIn  the binary-image case,  we  use a  recursive label  c,  for  Curve,  and an associated \nbinding function which creates objects of the form  hl + p -+ c, vl + p -+ c,  c + p -+ c, \np + hl  -+  c,  p + vl  -+  c,  p + c  -+  c,  and  c + c  -+  c.  The  reader  may  easily \nfill  in  the details,  i.e.,  define  a  binding function  and  binding  support  which  result \nin  \"c\" -objects  being  precisely  curves  in  the  image,  where  a  curve  is  of length  at \nIn  the  previous  examples,  primitives  were \nleast  3  and  may  be  self-intersecting. \ncomposed  into  compositions;  here  compositions  are  further  composed  into  more \ncomplex  compositions.  In general,  an object W  is  a  labeled  tree,  where each  vertex \ncarries  the  name  of  an  object,  and  each  leaf is  associated  with  a  primitive  (the \nassociation is  not necessarily one-to-one, as in the case of a  self-intersecting curve). \n\nLet  M  be  a  model-Le.,  a  collection  of labels  with  their  binding  functions  and \nbinding  supports-and  0  the  set  of  all  objects  in  M .  We  say  that  object  W  E \no  covers  S  if S  is  precisely  the  set  of  primitives  that  make  up  w's  leaves.  An \ninterpretation  I  of S  is  any  finite  collection  of objects  in  0  such  that  the  union \nof the  sets  of primitives  they  cover  is  S.  We  use  the convention  that,  for  all  M \nand S, 10  denotes the  trivial interpretation, defined  as the collection of (unbound) \nprimitives in S.  In most cases of interest, a model M  will allow many interpretations \nfor  a  scene  S .  For  instance,  given  a  long  curve  in  the binary-image model,  there \nwill  be  many  ways  to recursively  construct  a  \"c\"-labeled tree that covers  exactly \nthat curve. \n\n3  The MDL  formulation \n\nIn  Laplace's  Table,  a  scene  consisting  of  the  string  Constantinople  admits,  in \naddition  to  10 ,  the  interpretation  II  =  {WI},  where  WI  is  a  \"G anstantinople\" -\nobject.  We  wish to define a probability distribution D  on interpretations such that \nD(I1 )  \u00bb  D(Io), in order to realize Laplace's  \"incomparably more probable\".  Our \n\n\fCompositionality, MDL Priors, and Object Recognition \n\n841 \n\ndefinition of D  will  be motivated by the following  use of the Minimum Description \nLength  (MDL)  principle (Rissanen 1989).  Consider a  scene S and pretend we want \nto transmit S  as  quickly as  possible through a  noiseless  channel,  hence  we  seek to \nencode it as efficiently as possible, i.e., with the shortest possible binary code c.  We \ncan always use  the trivial interpretation 10:  the codeword  c(Io)  is a  mere list  of n \nlocations in S.  We  need not specify labels, since there is  only one primitive label in \nthis example.  The length, or cost,  of this code for  S  is  Ic(Io)1  =  nlog2 (N 2 ). \n\nNow  however  we  want  to  take  advantage of regularities,  in  the  sense  of Laplace, \nthat  we  expect to be present  in  S.  We  are specifically  interested in  compositional \nregularities, where some arrangements that occur more frequently  than by chance \ncan  be  interpreted  advantageously  using  an  appropriate  compositional  model  M. \nInterpretation I  is advantageous if Ic(I)1  < Ic(Io)l.  An example in the binary-image \ncase is a linelet scene S.  The trivial encoding of this scene costs us Ic(Io)1  =  2[log2 3+ \nlog2(N2)]  bits,  whereas  the  cost  of the  compositional  interpretation  II  =  {wI}  is \nIc(Idl =  log2 3+log2 (N(N -1)), where WI  is an hI or vl object, as the case may be. \nThe first  log23  bits  encode the label  L(WI)  E  {p, hi, vi}, and  the rest  encodes  the \nposition in the image.  The compositional {p, hl, vl} model is therefore advantageous \nfor a  linelet scene,  since It affords us  a gain in encoding cost of about 2log2 N  bits. \nIn general, the gain realized by encoding {w}  =  {I (WI, W2)}  instead of {WI, W2}  may \nbe  viewed  as  a  binding  energy,  measuring  the  affinity  that WI  and  W2  exhibit  for \neach  other as  they assemble into w.  This binding energy is c,  =  IC(WI)I  +  IC(W2)1  -\nI c( I (WI, W2) ) I,  and an efficient  M  is one that contains judiciously chosen cost-saving \ncomposition  rules.  In  effect,  if,  say,  linelets  were  very  rare,  we  would  be  better \noff  with  the  trivial  model.  The  inclusion  of non-primitive  labels  would  force  us \nto add at least  one bit to the code  of every  object-to specify  its  label-and this \nwould increase the  average encoding cost, since the infrequent use of non-primitive \nlabels  would  not balance the extra small  cost  incurred  on primitives.  In practical \napplications,  the  construction  of  a  sound  M  is  no  trivial  issue.  Note  however \nthe simple rationale for  including a  rule such  as  p +  p  --7  hl.  Giving  ourselves the \nlabel hi renders redundant the independent encoding of the positions of horizontally \nadjacent pixels.  In general, a good model should allow one to hierarchically compose \nwith each other frequently  occurring arrangements of objects. \n\nThis use of MDL leads in a straightforward way to an equivalent Bayesian formula(cid:173)\ntion.  Setting P'(w)  =  2- lc(w)lj L:w'EO 2- lc(w')I  yields  a  probability distribution  P' \non n for  which c is approximately a Shannon code (Cover and Thomas 1991).  With \nthis definition,  the decision to include the label hl-or the label Con8tantinople(cid:173)\nwould be viewed, in principle, as a statement about the prior probability of finding \nhorizontal linelets-or  Constantinople strings-in the scene to be interpreted. \n\n4  The observable-measure formulation \n\nThe MDL formulation  however has a  number of shortcomings;  in  particular, com(cid:173)\nputing the  binding  energy for  composite objects  can  be  problematic.  We  outline \nnow  an alternative approach  (Geman et al.  1996,  Potter 1997),  where a  probabil(cid:173)\nity distribution  P(w)  on n is  defined  through  a  family  of observable  measures Q,. \nThese measures assign probabilities to each possible binding-function value,  s E  S\" \nand also to the primitives.  We  require L:'EM L:sEsr Q,(8)  =  1,  where the notion of \nbinding function has been extended to primitives via Bprim (w)  =  7r(w)  for primitive \n\n\f842 \n\nE.  Bienenstoc/c, S. Geman and D. Potter \n\nw.  The probabilities  induced  on  0  by  Q,  are  given  by  P(w)  =  Qprim(Bprim(w)) \nfor  a  primitive w,  and P(w) =  Q,(B,(WI,W2))P2(WI,W2IB,(WI,W2))  for  a  composite \nobject w =  l(wI, W2).1  Here p 2  =  P  X  P  is  the product probability, i.e., the free,  or \nnot-bound, distribution for the pair (WI, W2)  E 0 2.  For instance, with C + ... + E  -? \nCanstantinople,  p 14 (WI,W2,'\" \n,w14IBcons ... (W1, ... ,W14)  =  (C, 0,\u00b7\u00b7\u00b7,13))  is  the \nconditional  probability of observing  a  particular string  Constantinople,  under  the \nfree  distribution,  given  that  (WI, ... , W14)  constitutes such  a  string.  With the rea(cid:173)\nsonable  assumption  that,  under  Q,  primitives  are  uniformly  distributed  over  the \ntable,  this  conditional  probability  is  simply  the  inverse  of the  number of possible \nConstantinople strings, Le.,  1/(N - 13). \n\nThe  binding  energy,  defined,  by  analogy  to  the  MDL  approach,  as  [,  = \nlog2(P(w)/(P(wdP(w2))),  now  becomes  [,  = \nlog2(P  x \nP(B'(Wl,W2)))'  Finally,  if I  is  the collection of all finite interpretations /  c 0, we \ndefine the probability of /  E I  as D(/) = IIwElP(w)/Z, with Z  = L:I'EI IIwEl'P(w), \nThus, the probability of an interpretation containing several free objects is obtained \nby assuming that these objects occurred in the scene independently of each  other. \nGiven a  scene S,  recognition is formulated  as the task of maximizing D  over all the \nl's in I  that are interpretations of S. \n\nlog2(Q,(B,(wI,w2)))  -\n\nWe  now illustrate the use of D  on our two examples.  In the binary-image example \nwith  model  M  =  {p, hi, vi}, we  use  a  parameter q, 0  ~ q  ~ 1,  to adjust  the prior \nprobability  of linelets.  Thus,  Qprim(Bprim(W))  =  (1  - q)/N2 for  primitives,  and \nQh'\u00abP,p,O, 1)) = Qv'\u00abP,p, 1,0)) =  q/2 for  linelets.  It is  easily seen that regardless \nof  the  normalizing  constant  Z,  the  binding  energy  of two  adjacent  pixels  into  a \nlinelet  is  [h'  =  [v,  =  log2(q/2)  - log2[{lNf N(N - 1)].  Interestingly,  as  long  as \nq =1=  0  and q  =1=  1,  the binding energy,  for  large N, is  approximately 2log2 N, which \nis  independent of q.  Thus, the linelet interpretation is  \"incomparably\"  more likely \nthan  the  independent  occurrence  of two  primitives  at  neighboring  positions.  We \nleave  it  to  the  reader  to  construct  a  prior  P  for  the  model  {p, hl, vI, c},  e.g.  by \ndistributing the Q-mass evenly between all  composition rules.  Finally, in  Laplace's \nTable,  if there are M  equally likely non-primitive labels-say city names-and q  is \ntheir total mass, the binding energy for  Constantinople is [Cons ...  =  log2  M(r! -13)  -\nlog2[~~.&]14, and the  \"regular\"  cause is  again  \"incomparably\"  more likely. \n\nThere are several advantages to this reformulation from codewords into probabilities \nusing  the  Q-parameters.  First,  the Q-parameters can  in  principle  be  adjusted  to \nbetter account for  a  particular world of images.  Second,  we  get an explicit formula \nfor  the  binding energy,  (namely  log2 (Q / P  x  P)).  Of course,  we  need  to evaluate \nthe  product  probability  P  x  P,  and this  can be highly  non-trivial-one approach \nis  through  sampling,  as  demonstrated  in  Potter  (1997).  Fi~ally,  this  formulation \nis  well-suited  for  parameter estimation:  the Q's,  which  are the parameters of the \ndistribution P, are indeed observables, Le.,  directly available empirically. \n\n5  Concluding remarks \n\nThe  approach  described  here  was  applied  by  X.  Xing  to  the  recognition  of  \"on(cid:173)\nline\"  handwritten characters, using a  binary-image-type model  as  above,  enriched \n\nIThis is actually an implicit definition.  Under reasonable conditions, it is well defined(cid:173)\n\nsee  Geman et al.  (1996). \n\n\fCompositionality, MDL Priors, and Object Recognition \n\n843 \n\nwith  higher-level  labels  including  curved  lines,  straight lines,  angles,  crossings,  T(cid:173)\njunctions,  L-junctions  {right  angles},  and  the  26  letters  of the alphabet.  In such \na  model,  the search for  an optimal  solution  cannot  be done  exhaustively.  We  ex(cid:173)\nperimented with a  number of strategies, including a  two-step algorithm which first \ngenerates all possible objects in the scene, and then selects the  \"best\"  objects, Le., \nthe objects with highest total binding energy, using a greedy method, to yield a final \nscene interpretation.  (The total binding energy of W  is  the sum of the binding ener(cid:173)\ngies  \u00a3,  over all  the composition rules  I  used  in the composition of w.  Equivalently, \nthe total binding energy is  the log-likelihood ratio log2{P{w}/IIi P{Wi)),  where  the \nproduct is  taken over all the primitives Wi  covered by w.} \n\nThe first  step  of the  algorithm  typically  results  in  high-level  objects  partly over(cid:173)\nlapping on the set of primitives they cover,  i.e.,  competing for  the interpretation of \nshared  primitives.  Ambiguity  is  thus  propagated  in  a  \"bottom-up\"  fashion.  The \nambiguity is  resolved  in  the second  \"top-down\"  pass,  when high-level  composition \nrules are used to select the best compositions, at all levels including the lower ones. \nA  detailed  account  of our experiments  will  be given  elsewhere.  We  found  the  re(cid:173)\nsults quite encouraging, particularly in view of the potential scope of the approach. \nIn effect,  we  believe  that this  approach  is  in principle capable of addressing unre(cid:173)\nstricted vision problems, where images are typically very ambiguous at lower levels \nfor  a  variety of reasons-including occlusion and mutual overlap of objects-hence \npurely bottom-up segmentation is  impractical. \n\nTurning now  to  biological implications,  note that dynamic binding in  the nervous \nsystem has been a subject of intensive research and debate in the last decade.  Most \ninteresting in the present context is  the suggestion, first  clearly articulated by von \nder  Malsburg {1981},  that composition may be performed  thanks to a  dual  mech(cid:173)\nanism  of accurate  synchronization  of spiking  activity-not  necessarily  relying  on \nperiodic  firing-and  fast  reversible  synaptic  plasticity.  If there is  some  neurobio(cid:173)\nlogical  truth to the model  described  in the present  paper,  the binding mechanism \nproposed  by  von  der  Malsburg would  appear to be an attractive implementation. \nIn effect, the use of fine temporal structure of neural activity opens up a large realm \nof possible high-order codes in  networks of neurons. \n\nIn the present  model,  constituents  always  bind  in  the  service of a  new  object,  an \noperation one may refer to as  triangular  binding.  Composite objects can engage in \nfurther composition, thus giving rise to arbitrarily deep  tree-structured constructs. \nPhysiological evidence of triangular binding in the visual system can be found in Sil(cid:173)\nlito et al.  {1994}; Damasio {1989} describes an approach derived from neuroanatom(cid:173)\nical  data and lesion  studies that is  largely consistent with  the formalism  described \nhere. \n\nAn  important requirement for  the neural representation of the tree-structured ob(cid:173)\njects  used  in  our model  is  that the doing and undoing of links  operating on  some \nconstituents, say Wi  and W2,  while affecting in some  useful way the high-order pat(cid:173)\nterns that represent these objects, leaves these patterns, as representations of Wi  and \nW2,  intact.  A family of biologically plausible patterns that would appear to satisfy \nthis  requirement  is  provided  by  synfire  patterns  {Abeles  1991}.  We  hypothesized \nelsewhere  {Bienenstock 1991,  1994,  1996}  that synfire chains could be dynamically \nbound via weak synaptic couplings; such couplings would synchronize the wave-like \nactivities  of two  synfire  chains,  in  much  the same  way  as  coupled  oscillators  lock \n\n\f844 \n\nE.  Bienenstock, S.  Geman and D.  Potter \n\ntheir  phases.  Recursiveness  of compositionality could,  in  principle,  arise from  the \nfurther  binding of these composite structures. \n\nAcknow ledgements \n\nSupported by the Army Research Office (DAAL03-92-G-0115), the National Science \nFoundation (DMS-9217655), and the Office of Naval  Research (N00014-96-1-0647). \n\nReferences \n\nAbeles,  M.  (1991)  Corticonics:  Neuronal circuits of the cerebral  cortex,  Cambridge \n\nUniversity Press. \n\nBienenstock, E. (1991)  Notes on the growth of a composition machine, in Proceed(cid:173)\n\nings  of the  Royaumont Interdisciplinary  Workshop  on  Compositionality  in \nCognition and Neural Networks-I, D. Andler, E. Bienenstock, and B. Laks, \nEds., pp.  25--43.  (1994)  A Model  of Neocortex.  Network:  Computation  in \nNeural  Systems,  6:179-224.  (1996)  Composition, In  Brain  Theory:  Biolog(cid:173)\nical  Basis  and  Computational  Principles,  A.  Aertsen  and  V.  Braitenberg \neds.,  Elsevier,  pp  269-300. \n\nBienenstock,  E.,  and  Geman,  S.  (1995)  Compositionality  in  Neural  Systems,  In \nThe  Handbook  of  Brain  Theory  and  Neural  Networks,  M.A.  Arbib  ed., \nM.I.T./Bradford Press, pp 223-226. \n\nCover,  T.M.,  and  Thomas,  J.A.  (1991)  Elements  of Information  Theory,  Wiley \n\nand Sons,  New York. \n\nDamasio,  A.  R.  (1989)  Time-locked multiregional retroactivation:  a  systems-level \nproposal  for  the  neural  substrates  of  recall  and  recognition,  Cognition, \n33:25-62. \n\nFodor, J .A., and Pylyshyn, Z.W.  (1988) Connectionism and cognitive architecture: \n\na  critical analysis,  Cognition,  28:3-71. \n\nGeman,  S., Potter,  D.,  and  Chi,  Z.  (1996)  Compositional  Systems,  Technical  Re(cid:173)\n\nport, Division of Applied  Mathematics,  Brown University. \n\nLaplace,  P.S.  (1812)  Esssai  philosophique  sur les  probabiliUs.  Translation of Tr(cid:173)\n\nuscott and Emory,  New York,  1902. \n\nPotter,  D.  (1997)  Compositional  Pattern  Recognition,  PhD  Thesis,  Division  of \n\nApplied Mathematics, Brown University,  In preparation. \n\nRissanen,  J.  (1989)  Stochastic  Complexity  in  Statistical  Inquiry  World  Scientific \n\nCo,  Singapore. \n\nSillito,  A.M.,  Jones,  H.E,  Gerstein,  G.L.,  and  West,  D.C.  (1994)  Feature-linked \nsynchronization of thalamic relay cell  firing  induced  by feedback  from  the \nvisual cortex,  Nature,  369:  479-482 \n\nvon der Malsburg,  C.  (1981)  The  correlation theory  of brain function.  Internal re(cid:173)\nport 81-2,  Max-Planck Institute for  Biophysical Chemistry,  Dept.  of Neu(cid:173)\nrobiology,  Gottingen,  Germany. \n(1987)  Synaptic  plasticity  as  a  basis  of \nbrain  organization,  in  The  Neural  and  Molecular  Bases  of Learning  (J.P. \nChangeux and M. Konishi,  Eds.), John Wiley and  Sons, pp.  411--432. \n\n\f", "award": [], "sourceid": 1327, "authors": [{"given_name": "Elie", "family_name": "Bienenstock", "institution": null}, {"given_name": "Stuart", "family_name": "Geman", "institution": null}, {"given_name": "Daniel", "family_name": "Potter", "institution": null}]}