{"title": "The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving", "book": "Advances in Neural Information Processing Systems", "page_first": 17, "page_last": 23, "abstract": null, "full_text": "The Interplay of Symbolic and Subsymbolic \n\nProcesses \n\nin Anagram Problem Solving \n\nDavid B. Grimes and Michael C. Mozer \n\nDepartment of Computer Science and Institute of Cognitive Science \n\nUniversity of Colorado, Boulder, CO 80309-0430 USA \n\n{gr imes ,mo z er}@c s .co l ora do .edu \n\nAbstract \n\nAlthough connectionist models have provided insights into the nature of \nperception and motor control, connectionist accounts of higher cognition \nseldom go  beyond an  implementation of traditional  symbol-processing \ntheories.  We  describe  a connectionist constraint satisfaction  model  of \nhow  people  solve  anagram problems.  The  model exploits  statistics  of \nEnglish  orthography,  but  also  addresses  the  interplay  of  sub symbolic \nand  symbolic  computation  by  a  mechanism  that  extracts  approximate \nsymbolic representations (partial orderings of letters) from sub symbolic \nstructures  and  injects  the  extracted  representation back into  the  model \nto  assist  in  the  solution  of the  anagram.  We  show  the  computational \nbenefit of this extraction-injection process and discuss its relationship to \nconscious mental processes and  working memory.  We  also  account for \nexperimental data concerning the difficulty of anagram solution based on \nthe orthographic structure of the anagram string and the target word. \n\nHistorically,  the  mind  has  been  viewed  from  two  opposing  computational perspectives. \nThe  symbolic  perspective  views  the  mind  as  a  symbolic  information processing engine. \nAccording  to  this  perspective,  cognition  operates  on  representations  that  encode  logical \nrelationships among discrete  symbolic elements,  such  as  stacks  and  structured trees,  and \ncognition involves basic operations such as  means-ends analysis  and best-first search.  In \ncontrast,  the  subsymbolic perspective views  the  mind  as  performing statistical inference, \nand involves basic operations such as constraint-satisfaction search.  The data structures on \nwhich these operations take place are numerical vectors. \n\nIn some domains of cognition, significant progress has been made through analysis from \none computational perspective or the other. The thesis of our work is that many of these do(cid:173)\nmains might be understood more completely by focusing on the  interplay of subsymbolic \nand  symbolic information processing.  Consider the higher-cognitive domain  of problem \nsolving.  At an  abstract level of description, problem solving tasks can readily be formal(cid:173)\nized in  terms  of symbolic representations  and  operations.  However,  the  neurobiological \nhardware that underlies human cognition appears to be  subsymbolic-representations are \nnoisy and  graded,  and  the brain  operates and  adapts in  a continuous fashion  that is  diffi(cid:173)\ncult to characterize in discrete symbolic terms.  At some level-between the computational \nlevel  of the  task  description  and  the  implementation level  of human  neurobiology-the \nsymbolic and  subsymbolic accounts must come into contact with  one another.  We  focus \non this point of contact by proposing mechanisms by  which symbolic representations can \nmodulate sub symbolic processing, and mechanisms by which subsymbolic representations \n\n\fare made symbolic. We conjecture that these mechanisms can not only provide an account \nfor the  interplay of symbolic and sub symbolic processes  in cognition, but that  they  form \na sensible computational strategy  that outperforms purely subsymbolic computation, and \nhence, symbolic reasoning makes sense from an evolutionary perspective. \n\nIn this paper, we apply our approach to a high-level cognitive task, anagram problem solv(cid:173)\ning.  An  anagram is  a nonsense  string  of letters  whose letters  can be rearranged to  form \na word.  For example, the solution to  the  anagram puzzle RYTEHO  is  THEORY.  Anagram \nsolving is  a interesting task because it taps  higher cognitive abilities and issues of aware(cid:173)\nness, it has a tractable state space, and interesting psychological data is available to model. \n\n1  A Sub symbolic Computational Model \n\nWe  start by presenting a purely subsymbolic model  of anagram processing.  By  subsym(cid:173)\nbolic,  we  mean  that the  model  utilizes  only  English  orthographic statistics  and  does  not \nhave access to an English lexicon. We will argue that this model proves insufficient to ex(cid:173)\nplain human performance on anagram problem solving. However, it is a key component of \na hybrid symbolic-subsymbolic model  we propose, and is thus described in detail. \n\n1.1  Problem Representation \n\nA computational  model  of anagram  processing  must represent letter orderings.  For ex(cid:173)\nample, the model must be capable of representing a solution such as  <THEORY>,  or any \npermutation of the  letters  such as  <RYTEHO>.  (The  symbols \"<\" and  \">\" will be used \nto  delimit the  beginning  and  end  of a string,  respectively.)  We  adopted  a representation \nof letter strings in  which a string is  encoded by the  set of letter pairs  (hereafter, bigrams) \ncontained in the string; for example, the bigrams in  <THEORY> are: <T, TH,  HE,  EO,  OR, \nRY,  and Y>.  The delimiters <  and  >  are treated as  ordinary symbols of the alphabet.  We \ncapture letter pairings in a symbolic letter-ordering matrix, or symbolic ordering for short. \nFigure  lea) shows the matrix, in which the rows indicate the first letter of the bigram, and \nthe  columns indicate the  second.  A cell  of the  matrix contains a value of I  if the  corre(cid:173)\nsponding bigram is present in the string.  (This matrix formalism and all procedures in the \npaper can be extended to handle strings with repeated letters, which we do not have space to \ndiscuss.) The matrix columns and rows can be thought of as consisting of all letters from  A \nto z, along with the delimiters <  and>. However, in the Figure we have omitted rows and \ncolumns corresponding to  letters  not present in the  anagram.  Similarly,  we  have  omitted \nthe <  from the column space and the> from row space, as they could not by definition be \npart of any bigram.  The seven bigrams indicated by  the seven ones in the Figure uniquely \nspecify the string THEORY. \n\nAs  we've described the matrix, cells contain the truth  value of the proposition that a par(cid:173)\nticular bigram appears in the  string being represented.  However,  the  cell  values  have  an \ninteresting alternative interpretation:  as  the probability that a particular bigram is  present. \nFigure  l(b) illustrates  a  matrix  of this  sort,  which  we  call  a subsymbolic letter ordering \nmatrix, or subsymbolic ordering for short.  In the Figure, the bigram TH  occurs with prob(cid:173)\nability  0.8.  Although  the  symbolic  orderings  are  obviously  a subset  of the  sub symbolic \norderings, the two representations play critically disparate roles in  our model, and thus are \ntreated as  separate entities. \n\nTo  formally  characterize symbolic  and  subsymbolic ordering matrices,  we  define a mask \nvector,  /-\u00a3,  having N  = 28 elements,  corresponding to  the  26  letters  of the  alphabet plus \nthe  two  delimiters.  Element i  of the  mask,  /-\u00a3i,  is  set  to  one  if the  corresponding letter \nappears in the anagram string and zero if it does not.  In both the symbolic and sub symbolic \norderings, the matrices are constrained such that elements in row i  and column i  must sum \n\n\fE  H  0  R  T  Y  > \n1  0  0 \n\n<  0  0  0  0 \nE \nH \n\n0  0 \n\n1  0  0  0  0 \n\n1  0  0  0  0  0  0 \n\n0 \n\n0  0  0 \n\n1  0  0  0 \n\nR  0  0  0  0  0 \nT  0 \nY  0  0  0  0  0  0 \n\n1  0  0  0  0  0 \n\n1  0 \n\n1 \n\n(a) \n\nE  H  0  R  T  Y  > \n.2  0 \n\n.2  0 \n\n.6 \n\n<  0  0 \n.2  0 \nE \nH \n\n.6  0 \n\n.1 \n\n0 \nR  0  0 \nT \nY  0  0 \n\n.1 \n\n.3 \n\n.3 \n\n.1  0 \n\n.1 \n\n.3  0  0 \n\n.1  0 \n\n.2  0 \n\n.5 \n\n.1  0 \n\n.1 \n\n.1  0 \n\n.2 \n\n.7  0 \n\n.8  0 \n\n.1  0  0  0 \n\n.1 \n\n.1  0  0 \n\n.8 \n\n(b) \n\nE  H  0  R  T  Y  > \n<  0  0  0  0 \n1  0  0 \nE  0  0  0  0  0  0  0 \nH  0  0  0  0  0  0  0 \n0  0  0  0  0  0  0 \n0 \nR  0  0  0  0  0 \n1  0 \nT  0 \n1  0  0  0  0  0 \nY  0  0  0  0  0  0  0 \n\n(c) \n\nFigure 1:  (a)  A symbolic letter-ordering  matrix  for  the  string  THEORY.  (b)  A subsymbolic  letter(cid:173)\nordering  matrix  whose cells  indicate the  probabilities that particular bigrams  are  present in  a letter \nstring.  (c)  A symbolic  partial letter-ordering matrix, formed  from  the  symbolic  ordering  matrix by \nsetting to zero a subset of the elements, which are highlighted in grey. The resulting matrix represents \nthe partial ordering { <TH, RY  }. \n\nto J.Li.  If one extracts all rows and columns for which J.Li  = 1 from a symbolic ordering, as \nwe have done in Figure l(a), a permutation matrix is obtained. If one extracts all rows and \ncolumns for which J.Li  = 1 from a sub symbolic ordering, as  we  have done in Figure  l(b), \nthe resulting matrix is  known as  doubly stochastic, because each row and column  vector \ncan each be interpreted as a probability distribution. \n\n1.2  Constraint Satisfaction Network \n\nA simple computational model can be conceptualized by considering each cell in the sub(cid:173)\nsymbolic ordering matrix  to  correspond to  a  standard connectionist unit,  and to  consider \neach cell value as  the activity  level of the unit.  In this  conceptualization, the goal  of the \nconnectionist network is  to obtain a pattern of activity corresponding to the solution word, \ngiven the anagram.  We  wish for the model to rely solely on orthographic statistics of En(cid:173)\nglish, avoiding lexical knowledge at this stage.  Our premise is that an interactive model-a \nmodel that allows  for top-down lexical knowledge to  come in contact with the bottom-up \ninformation about the anagram-would be too powerful;  i.e.,  the model would be super(cid:173)\nhuman  in  its  ability  to  identify  lexical  entries  containing a  target  set  of letters.  Instead, \nwe conjecture that a suitable model of human performance should be primarily bottom-up, \nattempting to order letters without the benefit of the lexicon.  Of course, the task cannot be \nperformed without a lexicon, but we defer discussion of the role of the lexicon until we first \npresent the core connectionist component of the model. \n\nThe connectionist model  is  driven  by  three  constraints:  (1)  solutions  should contain bi(cid:173)\ngrams  with  high  frequency  in  English,  (2)  solutions  should  contain  trigrams  with  high \nfrequency in English, and  (3) solutions should contain bigrams that are consistent with the \nbigrams in  the original anagram.  The first  two  constraints attempt to  obtain English-like \nstrings.  The  third constraint is  motivated by  the  observation that anagram solution time \ndepends  on the arrangement of letters in  the  original anagram (e.g., Mayzner &  Tresselt, \n1959).  The  three constraints are embodied by  a constraint-satisfaction network with  the \nfollowing harmony function : \n\nH  = L \n\nlj \n\nf3ijPij  + W  LTijkPijPjk + ~ LPijSij \n\nljk \n\nlj \n\n(1) \n\nwhere Pij  denotes the  value of the  cell  corresponding to  bigram ij, f3ij  is  monotonically \nrelated to  the frequency  of bigram ij in  English,  Tijk  is  monotonically related to the fre(cid:173)\nquency of trigram ijk in  English,  Sij  is  1 if the original anagram contained bigram ij or \no otherwise,  and W  and  ~ are model parameters that specify  the relative weighting of the \ntrigram and unchanged-ordering constraints, respectively. \n\n\fAnagram \n\nConstraint-Satisfacti on \n\nExtraction \n\nProcess \n\nE  H  0  R  T  Y  > \n<  0  0\n2 06 2 0 \nE  20  J \nJ  101  \nIt  60  J  0 01 0  \no \nI \nROO  10210  \nT \nY  0 \n\n1  0  0  0 \nI  0  0  B \n\nI  2  0  5 \n\nI  0 \n\nI  8  0 \nI \n\n0 \n\nNetwork \n\nSubsymbolic \n\nMatrix \n\nSolved? (YIN) 4 \n\nLexical Verification \n\n<  0  0  0  0 \nI  00 \nEOO  I  0 000  \n\nHI  0  0  0  0  0  0 \no  0  0  0  1  0  0  0 \nROO  0 00   I  0 \nT  0  1  0  0  0  0  0 \nY  0  0  0  0  0  0 \nI \n\nSymbolic \nMatrix \n\nFigure 2:  The Iterative Extraction-Injection Model \n\nThe harmony function  specifies  a measure of goodness of a given  matrix in terms  of the \ndegree to which the three sets  of constraints are satisfied.  Running the connectionist net(cid:173)\nwork corresponds  to  searching for  a local  optimum in  the  harmony function.  The  local \noptimum can be found by gradient ascent, i.e., defining a unit-update rule that moves up(cid:173)\nhill in harmony.  Such a rule can be obtained via the  derivative of the  harmony function: \nA \nI....l.Pij  =  E {}Pi;  \u2022 \n\n{}H \n\nAlthough the  update rule ensures that harmony will  increase over time,  the network state \nmay violate the conditions of the doubly stochastic matrix by allowing the Pij to take on val(cid:173)\nues outside of [0, 1], or by failing to  satisfy the row and column constraints.  The procedure \napplied to enforce the row and column constraints involves renormalizing the activities af(cid:173)\nter each harmony update to bring the activity pattern arbitrarily close to a doubly-stochastic \nmatrix.  The procedure,  suggested by Sinkhorn (1964), involves  alternating row  and col(cid:173)\numn normalizations  (in  our case to  the  values  of the  mask vector).  Sinkhorn proved that \nthis  procedure will asymptotically converge on  a doubly  stochastic matrix.  Note  that the \nSinkhorn normalization  procedure must operate at  a  much  finer  time  grain  than  the  har(cid:173)\nmony updates, in order to ensure that the updates do not cause the state to wander from the \nspace of doubly stochastic matrices. \n\n2  The Iterative Extraction-Injection Model \n\nThe constraint-satisfaction network we just described is  inadequate as  a model of human \nanagram problem solving for  two  principle reasons.  First,  the  network output generally \ndoes not correspond to a symbolic ordering, and hence has no immediate interpretation as a \nletter string.  Second, the network has no access to a lexicon so it cannot possibly determine \nif a candidate solution is a word.  These two concerns are handled by introducing additional \nprocessing components to the model. The components-called extraction, verification, and \ninjection-bring subsymbolic  representations  of the  constraint-satisfaction  network  into \ncontact with the symbolic realm. \n\nThe extraction component converts a sub symbolic ordering-the output of the constraint(cid:173)\nsatisfaction network-into a symbolic ordering.  This  symbolic  ordering serves  as a can(cid:173)\ndidate solution to the anagram.  The verification component queries the lexicon to retrieve \nwords that match or are very close to the candidate solution. If no lexical item is retrieved \nthat  can  serve  as  a  solution,  the  injection  component feeds  the  candidate  solution  back \n\n\finto  the  constraint-satisfaction  network  in  the  form  of a  bias  on  subsequent processing, \nin  exactly  the  same  way  that  the  original anagram did  on the  first  iteration of constraint \nsatisfaction. \n\nFigure  2 shows  a high-level  sketch  of the complete model.  The intuition behind this  ar(cid:173)\nchitecture is  as  follows.  The  symbolic  ordering extracted  on  one  iteration  will  serve  to \nconstrain the  model's interpretation of the  anagram on  the  next iteration.  Consequently, \nthe feedback forces the model down one path in a solution tree.  When viewed from a high \nlevel, the model steps through a sequence of symbolic states.  The transitions  among sym(cid:173)\nbolic  states,  however,  are driven by  the  subsymbolic constraint-satisfaction network.  To \nreflect the importance of the interplay between symbolic and subsymbolic processing, we \ncall the architecture the iterative extraction-injection model. \n\nBefore describing the extraction, verification,  and injection components in detail, we  em(cid:173)\nphasize one point about the  role  of the  lexicon.  The  model makes  a strong claim about \nthe sort of knowledge used to  guide the solution of anagrams.  Lexical knowledge is  used \nonly for verification, not for generation of candidate solutions.  The limited use of the lex(cid:173)\nicon restricts the computational capabilities of the model, but in  a way  that we conjecture \ncorresponds to human limitations. \n\n2.1  Symbolic Extraction \n\nThe extraction component transforms the subsymbolic ordering matrix to an approximately \nequivalent symbolic ordering matrix.  In essence, the extraction component treats the net(cid:173)\nwork activities as probabilities that pairs of letters will be joined, and samples a symbolic \nmatrix from this probability distribution, subject to the restriction that each letter can pre(cid:173)\ncede or follow at most one other letter. \n\nIf sub symbolic matrix  element Pij  has  a  value close  to  1,  then  it is  clear that bigram ij \nshould be included in the symbolic ordering. However, if a row or column of a sub symbolic \nordering matrix is close to  uniform,  the  selection of a bigram in that row  or column  will \nbe somewhat arbitrary.  Consequently, we endow the model with the ability to  select only \nsome bigrams  and  leave  other letter pairings  unspecified.  Thus,  we  allow  the  extraction \ncomponent to consider symbolic partial orderings-i.e., a subset of the letter pairings in a \ncomplete ordering.  For example, {  <TH, RY  }  is a partial ordering that specifies that the T \nand  H  belong together in sequence at the beginning of the word, and the R  should precede \nthe  Y,  but does  not specify  the  relation  of these letter clusters to one  another or to  other \nletters of the  anagram.  Formally,  a symbolic partial ordering matrix is  a binary matrix in \nwhich the  row  and columns  sum to  values  less  than or equal to  the  corresponding mask \nvalue.  A  symbolic partial ordering can be formed  by  setting to  zero  some elements of a \nsymbolic ordering (Figure  l(c\u00bb. \n\nIn  the  context of this  task,  a sub symbolic  ordering is  best  viewed  as  a  set of parameters \nspecifying a distribution over a space P  of all possible symbolic partial ordering matrices. \nRather than explicitly generating and assigning probabilities to each element in P, our ap(cid:173)\nproach samples from the distribution specified by the subsymbolic ordering using Markov \nChain Monte Carlo (Neal, 1993). Our MCMC method obtains samples consistent with the \nbigram probabilities Pij and the row and column constraints, J-Lj. \n\n2.2  Lexical Verification \n\nLexical verification involves consulting the lexicon to identify and validate candidate solu(cid:173)\ntions.  The extracted symbolic partial ordering is fed into the lexical verification component \nto identify a set of words, each of which is consistent with the partial ordering.  By consis(cid:173)\ntent, we mean the word contains all of the bigrams in the partial ordering. This set of words \n\n\f'0 \n~ 0.4 \n\nj .....  0.2 \n\n-\n\nLength 3 \nLength 4 \n- - Length 5 \n- - Length 6 \n-\nLength 7 \n\ngO.8 \n'\u00a7 \n51 \n'\" ~O.6 \n'\" '0 \n.~0.4 \n:g \ne \n\n0..0.2 \n\n\" \n\nI \n\nI \n\n,  , \n\n-\n\nExtraction-Injection \nNo  Feedback \n\n- - Random  Feedback \n- - - Continuous Feedback \n\n~L----2~0----~40-----6~O ----~80----~100 \n\nNumber of iteration s \n\n50 \nNumber of iterations \n\n100 \n\n150 \n\nFigure 3:  (a)  Probability of finding  solution for different word lengths  as  a function  of number of \niterations.  (b) Convergence of the extraction-injection model and variants of the feedback mechanism. \n\nis then checked to see if any word contains the same letters as the anagram. If so, the lexical \nverifier returns that the problem is  solved,  Otherwise, the lexical verifier indicates failure. \nBecause the list of consistent words  can be extremely large, and recalling and processing \na large number of candidates seems implausible, we limit the size of the consistent set by \nintroducing a recall parameter 'fJ  that controls the maximum size of the consistent set.  If the \nactual number of consistent words is larger, a random sample of size 'fJ  is retrieved. \n\n2.3 \n\nInjection \n\nWhen the lexical verification component fails, the symbolic partial ordering is injected into \nthe  constraint-satisfaction network, replacing the  letter ordering of the original anagram, \nand a new processing iteration begins. Were it not for new bias injected into the constraint \nsatisfaction  network,  the  constraint-satisfaction network would  produce the  same  output \nas  on  the previous iteration,  and  the  model  would  likely  become stuck without finding  a \nsolution.  In our experiments, we  show that injecting the symbolic partial ordering allows \nthe model to arrive at a solution more rapidly than other sorts of feedback. \n\n3  Results and Discussion \n\nThrough simulation of our architecture we modeled several basic findings  concerning hu(cid:173)\nman anagram problem solving. In our simulations, we define the model solution time to be \nthe number of extraction-injection iterations before the solution word is identified. \n\nFigure 3(a) shows the probability of the  model finding the  a solution as  a function of the \nnumber of iterations the model is  allowed to run and the number of letters in the word set. \nThe  data  set  consists  of 40  examples for  each  of five  different  word lengths.  The  most \nstriking  result  is  that  the  probability  of finding  a  solution  increases  monotonically  over \ntime.  It is also interesting to note that the model's asymptotic accuracy is  100%, indicating \nthat the model is  computationally sufficient to perform the  task.  Of more  significance is \nthe fact that the model exhibits the word length effect as reported in Sargent (1940), that is, \nlonger words take more time to  solve. \n\nOur model can explain other experimental results on anagram problem solving.  Mayzner \nand Tresselt (1958) found that subjects were faster to find solutions composed of high fre(cid:173)\nquency bigrams  than  solutions  composed of low  frequency  bigrams.  For example,  SHIN \ncontains  higher frequency bigrams  than  HYMN.  The  iterative extraction-injection model \nreproduced this  effect in  the  solution  time  to  two  classes  of five  five-letter  words.  Each \n\n\fword  was  presented 30 times  to  obtain  a  distribution  of solution  times.  A  mean  of 5.3 \niterations was required for solutions composed of high frequency bigrams, compared to a \nmean of 21.2 iterations for solutions composed of low frequency bigrams.  The difference \nis  statistically reliable (F(l, 8)  =  30.3,p < .001).  It is  not surprising that the model  pro(cid:173)\nduces this result, as the constraint-satisfaction network attempts to generate high frequency \npairings of letters. \n\nMayzner  and  Tresselt  (1959)  found  that  subjects  also  are  faster  to  solve  an  anagram  if \nthe anagram is composed of low frequency bigrams.  For example,  RCDA  might be recog(cid:173)\nnized as  CARD more readily than would DACR.  Our model reproduces this result as  well. \nWe  tested  the  model  with  25  four-letter  target  words  whose  letters  could  be rearranged \nto  form  anagrams  with  either low  or high bigram frequency ; each  target word  was  pre(cid:173)\nsented  30 times.  The  mean  solution time  for  low  bigram-frequency anagrams was  21.4, \nversus  27.6 for  high  bigram-frequency anagrams.  This  difference is  statistically  reliable \n(F(1,24)  = 41.4, p  <  .001).  The difference is  explained by  the  model's initial bias  to \nsearch for solutions containing bigrams in the anagram, plus the fact that the model has a \nharder time pulling apart bigrams with high frequency. \n\nSimulation results to date have focused on the computational properties of the model, with \nthe goal of showing that the iterative extraction-injection process leads to efficient solution \ntimes.  The experiments involve  testing  performance of models  with  some  aspect  of the \niterative extraction-injection model modified. Three such variants were tested: 1) the feed(cid:173)\nback connection was  removed,  2) random symbolic partial orderings were fed-back,  and \n3)  sub symbolic partial orderings were fed-back.  The experiment consisted of 125  words \ntaken from Kucera and Francis (1967) corpus, which was also used for bigram and trigram \nfrequencies. The median of25 solution times for each word/model was used to compute the \nmean solution time for the  original, no  feedback, random feedback, and continuous feed(cid:173)\nback:  13.43, 41.88,74.91 , 43.17.  The key result is  that the iterative extraction-injection \nmodel was reliably 3-5 faster than the variants, as respective F(l, l24,p < 0.001)  scores \nwere  87.8,  154.3, 99.1.  Figure 3(b) shows  the  probability that each of these four models \nfound the solution at a given time. \n\nAlthough our investigation of this  architecture is  just beginning, we  have  shown that the \nmodel can explain  some fundamental behavioral data,  and  that  surprising computational \npower arises from the interplay of symbolic and subsymbolic information processing. \n\nAcknowledgments \n\nThis work benefited from  the initial explorations and ideas of Tor Mohling.  This research  was  sup(cid:173)\nported by Grant 97-18  from  the  McDonnell-Pew  Program in  Cognitive Neuroscience,  and by NSF \naward IBN-9873492. \n\nReferences \nKucera,  H.  &  Francis,  W.  N.  (1967).  Computational  analysis  of present-day  American  English. \n\nProvidence, RI:  Brown University Press. \n\nMayzner,  M.  S.  &  TresseIt, M. E.  (1958).  Anagram solution  times:  A function  of letter and word \n\nfrequency.  Journal of Experimental Psychology, 56, 376-379. \n\nMayzner, M.  S.  &  Tresselt, M. E. (1959). Anagram solution times:  A function of transitional proba(cid:173)\n\nbilities.  Journal of Experimental Psychology, 63, 510-513. \n\nNeal, R.  M.  (1993).  Probabilistic inference using Markov  Chain Monte  Carlo Methods.  Technical \n\nReport CRG-TR-93-1 , Dept.  of Computer Science, University of Toronto. \n\nSargent, S.  Stansfeld (1940).  Thinking  Processes at Various  Levels of Difficulty.  Archives  of Psy(cid:173)\n\nchology 249. New  York. \n\nSinkhorn, Richard (1964).  A Relationship Between Arbitrary Positive Matrices and Doubly Stochas(cid:173)\n\ntic Matrices. Annals of Mathematical Statistics, Vol.  35, No. 2. pp. 876-879. \n\n\f", "award": [], "sourceid": 1819, "authors": [{"given_name": "David", "family_name": "Grimes", "institution": null}, {"given_name": "Michael", "family_name": "Mozer", "institution": null}]}