{"title": "Learning Temporally Persistent Hierarchical Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 824, "page_last": 830, "abstract": null, "full_text": "Learning temporally persistent \n\nhierarchical representations \n\nDepartment of Psychology \n\nMcMaster University \n\nSuzanna Becker \n\nHamilton,  Onto  L8S  4K1 \n\nbecker@mcmaster.ca \n\nAbstract \n\nA biologically motivated model of cortical self-organization is  pro(cid:173)\nposed.  Context  is  combined  with  bottom-up  information  via  a \nmaximum  likelihood  cost  function.  Clusters of one  or more  units \nare modulated by a common contextual gating Signal;  they thereby \norganize themselves into mutually supportive predictors of abstract \ncontextual features.  The model was tested in its ability to discover \nviewpoint-invariant classes on a  set of real image sequences of cen(cid:173)\ntered,  gradually  rotating faces.  It  performed  considerably  better \nthan  supervised  back-propagation  at  generalizing  to  novel  views \nfrom  a  small number of training examples. \n\n1  THE ROLE  OF  CONTEXT \n\nThe importance of context  effects l  in  perception  has  been  demonstrated  in  many \ndomains.  For  example,  letters  are  recognized  more  quickly  and  accurately  in  the \ncontext  of words  (see  e.g.  McClelland  &  Rumelhart,  1981),  words  are  recognized \nmore efficiently  when  preceded  by  related  words  (see  e.g.  Neely,  1991),  individual \nspeech utterances are more intelligible in the context of continuous speech, etc.  Fur(cid:173)\nther, there is  mounting evidence that neuronal responses are modulated by context. \nFor example,  even  at the level  of the  LGN  in  the thalamus,  the primary source of \nvisual input  to the  cortex,  Murphy &  Sillito  (1987)  have  reported cells  with  \"end(cid:173)\nstopped\"  or  length-tuned  receptive  fields  which  depend  on  top-down  inputs  from \nthe  cortex.  The end-stopped  behavior disappears  when  the  top-down  connections \nare removed, suggesting that the cortico-thalamic connections are providing contex(cid:173)\ntual modulation  to the LGN.  Moving a  bit  higher up  the visual hierarchy,  von  der \nHeydt et al.  (1984) found  cells  which  respond to  \"illusory contours\", in  the absence \nof  a  contoured  stimulus  within  the  cells'  classical  receptive  fields.  These  exam(cid:173)\nples  demonstrate that  neuronal responses  can  be  modulated  by secondary sources \nof information  in  complex  ways,  provided  the  information  is  consistent  with  their \nexpected or preferred input. \n\n1 We  use  the term context  rather loosely  here to  mean  any secondary  source  of input. \nIt could  be from  a  different  sensory  modality,  a  different  input  channel  within  the same \nmodality,  a  temporal history of the input, or top-down information. \n\n\fLearning Temporally Persistent Hierarchical Representations \n\n825 \n\nFigure 1:  Two  sequences  of 48  by  48  pixel images  digitized  with  an  IndyCam  and prepro(cid:173)\ncessed  with  a  Sobel  edge  filter.  Eleven  views  of each  of four  to  ten faces  were  used  in  the \nsimulations  reported  here.  The  alternate  (odd)  views  of two  of the  faces  are  shown  above. \n\nWhy would  contextual modulation  be such a  pervasive  phenomenon?  One obvious \nreason is  that if context  can  influence  processing,  it  can  help  in  disambiguating or \ncleaning  up  a  noisy  stimulus.  A  less  obvious  reason  may  be  that  if  context  can \ninfluence  learning,  it may lead to more compact representations, and hence  a  more \npowerful  processing  system.  To  illustrate,  consider  the  benefits  of incorporating \ntemporal history into an unsupervised classifier.  Given  a  continuous sensory signal \nas  input,  the  classifier  must  try  to  discover  important  partitions  in  its  training \ndata.  If it can discover features  that are  temporally  persistent,  and thus insensitive \nto transformations in  the input, it should be able to represent the signal compactly \nwith a  small set offeatures.  FUrther, these features are more likely to be associated \nwith  the identity of objects rather than lower-level attributes. \nHowever,  most  classifiers  group  patterns  together on  the  basis  of spatial overlap. \nThis may be reasonable if there is very little shift or other form of distortion between \none  time  step and the  next,  but is  not a  reasonable assumption  about  the sensory \ninput to the cortex.  Pre-cortical stages of sensory processing, certainly in the visual \nsystem (and probably in other modalities), tend to remove low-order correlations in \nspace and  time,  e.g.  with  centre-surround filters.  Consider the image  sequences  of \ngradually rotating faces in Figure 1.  They have been preprocessed by a simple edge(cid:173)\nfilter,  so that successive views of the same face have relatively little pixel overlap.  In \ncontrast, identical views  of different  faces  may  have  considerable overlap.  Thus,  a \nclassifier such as k-means, which groups patterns based on their Euclidean distance, \nwould  not  be  expected to do well  at classifying these  patterns.  So  how  are people \n(and in fact very young children)  able to learn to classify a virtually infinite number \nof objects based on relatively brief exposures?  It is argued here that the assumption \nof temporal  persistence  is  a  powerful  constraining factor  for  achieving  this,  and  is \none which  may be used  to advantage in  artificial neural networks  as  well.  Not only \ndoes it lead to the development of higher-order feature analyzers, but it can result in \nmore  compact  codes  which  are important for  applications  like  image  compression. \nFurther,  as  the  simulations  reported  here  show,  improved  generalization  may  be \nachieved  by  allowing  high-level  expectations  (e.g.  of class  labels)  to  influence  the \ndevelopment of lower-level feature  detectors. \n\n2  THE MODEL \nCompetitive  learning  (for  a  review,  see  Becker  &  Plumbley,  1996)  is  considered \nby  many  to  be  a  reasonably  strong  candidate  model  of cortical  learning.  It can \nbe  implemented,  in  its  simplest  form,  by  a  Hebbian  learning  rule  in  a  network \n\n\f826 \n\nS.  Becker \n\nwith  lateral  inhibition.  However,  a  major limitation  of competitive  learning,  and \nthe majority of unsupervised learning procedures (but see the Discussion section), is \nthat they treat the input as a set of independent identically distributed (iid) samples. \nThey fail  to take into account context.  So they are unable to take advantage of the \ntemporal continuity in signals.  In contrast, real sensory signals may be better viewed \nas  discretely sampled, continuously varying time-series  rather than iid  samples. \n\nThe  model  described  here  extends  maximum  likelihood  competitive  learning \n(MLCL)  (Nowlan,  1990)  in  two  important  ways:  (i)  modulation  by  context,  and \n(ii)  the  incorporation of several  \"canonical features\"  of neocortical  circuitry.  The \nresult  is  a  powerful framework for  modelling cortical self-organization. \nMLCL retains  the  benefits of competitive learning mentioned  above.  Additionally, \nit  is  more easily extensible  because it  maximizes a global cost function: \n\nL  =  t, log [t, ~iYi(a) 1 \n\n(1) \n\nwhere the 7r/s are positive weighting coefficients which sum  to one, and the Yi'S  are \nthe clustering unit  activations: \n\ny/ a ) \n\nN(fl a ), Wi, ~i) \n\n(2) \nwhere j(a) is the input vector for pattern a, and NO is the probability of j(a)  under \na  Gaussian  centred  on  the  ith  unit's  weight  vector,  Wi,  with  covariance  matrix \n2:i .  For  simplicity,  Nowlan  used  a  single  global  variance  parameter for  all  input \ndimensions,  and  allowed  it  to  shrink  during  learning.  MLCL  actually  maximizes \nthe log likelihood  (L) of the data under a mixture of Gaussians model,  with mixing \nproportions equal to the 7r'S.  L  can  be  maximized  by  online gradient ascent 2  with \nlearning rate E: \n\nD..Wij  =  E \n\n()L  =  E \"'\" \n()Wij  ~ L:k 7rk  Yk(a) \n\n7ri  Yi(a) \n\n(I/ a )  - Wij) \n\n(3) \n\nThus,  we  have  a  Hebbian  update  rule  with  normalization  of  post-synaptic  unit \nactivations (which could be accomplished by shunting inhibition) and weight decay. \n2.1  Contextual modulation \nTo  integrate a  contextual information  source into MLCL,  our first  extension  is  to \nreplace the mixing proportions (7r/s)  by the outputs of contextual gating  units (see \nFigure 2).  Now  the 7r/s  are computed  by  separate processing units  receiving  their \nown  separate  stream of input,  the  \"context\".  The  role  of the  gating  signals  here \nis  analagous to that of the gating network in  the  (supervised)  \"competing experts\" \nmodel  (Jacobs  et  al.,  1991),3  For  the  network  shown  in  Figure  2,  the  context  is \nsimply a time-delayed version of the outputs of a  module (explained in the next sub(cid:173)\nsection).  However,  more general forms  of context are possible  (see  Discussion) .  In \nthe simulations reported here,  the  context  units  computed their outputs according \nto a  softmax function  of their weighted  summed inputs  Xi: \n\n(a)  _ \n-\n\n7r . \n\nZ \n\nex;(a) \n\n---..,.--:-\nL:j  eXj(a) \n\n(4) \n\nWe  refer  to the action  of the  gating units  (the  7r/s)  as  modulatory  because  of the \n\n2Nowlan  (1990)  used  a  slightly  different  online  weight  update  rule  that  more  closely \n\napproximates the batch update rule of the EM algorithm  (Dempster et al.,  1977) \n\n3 However , in the competing experts architecture,  both the experts and gating network \nreceive  a  common source  of input.  The competing experts  model could  be thought  of as \nfitting a  mixture model of the training signal. \n\n\fLearning Temporally Persistent Hierarchical Representations \n\n827 \n\n~~~~~ .... ta \n\n(f) \n\nI ....... .nta \n\nFigure 2:  The  architecture  used  in the  simulations  reported  here.  Except  where  indicated, \nthe  gating  units  received  all  their  inputs  across  unit  delay  lines  with fixed  weights  of 1. o. \n\nmultiplicative  effect  they  have  on  the  activities  of the  clustering  units  (the  y/s). \nThis  multiplicative  interaction  is  built  into  the  cost  function  (Equation  1),  and \nconsequently,  arises  in  the  learning  rule  (Equation  3).  Thus,  clustering  units  are \nencouraged  to  discover  features  that  agree  with  the  current  context  signal  they \nreceive.  If their  context  signal  is  weak  or  if  they  fail  to  capture  enough  of  the \nactivation  relative  to  the  other  clustering  units,  they  will  do  very  little  learning. \nOnly  if a  unit's  weight  vector  is  sufficiently  close  to the  current  input  vector  and \nit's corresponding gating unit is  strongly active will  it do substantial learning. \n\n2.2  Modular, hierarchical  architecture \nOur second modification to MLCL is required to apply it to the architecture shown \nin  Figure  2,  which  is  motivated  by  several ubiquitous  features  of the  neocortex:  a \nlaminar structure, and a functional organization into  \"cortical clusters\"  of spatially \nnearby columns  with  similar receptive field  properties  (see e.g.  Calvin,  1995).  The \ncortex,  when  flattened  out,  is  like  a  large  six-layered  sheet.  As  Calvin  (1995,  pp. \n269)  succinctly  puts it,  \" ...  the  bottom  layers are like  a  subcortical  'out'  box,  the \nmiddle  layer like  an  'in'  box,  and  the  superficial  layers  somewhat  like  an  'interof(cid:173)\nfice'  box  connecting  the  columns  and  different  cortical  areas\".  The  middle  and \nsuperficial  layer  cells  are  analagous  to  the  first-layer  clustering  units  and  gating \nunits respectively.  Thus, we  propose that the superficial cells  may be providing the \ncontextual  modulation.  (The  bottom  layers  are  mainly  involved  in  motor output \nand are  not  included in  the  present  model.)  To induce  a  functional  modularity in \nour model  analogous to cortical clusters,  clustering units  within  the same  module \nreceive a  shared  gating  signal.  The cost  function  and learning rule  are now: \n\nL \n\nn \n\n~ log  ~ 1r~a)  l  ~Yi/a) \n\nI \n\n1 \n\n1 \n\n~  1r(a)  Yi .(a) \n\n=  E L..J  2: \n\na \n\nrYqr \n\n( \n\n) \nIk(a)  -Wijk \n\n(a) \n\n[m \n(~) i \n\nq1rq \n\n(5) \n\n(6) \n\nThus, units in the same module form predictions y~j) of the same contextual feature \n1r~a).  Fortunately, there is a disincentive to all of them discovering identical weights: \nthey would  then do poorly at modelling the input. \n\n3  EXPERIMENTS \nAs  a  simple  test  of this  model,  it  was  first  applied  to a  set  of image  sequences  of \nfour centered, gradually rotating faces  (see Figure 1), divided into training and test \n\n\f828 \n\nS.  Becker \n\nno context, 4 faces:  Layer 1 \nLayer  1 \ncontext, 4 faces: \nLayer 2 \nLayer 1 \nLayer 2 \n\ncontext,  10 faces: \n\nTraining Set \n\n59.2 (2.4) \n88.4 (3.9) \n88.8 (4.0) \n96.3 (1.2) \n91.8 (2.4) \n\nTest Set \n65  (3.5) \n74.5  (4.2) \n72.7  (4.8) \n71.0  (3.0) \n70.2  (4.3) \n\nTable  1 :  Mean  percent  (and  standard  error)  correctly  classified  faces ,  across  10  runs, \nfor  unsupervised  clustering  networks  trained  for  2000  iterations  with  a  learning  rate  of \n0.5,  with  and  without  temporal  context.  Layer  1:  clustering  units.  Layer 2:  gating  units. \nPerformance  was  assessed  as  follows:  Each  unit  was  assigned  to  predict  the  face  class  for \nwhich  it  most  frequently  won  (was  the  most  active).  Then  for  each  pattern,  the  layer's \nactivity  vector was  counted  as  correct  if the  winner correctly  predicted  the  face  identity. \n\nsets  by  taking alternating views.  It was  predicted  that the clustering units  should \ndiscover \"features\" such as individual views of specific faces.  Further, different views \nof the same face  should be clustered together within a module because they will  be \nobserved in  the same temporal context,  while  the gating units should discover  the \nidentity of faces,  independent of viewpoint. \nFirst,  the  baseline  effect  of the  temporal  context  on  clustering  performance  was \nassessed by comparing the network shown in Figure 2 to the same network with the \ninput  connections  to the gating layer removed.  The latter is  equivalent  to MLCL \nwith  fixed,  equal  7ri'S .  The results  are  summarized  in  Table  1.  As  predicted,  the \ntemporal  context  provides  incentive  for  the  clustering  units  to  group  successive \ninstances of the same face  together, and the gating layer can therefore do very well \nat classifying the faces  with a  much smaller number of units - i.e., independently of \nviewpoint.  In contrast, the clustering units without  the contextual signal are more \nlikely  to group together similar views  of different  people's faces . \nNext,  to explore the scaling properties of the model,  a  network like  the one  shown \nin  Figure 2 but with  10 modules was presented with a set of 10 faces,  11  views each. \nAs  before,  the odd-numbered views  were trained on  and the even-numbered  views \nwere  tested  on.  To  achieve  comparable  performance  to  the  smaller  network,  the \nweights on the self-pointing connections on the gating units were increased from  1.0 \nto 3.0,  which  increased the time constant of temporal agveraging.  The model then \nhad no  difficulty scaling up to the larger training set  size,  as shown in  Table 1. \nBased on the unexpected success  of this  model,  it's classification performance  was \nthen  compared against  supervised  back-propagation  networks on  the four  face  se(cid:173)\nquences.  The first supervised network we tried was a simple recurrent network with \nessentially the same architecture:  one layer of Gaussian units followed  by one layer \nof recurrent  soft max  units  with  fixed  delay  lines.  Over  ten  runs  of each  model, \nalthough the unsupervised classifier did worse on the training set  (it averaged 88% \nwhile  the  supervised  model  always  scored  100%  correct),  it  outperformed  the  su(cid:173)\npervised  model  in  its  generalization  ability  by  a  considerable  margin  (it  averaged \n73%  while  the supervised model averaged 45%  correct) . \nFinally,  a  feedforward  back-propagation  network  with  sigmoid  units  was  trained. \nThe following  constraint on  the hidden layer activations,  hj(t):  4 \n\nhidden  state cost =  ,\\ l:)hj(t) - hj(t - 1\u00bb2 \n\nj \n\n4 As  Geoff  Hinton  pointed  out,  the  above  constraint,  if normalized  by  the  variance, \n\nmaximizes the mutual information between hidden unit states at adjacent  time steps. \n\n\fLearning Temporally Persistent Hierarchical Representations \n\n829 \n\nTraining  Set  Performance \n\n1000 \n\nU  BOO \nQl \nL.. \nL.. \n0 \nU \n\n60.0 \n\n-\n\nC \nQl \nu \nL.. \nQl \na... \n\n400 \n\nO. \n\nTest  Set  Performance \n\n100.0 -\n\nu \nQl  BOO \nL.. \nL.. \n0 \nU \n\n60.0 \n\na \n\n- - -.  1 \n\nC \nQl  400 \nU \nL.. \nQl \na...  20.0 \n\na \n\n-4  \n\u00b7\u00b7\u00b7\u00b7_\u00b7\u00b7 2 \n- - - .  1 \n\n1000.0 \nJOOO.O \nLearning  epoch \n\n2000.0 \n\n4000.0 \n\n1000.0 \nJOOO.O \nLearning  epoch \n\n2000.0 \n\n4000.0 \n\nFigure 3:  Learning  curves,  averaged  over five  runs, for feedforward  supervised  net with  a \ntemporal  smoothness  constraint,  for  each  of four  levels  of the  parameter >.. \n\nwas  added  to the cost  function  to encourage temporal smoothness.  As  the  results \nin Figure 3 show, a feedforward  network with no contextual input was thereby able \nto  perform as  well  as our unsupervised  model  when  it  was  constrained  to develop \nhidden layer representations that clustered temporally adjacent patterns together. \n\n4  DISCUSSION \nThe unsupervised model's markedly better ability to generalize stems from  it's cost \nfunction;  it  favors  hidden  layer  features  which  contribute  to  temporally  coherent \npredictions at the output (gating) layer.  Multiple views of a given object are there(cid:173)\nfore  more likely to be detected by a given clustering unit in the unsupervised model, \nleading to considerably improved interpolation of novel views.  The poor generaliza(cid:173)\ntion performance of back-propagation is not just due to overtraining, as the learning \ncurves  in  Figure  3  show.  Even  with  early  stopping,  the  network  with  the  lowest \nvalue of >.  would  not have done as well  as the unsupervised network.  There is  sim(cid:173)\nply no reason why supervised back-propagation should cluster temporally adjacent \nviews  together unless  it is  explicitly forced  to do so. \n\nA  \"contextual input\"  stream  was implemented  in  the simplest  possible  way  in  the \nsimulations reported here,  using fixed  delay lines.  However, the model we  have pro(cid:173)\nposed  provides  for  a  completely general way  of incorporating arbitrary contextual \ninformation, and could equally well  integrate other sources of input.  The incoming \nweights  to the gating units could also be learned.  In fact,  the gating unit activities \nactually represent the probabilities of each clustering unit's Gaussian  model fitting \nthe  data,  conditioned  on  the  temporal  history;  hence,  the entire  model  could  be \nviewed  as  a  Hidden  Markov Model  (Geoff Hinton, personal communication).  How(cid:173)\never, current techniques for fitting HMMs are intractable if state dependencies span \narbitrarily long time intervals. \n\nThe model in its present implementation is not meant to be a realistic account of the \nway  humans learn  to recognize  faces .  Viewpoint-invariant  recognition  is  achieved, \nif at  all,  in  a  hierarchical, multi-stage system.  One  could  easily  extend  our model \nto achieve  this,  by  connecting together a  sequence of networks like  the one shown \nin  Figure 2,  each having progressively larger receptive fields. \n\nA number of other unsupervised learning rules have been proposed based on the as(cid:173)\nsumption oftemporally coherent inputs (FOldiak,  1991; Becker, 1993; Stone, 1996). \nPhillips et al. (1995) have proposed an alternative model of cortical self-organization \nthey  call  coherent  Infomax  which  incorporates  contextual  modulation. \nIn  their \nmodel,  the  outputs  from  one  processing  stream  modulate  the  activity in  another \n\n\f830 \n\ns.  Becker \n\nstream,  while  the mutual information between the two streams is  maximized. \n\nA  wide  range  of perceptual  and  cognitive  abilities  could  be  modelled  by  a  net(cid:173)\nwork that  can learn features of its primary input  in  particular contexts.  These in(cid:173)\nclude multi-sensor fusion,  feature segregation in object recognition  using top-down \ncues,  and  semantic disambiguation  in  natural language  understanding.  Finally,  it \nis  widely  believed  that  memories  are  stored  rapidly  in  the  hippocampus  and  re(cid:173)\nlated  brain  structures,  and  gradually incorporated  into  the  slower-learning cortex \nfor  long-term storage.  The model proposed here  may  be able  to explain  how  such \ninteractions between disparate information sources are learned. \nAcknowledgements \n\nThis work evolved out of discussions with Ron Racine and Larry Roberts.  Thanks to \nGeoff Hinton for  contributing several valuable insights,  as  mentioned in  the paper, \nand to Ken Seergobin for the face images.  Software was developed using the Xerion \nneural network simulation package from Hinton's lab, with programming assistance \nfrom  Lianxiang  Wang.  This  work  was  supported  by  a  McDonnell-Pew  Cognitive \nNeuroscience  research  grant  and  a  research  grant  from  the  Natural  Sciences  and \nEngineering Research  Council of Canada. \n\nReferences \nBecker,  S.  (1993).  Learning  to  categorize  objects  using  temporal  coherence.  In  S.  J. \nHanson, J.  D.  Cowan,  &  C.  L.  Giles  (Eds.),  Advances  in Neural  Information  Processing \nSystems  5 (pp.  361-368).  San Mateo,  CA:  Morgan  Kaufmann. \n\nBecker,  S.  &  Plumbley,  M.  (1996) .  Unsupervised  neural network  learning procedures for \nfeature extraction and classification.  International Journal  of Applied Intelligence,  6(3). \nCalvin,  W.  H.  (1995).  Cortical  columns,  modules,  and  Hebbian  cell  assemblies.  In  M. \nArbib  (Ed.),  The  handbook  of brain  theory  and  neural  networks.  Cambridge,  MA:  MIT \nPress. \n\nDempster,  A.  P., Laird,  N.  M.,  &  Rubin,  D.  B.  (1977).  Maximum likelihood  from  incom(cid:173)\nplete data via the EM algorithm.  Proceedings  of the  Royal Statistical Society,  B-39:1-38. \nFoldiak,  P.  (1991).  Learning invariance from  transformation sequences.  Neural  Computa(cid:173)\n\ntion,  3(2):194-200. \n\nJacobs,  R.  A.,  Jordan,  M.  I.,  Nowlan,  S.  J.,  &  Hinton,  G. E.  (1991).  Adaptive mixtures \n\nof local  experts.  Neural  Computation,  3(1):79-87. \n\nMcClelland,  J.  L.  &  Rumelhart, D.  E.  (1981).  An interactive activation  model of context \neffects  in  letter perception,  part I:  An  account  of basic  findings.  Psychological  Review, \n88:375-407. \n\nMurphy,  C.  &  Sillito,  A.  M.  (1987).  Corticofugal  feedback  influences  the  generation  of \n\nlength tuning in  the visual  pathway.  Nature,  329:727-729. \n\nNeely,  J.  (1991).  Semantic priming effects in visual word recognition:  A selective review of \ncurrent findings and theories.  In D. Besner &  G.  W.  Humphreys (Eds.),  Basic processes \nin  reading:  Visual  Word  Recognition  (pp.  264-336).  Hillsdale,  NJ:  Lawrence  Erlbaum \nAssociates. \n\nNowlan, S.  J.  (1990).  Maximum likelihood competitive learning.  In D.  S.  Touretzky (Ed.), \nNeural Information  Processing  Systems,  Vol.  2 (pp.  574-582).  San  Mateo,  CA:  Morgan \nKaufmann. \n\nPhillips,  W.  A.,  Kay,  J.,  &  Smyth, D.  (1995).  The discovery of structure by  multi-stream \n\nnetworks of local  processors  with contextual guidance.  Network,  6:225-246 . \n\nStone,  J.  (1996).  Learning  perceptually  salient  visual  parameters  using  spatiotemporal \n\nsmoothness constraints.  Neural  Computation,  8:1463-1492. \n\nvon der Heydt, R., Peterhans, E.,  & Baumgartner, G.  (1984).  Illusory contours and cortical \n\nneural responses.  Science,  224 :1260-1262. \n\n\f", "award": [], "sourceid": 1266, "authors": [{"given_name": "Suzanna", "family_name": "Becker", "institution": null}]}