{"title": "Semi-Supervised Support Vector Machines", "book": "Advances in Neural Information Processing Systems", "page_first": 368, "page_last": 374, "abstract": null, "full_text": "Semi-Supervised  Support  Vector \n\nMachines \n\nKristin  P.  Bennett \n\nDepartment of Mathematical Sciences \n\nRensselaer  Polytechnic Institute \nbennek@rpi.edu \n\nTroy,  NY 12180 \n\nDepartment of Decision  Sciences  and Engineering Systems \n\nAyhan Demiriz \n\nRensselaer  Polytechnic Institute \nTroy,  NY  12180  demira@rpi.edu \n\nAbstract \n\nWe  introduce  a  semi-supervised  support  vector  machine  (S3yM) \nmethod.  Given  a  training  set  of labeled  data  and  a  working  set \nof unlabeled  data,  S3YM  constructs  a  support vector  machine us(cid:173)\ning  both  the  training  and  working  sets.  We  use  S3 YM  to  solve \nthe  transduction  problem  using  overall  risk  minimization  (ORM) \nposed  by  Yapnik.  The  transduction  problem  is  to  estimate  the \nvalue of a  classification function at the given points in the working \nset.  This  contrasts  with  the  standard inductive  learning  problem \nof estimating the  classification  function  at  all  possible values  and \nthen  using  the  fixed  function  to  deduce  the classes  of the working \nset  data.  We  propose  a  general  S3YM  model  that minimizes both \nthe  misclassification  error  and  the  function  capacity  based  on  all \nthe available data.  We show  how the S3YM  model  for  I-norm lin(cid:173)\near  support  vector  machines  can  be  converted  to  a  mixed-integer \nprogram  and then solved exactly using  integer  programming.  Re(cid:173)\nsults  of S3YM  and  the  standard  I-norm  support  vector  machine \napproach  are  compared  on  ten  data sets.  Our  computational  re(cid:173)\nsults  support  the statistical  learning  theory  results  showing  that \nincorporating  working  data  improves  generalization  when  insuffi(cid:173)\ncient  training information is  available.  In every case,  S3YM  either \nimproved or showed no significant difference in generalization com(cid:173)\npared to the traditional approach. \n\n\fSemi-Supervised Support  Vector Machines \n\n1 \n\nINTRODUCTION \n\n369 \n\nIn  this  work  we  propose  a  method  for  semi-supervised  support  vector  machines \n(S3VM).  S3VM  are constructed  using  a  mixture of labeled  data (the training set) \nand  unlabeled data (the working set) .  The objective is  to assign class  labels to the \nworking  set  such  that  the  \"best\"  support  vector  machine  (SVM)  is  constructed. \nIf  the  working  set  is  empty  the  method  becomes  the  standl1rd  SVM  approach  to \nclassification  [20,  9,  8].  If the  training set  is  empty,  then  the  method  becomes  a \nform of unsupervised learning.  Semi-supervised learning occurs  when  both training \nand  working sets  are  nonempty.  Semi-supervised  learning for  problems with small \ntraining sets  and large working sets  is  a  form  of semi-supervised  clustering.  There \nare successful  semi-supervised  algorithms for  k-means and fuzzy  c-means clustering \n[4,  18].  Clustering is  a  potential  application for  S3VM  as  well.  When  the training \nset is large relative to the working set,  S3VM can be viewed as a method for solving \nthe  transduction  problem  according  to  the  principle  of  overall  risk  minimization \n(ORM)  posed  by Vapnik at the NIPS  1998 SVM Workshop and in [19, Chapter 10]. \nS3VM  for  ORM  is  the focus  of this  paper. \n\nIn  classification,  the  transduction  problem  is  to  estimate  the  class  of each  given \npoint in  the  unlabeled working set.  The usual  support  vector machine  (SVM)  ap(cid:173)\nproach  estimates  the entire  classification  function  using  the  principle  of statistical \nrisk  minimization  (SRM).  In  transduction,  one  estimates  the  classification  func(cid:173)\ntion at points within the working set  using information from  both the training and \nworking set  data.  Theoretically,  if there is  adequate training data to estimate the \nfunction satisfactorily,  then  SRM will  be sufficient.  We would  expect  transduction \nto  yield  no  significant  improvement  over  SRM  alone.  If,  however,  there  is  inad(cid:173)\nequate  training data,  then  ORM  may  improve generalization  on the  working  set. \nIntuitively, we would expect  ORM to yield improvements when the training sets are \nsmall or when there is  a  significant deviation between the training and working set \nsubsamples  of the  total  population.  Indeed,the  theoretical  results  in  [19]  support \nthese hypotheses. \n\nIn Section 2, we briefly review the standard SV:~\\'1 model for structural risk minimiza(cid:173)\ntion .  According  to  the  principles  of structural  risk  minimization,  SVM  minimize \nboth the empirical misclassification rate and the capacity of the classification func(cid:173)\ntion  [19,  20]  using the training data.  The capacity of the function  is  determined by \nmargin of separation between  the two  classes  based  on the training set.  ORM also \nminimizes the  both  the empirical  misclassification  rate  and  the function  capacity. \nBut the capacity of the function is determined  using  both the training and working \nsets.  In Section  3,  we  show  how  SVM  can be extended  to the semi-supervised case \nand how  mixed  integer  programming can be  used  practically to solve the resulting \nproblem.  We compare support vector machines constructed  by structural risk min(cid:173)\nimization and overall risk minimization computationally on ten problems in Section \n4.  Our computational results  support  past theoretical  results that improved gener(cid:173)\nalization can  be obtained by  incorporating working set  information during training \nwhen  there  is  a  deviation between  the working set  and  training  set  sample distri(cid:173)\nbutions.  In three of ten real-world problems the semi-supervised approach, S3VM  , \nachieved  a significant increase in generalization.  In no case did S3VM ever obtain a \nsifnificant decrease in generalization.  We conclude with a discussion of more general \nS  VM  algorithms. \n\n\f370 \n\nK.  Bennett and A. Demiriz \n\n6 \n\nClass  1 \n\n- - -- 6 ___ __ .1:>  _______ __ ______  w\u00b7 x  =  b+ 1 \n\n- - - - -- - - W\u00b7 x  =  b \n\n- - - --- - -0-------0---- - - - - - W\u00b7 x  = b - 1 \n\no \n\no \n\no \n\no \n\n0 \n\noClass -1 \n\nFigure  1:  Optimal plane maximizes margin. \n\n2  SVM using  Structural  Risk  Minimization \n\nThe  basic  SRM  task  is  to  estimate a  classification  function  f  : RN  - t  {\u00b1 I}  using \ninput-output training data from two  classes \n\n(1) \n\nThe function  f  should correctly classify unseen examples (x, Y),  i.e.  f(x)  =  y if (x, y) \nis generated from the same underlying probability distribution as the training data. \nIn  this  work  we  limit discussion  to  linear classification  functions.  We  will  discuss \nextensions  to  the  nonlinear  case  in  Section  5.  If the  points  are  linearly separable, \nthen there exist an n-vector wand scalar b such that \nif Yi  =  1,  and \nif Yi  =  - 1,  i  =  1, . .. , f \n\nw\u00b7 Xi  - b ~ 1 \nw  . Xi  - b :S  - 1 \n\n(2) \n\nor equivalently \n\nb]  ~ 1,  i  =  1, ... , f. \n(3) \nYt [w  . Xi  -\n.  X  =  b,  is  the  one  which  is  furthest  from  the \nThe  \"optimal\"  separating  plane,  W \nclosest  points in the two classes.  Geometrically this  is equivalent to maximizing the \n.  X  =  b + 1  and \nseparation  margin or  distance  between  the  two  parallel  planes  W \nW \n\n.  X  = b - 1  (see  Figure 1.) \n\nThe  \"margin  of  separation\"  in  Euclidean  distance  is  2/llw112  where  IIw I1 2  = \n:L~=l wt  is  the  2-norm.  To  maximize  the  margin,  we  minimize  IIw1l2/2  subject \nto  the  constraints  (3).  According  to  structural  risk  minimization,  for  a  fixed  em(cid:173)\npirical  misclassification  rate,  larger  margins  should  lead  to  better  generalization \nand prevent overfitting in  high-dimensional attribute spaces.  The classifier is called \na  support  vector  machine  because  the solution  depends  only on  the points  (called \nsupport vectors)  located on the two supporting planes w\u00b7 x  = b - 1 and W \u00b7 x  = b + 1. \nIn general the classes will not be separable, so  the generalized optimal plane  (GOP) \nproblem (4)  [9,  20]  is  used.  A  slack term  T]!  is  added for  each  point such that if the \npoint is  misclassified,  T]i  2:  1.  The final  GOP formulation is: \n\nmin \nw ,b,'1 \ns. t. \n\n1 \n\ne \n\ni= l \n\nC  LT]t  + 2 II wll2 \nYdw  . Xi  - b]  + T]i  2:  1 \nT]i  ~ 0, \n\ni  =  1, ... , f \n\n(4) \n\nwhere  C  >  0  is  a  fixed  penalty  parameter.  The capacity  control  provided  by  the \nmargin maximization is  imperative to achieve good generalization  [21 ,  19]. \n\nThe  Robust  Linear  Programming  (RLP)  approach  to  SVM  is  identical  to  GOP \nexcept  the  margin  term  is  changed  from  the  2-norm  II wll2  to  the  I-norm,  II wlll  = \n\n\fSemi-Supervised Support Vector Machines \n\n371 \n\n2::;=1 IWj l\u00b7 The problem becomes  the following  robust  linear  program  (RLP)  [2,  7, \n1]: \n\ne \n\nn \n\nCL1]i + LS) \n\nmin \nw ,b,s ,,,, \n\ns.t. \n\ni = l \n\nj = l \n\nb]  + 1]i  ~ 1 \n\nYt [w . Xi  -\ni  =  1, ... , f \n1]i  ~ 0, \n-Sj  <= Wj  <=  Sj, \n\nj  = 1, ... , n. \n\n(5) \n\nThe  RLP  formulation is  a  useful  variation of SVM  with some  nice  characteristics. \nThe I-norm weight reduction still provides capacity control.  The results  in [13]  can \nbe  used  to show  that  minimizing  II wi ll  corresponds  to  maximizing the separation \nmargin  using  the  infinity  norm.  Statistical  learning  theory  could  potentially  be \nextended  to  incorporate alternative  norms.  One  major  benefit  of RLP  over  GOP \nis  dimensionality  reduction.  Both  RLP  and  GOP  minimize the  magnitude of the \nweights  w.  But  RLP  forces  more of the  weights  to  be  0  due  to  the  properties  of \nthe I-norm.  Another benefit of RLP over GOP is that it can be solved using linear \nprogramming instead of quadratic programming.  Both approaches can be extended \nto handle nonlinear discrimination using kernel functions  [8,  12] .  Empirical compar(cid:173)\nisons  of the approaches  have  not  found  any significant  difference  in  generalization \nbetween the  formulations  [5,  7,  3,  12] . \n\n3  Semi-supervised support  vector machines \n\nTo formulate the S3VM , we start with either SVM formulation, (4)  or (5) , and then \nadd  two  constraints  for  each  point  in  the  working  set.  One  constraint  calculates \nthe  misclassification  error  as  if the  point  were  in  class  1  and  the  other  constraint \ncalculates the misclassification error as if the point were  in class  - l.  The objective \nfunction  calculates  the  minimum of the  two  possible  misclassification  errors.  The \nfinal  class  of the  points  corresponds  to  the  one  that  results  in  the smallest  error. \nSpecifically we define the semi-supervised support vector machine problem (S3VM) \nas: \n\nw~~~,,' \nsubjectto  Yi (w'xt+b)+1]i ~I  1]t~O  i = I, . . . ,e \n\nC [t,~. + j~' min(~j, Zj)]  + II  w  II \n\n.  X j  - b + t,j  ~ 1 \n\nt,j  ~ 0  j  = f  + 1, ... , f  + k \n\nW \n\n(6) \n\n- (w\u00b7xj-b)+zj~I  Zj ~ O \n\nwhere C  > 0 is  a  fixed  misclassification penalty. \nInteger  programming can  be  used  to solve this  problem.  The  basic  idea is  to add \na  0  or  1 decision  variable,  dj ,  for  each  point  Xj  in the  working set.  This variable \nindicates  the  class of the point.  If dj  =  1  then the point is  in class  1 and if dj  =  0 \nthen the point is  in class  -1.  This  results  in the following mixed  integer  program: \n\nW,~~~',d  C [t,~. + j~l (~j + Zj)]  + II  w  II \n\nsubject  to \n\nYt(w\u00b7x i- b)+1]t~I  1]t~O  i=I, ... ,f \n\nW  .  Xj  - b + t,j  + A1(I  - d j ) ~ 1 \n\nt,j  ~ 0 \n\nj  =  e + 1, ... , f  + k \n\n- (w \u00b7 xj-b)+zj+Mdj~I  Zj~O dj={O , I} \n\n(7) \n\nThe constant  M  >  0  is  chosen  sufficiently  large such  that if d j  = 0  then  t,j  = 0  is \nfeasible for any optimal wand b.  Likewise if dJ  = 1 then Zj  =  O.  A globally optimal \n\n\f372 \n\nK.  Bennett and A.  Demiriz \n\ne:, \ne \n\ne \ne:, \ne e:,e \n\ne:, \n\ne:, \n\ne \n\ne \n\ne \ne \n- - -e- - -~ - - - - - - -\u2022  - - - - - - - - - -\n-e. -- --- - -.  -- - - - -- -e- - - --\ne \n\nee \n\ne \n\n0 \n\ne  0 \n\no \n\no \n\no \n\no \n\no \n\ne:, \n\ne:, \n\ne:, \n\ne:, \n\n4 \u00b7 \n\n~ \",. ' \n\ne:, \u2022  \u2022\u2022 \n. ....  /. \n..... \u2022 \n. ... ~ \n\u2022 \n\u2022 \n.,.'  .... \n\u2022 \n\u2022 \n\n..\u2022 . \n\n0 \n\no \n\no \n\no \n\no \n\no \n\no \n\no \n\n\u2022  0 \n\nFigure 2:  Left  =  solution found  by  RLP;  Right =  solution found  by  S3YM \n\nsolution  to  this  problem  can  be  found  using  CPLEX  or  other  commercial  mixed \ninteger  programming codes  [10]  provided  computer  resources  are sufficient  for  the \nproblem size.  Using the mathematical programming modeling language AMPL [11], \nwe were able to express the problem in thirty lines of code plus a  data file  and solve \nit using CPLEX. \n\n4  S3VM  and  Overall Risk Minimization \n\nAn  integer  S3YM  can  be  used  to  solve  the  Overall  Risk  Minimization  problem. \nConsider  the  simple  problem  given  in  Figure  20  of [19].  Using  RLP  alone on  the \ntraining data results  in the separation shown in Figure 1.  Figure 2 illustrates what \nhappens  when  working  set  data  is  added.  The  training  set  points  are  shown  as \ntransparent  triangles  and  hexagons.  The  working  set  points  are  shown  as  filled \ncircles.  The  left  picture  in  Figure  2  shows  the  solution found  by  RLP.  Note  that \nwhen  the  working  set  points  are  added,  the  resulting  separation  has  very  a  small \nmargin.  The right picture shows the S3YM solution constructed using the unlabeled \nworking set.  Note that a  much larger and clearer separation margin is found.  These \ncomputational solutions are identical to those  presented  in [19] . \n\nWe also tested S3YM on ten real-world data sets (eight from [14]  and the bright and \ndim galaxy sets from  [15]).  There have been many algorithms applied successfully to \nthese  problems without incorporate working set information.  Thus it was  not clear \na priori that S3YM  would improve generalization on these data sets.  For the data \nsets  where  no  improvement  is  possible,  we  would  like  transduction  using  ORM  to \nnot degrade the performance of the induction via SRM approach.  For each data set, \nwe  performed  10-fold cross-validation.  For the three starred data sets,  our  integer \nprogramming solver  failed  due  to excessive  branching  required  within the CPLEX \nalgorithm.  On  those  data  sets  we  randomly  extracted  50  point  working  sets  for \neach  trial.  The same C  parameter was  used for  each data set  in both the RLP and \nS3YM  problems l .  In  all  ten  problems,  S3YM  never  performed significantly worse \nthan RLP. In three of the problems, S3YM performed significantly better.  So ORM \ndid  not  hurt  generalization  and  in  some  cases  it  helped  significantly.  \\Ve  would \nexpect  this  based on ORM theory.  The generalization bounds for  ORM depend on \nthe difference  between the training and working sets.  If there is  little difference,  we \nwould not expect  any improvement using ORM. \n\nIThe formula  for  C  was  C  =  ;~f~>;;)  with  oX  = .001,  f  is  the  size of training  set,  and  k \nis  the size of the  working set .  This  formula was chosen  because it worked  well empirically \nfor  both  methods. \n\n\fSemi-Supervised Support  Vector Machines \n\n373 \n\nS.1VM \nDim  Points  CV-size  RLP \n0.018 \n14 \n0.02 \n0.036  0.034 \n9 \n0.035  0.033 \n30 \n0.064  0.054 \n14 \n0.173  0.160 \n13 \n0.155  0.151 \n13 \n0.109  0.106 \n34 \n0.173  0.173 \n166 \n0.220  0.222 \n8 \n0.281 \n0.219 \n60 \n\n2462 \n699 \n569 \n4192 \n297 \n506 \n351 \n476 \n769 \n208 \n\n50* \n70 \n57 \n50* \n30 \n51 \n35 \n48 \n50* \n21 \n\np-value \n0.343 \n0.591 \n0.678 \n0.096 \n0.104 \n0.590 \n0.59 \n0.999 \n0.678 \n0.045 \n\nData Set \nBright \nCancer \n\nCancer(Prognostic ) \n\nDim \nHeart \nHousing \n\nIonosphere \n\nMusk \nPima \nSonar \n\n5  Conclusion \n\n\\Ve  introduced  a  semi-supervised  SVM  model.  S3VM  constructs  a  support  vector \nmachine  using  all the available data from  both the  training and working sets.  We \nshow  how  the S3VM  model for  I-norm linear support vector machines can be  con(cid:173)\nverted  to a  mixed-integer program.  One great advantage of solving S3VM  using in(cid:173)\nteger programming is that the globall\u00a5 optimal solution can be found using packages \nsuch  as  CPLEX.  Using  the integer S  VM  we  performed  an empirical  investigation \nof transduction  using overall  risk  minimization, a  problem  posed  by  Vapnik.  Our \nresults  support  the  statistical  learning  theory  results  that  incorporating  working \ndata improves generalization when  insufficient  training information is  available.  In \nevery case, S3VM  either improved or showed  no significant difference  in generaliza(cid:173)\ntion compared  to the  usual  structural  risk  minimization approach.  Our empirical \nresults combined with the theoretical  results  in  [19],  indicate that transduction via \nORM constitutes a  very  promising research  direction. \n\nMany  research  questions  remain.  Since  transduction via overall  risk  minimization \nis  not  always  be  better  than  the  basic  induction via structural  risk  minimization, \ncan  we  identify  a  priori  problems  likely  to  benefit  from  transduction?  The  best \nmethods  of  constructing  S3VM  for  the  2-norm  case  and  for  nonlinear  functions \nare  still  open  questions.  Kernel  based  methods  can  be  incorporated  into  S3VM. \nThe  practical  scalability of the  approach  needs  to  be  explored.  We  were  able  to \nsolve moderately-sized problems with on the order of 50  working set  points using a \ngeneral  purpose  integer  programming code.  The  recent  success  of special  purpose \nalgorithms  for  support  vector  machines  [16,  17,  6]  indicate  that  such  approaches \nmay produce improvement for  S3VM  as well. \n\nReferences \n\n[1]  K.  P.  Bennett  and  E.  J.  Bredensteiner.  Geometry  in  learning.  In  C.  Gorini, \nE.  Hart,  W.  Meyer,  and T.  Phillips, editors,  Geometry  at  Work,  Washington, \nD.C., 1997.  Mathematical Association of America.  To  appear. \n\n[2]  K.  P.  Bennett  and  O.  1.  Mangasarian.  Robust  linear  programming discrim(cid:173)\n\nination of two  linearly inseparable sets.  Optimization  Methods  and  Software, \n1:23- 34, 1992. \n\n[3]  K. P.  Bennett, D. H. Wu, and L.  Auslender.  On support vector decision trees for \ndatabase  marketing.  R.P.I.  Math  Report  No.  98-100,  Rensselaer  Polytechnic \n\n\f374 \n\nK.  Bennett and A.  Demiriz \n\nInstitute,  Troy,  NY,  1998. \n\n[4J  A.M.  Bensaid,  L.O.  Hall,  J.e.  Bezdek,  and  L.P.  Clarke.  Partially supervised \n\nclustering for  image segmentation.  Pattern Recognition,  29(5):859- 871,  199. \n[5J  P.  S.  Bradley  and  O.  L.  Mangasarian.  Feature  selection  via  concave  mini(cid:173)\nmization  and  support  vector  machines.  Technical  Report  Mathematical  Pro(cid:173)\ngramming Technical Report 98-03,  University of Wisconsin-Madison,  1998.  To \nappear in ICML-98. \n\n[6J  P.  S.  Bradley  and  O.  L.  Mangasarian.  Massive  data  discrimination  via  lin(cid:173)\n\near  support  vector  machines.  Technical  Report  Mathematical  Programming \nTechnical  Report  98-05,  University  of Wisconsin-Madison,  1998.  Submitted \nfor  publication. \n\n[7J  E.  J.  Bredensteiner  and  K.  P.  Bennett.  Feature  minimization within decision \n\ntrees.  Computational  Optimization and  Applications,  10:110-126, 1997. \n\n[8J  C.  J.  C  Burges.  A tutorial on support vector machines for  pattern recognition. \n\nData  Mining  and  Knowledge  Discovery,  1998.  to appear. \n\n[9J  C.  Cortes  and  V.  N.  Vapnik.  Support  vector  networks.  Machine  Learning, \n\n20:273- 297,  1995. \n\n[IOJ  CPLEX Optimization Incorporated, Incline Village, Nevada.  Using the  CPLEX \n\nCallable  Library,  1994. \n\n[11]  R.  Fourer,  D.  Gay, and B.  Kernighan.  AMPL  A  Modeling  Language  for  Math(cid:173)\n\nematical Programming.  Boyd and  Frazer,  Danvers,  Massachusetts,  1993. \n\n[12J  T. T. Fries and R. Harrison Fries.  Linear programming support vector machines \n\nfor  pattern classification  and regression  estimation:  and  the sr algorithm.  Re(cid:173)\nsearch  report  706,  University of Sheffield,  1998. \n\n[13]  O.  L.  Mangasarian.  Parsimonious least norm approximation. Technical Report \n\nMathematical Programming Technical  Report 97-03,  University of Wisconsin(cid:173)\nMadison,  1997.  To appear in  Computational  Optimization  and  Applications. \n\n[14]  P.M.  Murphy and D.W.  Aha.  UCI  repository  of machine learning databases. \nDepartment  of Information  and  Computer  Science,  University  of California, \nIrvine,  California, 1992. \n\n[15J  S.  Odewahn,  E.  Stockwell,  R.  Pennington,  R  Humphreys,  and  W  Zumach. \nAutomated  star/galaxy  discrimination  with  neural  networks.  Astronomical \nJournal,  103(1):318- 331,1992. \n\n[16]  E.  Osuna,  R.  Freund,  and F.  Girosi.  Support  vector  machines:  Training and \n\napplications.  AI  Memo  1602,  Maassachusets  Institute of Technology,  1997. \n\n[17]  J.  Platt.  Sequentional  minimal  optimization:  A  fast  algorithm  for  training \nsupport vector machines.  Technical  Report  Technical  Report  98-14,  Microsoft \nResearch,  1998. \n\n[18]  M.  Vaidyanathan,  R.P. Velthuizen,  P.  Venugopal,  L.P.  Clarke,  and  L.O.  Hall. \n\nTumor  volume  measurements  using  supervised  and  semi-supervised  mri  seg(cid:173)\nIn  Artificial  Neural  Networks  in  Engineering  Conference,  AN(cid:173)\nmentation. \nNIE(19g4),  1994. \n\n[19]  V.  N.  Vapnik.  Estimation of dependencies  based  on  empirical  Data.  Springer, \n\nNew  York,  1982.  English translation,  Russian version  1979. \n\n[20]  V.  N.  Vapnik.  The  Nature  of Statistical  Learning  Theory.  Springer  Verlag, \n\nNew  York,  1995. \n\n[21]  V.  N.  Vapnik and  A.  Ja. Chervonenkis.  Theory  of Pattern Recognition.  Nauka, \n\nMoscow,  1974.  In Russian. \n\n\f", "award": [], "sourceid": 1582, "authors": [{"given_name": "Kristin", "family_name": "Bennett", "institution": null}, {"given_name": "Ayhan", "family_name": "Demiriz", "institution": null}]}