{"title": "Rapid Quality Estimation of Neural Network Input Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 45, "page_last": 51, "abstract": null, "full_text": "Rapid  Quality  Estimation of Neural \n\nNetwork Input  Representations \n\nKevin J.  Cherkauer \n\nJude W.  Shav lik \n\nComputer Sciences  Department, University of Wisconsin-Madison \n\n1210 W.  Dayton St., Madison,  WI 53706 \n\n{cherkauer,shavlik }@cs.wisc.edu \n\nAbstract \n\nThe choice of an input representation for a neural network can have \na  profound  impact  on  its  accuracy  in  classifying  novel  instances. \nHowever,  neural networks are typically  computationally expensive \nto  train,  making  it  difficult  to  test  large  numbers  of alternative \nrepresentations.  This  paper  introduces  fast  quality  measures  for \nneural  network  representations,  allowing  one  to  quickly  and  ac(cid:173)\ncurately estimate which  of a  collection  of possible  representations \nfor  a  problem is  the best.  We show  that our measures for  ranking \nrepresentations are more accurate than a previously published mea(cid:173)\nsure,  based on experiments with  three difficult,  real-world pattern \nrecognition  problems. \n\n1 \n\nIntroduction \n\nA  key  component of successful  artificial  neural  network  (ANN)  applications  is  an \ninput representation that suits the problem.  However,  ANNs  are usually  costly to \ntrain,  preventing  one  from  trying  many  different  representations.  In  this  paper, \nwe  address  this  problem  by  introducing  and  evaluating  three  new  measures  for \nquickly estimating ANN input representation quality.  Two of these, called [DBleaves \nand  Min (leaves),  consistently  outperform  Rendell  and  Ragavan's  (1993)  blurring \nmeasure in  accurately ranking different  input representations for  ANN  learning on \nthree difficult,  real-world datasets. \n\n2  Representation Quality \n\nChoosing  good  input  representations  for  supervised  learning  systems  has  been \nthe subject  of diverse  research in  both connectionist  (Cherkauer &  Shavlik,  1994; \nKambhatla &  Leen,  1994)  and symbolic paradigms (Almuallim &  Dietterich,  1994; \n\n\f46 \n\nK.  J.  CHERKAUER, J.  W.  SHA VLIK \n\nCaruana & Freitag,  1994;  John  et  al.,  1994;  Kira  & Rendell,  1992).  Two  factors \nof representation  quality  are  well-recognized  in  this  work:  the ability  to  separate \nexamples of different  classes  (sufficiency of the representation)  and the number of \nfeatures present (representational economy).  We believe there is  also a third impor(cid:173)\ntant  component  that is  often overlooked,  namely  the  ease of learning an  accurate \nconcept under a  given representation, which we  call  transparency.  We  define trans(cid:173)\nparency  as  the  density  of  concepts  that  are  both  accurate  (generalize  well)  and \nsimple  (of low  complexity)  in  the  space of possible  concepts  under  a  given  input \nrepresentation and learning algorithm.  Learning an accurate concept  will  be more \nlikely if the concept space is  rich in accurate concepts that are also simple, because \nsimple concepts require less  search to find  and less data to validate. \nIn  this  paper,  we  introduce  fast  transparency  measures  for  ANN  input  represen(cid:173)\ntations.  These  are  orders  of  magnitude  faster  than  the  wrapper  method  (John \net  al.,  1994),  which  would  evaluate  ANN  representations  by  training  and  testing \nthe  ANN s  themselves.  Our  measures  are  based  on  the  strong  assumption  that, \nfor  a  fixed  input representation, information about the density of accurate, simple \nconcepts under a  (fast)  decision-tree learning algorithm will  transfer to the concept \nspace of an ANN  learning algorithm.  Our experiments on three real-world datasets \ndemonstrate that our transparency measures are highly predictive of representation \nquality for  ANNs,  implying that the transfer assumption holds surprisingly well for \nsome  pattern recognition tasks even  though  ANNs  and  decision  trees are believed \nto work best on quite different types of problems (Quinlan, 1994).1  In addition, our \nExper.  1 shows  that transparency does  not depend on representational sufficiency. \nExper.  2 verifies  this  conclusion and also demonstrates that transparency does  not \ndepend on representational economy.  Finally,  Exper.  3 examines  the effects  of re(cid:173)\ndundant features on the transparency measures,  demonstrating that the  ID31eaves \nmeasure is  robust in  the face  of such features. \n\n2.1  Model-Based Transparency Measures \n\nWe  introduce  three  new  \"model-based\"  measures  that  estimate  representational \ntransparency  by  sampling  instances  of  roughly  accurate  concept  models  from  a \ndecision-tree  space  and  measuring  their  complexities.  If simple,  accurate  models \nare abundant,  the average  complexity  of the  sampled  models  will  be low.  If they \nare sparse, we  can expect a  higher complexity value. \nOur first  measure,  avg(leaves),  estimates the expected  complexity of accurate con(cid:173)\ncepts as the average number of leaves in n  randomly constructed decision trees that \ncorrectly classify  the training set: \n\navg(leaves) ==  ~ 2:;=11eaves(t) \n\nwhere leaves(t) is  the number of leaves in tree t.  Random trees are built top-down; \nfeatures are chosen with uniform probability from those which further partition the \ntraining  examples  (ignoring  example  class).  Tree  building  terminates  when  each \nleaf achieves class purity (Le.,  the tree correctly classifies all the training examples). \nHigh values of avg(leaves) indicate high concept complexity (i.e., low transparency). \nThe second  measure,  min(leaves),  finds  the minimum  number of leaves over the n \nrandomly constructed trees  instead of the average to reflect  the fact  that learning \nsystems try to make intelligent,  not random, model  choices: \n\nmin (leaves) ==  min {leaves(t)} \n\nt=l,n \n\nlWe did not preselect  datasets  based on  whether our experiments  upheld the transfer \nassumption.  We  report  the results  for  all  datasets that we  have tested our transparency \nmeasures on. \n\n\fRapid  Quality  Estimation  of Neural  Network Input Representations \n\n47 \n\nDataset \nDNA \nNIST \nMagellan \n\nTable  1:  Summary of datasets used. \n\nII  Examples \n\nClasses \n\nCross Validation Folds \n\n20,000 \n3,471 \n625 \n\n6 \n10 \n2 \n\n4 \n10 \n4 \n\nThe third measure,  ID31eaves,  simply counts the number of leaves in  the tree grown \nby  Quinlan's (1986)  ID3 algorithm: \n\nID31eaves ==  leaves(ID3 tree) \n\nWe  always  use  the full  ID3 tree  (100%  correct on  the training set).  This measure \nassumes the complexity of the concept ID3 finds  depends on the density of simple, \naccurate models in its space and  thus reflects  the true transparency. \nAll  these  measures  fix  tree  training-set  accuracy  at  100%,  so  simpler  trees  imply \nmore accurate generalization (Fayyad,  1994)  as well as easier learning.  This lets us \nestimate transparency without the multiplicative additional computational expense \nof cross validating each tree.  It also lets us use all the training data for tree building. \n\n2.2 \n\n\"Blurring\"  as a  Transparency  Measure \n\nRendell  and Ragavan  (1993)  address ease of learning explicitly and present a  met(cid:173)\nric  for  quantifying it called  blurring.  In their framework,  the less  a  representation \nrequires  the  use  of  feature  interactions  to  produce  accurate  concepts,  the  more \ntransparent it  is.  Blurring  heuristically  estimates  this  by  measuring  the  average \ninformation content of a representation's individual features.  Blurring is equivalent \nto the  (negation of the)  average information gain  (Quinlan,  1986)  of a  representa(cid:173)\ntion's features  with respect to a  training set,  as  we  show in Cherkauer and Shavlik \n(1995). \n\n3  Evaluating the Transparency Measures \n\nWe  evaluate the transparency measures on three problems:  DNA  (predicting gene \nreading  frames;  Craven  & Shavlik,  1993),  NIST  (recognizing  handwritten  digits; \n\"FI3\"  distribution), and Magellan (detecting volcanos in radar images of the planet \nVenus;  Burl et al.,  1994).2  The datasets are summarized in Table  l. \nTo assess the different transparency measures, we follow these steps for each dataset \nin Exper.  1 and 2: \n\n1.  Construct several different  input representations for  the problem. \n2.  Train ANNs  using each representation and test the resulting generalization \naccuracy  via  cross  validation  (CV).  This gives  us  a  (costly)  ground-truth \nranking of the relative qualities of the different  representations. \n\n3.  For  each  transparency  measure,  compute  the  transparency  score  of  each \n\nrepresentation.  This gives  us  a  (cheap)  predicted ranking of the  represen(cid:173)\ntations from each measure. \n\n4.  For each  transparency  measure,  compute Spearman's rank correlation  co(cid:173)\n\nefficient  between the ground-truth and predicted rankings.  The higher this \ncorrelation, the better the transparency measure predicts the true ranking. \n\n20n these problems, we  have found that ANNs generalize 1- 6 percentage points better \nthan decision  trees using identical input representations,  motivating our desire to develop \nfast  measures of ANN input representation quality. \n\n\f48 \n\nK. 1.  CHERKAUER, J.  W. SHAVLIK \n\nTable 2:  User CPU seconds  on  a Sun  SPARCstation  10/30 for  the largest representation \nof each  dataset.  Parenthesized numbers are standard deviations  over  10 runs. \n\nI Dataset  \\I  Blurring  I ID3leaves  I Min! A vg(leaves)  I Backprop  I \nDNA \nNIST \nMagellan \n\n13,444  56.25 \n1,558  5.00 \n0.13) \n\n212,900 \n501,400 \n6,300 \n\n1.68  2.38 \n2.69  2.31 \n0.21  0.15 \n\n1,245  3.96) \n221  2.75 \n1  0.07 \n\n12 \n\nIn  Exper. 3 we  rank only two representations at a  time, so instead of computing a \nrank correlation in step 4,  we just count the number of pairs ranked correctly. \nWe  created input  representations  (step  1)  with  an algorithm  we  call  RS  (\"Repre(cid:173)\nsentation Selector\").  RS  first  constructs a  large  pool  of plausible,  domain-specific \nBoolean features  (5,460 features for  DNA,  251,679 for  NIST, 33,876 for  Magellan). \nFor each  CV fold,  RS sorts the features by information gain on  the entire training \nset.  Then it  scans the  list,  selecting each feature  that  is  not  strongly pairwise de(cid:173)\npendent  on  any  feature  already selected  according to a  standard X2  independence \ntest using the X 2  statistic. \nThis  produces  a  single  reasonable  input  representation,  Rl.3  To  obtain the  addi(cid:173)\ntional representations needed for  the ranking experiments, we ran RS several times \nwith  successively  smaller  subsets  of  the  initial  feature  pool,  created  by  deleting \nfeatures  whose  training-set information gains were  above different  thresholds.  For \neach dataset, we  made nine additional representations of varying qualities,  labeled \nR 2-RlO ,  numbered from  least to most  \"damaged\"  initial feature pool. \nTo get the ground-truth ranking (step 2), we trained feed-forward ANNs with back(cid:173)\npropagation  using  each  representation  and  one  output  unit  per  class.  We  tried \nseveral different  numbers of hidden  units in one  layer and used  the best  CV accu(cid:173)\nracy among these  (Fig.  1,  left)  to rank each input representation for  ground truth. \nEach transparency measure also predicted a ranking of the representations (step 3). \nA CPU time comparison is in Table 2.  This table and the experiments below report \nmin (leaves)  and  avg(leaves)  results from  sampling 100 random trees,  but sampling \nonly 10 trees  (giving a  factor  10 speedup)  yields  similar ranking accuracy. \nFinally,  in  Exper.  1 and  2  we  evaluate  each  transparency  measure  (step  4)  using \n\nSpearman's rank  correlation coefficient,  rs  =  1 - m(\";i.!:l)\u00b7'  between  the ground-\ntruth  and  predicted  rankings  (m  is  the  number  of representations  (10);  di  is  the \nground-truth  rank  (an  integer  between  1  and  10)  minus  the  transparency  rank). \nWe  evaluate  the  transparency measures  in  Exper.  3 by  counting  the  number  (out \nof ten)  of representation pairs each measure orders the same as ground truth. \n\n6.Em  d 2 \n\n4  Experiment  I-Transparency vs.  Sufficiency \n\nThis experiment demonstrates that our transparency measures are good predictors \nof representation  quality  and  shows  that  transparency  does  not  depend  on  repre(cid:173)\nsentational sufficiency  (ability  to separate examples).  In this experiment  we  used \ntransparency to rank  ten representations for  each dataset and compared the rank(cid:173)\nings  to  the  ANN  ground  truth  using  the  rank  correlation  coefficient.  RS  created \nthe  representations  by  adding features  until  each  representation  could  completely \nseparate  the  training data into  its  classes.  Thus,  representational  sufficiency  was \n\n3Though  feature  selection  is  not  the  focus  of  this  paper,  note  that  similar  feature \nselection  algorithms  have  been  used  by others  for  machine  learning  applications  (Baim, \n1988;  Battiti,  1994). \n\n\fRapid Quality Estimation of Neural Network Input Representations \n\n49 \n\nDNA Backprop Ground-Truth Cross-Validation \n\n1 00  .---..---.----.--r--.---,.----.-....--,.---.---, \n\n~ \n\n~  90 \n!!! \n~  80 \n~ \n~  70 \nrfl \nCii  60 \n~ \n~  50 \n\nExperiment 1  -\nExperiment 2  ......... . \n\n40~~~~~~~~~~~~ \n\nR1  R2  R3  R4  R5  R6  R7  R8  R9R10 \n\nRepresentation Number \n\nNIST Backprop Ground-Truth Cross-Validation \n\n1 00  r-\"--\"'--~--.--r--r--.---.----.---..--, \n\n~ 90 \n~ o !i  80 \n*(cid:173)OJ  70 \n1ii  60 \n~ \n~  50 \n\n(f) \n\nExperiment 1  -\nExperiment 2  ......... . \n\n40~~~~~~~~~~~~ \n\nR1  R2  R3  R4  R5  R6  R7  R8  R9R10 \n\nRepresentation  Number \n\nMagellan Backprop Ground-Truth Cross-Validation \n100.---..--...--~--.--r--r--.---.----.---..--, \n\n~  90 \n~ o \n!i  80 \n*(cid:173)OJ  70 \nrfl \nCii  60 \n~ \n!li  50 \n\nExperiment 1  -\nExperiment 2  ......... . \n\n40~~~~~~~~~~~~ \n\nR1  R2  R3  R4  R5  R6  R7  R8  R9R10 \n\nRepresentation  Number \n\nDNA Dataset \n\nMeasure \nID3leaves \nMin (leaves) \nA vgJleaves) \n\nBlurring \n\nExp1  rs \n\nExp2 rs \n\n0.99 \n0.94 \n0.78 \n0.78 \n\n0.95 \n0.99 \n0.96 \n0.81 \n\nNIST Dataset \n\nMeasure \nID3leaves \nMin(leaves) \nAvg(leaves) \n\nBlurring \n\nExp1  rs \n\nExp2  rs \n\n1.00 \n1.00 \n1.00 \n1.00 \n\n1.00 \n1.00 \n1.00 \n1.00 \n\nMagellan Dataset \n\nMeasure \nID3leaves \nMin(leaves) \nAvg(leaves) \n\nBlurring \n\nExp1  rs  Exp2 rs \n\n0.81 \n0.83 \n0.71 \n0.48 \n\n0.78 \n0.76 \n0.71 \n0.73 \n\nFigure 1:  Left:  Exper. 1 and 2 ANN  CV test-set accuracies (y  axis; error bars are 1 \nSD)  used to rank the representations (x axis).  Right:  Exper. 1 and 2, transparency \nrankings compared to ground truth.  rs:  rank correlation coefficient  (see  text). \n\nheld  constant.  (The number of features  could vary across representations.) \nThe rank correlation results are shown in Fig.  1 (right).  ID31eaves  and min (leaves) \noutperform  the  less  sophisticated  avg(leaves)  and  blurring  measures  on  datasets \nwhere there is  a  difference.  On the NIST data, all  measures produce perfect rank(cid:173)\nings.  The  confidence  that  a  true  correlation  exists  is  greater  than  0.95  for  all \nmeasures and datasets except  blurring on the Magellan data, where it is  0.85. \nThe high rank correlations we  observe imply that our transparency measures  cap(cid:173)\ntUre  a  predictive factor  of representation quality.  This factor  does  not  depend  on \nrepresentational sufficiency,  because sufficiency  was equal for  all representations. \n\n\f50 \n\nK.  J. CHERKAUER. J.  W. SHAVLIK \n\nTable  3:  Exper.  3 results:  correct  rankings  (out of 10)  by  the transparency  measures  of \nthe corresponding representation pairs,  Ri  vs.  R~, from  Exper.  1 and Exper.  2. \n\nI Dataset  II  ID3leaves  Min{leaves)  Avg(leaves)  Blurring \n\nI ~:naJi \n\n~~ \n\n~ \n\n~ \n\n~ \n\n5  Experiment  2-Transparency vs.  Economy \n\nThis experiment shows that transparency does not depend on representational econ(cid:173)\nomy  (number  of  features),  and  it  verifies  Exper.  1's  conclusion  that  it  does  not \ndepend on sufficiency.  It also reaffirms the predictive power of the measures. \nIn Exper. 1, sufficiency was held constant, but economy could vary.  Exper. 2 demon(cid:173)\nstrates that  transparency  does  not  depend  on  economy  by  equalizing  the number \nof features  and  redoing the  comparison.  In  Exper.  2,  RS  added extra features  to \neach representation  used in in  Exper.  1 until  they all  contained a  fixed  number of \nfeatures  (200  for  DNA,  250  for  NIST,  100 for  Magellan).  Each  Exper.  2 represen(cid:173)\ntation,  R~ (i  =  1, ... , 10),  is  thus  a  proper superset  of the corresponding  Exper.  1 \nrepresentation,  Ri.  All  representations  for  a  given  dataset  in  Exper.  2  have  an \nidentical number of features and allow perfect classification of the training data, so \nneither economy nor  sufficiency  can affect  the transparency scores now. \nThe results  (Fig.  1,  right)  are similar  to  Exper.  1's.  The notable changes  are that \nblurring is  not as far  behind ID3leaves and  min (leaves) on the Magellan data as be(cid:173)\nfore,  and avg(leaves) has joined the accuracy of the other two model-based measures \non  the DNA.  The confidence that correlations exist  is  above 0.95  in all  cases. \nAgain,  the  high  rank  correlations  indicate  that  transparency  is  a  good  predictor \nof representation  quality.  Exper.  2  shows  that  transparency  does  not  depend  on \nrepresentational economy or sufficiency,  as both were  held constant here. \n\n6  Experiment 3-Redundant Features \n\nExper.  3 tests the transparency measures'  predictions  when  the  number  of redun(cid:173)\ndant features varies, as ANNs can often use redundant features to advantage (Sutton \n&  Whitehead,  1993), an  ability generally not attributed to decision  trees. \nExper. 3 reuses the representations Ri and R~ (i  =  1,  ... ,  10)  from  Exper. 1 and 2. \nRecall  that  R~ =>  R i .  The extra features  in each  R~ are redundant as they  are not \nneeded to separate the training data.  We show  the number of Ri vs.  R~ representa(cid:173)\ntion pairs each transparency measure ranks correctly for each dataset (Table 3).  For \nDNA  and NIST,  the  redundant representations always  improved  ANN  generaliza(cid:173)\ntion  (Fig.  1,  left;  0.05 significance).  Only ID3leaves predicted this correctly, finding \nsmaller trees with the increased flexibility  afforded by the extra features.  The other \nmeasures  were  always  incorrect  because  the  lower  quality  redundant  features  de(cid:173)\ngraded  the  random  trees  (avg (leaves) ,  min (leaves))  and  the  average  information \ngain  (blurring).  For  Magellan,  ANN  generalization was  only significantly  different \nfor  one representation pair, and all  measures performed near chance. \n\n7  Conclusions \n\nWe  introduced  the  notion  of transparency  (the  prevalence  of  simple  and  accurate \nconcepts)  as an important factor of input representation quality and developed in-\n\n\fRapid  Quality Estimation of Neural  Network  Input Representations \n\n51 \n\nexpensive, effective ways to measure it.  Empirical tests on three real-world datasets \ndemonstrated these  measures' accuracy  at ranking representations for  ANN  learn(cid:173)\ning  at  much  lower  computational  cost  than  training  the  ANNs  themselves.  Our \nnext  step  will  be  to  use  transparency measures  as  scoring functions  in  algorithms \nthat apply extensive search to find  better input representations. \n\nAcknowledgments \n\nThis  work  was  supported  by  ONR  grant  N00014-93-1-099S,  NSF  grant  CDA-\n9024618  (for  CM-5  use),  and a  NASA  GSRP fellowship  held  by  KJC. \n\nReferences \n\nAlmuallim,  H.  &  Dietterich,  T.  (1994).  Learning  Boolean  concepts  in  the  presence  of \nmany irrelevant features.  Artificial Intelligence,  69(1- 2):279-305. \n\nBairn,  P.  (1988).  A  method for  attribute selection  in inductive  learning  systems.  IEEE \nTransactions  on  Pattern  Analysis  fj Machine  Intelligence,  10(6):888-896. \n\nBattiti,  R.  (1994).  Vsing mutual information  for  selecting features  in  supervised neural \nnet learning.  IEEE  Transactions  on  Neural  Networks, 5(4):537-550. \nBurl,  M.,  Fayyad,  V.,  Perona,  P.,  Smyth,  P.,  &  Burl,  M.  (1994).  Automating the  hunt \nfor  volcanoes on Venus.  In IEEE Computer Society  Con! on  Computer  Vision  fj Pattern \nRecognition:  Proc,  Seattle,  WA.  IEEE Computer Society Press. \nCaruana, R.  &  Freitag, D.  (1994).  Greedy attribute selection.  In Machine  Learning:  Proc \n11th  Intl  Con!,  (pp.  28-36),  New Brunswick,  NJ.  Morgan  Kaufmann. \nCherkauer,  K.  &  Shavlik,  J.  (1994).  Selecting  salient  features  for  machine  learning \nfrom  large  candidate  pools  through  parallel  decision-tree  construction.  In  Kitano,  H. \n&  Hendler,  J.,  ecis.,  Massively  Parallel  Artificial Intel.  MIT Press,  Cambridge,  MA. \nCherkauer,  K.  &  Shavlik,  J.  (1995).  Rapidly estimating the quality of input representa(cid:173)\ntions for  neural networks.  In  Working  Notes,  IJCAI  Workshop  on  Data  Engineering  for \nInductive  Learning,  (pp.  99-108),  Montreal,  Canada. \nCraven,  M.  &  Shavlik,  J.  (1993).  Learning  to  predict  reading  frames  in  E.  coli  DNA \nsequences.  In  Proc  26th  Hawaii  Intl  Con! on  System  Science,  (pp.  773-782),  Wailea,  HI. \nIEEE Computer Society  Press. \nFayyad,  V.  (1994).  Branching on  attribute  values  in  decision  tree  generation.  In  Proc \n12th  Natl  Con! on  Artificial Intel,  (pp.  601-606),  Seattle, WA.  AAAIjMIT Press. \nJohn,  G.,  Kohavi,  R.,  &  Pfleger,  K.  (1994).  Irrelevant features  and the subset  selection \nproblem.  In Machine  Learning:  Proc  11th  Intl  Con!,  (pp.  121-129),  New Brunswick, NJ. \nMorgan  Kaufmann. \nKambhatla,  N.  &  Leen,  T.  (1994).  Fast non-linear dimension reduction.  In  Advances  in \nNeural In!o Processing  Sys (vol  6),  (pp. 152-159), San Francisco, CA. Morgan Kaufmann. \nKira, K.  &  Rendell,  L.  (1992).  The feature selection problem:  Traditional methods and a \nnew  al~orithm.  In Proc  10th  Natl  Con! on  Artificial Intel,  (pp.  129-134),  San Jose,  CA. \nAAAI/MIT Press. \nQuinlan,  J.  (1986).  Induction of decision  trees.  Machine  Learning,  1:81-106. \n\nQuinlan, J.  (1994).  Comparing connectionist and symbolic learning methods.  In Hanson, \nS.,  Drastal,  G.,  &  Rivest,  R.,  eds.,  Computational  Learning  Theory  fj Natural  Learning \nSystems  (vol  I:  Constraints  fj Prospects).  MIT Press,  Cambridge,  MA. \nRendell, L.  &  Ragavan, H.  (1993).  Improving the design of induction methods by analyz(cid:173)\ning  algorithm  functionality  and data-based concept  complexity.  In  Proc  13th  Intl  Joint \nCon! on  Artificial Intel,  (pp.  952-958),  Chamhery,  France.  Morgan  Kaufmann. \nSutton, R. &  Whitehead, S.  (1993) .  Online learning with random representations.  In Ma(cid:173)\nchine  Learning:  Proc  10th  IntI  Con/,  (pp.  314-321),  Amherst, MA.  Morgan  Kaufmann. \n\n\f", "award": [], "sourceid": 1139, "authors": [{"given_name": "Kevin", "family_name": "Cherkauer", "institution": null}, {"given_name": "Jude", "family_name": "Shavlik", "institution": null}]}