{"title": "Predicting Lifetimes in Dynamically Allocated Memory", "book": "Advances in Neural Information Processing Systems", "page_first": 939, "page_last": 945, "abstract": null, "full_text": "Predicting Lifetimes  in Dynamically \n\nAllocated Memory \n\nDavid A.  Cohn \n\nAdaptive Systems Group \n\nHarlequin,  Inc. \n\nSatinder Singh \n\nDepartment of Computer Science \n\nUniversity of Colorado \n\nMenlo  Park,  CA  94025 \ncohn~harlequin.com \n\nBoulder, CO 80309 \n\nbaveja~cs.colorado.edu \n\nAbstract \n\nPredictions oflifetimes of dynamically allocated objects can be used \nto improve time and space efficiency  of dynamic memory manage(cid:173)\nment in computer programs.  Barrett and Zorn  [1993]  used a simple \nlifetime predictor and demonstrated this improvement on a variety \nof computer programs.  In  this  paper,  we  use  decision  trees  to  do \nlifetime  prediction  on  the  same  programs  and  show  significantly \nbetter  prediction .  Our method also has the advantage that  during \ntraining we  can use  a large number of features  and let the decision \ntree  automatically choose  the relevant  subset. \n\n1 \n\nINTELLIGENT MEMORY ALLOCATION \n\nDynamic  memory  allocation  is  used  in  many  computer  applications.  The  appli(cid:173)\ncation  requests  blocks  of memory from  the  operating  system  or  from  a  memory \nmanager when  needed  and explicitly frees  them up after use.  Typically, all of these \nrequests  are handled  in  the same way,  without  any  regard  for  how  or for  how  long \nthe  requested  block  will  be  used.  Sometimes programmers use  runtime profiles  to \nanalyze  the  typical  behavior  of their  program  and  write  special  purpose  memory \nmanagement  routines  specifically  tuned  to  dominant  classes  of allocation  events. \nMachine learning methods offer  the opportunity to automate the process  of tuning \nmemory management systems. \n\nIn  a  recent  study,  Barrett  and  Zorn  [1993]  used  two  allocators:  a  special  allocator \nfor  objects  that  are  short-lived,  and  a  default  allocator  for  everything  else.  They \ntried a simple prediction method on a number of public-domain , allocation-intensive \nprograms and  got  mixed results  on  the  lifetime prediction  problem.  Nevertheless, \nthey showed that for all the cases where they were able to predict well,  their strategy \nof assigning objects predicted to be short-lived to the special allocator led to savings \n\n\f940 \n\nD. A.  Cohn and S.  Singh \n\nin  program running  times.  Their results  imply that  if we  could  predict  well  in  all \ncases  we  could  get  similar savings for  all programs.  We  concentrate on  the lifetime \nprediction  task  in  this  paper  and  show  that using  axis-parallel  decision  trees  does \nindeed lead to significantly better prediction on all the programs studied by Zorn and \nGrunwald  and  some others  that we  included.  Another  advantage of our approach \nis that we  can  use  a large number of features  about the allocation requests  and  let \nthe decision  tree decide  on their relevance. \nThere are  a number of advantages of using lifetime predictions for  intelligent mem(cid:173)\nory  management.  It can  improve CPU  usage,  by  using special-purpose  allocators, \ne.g., short-lived objects can be allocated  in small spaces by  incrementing a  pointer \nand deallocated together when they  are  all dead.  It can decrease  memory fragmen(cid:173)\ntation, because the short-lived objects do not pollute the address space of long lived \nobjects.  Finally, it can  improve program locality, and thus program speed,  because \nthe short-lived objects  are  all allocated in a  small part of the  heap. \n\nThe advantages of prediction must be weighed against the time required to examine \neach  request  and  make  that  prediction  about  its  intended  use.  It is  frequently \nargued  that,  as  computers  and  memory  become  faster  and  cheaper,  we  need  to \nbe  less  concerned  about  the  speed  and  efficiency  of machine  learning  algorithms. \nWhen  the  purpose  of the  algorithm  is  to  save  space  and  computation,  however, \nthese  concerns  are paramount. \n\n1.1  RELATED  WORK \n\nTraditionally, memory management has been  relegated  to a single,  general-purpose \nallocator.  When performance is  critical, software  developers will frequently  build a \ncustom memory manager which  they  believe is tuned  to optimize the  performance \nof the  program.  Not  only  is  this hand construction  inefficient  in  terms of the pro(cid:173)\ngramming time required,  this  \"optimization\" may seriously  degrade the  program's \nperformance if it does not accurately reflect  the program's use  [Wilson et al., 1995]. \nCustomalloc [Grunwald and  Zorn,  1992]  monitors program runs on  benchmark in(cid:173)\nputs  to  determine  the  most  commonly  requested  block  sizes.  It then  produces  a \nset of memory allocation routines which  are customized to efficiently allocate those \nblock sizes.  Other memory requests  are still handled by a general purpose allocator. \n\nBarrett  and Zorn  [1993]  studied lifetime prediction based on benchmark inputs.  At \neach allocation request, the call graph (the list of nested  procedure/function calls in \neffect  at the time)  and the object  size  was used  to identify  an  allocation  site.  If all \nallocations from  a  particular site  were  short-lived  on  the  benchmark  inputs,  their \nalgorithm predicted that future allocations would also be short-lived.  Their method \nproduced  mixed  results  at  lifetime prediction,  but  demonstrated  the  savings  that \nsuch  predictions could bring. \n\nIn  this  paper,  we  discuss  an  approach  to  lifetime  prediction  which  uses  learned \ndecision  trees.  In  the  next  section,  we  first  discuss  the  identification  of relevant \nstate features  by  a  decision  tree.  Section  3 discusses  in greater  detail  the problem \nof lifetime  prediction.  Section  4  describes  the  empirical  results  of applying  this \napproach  to several  benchmark programs, and Section  5 discusses  the implications \nof these  results  and directions for  future  work. \n\n\fPredicting Lifetimes in Dynamically Allocated Memory \n\n941 \n\n2  FEATURE  SELECTION WITH  A  DECISION TREE \n\nBarrett and Zorn 's approach captures state information in the form of the program's \ncall  graph  at  the  time of an  allocation  request,  which  is  recorded  to  a  fixed  pre(cid:173)\ndetermined depth.  This graph,  plus the request  size, specifies  an  allocation  \"site\"; \nstatistics are gathered separately for each site.  A drawback of this approach is that \nit forces  a  division for  each  distinct  call  graph,  preventing  generalization across  ir(cid:173)\nrelevant  features.  Computationally, it  requires  maintaining an  explicit  call  graph \n(information that  the  program  would  not  normally provide),  as  well  as  storing  a \npotentially large  table of call  sites from  which  to make predictions.  It also  ignores \nother potentially useful  information, such as the parameters of the functions on the \ncall stack,  and the contents of heap  memory and the program registers  at the time \nof the request. \n\nIdeally, we  would  like  to examine as  much  of the program state  as  possible  at the \ntime of each  allocation request,  and  automatically extract  those  pieces  of informa(cid:173)\ntion that best  allow  predicting how  the requested  block  will be used.  Decision  tree \nalgorithms are  useful  for  this sort  of task.  A  decision  tree  divides  inputs on  basis \nof how  each  input feature  improves  \"purity\"  of the  tree's  leaves.  Inputs  that  are \nstatistically  irrelevant  for  prediction  are  not  used  in  any  splits;  the  tree's final  set \nof decisions examine only input features  that improve its predictive performance. \n\nRegardless of the parsimony of the final tree however,  training a tree with the entire \nprogram state as a feature vector is  computationally infeasible. In our experiments, \ndetailed  below,  we  arbitrarily  used  the  top  20  words  on  the  stack,  along  with  the \nrequest  size,  as  an approximate indicator of program state.  On the target  machine \n(a Sparcstation) , we found that including program registers in the feature set made \nno significant difference,  and so  dropped  them from  consideration for  efficiency. \n\n3  LIFETIME  PREDICTION \n\nThe characteristic of memory requests that we  would like to predict  is  the lifetime \nof the  block - how  long it  will  be  before  the  requested  memory is  returned  to the \ncentral  pool.  Accurate  lifetime  prediction  lets  one  segregate  memory  into  short(cid:173)\nterm, long-term and permanent storage.  To this end,  we  have  used  a  decision  tree \nlearning  algorithm to  derive  rules  that distinguish  \"short-lived\"  and  \"permanent\" \nallocations from the general  pool of allocation requests. \n\nFor short-lived blocks, one  can  create a  very  simple and efficient  allocation scheme \n[Barrett  and  Zorn,  1993].  For  \"permanent\"  blocks,  allocation  is  also  simple  and \ncheap, because  the  allocator  does  not  need  to  compute and  store  any of the  infor(cid:173)\nmation that would normally be required  to keep  track of the block and return it to \nthe pool when freed . \n\nOne  complication is  that of unequal loss for  different  types of incorrect  predictions. \nAn appropriately routed memory request may save dozens of instruction cycles,  but \nan  inappropriately  routed  one  may  cost  hundreds.  The  cost  in  terms  of memory \nmay  also  be  unequal:  a  short-lived  block  that  is  incorrectly  predicted  to  be  \"per(cid:173)\nmanent\"  will  permanently tie up  the space occupied  by  the block  (if it is  allocated \nvia a  method  that  can  not  be  freed).  A  \"permanent\"  block,  however,  that  is  in(cid:173)\ncorrectly  predicted  to  be  short-lived  may  pollute  the  allocator's short-term space \nby  preventing a  large segment of otherwise free  memory from  being reclaimed  (see \nBarrett  and  Zorn  for  examples). \n\nThese  risks  translate  into  a  time-space  tradeoff that  depends  on  the  properties  of \n\n\f942 \n\nD. A.  Cohn and S.  Singh \n\nthe specific  allocators used  and the space limitations of the target machine.  For our \nexperiments,  we  arbitrarily defined  false  positives and false  negatives to have equal \nloss,  except  where  noted  otherwise.  Other  cases  may  be  handled  by  reweighting \nthe  splitting  criterion,  or  by  rebalancing  the  training  inputs  (as  described  in  the \nfollowing section). \n\n4  EXPERIMENTS \n\nWe  conducted  two  types of experiments.  The first  measured  the  ability of learned \ndecision trees to predict allocation lifetimes.  The second incorporated these learned \ntrees into the target applications and measured the change in runtime performance. \n\n4.1  PREDICTIVE ACCURACY \n\nWe  used  the  OC1  decision  tree  software  (designed  by  Murthy  et  al.  [1994])  and \nconsidered  only  axis-parallel splits,  in effect,  conditioning each  decision on  a single \nstack feature.  We  chose  the sum minority criterion for  splits,  which  minimizes the \nnumber of training examples misclassified after the split.  For tree pruning, we  used \nthe  cost  complexity  heuristic,  which  holds  back  a  fraction  (in  our  case  10%)  of \nthe data set for  testing,  and selects  the smallest pruning of the original tree  that is \nwithin one standard error squared ofthe best tree  [Breiman et al.  1984].  The details \nof these  and other criteria may be found  in  Murthy et al.  [1994]  and Breiman et al. \n[1984].  In addition to the automatically-pruned trees,  we  also examined trees  that \nhad  been  truncated  to  four  leaves,  in effect  examining no  more than  two  features \nbefore  making a  decision. \nOC1  includes no  provisions for explicitly specifying a loss function for false positive \nand  false  negative  classifications.  It would  be  straightforward  to  incorporate  this \ninto the  sum minority splitting  criterion;  we  chose  instead  to  incorporate  the  loss \nfunction  into  the  training  set  itself,  by  duplicating  training  examples  to  match \nthe  target  ratios  (in  our  case,  forcing  an  equal  number  of positive  and  negative \nexamples). \nIn  our  experiments,  we  used  the  set  of benchmark  applications  reported  on  by \nBarrett  and  Zorn:  Ghostseript,  a  PostScript  interpreter,  Espresso,  a  PLA  logic \noptimizer,  and  Cfrae,  a program for  factoring large numbers,  Gawk,  an AWK  pro(cid:173)\ngramming  language  interpreter  and  Perl,  a  report  extraction  language.  We  also \nexamined  Gee,  a public-domain C  compiler, based  on our company's specific  inter(cid:173)\nest  in compiler technology. \n\nThe experimental procedure was as follows:  We linked the application program with \na  modified  mal/oe routine  which,  in  addition  to  allocating the  requested  memory, \nwrote to a trace file  the size  of the requested  block,  and the top 20  machine words \non  the program stack.  Calls to free  allowed  tagging the existing allocations, which, \nfollowing  Barrett  and  Zorn,  were  labeled  according  to  how  many  bytes  had  been \nallocated during their lifetime. 1 \n\nIt is  worth  noting  that  these  experiments  were  run  on  a  Sparcstation,  which  fre(cid:173)\nquently  optimizes  away  the  traditional  stack  frame.  While  it  would  have  been \npossible  to  force  the  system  to  maintain  a  traditional  stack,  we  wished  to  work \nfrom  whatever  information was  available from  the  program  \"in  the wild\" , without \noverriding system optimizations. \n\n1 We  have  also  examined,  with  comparable success,  predicting  lifetimes in  terms of the \nnumber of intervening  calls  to malloc;  which  may  be  argued  as  an equally  useful  measure. \nWe  focus  on  number of bytes  for  the purposes  of comparison  with  the existing  literature. \n\n\fPredicting Lifetimes in Dynamically Allocated Memory \n\n943 \n\nInput  files  were  taken  from  the  public  ftp  archive  made  available  by  Zorn  and \nGrunwald  [1993].  Our  procedure  was  to take  traces  of three  of the files  (typically \nthe  largest  three  for  which  we  could  store  an  entire  program  trace).  Two  of the \ntraces were  combined to form a training set for  the decision  tree, and the third was \nused  to test  the learned  tree. \nGhostseript  training files:  manual.ps and large.ps; test file:  ud-doc.ps \nEspresso  training files:  cps  and mlp4; test file:  Z5xp1 \nCfrae  training \n\nand \n327905606740421458831903; test input:  417576463441248601459380302877 \n\n41757646344123832613190542166099121 \n\ninputs: \n\nGawk  training file:  adj.awk/words-small.awk; test  file:  adj.awk/words-Iarge.awk2 \nPerl  training files:  endsort.perl  (endsort .perl  as input) , hosts.perl  (hosts-data.perl \n\nas  input) ; test file:  adj.perl(words-small.awk as  input) \n\nGee  training files:  cse.c and  combine.c; test file:  expr.c \n\n4.1.1  SHORT-LIVED  ALLOCATIONS \n\nFirst,  we  attempted  to  distinguish  short-lived  allocations  from  the  general  pool. \nFor comparison with  Barrett  and Zorn  [1993],  we  defined  \"short-lived\"  allocations \nas  those  that  were  freed  before  32k  subsequent  bytes  had  been  allocated.  The \nexperimental results of this section  are summarized in Table  1. \n\nBarrett &c  Zorn \n\nfalse  pos % \n\napplication \n\nghostscnpt  \u00b0 \n\nespresso \ncfrac \ngawk \nperl \ngcc \n\n0.006 \n3.65 \n0 \n1.11 \n-\n\nOC1 \n\nfalse  neg  % \n25.2 \n72 \n52.7 \n-3 \n78.6 \n-\n\nfalse  pos  % \n0.13 \n0.38 \n2.5 \n0.092 \n5.32 \n0.85 \n\n\\0.72) \n(1.39) \n(0.49) \n(0.092) \n(10.8) \n(2.54) \n\nfalse  neg  % \n1.7 \n\\13.5) \n(14.9) \n6.58 \n16.9 \n(19.4) \n(0.34) \n0.34 \n33.8 \n(34.3) \n31.1 \n(31.0) \n\nTable  1:  Prediction  errors  for  \"short-lived\"  allocations,  in  percentages  of misallo(cid:173)\ncated  bytes.  Values  in  parentheses  are  for  trees  that  have  been  truncated  to  two \nlevels.  Barrett  and  Zorn's results  included for  comparison where  available. \n\n4.1.2  \"PERMANENT\"  ALLOCATIONS \n\nWe  then  attempted  to  distinguish  \"permanent\"  allocations from  the  general  pool \n(Barrett and Zorn only consider the short-lived allocations discussed in the previous \nsection).  \"Permanent\"  allocations were  those that were  not freed  until the program \nterminated.  Note that there is some ambiguity in these definitions -\na  \"permanent\" \nblock  that  is  allocated  near  the end  of the  program's lifetime may also  be  \"short(cid:173)\nlived\".  Table 2 summarizes the results of these experiments. \nWe  have  not had the opportunity to examine the function  of each  of the  \"relevant \nfeatures\"  in the  program stacks; this is  a subject  for  future  work. \n\n2For  Gawk, we  varied  the training  to match that used  by  Barrett and  Zorn.  They used \nas training input  a single  gawk program file  run with one data set , and tested on  the same \ngawk program run with  another. \n\n3We were unable  to compute Barrett and Zorn's exact results here, although it appears \n\nthat  their false  negative  rate was less  than 1%. \n\n\f944 \n\nD. A.  Cohn and S.  Singh \n\napplication \nghostscript \nespresso \ncfrac \ngcc \n\nfalse  pos % \n0 \n0 \n0.019 \n0.35 \n\nfalse  neg  % \n0.067 \n1.27 \n3.3 \n19.5 \n\nTable 2:  Prediction errors for  \"permanent\"  allocations (%  misallocated bytes). \n\n4.2  RUNTIME PERFORMANCE \n\nThe  raw  error  rates  we  have  presented  above  indicate  that  it  is  possible  to  make \naccurate predictions about the lifetime of allocation requests,  but not whether those \npredictions  are  good  enough  to  improve  program  performance.  To  address  that \nquestion,  we  have incorporated predictive trees into three of the above applications \nand measured  the effect  on their runtimes. \n\nWe used  a hybrid implementation, replacing the single monolithic decision tree with \na  number of simpler, site-specific  trees.  A  \"site\"  in  this case  was a  lexical instance \nof a  call  to malloc or its equivalent.  When allocations from  a  site  were  exclusively \nshort-lived  or  permanent,  we  could  directly  insert  a  call  to one  of the specialized \nallocators (in  the manner of Barrett  and  Zorn).  When  allocations from a  site were \nmixed,  a  site-specific  tree  was  put in  place to predict the allocation lifetime. \n\nRequests predicted to be short-lived were routed to a  \"quick malloc\" routine similar \nto  the  one  described  by  Barrett  and  Zorn;  those  predicted  to  be  permanent  were \nrouted  to another routine specialized  for  the  purpose.  On  tests  with random data \nthese specialized  routines were  approximately four  times faster  than  \"malloc\". \n\nOur experiments targeted three  applications with varying degrees  of predictive  ac(cid:173)\ncuracy:  ghostscript,  gcc,  and  cfrac.  The results  are encouraging  (see  Table 3).  For \nghostscript  and  gcc,  which  have  the  best  predictive  accuracies  on  the  benchmark \ndata  (from  Section  4.1),  we  had  a  clear  improvement  in  performance.  For  cfrac, \nwith much lower accuracy,  we  had mixed results:  for shorter runs,  the runtime per(cid:173)\nformance  was  improved,  but on  longer  runs  there  were  enough  missed  predictions \nto pollute the short-lived memory area and degrade  performance. \n\n5  DISCUSSION \n\nThe  application of machine  learning  to  computer software  and  operating systems \nis  a  largely  untapped  field  with  promises of great  benefit.  In  this  paper  we  have \ndescribed  one such  application, producing efficient  and  accurate  predictions of the \nlifetimes of memory allocations. \n\nOur data suggest  that, even  with a feature set  as large as a  runtime program stack, \nit is possible to characterize and predict  the memory usage of a  program after only \na  few  benchmark runs.  The exceptions  appear  to be programs like  Perl  and gawk \nwhich  take  both  a  script  and  a  data file.  Their  memory usage  depends  not  only \nupon characterizing typical scripts, but the typical data sets those scripts act upon. 4 \n\nOur ongoing research  in  memory management is  pursuing  a  number of other con-\n\n4Perl's generalization  performance is significantly  better when tested on the same script \nwith  different  data.  We  have  reported  the  results  using  different  scripts  for  comparison \nwith Barrett  and  Zorn. \n\n\fPredicting Lifetimes in Dynamically Allocated Memory \n\n945 \n\nshort \n\nI long \n\nI permanent \n\n16/256432 \n\n0/3431 \n\nbenchmark test  error \n\napplication \n{training set) \nghostscript, trained on  ud-doc.ps;  7 sites,  1 tree \nmanual.ps \n0/0 \nlarge.ps \nthesis.ps \ngcc, trained on  combine,  cse, c-decl;  17  sites, 4 trees \nexpr.c \nloop.c \nreload1.c \ncfrac,  trained on  100 \u00b7 . \u00b7057; 8 sItes,  4 trees \n327 .. \u00b7903 \n417\u00b7 \u00b7 \u00b7771 \n417 .. \u00b7 121 \n\n24/7970099  13172/22332  106/271 \n\n0/11988 \n\n2786/11998 \n\n301/536875  12.59 \n5.16 \n7.02 \n\nrun tIme \n\nnormal  I predictive \n\n96.29 \n17.22 \n40.27 \n\n95.43 \n16.75 \n37.57 \n\n12.40 \n5.16 \n6.81 \n\n7.75 \n67.93 \n225.31 \n\n7.23 \n74.57 \n245 .64 \n\nTable  3:  Running  times in  seconds  for  applications with  site-specific  trees.  Times \nshown are averages over 24-40 runs , and with the exception of loop.c, are statistically \nsignificant with  probability greater  than 99%. \n\ntinuations of the  results described  here, including lifetime clustering and intelligent \ngarbage  collection. \n\nREFERENCES \n\nD.  Barrett  and  B.  Zorn  (1993)  Using  lifetime  predictors  to  improve  memory \nallocation  performance.  SIGPLAN'93  - Conference  on  Programming  Language \nDesign  and Implementation, June  1993, Albuquerque, New  Mexico,  pp.  187-196. \n\nL. Breiman, J.  Friedman, R.  Olshen and C.  Stone (1984)  Classification  and \nRegression  Trees, Wadsworth International Group , Belmont, CA. \nD.  Grunwald and B.  Zorn (1992) CUSTOMALLOC: Efficient synthesized  mem(cid:173)\nory  allocators.  Technical  Report  CU-CS-602-92, Dept.  of Computer Science, Uni(cid:173)\nversity of Colorado. \n\nS.  Murthy, S.  Kasif and  S.  Salzberg (1994)  A system for  induction of oblique \ndecision  trees.  Journal  of Artificial Intelligence  Research  2:1-32. \nP.  Wilson,  M.  Johnstone,  M.  Neely  and  D.  Boles  (1995)  Dynamic storage \nallocation :  a  survey  and  critical  review .  Proc.  1995  Intn'l  Workshop  on  Memory \nManagement, Kinross,  Scotland, Sept.  27-29, Springer  Verlag. \nB.  Zorn  and  D.  Grunwald  (1993)  A  set  of benchmark  inputs  made  publicly \navailable, in  ftp  archive ftp. cs. colorado . edu: /pub/misc/malloc-benchmarks/. \n\n\f", "award": [], "sourceid": 1240, "authors": [{"given_name": "David", "family_name": "Cohn", "institution": null}, {"given_name": "Satinder", "family_name": "Singh", "institution": null}]}