{"title": "Learning to Perform Local Rewriting for Combinatorial Optimization", "book": "Advances in Neural Information Processing Systems", "page_first": 6281, "page_last": 6292, "abstract": "Search-based methods for hard combinatorial optimization are often guided by heuristics. Tuning heuristics in various conditions and situations is often time-consuming. In this paper, we propose NeuRewriter that learns a policy to pick heuristics and rewrite the local components of the current solution to iteratively improve it until convergence. The policy factorizes into a region-picking and a rule-picking component, each parameterized by a neural network trained with actor-critic methods in reinforcement learning. NeuRewriter captures the general structure of combinatorial problems and shows strong performance in three versatile tasks: expression simplification, online job scheduling and vehicle routing problems. NeuRewriter outperforms the expression simplification component in Z3; outperforms DeepRM and Google OR-tools in online job scheduling; and outperforms recent neural baselines and Google OR-tools in vehicle routing problems.", "full_text": "LearningtoPerformLocalRewritingforCombinatorialOptimizationXinyunChen\u2217UCBerkeleyxinyun.chen@berkeley.eduYuandongTianFacebookAIResearchyuandong@fb.comAbstractSearch-basedmethodsforhardcombinatorialoptimizationareoftenguidedbyheuristics.Tuningheuristicsinvariousconditionsandsituationsisoftentime-consuming.Inthispaper,weproposeNeuRewriterthatlearnsapolicytopickheuristicsandrewritethelocalcomponentsofthecurrentsolutiontoitera-tivelyimproveituntilconvergence.Thepolicyfactorizesintoaregion-pickingandarule-pickingcomponent,eachparameterizedbyaneuralnetworktrainedwithactor-criticmethodsinreinforcementlearning.NeuRewritercapturesthegeneralstructureofcombinatorialproblemsandshowsstrongperformanceinthreeversatiletasks:expressionsimpli\ufb01cation,onlinejobschedulingandvehi-cleroutingproblems.NeuRewriteroutperformstheexpressionsimpli\ufb01cationcomponentinZ3[15];outperformsDeepRM[33]andGoogleOR-tools[19]inonlinejobscheduling;andoutperformsrecentneuralbaselines[35,29]andGoogleOR-tools[19]invehicleroutingproblems.21IntroductionSolvingcombinatorialproblemsisalong-standingchallengeandhasalotofpracticalapplications(e.g.,jobscheduling,theoremproving,planning,decisionmaking).Whileproblemswithspeci\ufb01cstructures(e.g.,shortestpath)canbesolvedef\ufb01cientlywithprovenalgorithms(e.g,dynamicprogram-ming,greedyapproach,search),manycombinatorialproblemsareNP-hardandrelyonmanuallydesignedheuristicstoimprovethequalityofsolutions[1,40,27].Althoughitisusuallyeasytocomeupwithmanyheuristics,determiningwhenandwheresuchheuristicsshouldbeapplied,andhowtheyshouldbeprioritized,istime-consuming.Ittakescommercialsolversdecadestotunetostrongperformanceinpracticalproblems[15,44,19].Toaddressthisissue,previousworksuseneuralnetworkstopredictacompletesolutionfromscratch,givenacompletedescriptionoftheproblem[50,33,29,21].Whilethisavoidssearchandtuning,adirectpredictioncouldbedif\ufb01cultwhenthenumberofvariablesgrows.Improvingiterativelyfromanexistingsolutionisacommonapproachforcontinuoussolutionspaces,e.g,trajectoryoptimizationinrobotics[34,47,31].However,suchmethodsrelyingongradientinformationtoguidethesearch,isnotapplicablefordiscretesolutionspacesduetoindifferentiablity.Toaddressthisproblem,wedirectlylearnaneural-basedpolicythatimprovesthecurrentsolutionbyiterativelyrewritingalocalpartofituntilconvergence.Inspiredbytheproblemstructures,thepolicyisfactorizedintotwoparts:theregion-pickingandtherule-pickingpolicy,andistrainedend-to-endwithreinforcementlearning,rewardingcumulativeimprovementofthesolution.Weapplyourapproach,NeuRewriter,tothreedifferentdomains:expressionsimpli\ufb01cation,onlinejobscheduling,andvehicleroutingproblems.WeshowthatNeuRewriterisbetterthanstrong\u2217WorkpartiallydonewheninterningatFacebookAIResearch.2Thecodeisavailableathttps://github.com/facebookresearch/neural-rewriter.33rdConferenceonNeuralInformationProcessingSystems(NeurIPS2019),Vancouver,Canada.\fheuristicsusingmultiplemetrics.Forexpressionsimpli\ufb01cation,NeuRewriteroutperformstheexpressionsimpli\ufb01cationcomponentinZ3[15].Foronlinejobscheduling,underacontrolledsetting,NeuRewriteroutperformsGoogleOR-tools[19]intermsofbothspeedandqualityofthesolution,andDeepRM[33],aneural-basedapproachthatpredictsaholisticschedulingplan,bylargemarginsespeciallyinmorecomplicatedsetting(e.g.,withmoreheterogeneousresources).Forvehicleroutingproblems,NeuRewriteroutperformstworecentneuralnetworkapproaches[35,29]andGoogleOR-tools[19].Furthermore,extensiveablationstudiesshowthatourapproachworkswellindifferentsituations(e.g.,differentexpressionlengths,non-uniformjob/resourcedistribution),andtransferswellwhendistributionshifts(e.g.,testonlongerexpressionsthanthoseusedfortraining).!\"#\"\u223c%#\u22c5|!\"(\"\u223c%(\u22c5|!\"[#\"]!\"+,=.(!\",#\",(\")CurrentState(i.e.Solution)Region-PickerRule-Picker!\"#\"!\"[#\"](\"!\"+,Figure1:Theframeworkofourneuralrewriter.Giventhecurrentstate(i.e.,solutiontotheoptimizationproblem)st,we\ufb01rstpickaregion\u03c9tbytheregion-pickingpolicy\u03c0\u03c9(\u03c9t|st),andthenpickarewritingruleutusingtherule-pickingpolicy\u03c0u(ut|st[\u03c9t]),where\u03c0u(ut|st[\u03c9t])givestheprobabilitydistributionofapplyingeachrewritingruleu\u2208Utothepartialsolution.Oncethepartialsolutionisupdated,weobtainanimprovedsolutionst+1andrepeattheprocessuntilconvergence.2RelatedWorkMethods.Usingneuralnetworkmodelsforcombinatorialoptimizationhasbeenexploredinthelastfewyears.Astraightforwardideaistoconstructasolutiondirectly(e.g.,withaSeq2Seqmodel)fromtheproblemspeci\ufb01cation[50,6,33,28].However,suchapproachesmightmeetwithdif\ufb01cultiesiftheproblemhascomplexcon\ufb01gurations,asourevaluationindicates.Incontrast,ourpaperfocusesoniterativeimprovementofacompletesolution.Trajectoryoptimizationwithlocalgradientinformationhasbeenwidelystudiedinroboticswithmanyeffectivetechniques[34,9,51,47,32,31].Fordiscreteproblems,itispossibletoapplycontinuousrelaxationandapplygradientdescent[10].Incontrast,welearnthegradientfrompreviousexperiencetooptimizeacompletesolution,similartodata-drivendescent[49]andsyntheticgradient[26].Atahighlevel,ourframeworkiscloselyconnectedwiththelocalsearchpipeline.Speci\ufb01cally,wecanleverageourlearnedRLpolicytoguidethelocalsearch,i.e.,todecidewhichneighborsolutiontomoveto.Wewilldemonstratethatinourevaluatedtasks,ourapproachoutperformsseverallocalsearchalgorithmsguidedbymanuallydesignedheuristics,andsoftwaressupportingmoreadvancedlocalsearchalgorithms,i.e.,Z3[15]andOR-tools[19].Applications.Forexpressionsimpli\ufb01cation,somerecentworkusedeepneuralnetworkstodiscoverequivalentexpressions[11,2,52].Inparticular,[11]trainsadeepneuralnetworktorewritealgebraicexpressionswithsupervisedlearning,whichrequiresacollectionofgroundtruthrewritingpaths,andmaynot\ufb01ndnovelrewritingroutines.Wemitigatetheselimitationsusingreinforcementlearning.Jobschedulingandresourcemanagementproblemsareubiquitousandfundamentalincomputersystems.Variousworkhavestudiedtheseproblemsfromboththeoreticalandempiricalsides[8,20,3,42,48,33,13].Inparticular,arecentlineofworkstudiesdeepreinforcementlearningforjobscheduling[33,13]andvehicleroutingproblems[29,35].Ourapproachistestedonmultipledomainswithextensiveablationstudies,andcouldalsobeextendedtoothercloselyrelatedtaskssuchascodeoptimization[41,12],theoremproving[25,30,4,24],textsimpli\ufb01cation[14,37,18],andclassicalcombinatorialoptimizationproblemsbeyondroutingproblems[16,28,7,50,27],e.g.,VertexCoverProblem[5].3ProblemSetupLetSbethespaceofallfeasiblesolutionsintheproblemdomain,andc:S\u2192Rbethecostfunction.Thegoalofoptimizationisto\ufb01ndargmins\u2208Sc(s).Inthiswork,insteadof\ufb01ndingasolutionfromscratch,we\ufb01rstconstructafeasibleone,thenmakeincrementalimprovementbyiterativelyapplyinglocalrewritingrulestotheexistingsolutionuntilconvergence.Ourrewritingformulationisespeciallysuitableforproblemswiththefollowingproperties:(1)afeasiblesolution2\fiseasyto\ufb01nd;(2)thesearchspacehaswell-behavedlocalstructures,whichcouldbeutilizedtoincrementallyimprovethesolution.Forsuchproblems,acompletesolutionprovidesafullcontextfortheimprovementusingarewriting-basedapproach,allowingadditionalfeaturestobecomputed,whichishardtoobtainifthesolutionisgeneratedfromscratch;meanwhile,differentsolutionsmightshareacommonroutinetowardstheoptimum,whichcouldberepresentedaslocalrewritingrules.Forexample,itismucheasiertodecidewhethertopostponejobswithlargeresourcerequirementswhenanexistingjobscheduleisprovided.Furthermore,simpleruleslikeswappingtwojobscouldimprovetheperformance.Formally,eachsolutionisastate,andeachlocalregionandtheassociatedrewritingruleisanaction.Optimizationasarewritingproblem.LetUbetherewritingruleset.Supposestisthecurrentsolution(orstate)atiterationt.We\ufb01rstcomputeastate-dependentregionset\u2126(st),thenpickaregion\u03c9t\u2208\u2126(st)usingtheregion-pickingpolicy\u03c0\u03c9(\u03c9t|st).Wethenpickarewritingruleutapplicabletothatregion\u03c9tusingtherule-pickingpolicy\u03c0u(ut|st[\u03c9t]),wherest[\u03c9t]isasubsetofstatest.Wethenapplythisrewritingruleut\u2208Utost[\u03c9t],andobtainthenextstatest+1=f(st,\u03c9t,ut).Givenaninitialsolution(orstate)s0,ourgoalisto\ufb01ndasequenceofrewritingsteps(s0,(\u03c90,u0)),(s1,(\u03c91,u1)),...,(sT\u22121,(\u03c9T\u22121,uT\u22121)),sTsothatthe\ufb01nalcostc(sT)isminimized.Totacklearewritingproblem,rule-basedrewriterswithmanually-designedrewritingroutineshavebeenproposed[23].However,manuallydesigningsuchroutinesisnotatrivialtask.Anincompletesetofroutinesoftenleadstoaninef\ufb01cientexhaustivesearch,whileasetofkaleidoscopicroutinesisoftencumbersometodesign,hardtomaintainandlacks\ufb02exibility.Inthispaper,weproposetotrainaneuralnetworkinstead,usingreinforcementlearning.Recentadvanceindeepreinforcementlearningsuggeststhepotentialofwell-trainedmodelstodiscovernoveleffectivepolicies,suchasdemonstratedinComputerGo[43]andvideogames[36].Moreover,byleveragingreinforcementlearning,ourapproachcouldbeextendedtoabroaderrangeofproblemsthatcouldbehardforrule-basedrewritersandclassicsearchalgorithms.Forexample,wecandesigntherewardtotakethevalidityofthesolutionintoaccount,sothatwecanstartwithaninfeasiblesolutionandthenmovetowardsafeasibleone.Ontheotherhand,wecanalsotraintheneuralnetworktoexploretheconnectionsbetweendifferentsolutionsinthesearchspace.Inourevaluation,wedemonstratethatourapproach(1)mitigateslaborioushumanefforts,(2)discoversnovelrewritingpathsfromitsownexploration,and(3)\ufb01ndsbettersolutiontooptimizationproblemthanthecurrentstate-of-the-artandtraditionalheuristic-basedsoftwarepackagestunedfordecades.4NeuralRewriterModelInthefollowing,wepresentthedesignofourrewritingmodel,i.e.,NeuRewriter.We\ufb01rstprovideanoverviewofourmodelframework,thenpresentthedesigndetailsfordifferentapplications.4.1ModelOverviewFigure1illustratestheoverallframeworkofourneuralrewriter,andwedescribethetwokeycomponentsforrewritingasfollows.MoredetailscanbefoundinAppendixC.Scorepredictor.Giventhestatest,thescorepredictorcomputesascoreQ(st,\u03c9t)forevery\u03c9t\u2208\u2126(st),whichmeasuresthebene\ufb01tofrewritingst[\u03c9t].Ahighscoreindicatesthatrewritingst[\u03c9t]couldbedesirable.Notethat\u2126(st)isaproblem-dependentregionset.Forexpressionsimpli\ufb01cation,\u2126(st)includesallsub-treesoftheexpressionparsetrees;forjobscheduling,\u2126(st)coversalljobnodesforscheduling;andforvehiclerouting,itincludesallnodesintheroute.Ruleselector.Givenst[\u03c9t]toberewritten,therule-pickingpolicypredictsaprobabilitydistribution\u03c0u(st[\u03c9t])overtheentirerulesetU,andselectsaruleut\u2208Utoapplyaccordingly.4.2TrainingDetailsLet(s0,(\u03c90,u0)),...,(sT\u22121,(\u03c9T\u22121,uT\u22121)),sTbetherewritingsequenceintheforwardpass.Rewardfunction.Wede\ufb01ner(st,(\u03c9t,ut))asr(st,(\u03c9t,ut))=c(st)\u2212c(st+1),wherec(\u00b7)isthetask-speci\ufb01ccostfunctioninSection3.Q-Actor-Critictraining.Wetraintheregion-pickingpolicy\u03c0\u03c9andrule-pickingpolicy\u03c0usimulta-neously.For\u03c0\u03c9(\u03c9t|st;\u03b8),weparameterizeitasasoftmaxoftheunderlyingQ(st,\u03c9t;\u03b8)function:\u03c0\u03c9(\u03c9t|st;\u03b8)=exp(Q(st,\u03c9t;\u03b8))P\u03c9texp(Q(st,\u03c9t;\u03b8))(1)3\fmin-v0v2v1v1!\"#\"\u2217=argmax+#\u22c5,!\"ConstantReduction0./\u2264(a)(b)(c)015342#\"\u2217!\"./43swap!\"3210Route:0\u21921\u21922\u21920\u21923\u21920#\"\u22172swap./0Figure2:TheinstantiationofNeuRewriterfordifferentdomains:(a)expressionsimpli\ufb01cation;(b)jobscheduling;and(c)vehiclerouting.In(a),stistheexpressionparsetree,whereeachsquarerepresentsanodeinthetree.Theset\u2126(st)includeseverysub-treerootedatanon-terminalnode,fromwhichtheregion-pickingpolicyselects\u03c9t\u223c\u03c0\u03c9(\u03c9t|st))torewrite.Afterwards,therule-pickingpolicypredictsarewritingruleut\u2208U,thenrewritesthesub-tree\u03c9ttogetthenewtreest+1.In(b),stisthedependencygraphrepresentationofthejobschedule.Eachcirclewithindexgreaterthan0representsajobnode,andnode0isanadditionalonerepresentingthemachine.Edgesinthegraphre\ufb02ectjobdependencies.Theregion-pickingpolicyselectsajob\u03c9ttore-schedulefromalljobnodes,thentherule-pickingpolicychoosesamovingactionutfor\u03c9t,thenmodi\ufb01essttogetanewdependencygraphst+1.In(c),stisthecurrentroute,and\u03c9tisthenodeselectedtochangethevisitorder.Node0isthedepot,andothernodesarecustomerswithcertainresourcedemands.Theregion-pickingpolicyandtherule-pickingpolicyworksimilarlytothejobschedulingones.andinsteadlearnQ(st,\u03c9t;\u03b8)by\ufb01ttingittothecumulativerewardsampledfromthecurrentpolicies\u03c0\u03c9and\u03c0u:L\u03c9(\u03b8)=1TT\u22121Xt=0(T\u22121Xt0=t\u03b3t0\u2212tr(s0t,(\u03c90t,u0t))\u2212Q(st,\u03c9t;\u03b8))2(2)WhereTisthelengthoftheepisode(i.e.,thenumberofrewritingsteps),and\u03b3isthedecayfactor.Forrule-pickingpolicy\u03c0u(ut|st[\u03c9t];\u03c6),weemploytheAdvantageActor-Criticalgorithm[45]withthelearnedQ(st,\u03c9t;\u03b8)asthecritic,andthusavoidboot-strappingwhichcouldcausesampleinsuf\ufb01ciencyandinstabilityintraining.Thisformulationissimilarinspirittosoft-Qlearning[22].Denoting\u2206(st,(\u03c9t,ut))\u2261PT\u22121t0=t\u03b3t0\u2212tr(s0t,(\u03c90t,u0t))\u2212Q(st,\u03c9t;\u03b8)astheadvantagefunction,thelossfunctionoftheruleselectoris:Lu(\u03c6)=\u2212T\u22121Xt=0\u2206(st,(\u03c9t,ut))log\u03c0u(ut|st[\u03c9t];\u03c6)(3)TheoveralllossfunctionisL(\u03b8,\u03c6)=Lu(\u03c6)+\u03b1L\u03c9(\u03b8),where\u03b1isahyper-parameter.MoretrainingdetailscanbefoundinAppendixD.5ApplicationsInthefollowingsections,wediscusstheapplicationofourrewritingapproachtothreedifferentdomains:expressionsimpli\ufb01cation,onlinejobscheduling,andvehiclerouting.Inexpressionsimpli\ufb01cation,weminimizetheexpressionlengthusingawell-de\ufb01nedsemantics-preservingrewritingruleset.Inonlinejobscheduling,weaimtoreducetheoverallwaitingtimeofjobs.Invehiclerouting,weaimtominimizethetotaltourlength.5.1ExpressionSimpli\ufb01cationWe\ufb01rstapplyourapproachtoexpressionsimpli\ufb01cationdomain.Inparticular,weconsiderexpressionsinHalide,adomain-speci\ufb01clanguageforhigh-performanceimageprocessing[39],whichiswidelyusedatscaleinmultipleproductsofGoogle(e.g.,YouTube)andAdobePhotoshop.SimplifyingHalideexpressionsisanimportantsteptowardstheoptimizationoftheentirecode.Tothisend,arule-basedrewriterisimplementedfortheexpressions,whichiscarefullytunedwithmanually-designedheuristics.Thegrammaroftheexpressionsconsideredintherewriterisspeci\ufb01edinAppendixA.1.Noticethatthegrammarincludesamorecomprehensiveoperatorsetthanpreviousworkson\ufb01ndingequivalentexpressions,whichconsideronlybooleanexpressions[2,17]orasubsetofalgorithmicoperations[2].Therewriterincludeshundredsofmanually-designedrewritingtemplates.Givenanexpression,therewriterchecksthetemplatesinapre-designedorder,andappliesthoserewritingtemplatesthatmatchanysub-expressionoftheinput.Afterinvestigatingtherewritingtemplatesintherule-basedrewriter,we\ufb01ndthatalargenumberofrewritingtemplatesenumeratespeci\ufb01ccasesforanuphillrule,whichlengthenstheexpression\ufb01rst4\fandshortensitlater(e.g.,\u201cmin/max\u201dexpansion).Similartomomentumtermsingradientdescentforcontinuousoptimization,suchrulesareusedtoescapealocaloptimum.However,theyshouldonlybeappliedwhentheinitialexpressionsatis\ufb01escertainpre-conditions,whichistraditionallyspeci\ufb01edbymanualdesign,acumbersomeprocessthatishardtogeneralize.Observingtheselimitations,wehypothesizethataneuralnetworkmodelhasthepotentialofdoingabetterjobthantherule-basedrewriter.Inparticular,weproposetoonlykeepthecorerewritingrulesintheruleset,removeallunnecessarypre-conditions,andlettheneuralnetworkdecidewhichandwhentoapplyeachrewritingrule.Inthisway,theneuralrewriterhasabetter\ufb02exibilitythantherule-basedrewriter,becauseitcanlearnsuchrewritingdecisionsfromdata,andhastheabilityofdiscoveringnovelrewritingpatternsthatarenotincludedintherule-basedrewriter.Ruleset.WeincorporatetwokindsoftemplatesfromHaliderewritingruleset.The\ufb01rstkindissimplerules(e.g.,v\u2212v\u21920),whilethesecondoneistheuphillrulesafterremovingtheirmanuallydesignedpre-conditionsthatdonotaffectthevalidityoftherewriting.Inthisway,arulesetwith|U|=19categoriesisbuilt.SeeAppendixB.1formoredetails.Modelspeci\ufb01cation.Weuseexpressionparsetreesastheinput,andemploytheN-aryTree-LSTMdesignedin[46]astheinputencodertocomputetheembeddingforeachnodeinthetree.Boththescorepredictorandtheruleselectorarefullyconnectedneuralnetworks,takentheLSTMembeddingsastheinput.MoredetailscanbefoundinAppendixC.1.5.2JobSchedulingProblemWealsostudythejobschedulingproblem,usingtheproblemsetupin[33].Notation.SupposewehaveamachinewithDtypesofresources.Eachjobjisspeci\ufb01edasvj=(\u03c1j,Aj,Tj),wheretheD-dimensionalvector\u03c1j=[\u03c1jd]denotestherequiredportion0\u2264\u03c1jd\u22641oftheresourcetyped,Ajisthearrivaltimestep,andTjistheduration.Inaddition,wede\ufb01neBjasthescheduledbeginningtime,andCj=Bj+Tjasthecompletiontime.Weassumethattheresourcerequirementis\ufb01xedduringtheentirejobexecution,eachjobmustruncontinuouslyuntil\ufb01nishing,andnopreemptionisallowed.Weadoptanonlinesetting:thereisapendingjobqueuethatcanholdatmostWjobs.Whenanewjobarrives,itcaneitherbeallocatedimmediately,orbeaddedtothequeue.Ifthequeueisalreadyfull,tomakespaceforthenewjob,atleastonejobinthequeueneedstobescheduledimmediately.Thegoalisto\ufb01ndatimescheduleforeveryjob,sothattheaveragewaitingtimeisasshortaspossible.Ruleset.Thesetofrewritingrulesistore-scheduleajobvjandallocateitafteranotherjobvj0\ufb01nishesoratitsarrivaltimeAj.SeeAppendixB.2fordetailsofarewritingstep.Thesizeoftherewritingrulesetis|U|=2W,sinceeachjobcouldonlyswitchitsschedulingorderwithatmostWofitsformerandlatterjobsrespectively.Representation.Werepresenteachscheduleasadirectedacyclicgraph(DAG),whichdescribesthedependencyamongthescheduletimeofdifferentjobs.Speci\ufb01cally,wedenoteeachjobvjasanodeinthegraph,andweaddanadditionalnodev0torepresentthemachine.IfajobvjisscheduledatitsarrivaltimeAj(i.e.,Bj=Aj),thenweaddadirectededgehv0,vjiinthegraph.Otherwise,theremustexistatleastonejobvj0suchthatCj0=Bj(i.e.,jobjstartsrightafterjobj0).Weaddanedgehvj0,vjiforeverysuchjobvj0tothegraph.Figure2(b)showsthesetting,andwedefertheembeddingandgraphconstructiondetailstoAppendixC.2.Modelspeci\ufb01cation.Toencodethegraphs,weextendtheChild-SumTree-LSTMarchitecturein[46],whichissimilartotheDAG-structuredLSTMin[53].Similartotheexpressionsimpli\ufb01cationmodel,boththescorepredictorandtheruleselectorarefullyconnectedneuralnetworks,andwedeferthemodeldetailstoAppendixC.2.5.3VehicleRoutingProblemInaddition,weevaluateourapproachonvehicleroutingproblemsstudiedin[29,35].Speci\ufb01cally,wefocusontheCapacitatedVRP(CVRP),whereasinglevehiclewithlimitedcapacityneedstosatisfytheresourcedemandsofasetofcustomernodes.Todoso,weconstructmultipleroutesstartingandendingatthedepot,i.e.,node0inFigure2(c),sothattheresourcesdeliveredineachroutedonotexceedthevehiclecapacity,whilethetotalroutelengthisminimized.5\fWerepresenteachvehicleroutingproblemasasequenceofthenodesvisitedinthetour,anduseabi-directionalLSTMtoembedtheroutes.Therulesetissimilartothejobscheduling,whereeachnodecanswapwithanothernodeintheroute.Thearchitecturesofthescorepredictorandruleselectoraresimilartojobscheduling.MoredetailscanbefoundinAppendixC.3.6ExperimentsWepresenttheevaluationresultsinthissection.Tocalculatetheinferencetime,werunallalgorithmsonthesameserverequippedwith2QuadroGP100GPUsand80CPUcores.Only1GPUisusedwhenevaluatingneuralnetworks,and4CPUcoresareusedforsearchalgorithms.Wesetthetimeoutofsearchalgorithmstobe10secondsperinstance.AllneuralnetworksinourevaluationareimplementedinPyTorch[38].6.1ExpressionSimpli\ufb01cationSetup.Toconstructthedataset,we\ufb01rstgeneraterandompipelinesusingthegeneratorinHalide,thenextractexpressionsfromthem.We\ufb01lteroutthoseirreducibleexpressions,thensplittherestinto8/1/1fortraining/validation/testsetsrespectively.SeeAppendixA.1formoredetails.Metrics.Weevaluatethefollowingtwometrics:(1)Averageexpressionlengthreduction,whichisthelengthreducedfromtheinitialexpressiontotherewrittenone,andthelengthisde\ufb01nedasthenumberofcharactersintheexpression;(2)Averagetreesizereduction,whichisthenumberofnodesdecreasedfromtheinitialexpressionparsetreetotherewrittenone.Baselines.WeexaminetheeffectivenessofNeuRewriteragainsttwokindsofbaselines.The\ufb01rstkindofbaselinesareheuristic-basedrewritingapproaches,includingHalide-rule(therule-basedHaliderewriterinSection3)andHeuristic-search,whichappliesbeamsearchto\ufb01ndtheshortestrewritingwithourrulesetateachstep.NotethatNeuRewriterdoesnotusebeamsearch.Inaddition,wealsocompareourapproachwithZ3,ahigh-performancetheoremproverdevelopedbyMicrosoftResearch[15].Z3providestwotacticstosimplifytheexpressions:Z3-simplifyperformssomelocaltransformationusingitspre-de\ufb01nedrules,andZ3-ctx-solver-simplifytraverseseachsub-formulaintheinputexpressionandinvokesthesolverto\ufb01ndasimplerequivalentonetoreplaceit.Thissearch-basedtacticisabletoperformsimpli\ufb01cationnotincludedintheHalideruleset,andisgenerallybetterthantherule-basedcounterpartbutwithmorecomputation.ForZ3-ctx-solver-simplify,wesetthetimeouttobe10secondsforeachinputexpression.Results.Figure3apresentsthemainresults.WecannoticethattheperformanceofZ3-simplifyisworsethanHalide-rule,becausetherulesetincludedinthissimpli\ufb01erismorerestrictedthantheHalideone,andinparticular,itcannothandleexpressionswith\u201cmax/min/select\u201doperators.Ontheotherhand,NeuRewriteroutperformsboththerule-basedrewritersandtheheuristicsearchbyalargemargin.Inparticular,NeuRewritercouldreducetheexpressionlengthandparsetreesizebyaround52%and59%onaverage;comparedtotherule-basedrewriters,ourmodelfurtherreducestheaverageexpressionlengthandtreesizebyaround20%and15%respectively.Weobservethatthemainperformancegaincomesfromlearningtoapplyuphillrulesappropriatelyinwaysthatarenotincludedinthemanually-designedtemplates.Forexample,considertheexpression5\u2264max(max(v0,3)+3,max(v1,v2)),whichcouldbereducedtoTruebyexpandingmax(max(v0,3)+3,max(v1,v2))andmax(v0,3).Usingarule-basedrewriterwouldrequiretheneedofspecifyingthepre-conditionsrecursively,whichbecomesprohibitivewhentheexpressionsbecomemorecomplex.Ontheotherhand,heuristicsearchmaynotbeableto\ufb01ndthecorrectorderofexpandingtherighthandsizeoftheexpressionwhenmore\u201cmin/max\u201dareincluded,whichwouldmakethesearchlessef\ufb01cient.Furthermore,NeuRewriteralsooutperformsZ3-ctx-solver-simplifyintermsofboththeresultqualityandthetimeef\ufb01ciency,asshowninFigure3aandTable1a.NotethattheimplementationofZ3isinC++andhighlyoptimized,whileNeuRewriterisimplementedinPython;meanwhile,Z3-ctx-solver-simplifycouldperformrewritingstepsthatarenotincludedintheHalideruleset.MoreresultscanbefoundinAppendixG.Generalizationtolongerexpressions.Tomeasurethegeneralizabilityofourapproach,weconstruct4subsetsofthetrainingset:Train\u226420,Train\u226430,Train\u226450andTrain\u2264100,whichonlyincludeexpressionsoflengthatmost20,30,50and100inthefulltrainingset.WealsobuildTest>100,asubsetofthefulltestsetthatonlyincludesexpressionsoflengthlargerthan100.ThestatisticsofthesedatasetscanbefoundinAppendixA.1.6\fAverage expression length reductionAverage tree size reduction010203040506070Average expression length reduction17.7436.1347.0850.8157.2805101520Average tree size reduction7.399.6813.7615.8216.71Z3-simplifyHalide-ruleHeuristic SearchZ3-ctx-solver-simplifyNeuRewriter(a)TestTest>100020406080100Average expression length reduction36.1345.2550.8169.7957.2879.0854.3572.9551.4969.9350.7465.0950.5564.44Halide-ruleZ3-ctx-solver-simplifyNeuRewriter (Train)NeuRewriter (Train100)NeuRewriter (Train50)NeuRewriter (Train30)NeuRewriter (Train20)(b)Figure3:Experimentalresultsoftheexpressionsimpli\ufb01cationproblem.In(b),wetrainNeuRewriteronexpressionsofdifferentlengths(describedinthebrackets).(a)(b) (c)(d)Figure4:Experimentalresultsofthejobschedulingproblemvaryingthefollowingaspects:(a)thenumberofresourcetypesD;(b)jobfrequency;(c)resourcedistribution;(d)joblength.ForNeuRewriter,wedescribetrainingjobdistributionsinthebrackets.Workloadsin(a)arewithsteadyjobfrequency,non-uniformresourcedistribution,andnon-uniformjoblength.In(b),(c)and(d),D=20.In(b)and(c),weomitthecomparisonwithsomeapproachesbecausetheirresultsaresigni\ufb01cantlyworse;forexample,theaverageslowdownofEJFis14.53onthedynamicjobfrequency,and11.06ontheuniformresourcedistribution.MoreresultscanbefoundinAppendixE.VRP20, Cap30VRP50, Cap40VRP100, Cap500510152025Average tour length7.0812.9620.336.8112.2518.966.4311.3117.166.4011.1516.966.2510.6216.236.1610.5116.10Random SweepRandom CWOr-toolsNazari et al. (RL beam 10)AM (sampling)NeuRewriter(a)VRP20, Cap30VRP50, Cap40VRP100, Cap500510152025Average tour length6.1611.5118.866.3810.5117.336.6511.6316.10NeuRewriter (VRP20, Cap30)NeuRewriter (VRP50, Cap40)NeuRewriter (VRP100, Cap50)(b)Figure5:Experimentalresultsofthevehicleroutingproblemwithdifferentnumberofcustomernodesandvehiclecapacity;e.g.,VRP100,Cap50meansthereare100customernodesandthevehiclecapacityis50.(a)NeuRewriteroutperformsmultiplebaselinesandpreviousworks[29,35].MoreresultscanbefoundinAppendixF.(b)WeevaluatethegeneralizationperformanceofNeuRewriteronproblemsfromdifferentdistributions,andwedescribethetrainingproblemdistributionsinthebrackets.7\fWepresenttheresultsoftrainingourmodelondifferentdatasetsaboveinFigure3b.Eventrainedonshortexpressions,NeuRewriterisstillcomparablewiththeZ3solver.Thankstolocalrewritingrules,ourapproachcangeneralizewellevenwhenoperatingonverydifferentdatadistributions.6.2JobSchedulingProblemSetup.Werandomlygenerate100Kjobsequences,anduse80K/10K/10Kfortraining,validationandtesting.Typicallyeachjobsequenceincludes\u223c50jobs.Weuseanonlinesettingwherejobsarriveonthe\ufb02ywithapendingjobqueueoflengthW=10.Unlessstatedotherwise,wegenerateinitialschedulesusingEarliestJobFirst(EJF),whichcanbeconstructedwithnegligibleoverhead.WhenthenumberofresourcetypesD=2,wefollowthesamesetupasin[33].ThemaximaljobdurationTmax=15,andthelatestjobarrivaltimeAmax=50.WithlargerD,exceptchangingtheresourcerequirementofeachjobtoincludemoreresourcetypes,othercon\ufb01gurationsstaythesame.Metric.FollowingDeepRM[33],weusetheaveragejobslowdown\u03b7j\u2261(Cj\u2212Aj)/Tj\u22651asourevaluationmetric.Notethat\u03b7j=1meansnoslowdown.Jobproperties.TotestthestabilityandgeneralizabilityofNeuRewriter,wechangejobproperties(andtheirdistributions):(1)NumberofresourcetypesD:largerDleadstomorecomplicatedscheduling;(2)Averagejobarrivalrate:theprobabilitythatanewjobwillarrive,Steadyjobfrequencysetsittobe70%,andDynamicjobfrequencymeansthejobarrivalratechangesrandomlyateachtimestep;(3)Resourcedistribution:jobsmightrequiredifferentresources,wheresomeareuniform(e.g.,half-halfforresource1and2)whileothersarenon-uniform(seeAppendixA.2forthedetaileddescription);(4)Joblengths:Uniformjoblength:lengthofeachjobintheworkloadiseither[10,15](long)or[1,3](short),andNon-uniformjoblength:workloadhasbothshortandlongjobs.WeshowthatNeuRewriterisfairlyrobustunderdifferentdistributions.Whentrainedononedistribution,itcangeneralizetootherswithoutperformancecollapse.WecompareNeuRewriterwiththreekindsofbaselines.BaselinesonManuallydesignedheuristics:EarliestJobFirst(EJF)scheduleseachjobintheincreasingorderoftheirarrivaltime.ShortestJobFirst(SJF)alwaysallocatestheshortestjobinthependingjobqueueateachtimestep,whichisalsousedasabaselinein[33].ShortestFirstSearch(SJFS)searchesovertheshortestkjobstoscheduleateachtimestep,andreturnstheoptimalone.We\ufb01ndthatotherheuristic-basedbaselinesusedin[33]generallyperformworsethanSJF,especiallywithlargeD.Thus,weomitthecomparison.BaselinesonNeuralnetwork.WecomparewithDeepRM[33],aneuralnetworkalsotrainedwithRLtoconstructasolutionfromscratch.BaselinesonOf\ufb02ineplanning.Tomeasuretheoptimalityofthesealgorithms,wealsotakeanof\ufb02inesetting,wheretheentirejobsequenceisavailablebeforescheduling.Notethatthisisequivalenttoassuminganunboundedlengthofthependingjobqueue.Withsuchadditionalknowledge,thissettingprovidesastrongbaseline.Wetriedtwoof\ufb02inealgorithms:(1)SJF-offline,whichisasimpleheuristicthatscheduleseachjobintheincreasingorderofitsduration;and(2)GoogleOR-tools[19],whichisagenerictoolboxforcombinatorialoptimization.ForOR-tools,wesetthetimeouttobe10secondsperworkload,butwe\ufb01ndthatitcannotachieveagoodperformanceevenwithalargertimeout,andwedeferthediscussiontoAppendixE.ResultsonScalability.AsshowninFigure4a,NeuRewriteroutperformsbothheuristicalgo-rithmsandthebaselineneuralnetworkDeepRM.Inparticular,whiletheperformanceofDeepRMandNeuRewriteraresimilarwhenD=2,withlargerD,DeepRMstartstoperformworsethanheuristic-basedalgorithms,whichisconsistentwithourhypothesisthatitbecomeschallengingtodesignaschedulefromscratchwhentheenvironmentbecomesmorecomplex.Ontheotherhand,NeuRewritercouldcapturethebottleneckofanexistingschedulethatlimitsitsef\ufb01ciency,thenprogressivelyre\ufb01neittoobtainabetterone.Inparticular,ourresultsareevenbetterthanof\ufb02inealgorithmsthatassumetheknowledgeoftheentirejobsequence,whichfurtherdemonstratestheeffectivenessofNeuRewriter.Meanwhile,wepresenttherunningtimeofOR-tools,DeepRMandNeuRewriterinTable1b.WecanobservethatbothDeepRMandNeuRewriteraremuchmoretime-ef\ufb01cientthanOR-tools;ontheotherhand,therunningtimeofNeuRewriteriscomparabletoDeepRM,whileachievingmuchbetterresults.MorediscussioncanbefoundinAppendixE.ResultsonRobustness.AsshowninFigure4,NeuRewriterexcelsinalmostalldifferentjobdistributions,exceptwhenthejoblengthsareuniform(shortorlong,Figure4d),inwhichcase8\fTime(s)Z3-solver1.375NeuRewriter0.159(a)Time(s)OR-tools10.0DeepRM0.020NeuRewriter0.037(b)VRP20VRP50VRP100OR-tools0.0100.0530.231Nazarietal.0.1620.2320.445AM0.0360.1680.720NeuRewriter0.1330.2110.398(c)Table1:Averageruntime(perinstance)ofdifferentsolvers(OR-tools[19]andthetacticZ3-ctx-solver-simplifyofZ3[15])andRL-basedapproaches(NeuRewriter,DeepRM[33],Nazarietal.[35]andAM[29])overthetestsetof:(a)expressionsimpli\ufb01cation;(b)jobscheduling;(c)vehiclerouting.existingmethods/heuristicsaresuf\ufb01cient.ThisshowsthatNeuRewritercandealwithcomplicatedscenariosandisadaptivetodifferentdistributions.ResultsonGeneralization.Furthermore,NeuRewritercanalsogeneralizetodifferentdistribu-tionsthanthoseusedintraining,withoutsubstantialperformancedrop.Thisshowsthepoweroflocalrewritingrules:usinglocalcontextcouldyieldmoregeneralizablesolutions.6.3VehicleRoutingProblemSetupandBaselines.Wefollowthesametrainingsetupas[29,35]byrandomlygeneratingvehicleroutingproblemswithdifferentnumberofcustomernodesandvehiclecapacity.Wecomparewithtwoneuralnetworkapproaches,i.e.,AM[29]andNazarietal.[35],andbothofthemtrainaneuralnetworkpolicyusingreinforcementlearningtoconstructtheroutefromscratch.WealsocomparewithOR-toolsandseveralclassicheuristicsstudiedin[35].Results.We\ufb01rstdemonstrateourmainresultsinFigure5a,whereweincludethevariantofeachbaselinethatperformsthebest,anddefermoreresultstoAppendixF.NotethattheinitialroutesgeneratedforNeuRewriterareevenworsethantheclassicheuristics;however,startingfromsuchsub-optimalsolutions,NeuRewriterisstillabletoiterativelyimprovethesolutionsandoutperformsallthebaselineapproachesondifferentproblemdistributions.Inaddition,forVRP20problems,wecancomputetheexactoptimalsolutions,whichprovidesanaveragetourlengthof6.10.WeobservethattheresultofNeuRewriter(i.e.,6.16)istheclosesttothislowerbound,whichalsodemonstratesthatNeuRewriterisableto\ufb01ndsolutionswithbetterquality.WealsocomparetheruntimeofthemostcompetitiveapproachesinTable1c.NotethattheOR-ToolssolverforvehicleroutingproblemsishighlytunedandimplementedinC++,whiletheRL-basedapproachesincomparisonareimplementedinPython.Meanwhile,following[35],toreporttheruntimeofRLmodels,wedecodeasingleinstanceatatime,thusthereispotentialroomforspeedimprovementbydecodingmultipleinstancesperbatch.Nevertheless,wecanstillobservethatNeuRewriterachievesabetterbalancebetweentheresultqualityandthetimeef\ufb01ciency,especiallywithalargerproblemscale.ResultsonGeneralization.Furthermore,inFigure5b,weshowthatNeuRewritercangeneralizetodifferentproblemdistributionsthantrainingones.Inparticular,theystillexceedtheperformanceoftheclassicheuristics,andaresometimescomparableorevenbetterthantheOR-tools.MorediscussioncanbefoundinAppendixF.7ConclusionInthiswork,weproposetoformulateoptimizationasarewritingproblem,andsolvetheproblembyiterativelyrewritinganexistingsolutiontowardstheoptimum.Weutilizedeepreinforcementlearningtotrainourneuralrewriter.Inourevaluation,wedemonstratetheeffectivenessofourneuralrewriteronmultipledomains,whereourmodeloutperformsbothheuristic-basedalgorithmsandbaselinedeepneuralnetworksthatgenerateanentiresolutiondirectly.Meanwhile,weobservethatsinceourapproachisbasedonlocalrewriting,itcouldbecometime-consumingwhenlargechangesareneededineachiterationofrewriting.Inextremecaseswhereeachrewritingstepneedstochangetheglobalstructure,startingfromscratchbecomespreferrable.Weconsiderimprovingtheef\ufb01ciencyofourrewritingapproachandextendingittomorecomplicatedscenariosasfuturework.9\fReferences[1]M.AffenzellerandR.Mayrhofer.Genericheuristicsforcombinatorialoptimizationproblems.InProc.ofthe9thInternationalConferenceonOperationalResearch,pages83\u201392,2002.[2]M.Allamanis,P.Chanthirasegaran,P.Kohli,andC.Sutton.Learningcontinuoussemanticrepresentationsofsymbolicexpressions.InInternationalConferenceonMachineLearning,pages80\u201388,2017.[3]M.Armbrust,A.Fox,R.Grif\ufb01th,A.D.Joseph,R.Katz,A.Konwinski,G.Lee,D.Patterson,A.Rabkin,I.Stoica,etal.Aviewofcloudcomputing.CommunicationsoftheACM,53(4):50\u201358,2010.[4]L.BachmairandH.Ganzinger.Rewrite-basedequationaltheoremprovingwithselectionandsimpli\ufb01cation.JournalofLogicandComputation,4(3):217\u2013247,1994.[5]R.Bar-YehudaandS.Even.Alinear-timeapproximationalgorithmfortheweightedvertexcoverproblem.JournalofAlgorithms,2(2):198\u2013203,1981.[6]A.BayandB.Sengupta.Approximatingmeta-heuristicswithhomotopicrecurrentneuralnetworks.arXivpreprintarXiv:1709.02194,2017.[7]I.Bello,H.Pham,Q.V.Le,M.Norouzi,andS.Bengio.Neuralcombinatorialoptimizationwithreinforcementlearning.arXivpreprintarXiv:1611.09940,2016.[8]J.B\u0142a\u02d9zewicz,W.Domschke,andE.Pesch.Thejobshopschedulingproblem:Conventionalandnewsolutiontechniques.Europeanjournalofoperationalresearch,93(1):1\u201333,1996.[9]S.J.Bradtke,B.E.Ydstie,andA.G.Barto.Adaptivelinearquadraticcontrolusingpolicyiteration.InProceedingsoftheAmericancontrolconference,volume3,pages3475\u20133475.Citeseer,1994.[10]R.R.Bunel,A.Desmaison,P.K.Mudigonda,P.Kohli,andP.Torr.Adaptiveneuralcompilation.InAdvancesinNeuralInformationProcessingSystems,pages1444\u20131452,2016.[11]C.-H.Cai,Y.Xu,D.Ke,andK.Su.Learningofhuman-likealgebraicreasoningusingdeepfeedforwardneuralnetworks.BiologicallyInspiredCognitiveArchitectures,25:43\u201350,2018.[12]T.Chen,L.Zheng,E.Yan,Z.Jiang,T.Moreau,L.Ceze,C.Guestrin,andA.Krishnamurthy.Learningtooptimizetensorprograms.NIPS,2018.[13]W.Chen,Y.Xu,andX.Wu.Deepreinforcementlearningformulti-resourcemulti-machinejobscheduling.arXivpreprintarXiv:1711.07440,2017.[14]T.A.CohnandM.Lapata.Sentencecompressionastreetransduction.JournalofArti\ufb01cialIntelligenceResearch,34:637\u2013674,2009.[15]L.DeMouraandN.Bj\u00f8rner.Z3:Anef\ufb01cientsmtsolver.InInternationalconferenceonToolsandAlgorithmsfortheConstructionandAnalysisofSystems,pages337\u2013340.Springer,2008.[16]M.Deudon,P.Cournut,A.Lacoste,Y.Adulyasak,andL.-M.Rousseau.Learningheuristicsforthetspbypolicygradient.InInternationalConferenceontheIntegrationofConstraintProgramming,Arti\ufb01cialIntelligence,andOperationsResearch,pages170\u2013181.Springer,2018.[17]R.Evans,D.Saxton,D.Amos,P.Kohli,andE.Grefenstette.Canneuralnetworksunderstandlogicalentailment?ICLR,2018.[18]D.FeblowitzandD.Kauchak.Sentencesimpli\ufb01cationastreetransduction.InProceedingsoftheSecondWorkshoponPredictingandImprovingTextReadabilityforTargetReaderPopulations,pages1\u201310,2013.[19]Google.Googleor-tools.https://developers.google.com/optimization/,2019.[20]R.Grandl,G.Ananthanarayanan,S.Kandula,S.Rao,andA.Akella.Multi-resourcepackingforclusterschedulers.ACMSIGCOMMComputerCommunicationReview,44(4):455\u2013466,2015.10\f[21]A.Graves,G.Wayne,andI.Danihelka.Neuralturingmachines.arXivpreprintarXiv:1410.5401,2014.[22]T.Haarnoja,H.Tang,P.Abbeel,andS.Levine.Reinforcementlearningwithdeepenergy-basedpolicies.InICML,pages1352\u20131361.JMLR.org,2017.[23]Halide.Halidesimpli\ufb01er.https://github.com/halide/Halide,2018.[24]J.Hsiang,H.Kirchner,P.Lescanne,andM.Rusinowitch.Thetermrewritingapproachtoautomatedtheoremproving.TheJournalofLogicProgramming,14(1-2):71\u201399,1992.[25]D.Huang,P.Dhariwal,D.Song,andI.Sutskever.Gamepad:Alearningenvironmentfortheoremproving.arXivpreprintarXiv:1806.00608,2018.[26]M.Jaderberg,W.M.Czarnecki,S.Osindero,O.Vinyals,A.Graves,D.Silver,andK.Kavukcuoglu.Decoupledneuralinterfacesusingsyntheticgradients.InProceedingsofthe34thInternationalConferenceonMachineLearning-Volume70,pages1627\u20131635.JMLR.org,2017.[27]R.M.Karp.Reducibilityamongcombinatorialproblems.InComplexityofcomputercomputa-tions,pages85\u2013103.Springer,1972.[28]E.Khalil,H.Dai,Y.Zhang,B.Dilkina,andL.Song.Learningcombinatorialoptimizationalgorithmsovergraphs.InAdvancesinNeuralInformationProcessingSystems,pages6348\u20136358,2017.[29]W.Kool,H.vanHoof,andM.Welling.Attention,learntosolveroutingproblems!InInternationalConferenceonLearningRepresentations,2019.[30]G.Lederman,M.N.Rabe,andS.A.Seshia.Learningheuristicsforautomatedreasoningthroughdeepreinforcementlearning.arXivpreprintarXiv:1807.08058,2018.[31]S.LevineandP.Abbeel.Learningneuralnetworkpolicieswithguidedpolicysearchunderunknowndynamics.InAdvancesinNeuralInformationProcessingSystems,pages1071\u20131079,2014.[32]S.LevineandV.Koltun.Guidedpolicysearch.InInternationalConferenceonMachineLearning,pages1\u20139,2013.[33]H.Mao,M.Alizadeh,I.Menache,andS.Kandula.Resourcemanagementwithdeepreinforce-mentlearning.InProceedingsofthe15thACMWorkshoponHotTopicsinNetworks,pages50\u201356.ACM,2016.[34]D.Q.MAYNE.Differentialdynamicprogramming\u2013auni\ufb01edapproachtotheoptimizationofdynamicsystems.InControlandDynamicSystems,volume10,pages179\u2013254.Elsevier,1973.[35]M.Nazari,A.Oroojlooy,L.Snyder,andM.Takac.Reinforcementlearningforsolvingthevehicleroutingproblem.InAdvancesinNeuralInformationProcessingSystems,pages9861\u20139871,2018.[36]OpenAI.Openaidota2bot.https://openai.com/the-international/,2018.[37]G.H.PaetzoldandL.Specia.Textsimpli\ufb01cationastreetransduction.InProceedingsofthe9thBrazilianSymposiuminInformationandHumanLanguageTechnology,2013.[38]A.Paszke,S.Gross,S.Chintala,G.Chanan,E.Yang,Z.DeVito,Z.Lin,A.Desmaison,L.Antiga,andA.Lerer.Automaticdifferentiationinpytorch.InNIPS-W,2017.[39]J.Ragan-Kelley,C.Barnes,A.Adams,S.Paris,F.Durand,andS.Amarasinghe.Halide:alanguageandcompilerforoptimizingparallelism,locality,andrecomputationinimageprocessingpipelines.ACMSIGPLANNotices,48(6):519\u2013530,2013.[40]C.R.Reeves.Modernheuristictechniquesforcombinatorialproblems.Advancedtopicsincomputerscience,volume15.McGraw-Hill,1995.11\f[41]E.Schkufza,R.Sharma,andA.Aiken.Stochasticsuperoptimization.InACMSIGARCHComputerArchitectureNews,volume41,pages305\u2013316.ACM,2013.[42]Z.Scully,G.Blelloch,M.Harchol-Balter,andA.Scheller-Wolf.Optimallyschedulingjobswithmultipletasks.ACMSIGMETRICSPerformanceEvaluationReview,45(2):36\u201338,2017.[43]D.Silver,J.Schrittwieser,K.Simonyan,I.Antonoglou,A.Huang,A.Guez,T.Hubert,L.Baker,M.Lai,A.Bolton,etal.Masteringthegameofgowithouthumanknowledge.Nature,550(7676):354,2017.[44]N.SorenssonandN.Een.Minisatv1.13-asatsolverwithcon\ufb02ict-clauseminimization.SAT,2005(53):1\u20132,2005.[45]R.S.Sutton,A.G.Barto,etal.Reinforcementlearning:Anintroduction.1998.[46]K.S.Tai,R.Socher,andC.D.Manning.Improvedsemanticrepresentationsfromtree-structuredlongshort-termmemorynetworks.InProceedingsoftheAnnualMeetingoftheAssociationforComputationalLinguistics,2015.[47]Y.Tassa,T.Erez,andE.Todorov.Synthesisandstabilizationofcomplexbehaviorsthroughonlinetrajectoryoptimization.InIntelligentRobotsandSystems(IROS),2012IEEE/RSJInternationalConferenceon,pages4906\u20134913.IEEE,2012.[48]D.Terekhov,D.G.Down,andJ.C.Beck.Queueing-theoreticapproachesfordynamicscheduling:asurvey.SurveysinOperationsResearchandManagementScience,19(2):105\u2013129,2014.[49]Y.TianandS.G.Narasimhan.Hierarchicaldata-drivendescentforef\ufb01cientoptimaldeformationestimation.InProceedingsoftheIEEEInternationalConferenceonComputerVision,pages2288\u20132295,2013.[50]O.Vinyals,M.Fortunato,andN.Jaitly.Pointernetworks.InAdvancesinNeuralInformationProcessingSystems,pages2692\u20132700,2015.[51]D.Vrabie,O.Pastravanu,M.Abu-Khalaf,andF.L.Lewis.Adaptiveoptimalcontrolforcontinuous-timelinearsystemsbasedonpolicyiteration.Automatica,45(2):477\u2013484,2009.[52]W.Zaremba,K.Kurach,andR.Fergus.Learningtodiscoveref\ufb01cientmathematicalidentities.InAdvancesinNeuralInformationProcessingSystems,pages1278\u20131286,2014.[53]X.Zhu,P.Sobhani,andH.Guo.Dag-structuredlongshort-termmemoryforsemanticcom-positionality.InProceedingsofthe2016ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies,pages917\u2013926,2016.12\f", "award": [], "sourceid": 3385, "authors": [{"given_name": "Xinyun", "family_name": "Chen", "institution": "UC Berkeley"}, {"given_name": "Yuandong", "family_name": "Tian", "institution": "Facebook AI Research"}]}