{"title": "Online Sum-Product Computation Over Trees", "book": "Advances in Neural Information Processing Systems", "page_first": 2870, "page_last": 2878, "abstract": null, "full_text": "OnlineSum-ProductComputationoverTreesMarkHerbsterStephenPasterisDepartmentofComputerScienceUniversityCollegeLondonLondonWC1E6BT,England,UK{m.herbster,s.pasteris}@cs.ucl.ac.ukFabioVitaleDepartmentofComputerScienceUniversityofMilan20135Milan,Italyfabio.vitale@unimi.itAbstractWeconsidertheproblemofperformingef\ufb01cientsum-productcomputationsinanonlinesettingoveratree.Anaturalapplicationofourmethodsistocomputethemarginaldistributionatavertexinatree-structuredMarkovrandom\ufb01eld.Beliefpropagationcanbeusedtosolvethisproblem,butrequirestimelinearinthesizeofthetree,andisthereforetooslowinanonlinesettingwherewearecontinuouslyreceivingnewdataandcomputingindividualmarginals.Withourmethodweaimtoupdatethedataandcomputemarginalsintimethatisnomorethanlogarithmicinthesizeofthetree,andisoftensigni\ufb01cantlyless.Weaccomplishthisviaahierarchicalcoveringstructurethatcachespreviouslocalsum-productcomputations.Ourcontributionisthree-fold:wei)givealineartimealgorithmto\ufb01ndanoptimalhierarchicalcoverofatree;ii)giveasum-product-likealgorithmtoef\ufb01cientlycomputemarginalswithrespecttothiscover;andiii)apply\u201ci\u201dand\u201cii\u201dto\ufb01ndanef\ufb01cientalgorithmwitharegretboundfortheonlineallocationprobleminamulti-tasksetting.1IntroductionTheuseofgraphicalmodels[1,2]isubiquitousinmachinelearning.Theapplicationofthebatchsum-productalgorithmtotree-structuredgraphicalmodels,includinghiddenMarkovmod-els,Kalman\ufb01lteringandturbodecoding,issurveyedin[3].Ouraimistoadaptthesetechniquestoanonlinesetting.Inouronlinemodelwearegivenatreeanda\ufb01xedsetofparameters.Wethenreceiveapoten-tiallyunboundedonlinesequenceof\u201cpredictionrequests\u201dand\u201cdataupdates.\u201dApredictionrequestindicatesavertexforwhichwethenreturntheposteriormarginalatthatvertex.Eachdataupdateassociatesanew\u201cfactor\u201dtothatvertex.Classicalbeliefpropagationrequirestimelinearinthesizeofthetreeforthistask.Ouralgorithmrequirestimelinearintheheightofanoptimalhierarchi-calcoverofthistree.Theheightofthecoverisintheworstcaselogarithmicinthesizethetree.Thusourpertrialprediction/updatetimeisatleastanexponentialimprovementoverclassicalbeliefpropagation.Thepaperisstructuredasfollows.InSection2weintroduceournotationleadingtoourde\ufb01nitionofanoptimalhierarchicalcover.InSection3wegiveouroptimalhierarchicalcoveringalgorithm.InSection4weshowhowwemayusethiscoverasastructuretocachecomputationsinoursum-product-likealgorithm.Finally,inSection5wegivearegretboundandasketchofanapplicationofourtechniquestoanonlinemulti-taskallocation[4]problem.Previouswork.Pearl[5]introducedbeliefpropagationforBayesnetswhichcomputesmarginalsintimelinearinthesizeofthetree.In[6]analgorithmfortheonlinesettingwasgivenforaBayesnetonatreeTwhichrequiredO(log|V(T)|)timepermarginalizationstep,where|V(T)|isthenumberofverticesinthetree.InthisworkweconsideraMarkovrandom\ufb01eldonatree.WegiveanalgorithmwhoseperformanceisboundedbyO(\u03c7\u2217(T)).Theterm\u03c7\u2217(T)istheheightofour1\foptimalhierarchicalcoverwhichisupperboundedbyO(min(log|V(T)|,diameter(T))),butmayinfactbeexponentiallysmaller.2HierarchicalcoverofatreeInthissectionweintroduceournotionofahierarchicalcoverofatreeanditsdualthedecompo-sitiontree.Graph-theoreticalpreliminaries.AgraphGisapairofsets(V,E)suchthatEisasetofunorderedpairsofdistinctelementsfromV.TheelementsofVarecalledverticesandthoseofEarecallededges.Inordertoavoidambiguitiesderivingfromdealingwithdifferentgraphs,insomecaseswewillhighlightthemembershiptographGdenotingthesesetsasV(G)andE(G)respectively.Withslightabuseofnotation,bywritingv\u2208G,wemeanv\u2208V(G).SisasubgraphG(wewriteS\u2286G)iffV(S)\u2286V(G)andE(S)={(i,j):i,j\u2208V(S),(i,j)\u2208E(G)}.GivenanysubgraphS\u2286G,wede\ufb01neitsboundary(orinnerborder)\u2202G(S)anditsneighbourhood(orouterborder)NG(S)as:\u2202G(S):={i:i\u2208S,j/\u2208S,(i,j)\u2208E(G)},andNG(S):={j:i\u2208S,j/\u2208S,(i,j)\u2208E(G)}.Withslightabuseofnotation,NG(v):=NG({v}),andthusthedegreeofavertexvis|NG(v)|.GivenanygraphG,wede\ufb01nethesetofitsleavesasleaves(G):={i\u2208G:|NG(i)|=1},anditsinteriorG\u2022:={i\u2208G:|NG(i)|#=1}.ApathPinagraphGisde\ufb01nedbyasequenceofdistinctvertices$v1,v2,...,vm%ofG,suchthatforalli<mwehavethat(vi,vi+1)\u2208E(G).Inthiscasewesaythatv1andvmareconnectedbythesubgraphP.AtreeTisagraphinwhichforallv,w\u2208Tthereexistsauniquepathconnectingvwithw.Inthispaperwewillonlyconsidertreeswithanon-emptyedgesetandthusthevertexsetwillalwayshaveacardinalityofatleast2.ThedistancedT(v,w)betweenv,w\u2208Tisthepathlength|E(P)|.Thepair(T,r)denotesarootedtreeTwithrootvertexr.Givenarootedtree(T,r)andanyvertexi\u2208V(T),the(proper)descendantsofiareallverticesthatcanbeconnectedwithrviapathsP\u2286Tcontainingi(excludingi).Analogously,the(proper)ancestorsofiareallverticesthatlieonthepathP\u2286Tconnectingiwithr(excludingi).Wedenotethesetofalldescendants(resp.allancestors)ofiby\u21d3rT(i)(resp.\u21d1rT(i)).Weshallomittherootrwhenitisclearfromthecontext.Vertexiistheparent(resp.child)ofj,whichisdenotedby\u2191rT(j)(resp.i\u2208\u2193rT(j))if(i,j)\u2208E(T)andi\u2208\u21d1rT(j)(resp.i\u2208\u21d3rT(j)).GivenatreeTweusethenotationS\u2286TonlyifSisatreeandsubgraphofT.Theheightofarootedtree(T,r)isthemaximumlengthofapathP\u2286Tconnectingtheroottoanyvertex:hr(T):=maxv\u2208TdT(v,r).Thediameter\u2206(T)ofatreeTisde\ufb01nedasthelengthofthelongestpathbetweenanytwoverticesinT.2.1ThehierarchicalcoverofatreeInthissectionwedescribeasplittingprocessthatrecursivelydecomposesagiventreeT.A(de-composition)tree(D,r)identi\ufb01esthissplittingprocess,generatingatree-structuredcollectionSofsubtreesthathierarchicallycoverthegiventreeT.ThisprocessrecursivelysplitsateachstepasubtreeofT(thatwecalla\u201ccomponent\u201d)resultingfromsomeprevioussplits.Moreprecisely,asubtreeS\u2286TissplitintotwoormoresubcomponentsandthedecompositionofSdependsonlyonthechoiceofavertexv\u2208S\u2022,whichwecallsplittingvertex,inthefollowingway.Thesplittingvertexv\u2208S\u2022ofSinducesthesplitset\u2126(S,v)={S1,...,S|NS(v)|}whichistheuniquesetofS\u2019ssubtreeswhichoverlapatavertexv,uniquely,thatrepresentacoverforS,i.e.,itsatis\ufb01es(i)\u222aS!\u2208\u2126(S,v)S#=Sand(ii){v}=Si\u2229Sjforall1\u2264i<j\u2264|NS(v)|.ThusthesplitmaybevisualizedbyconsideringtheforestFresultingfromremovingavertexfromS,butafterwardseachcomponentS1,...,S|NS(v)|ofFhasthe\u201cremovedvertex\u201dvaddedbacktoit.Acomponenthavingonlytwoverticesiscalledatomic,sinceitcannotbesplitfurther.WeindicatewithSv\u2286Tthecomponentsubtreewhosesplittingvertexisv,andwedenoteatomiccomponentsbyS(i,j),whereE(S(i,j))={(i,j)}.We\ufb01nallydenotebySthesetofallcomponentsubtreesobtainedbythissplittingprocess.Sincethemethodisrecursive,wecanassociatearootedtree(D,r),withT\u2019sdecompositionintoahierarchicalcover,whoseinternalverticesarethesplittingverticesofthesplittingprocess.Itsleavescorrespondtothesingleedges(ofE(T))ofeachatomiccomponent,andavertex\u201cparent-child\u201drelationc\u2208\u2193rD(p)correspondstothe\u201csplits-into\u201drelationSc\u2208\u2126(Sp,p)(seeFigure1).Wewillnowformalizethesplittingprocessbyde\ufb01ningthehierarchicalcoverSofatreeT,whichisakeyconceptusedbyouralgorithm.2\fDe\ufb01nition1.AhierarchicalcoverSofatreeTisatree-structuredcollectionofsubtreesthathierarchicallycoverthetreeTsatisfyingthefollowingthreeproperties:1.T\u2208S,2.forallS\u2208SwithS\u2022#=\u2205thereexistsanx\u2208S\u2022suchthat\u2126(S,x)\u2282S,3.forallS,R\u2208SsuchthatS#\u2286RandR#\u2286S,wehave|V(R)\u2229V(S)|\u22641.Theabovede\ufb01nitionrecursivelygeneratesacover.ThesplittingprocessthatgeneratesahierarchicalcoverSofTisformalizedasrootedtree(D,r)inthefollowingde\ufb01nition.De\ufb01nition2.IfSisahierarchicalcoverofTwede\ufb01netheassociateddecompositiontree(D,r)asarootedtree,whosevertexsetV(D):=T\u2022\u222aE(T)whereD\u2022=T\u2022andleaves(D)=E(T),suchthatthefollowingthreepropertieshold:1.Sr=T,2.forallc,p\u2208D\u2022,c\u2208\u2193rD(p)iffSc\u2208\u2126(Sp,p),3.forall(c,p)\u2208E(T)1,wehave(c,p)\u2208\u2193rD(p)iffS(c,p)\u2208\u2126(Sp,p).ThefollowinglemmashowsthatwithanygivenhierarchicalcoverSitispossibletoassociateauniquedecompositiontree(D,r).Lemma3.AhierarchicalcoverSofTde\ufb01nesauniquedecompositiontree(D,r)suchthatifS\u2208Sthereexistsav\u2208V(D)suchthatS=Svandifv,w\u2208V(D)andv#=w,thenSv#=Sw.ForagivenhierarchicalcoverSinthefollowingwede\ufb01netheheightandtheexposure:twopropertieswhichmeasuredifferentsensesofthe\u201csize\u201dofacover.TheheightofahierarchicalcoverSistheheightoftheassociateddecompositiontreeD.NotethattheheightofadecompositiontreeDmaybeexponentiallysmallerthantheheightofT,since,forexample,itisnotdif\ufb01culttoshowthatthereexistsadecompositiontreeisomorphictoabinarytreewhentheinputtreeTisapathgraph.IfR\u2286TandSRisahierarchicalcoverofR,wede\ufb01netheexposureofSR(withrespecttotreeT)asmaxQ\u2208SR|\u2202T(Q)|.Thustheexposureisameasurerelativetoa\u201ccontaining\u201dtree(whichcanbetheinputtreeTitself)andtheheightisindependentofanycontainingtree.InSection4thecoveringsubtreescorrespondtocached\u201cjointdistributions,\u201dwhicharede\ufb01nedontheboundaryverticesofthesubtrees,andrequirememoryexponentialintheboundarysize.Thusweareinterestedincoverswithsmallexposure.Wenowde\ufb01neameasureoftheoptimalheightwithrespecttoagivenexposurevalue.De\ufb01nition4.Ahierarchicalcoverwithexposureatmostkiscalledak-hierarchicalcover.GivenanysubtreeR\u2286T,thek-decompositionpotential\u03c7k(R)ofRistheminimumheightofallhierar-chicalcoversofSRwithexposure(withrespecttoT)notlargerthank.The\u2217-decompositionpoten-tial\u03c7\u2217(R)istheminimumheightofallhierarchicalcoversofR.If|\u2202T(R)|>kthen\u03c7k(R):=\u221e.Let\u2019sconsidersomeexamples.Givenastargraph,i.e.,agraphwithasinglecentralvertexandanynumberofadjacentvertices,thereisinfactonlyonepossiblehierarchicalcoverobtainedbysplittingthecentralvertexsothat\u03c7\u2217(star)=1.Forpathgraphs,\u03c7\u2217(path)=\u0398(log|path|),asmentionedabove.Aninterestingexampleisastarwithpathgraphsratherthansingleedges.Speci\ufb01cally,astar-pathmaybeformedbyasetof|star-path|log|star-path|pathgraphsP1,P2,...eachwithlog|star-path|edges.Thesepathgraphsarethenjoinedatacentralvertex.Inthiscasewehave\u03c7\u2217(star-path)=O(loglog(|star-path|));aseachpathhasahierarchicalcoverofheightO(loglog(|star-path|)),eachofthesepathcoversmaythenbejoinedtocreateacoverofthestar-path.InTheorem6wewillseethegenericbound\u03c7\u2217(T)\u2264O(min(\u2206(T),log|V(T)|)).Thestar-paththusillustratesthattheboundmaybeexponentiallyloose.InTheorem6wewillseethat\u03c72(T)\u22642\u03c7\u2217(T).Thuswemayrestrictouralgorithmtohierarchicalcoverswithanexposureof2atverylittlecostinef\ufb01ciency.Hence,wewillnowfocusourattentionon2-hierarchicalcovers.2-Hierarchicalcovers.GivenanyelementQ#=Tina2-hierarchicalcoverofTthen|\u2202T(Q)|\u2208{1,2}.Considerthecaseinwhich\u2202T(Q)={v,w},i.e.|\u2202T(Q)|=2.ThenQcanbespeci\ufb01edby1Observethat(c,p)\u2208E(T)impliesc,p\u2208V(T)and(c,p)\u2208leaves(D).3\fthetwoverticesv,wandde\ufb01nedasfollows:Q:=!wv\":=argmaxS\u2286T(|V(S)|:v,w\u2208leaves(S)),thatisthemaximalsubtreeofT,havingvandwamongitsleaves.Considernowthecaseinwhich\u2202T(Q)={w},i.e.|\u2202T(Q)|=1.Qisnowde\ufb01nedastheT\u2019ssubtreecontainingvertexwtogetherwithallthedescendents\u21d3wT(z)wherez\u2208NT(w).Hence,asubtreesuchasQcanbeuniquelydeterminedbythew\u2019sneighborz\u2208NT(w).InordertodenotesubtreeQinthiscaseweusethefollowingnotation:Q:=!w!z\".Observethatonecanalsorepresenta\u201cboundaryone\u201dsubtreewiththepreviousnotationbywritingQ:=!w\"\",where#isany2chosenleafofTbelongingto\u21d3wT(z)(seeFigure1).(2,s)-Hierarchicalcovers.Wenowintroducethenotionof(2,s)-hierarchicalcovers(which,forsimplicity,weshallalsocall(2,s)-covers)withrespecttoarootedtree(T,s).Thisnotionexplicitlydependsonagivenvertexs\u2208V(T),which,forthesakeofsimplicity,willbeassumedtobealeafofT.(2,s)-Hierarchicalcoversareguaranteedtonotbemuchlargerthana2-hierarchicalcover(seeTheorem6).Theyarealsoamenabletoabottom-upconstruction.De\ufb01nition5.GivenanysubtreeR\u2286T,a2-hierarchicalcoverSRisa(2,s)-hierarchicalcoverofRif,forallS\u2208SR\\{T},thereexistsv,w\u2208Swherev\u2208\u21d3sT(w)suchthat(case1:|\u2202T(Q)|=1)S=!w!v\",or(case2:|\u2202T(Q)|=2)S=!wv\".Intheformercasev\u2208\u2193sT(w).Wede\ufb01ne\u03c72s(R)tobetheminimalheightofanypossible(2,s)-hierarchicalcoverofR\u2286T.Thuseverysubtreeofa(2,s)-hierarchicalcoverisnecessarily\u201coriented\u201dwithrespecttoaroots.3ComputinganoptimalhierarchicalcoverFroma\u201cbigpicture\u201dperspective,a(2,s)-hierarchicalcoverGisrecursivelyconstructedinabottom-upfashion:intheinitializationphaseGcontainsonlytheatomiccomponentsconveringT,i.e.theonesformedonlybyapairofadjacentverticesofV(T).Wehavethenatthisstage|G|=|E(T)|.ThenGgrowsstepbystepthroughtheadditionofnewcoveringsubtreesofT.Ateachtimestept,atleastonesubtreeofTisaddedtoG.Allthesubtreesaddedateachsteptmuststrictlycontainonlysubtreesaddedbeforestept.Wenowintroducetheformaldescriptionofourmethodforconstructinga(2,s)-hierarchicalcoverG.Aswesaid,theconstructionofGproceedsinincrementalsteps.AteachsteptthemethodoperatesonatreeTt,whoseverticesarepartofV(T).TheconstructionofTtisaccomplishedstartingbyTt\u22121(ift>0)insuchawaythatV(Tt)\u2282V(Tt\u22121),whereT0issettobethesubtreeof(T,s)containingtherootandalltheinternalvertices.Duringeachsteptallthewhile-loopinstructionsofFigure1areexecuted:(1)somevertices(theblackonesinFigure1)areselectedthroughadepth-\ufb01rstvisit(duringthebacktrackingsteps)ofTtstartingfroms3,(2)foreachselectedvertexv,subtreeSvisobtainedfrommergingsubtreesaddedtoGinpreviousstepsandoverlappingatvertexv,(3)inordertocreatetreeTt+1fromTtthepreviouslyselectedverticesofTtareremoved,(4)theedgesetE(Tt+1)iscreatedfromE(Tt)insuchawaytopreservetheTt\u2019sstructure,butalltheedgesincidenttotheverticesremovedfromV(Tt)(theblackverticesFigure1)inthewhile-loopstep3needtobedeleted.ThepossibledisconnectionthatwouldarisebytheremovalofthesepartsofTtisavoidedbycompletingtheconstructionofEt+1throughtheadditionofsomenewedges.TheseadditionaledgesarenotpartofE(T)andlinkeachvertexvwithitsgrand-parentinTtifvertexv\u2019sparentwasdeleted(seethedashedlineedgesinFigure1)duringtheconstructionofTt+1fromTt.Inthe\ufb01nalwhile-loopstepthevariabletgetsincrementedby1.Basically,thekeyforobtainingoptimalitywiththisconstructionmethodcanbeexplainedwiththefollowingobservation.Ateachtimestept,whenweaddacoveringsubtreeSvforsomevertexv\u2208V(Tt)selectedbythealgorithm(blackverticesinFigure1),thewhole(2,s)-coverofSvbecomescompletelycontainedinGanditsheightist+1,whichcanbeproventobetheminimumpossibleheightofa(2,s)-coverofSv.Hence,ateachtimesteptweconstructthet+1-thlevel(inthehierarchicalnestedsense)ofGinsuchawaytoachievelocaloptimalityforallelementscontainedinalllevelssmallerorequaltot+1.Asthenexttheoremstates,therunningofthealgorithmislinearin|V(T)|.2Thisrepresentationisnotnecessarilyunique,asif!1,!2\u2208leaves(T)\u2229Q,wehave!w!1\"=!w!2\"#=!w\"z\"$.3ObservethatsistheuniquevertexbelongingtoV(Tt)foralltimestepst\u22650.4\fTheorem6.Givenarootedtree(T,s),thealgorithminFigure1outputsG,anoptimal(2,s)-hierarchicalcoverintimelinearin|V(T)|ofheight\u03c72s(T)whichisboundedas\u03c7\u2217(T)\u2264\u03c72(T)\u2264\u03c72s(T)\u22642\u03c7\u2217(T)\u2264O(min(log|V(T)|,\u2206(T))).Beforeweprovidethedetaileddescriptionofthealgorithmforconstructinganoptimal(2,s)-hierarchicalcoverweneedsomeancillaryde\ufb01nitions.Wecallavertexv\u2208V(Tt)\\{s}mergeable(attimet)ifandonlyifeither(i)v\u2208leaves(Tt)or(ii)vhasasinglechildinTtandthatchildisnotmergeable.Ifv\u2208V(Tt)\\{s}ismergeablewewritev\u2208Mt.Wealsousethefollowingshorthandsformakingmoreintuitiveournotation:Wesetctv:=\u2193sTt(v)when|\u2193sTt(v)|=1,ptv:=\u2191sTt(v)whenv#=sandgtv:=\u2191sTt(ptv)whenv,ptv#=s.Finally,givenu,u#\u2208V(T)suchthatu#\u2208\u21d3sT(u),weindicatewithwith\u2193sT(u1\u2192u#)thechildofuwhichisancestorofu#inT.\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014Input:Rootedtree(T,s).\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014Initialisation:T0\u2190T\u2022\u222a{s};t\u21900;G\u2190\u00b6!\u2191sT(v)v\":v\u2208V(T)\\{s}\u00a9.\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014While#V(Tt)&={s}$1.ConstructMtviadepth-\ufb01rstsearchofTtfroms.2.Forallv\u2208Mt,mergeasfollows:Ifv\u2208leaves(Tt)thenz\u2190\u2193sT(ptv(\u2192v).G\u2190G\u222a!ptv\"z\".ElseG\u2190G\u222a!ptvctv\".3.V(Tt+1)\u2190V(Tt)\\Mt.4.E(Tt+1)\u2190{(v,ptv):v,ptv\u2208V(Tt+1)}\u222a{(v,gtv):v,gtv\u2208V(Tt+1),ptv&\u2208V(Tt+1)}.5.t\u2190t+1.\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014Output:Optimal(2,s)-hierarchicalcoverGofT.\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014\u2014!\"!!!\"#$%&\u2019!$!%($!\"!!!\"#$%&\u2019!$!%(%)*)*!+!+!\"!\"#*%&\u2019!!!$!\"(+!\"#$%&\u2019!\"!%(!!!!#))*!+!$$,-./01-23-45-26786(,-./01-23-45-26786(696(:45-260;/.74<1-460;6(=-.5->?@-6A-./01-2B<?/.--26>44-46/76/C-6D$E2FGH0-.>.1C01>@617A-.///Figure1:Left:Pseudocodeforthelineartimeconstructionalgorithmforanoptimal(2,s)-hierarchicalcover.Right:Pictorialexplanationofthepseudocodeandthedetailsofthehierarchicalcover.Inordertoclarifythemethod,wedescribesomeofthedetailsofthecoverandsomemergeoperationsthatareperformedinthediagram.Vertex1istherootvertexs.Ineachcomponent,depictedasenclosedinaline,theblacknodeisthesplittingvertex,i.e.,amergeablevertexofthetreeTt.Theboundaryde\ufb01nitionmaybeclari\ufb01edbyhighlighting,forinstance,that\u2202T(S2)={4}and\u2202T(S10)={8,12}.SubtreeS2containsvertices1,2,3and4.Vertex2isthesplittingvertexofS2.\u2126(S2,2)={S(1,2),S(2,3),S(2,4)},i.e.,attimet=0,S2isformedbymergingthethreeatomicsubtreesS(1,2),S(2,3)andS(2,4),whichwereaddedintheinitializationstep.Thesethreesubtreesoverlapatonlyvertex2,whichisdepictedinblackbecauseitismergeableinT0.Forwhatconcernsthedecompositiontree(D,r),wehave\u2193rD(5)={(4,5),6},whichimpliesthatS5isthereforeformedbytheatomiccomponentS(4,5)andthenon-atomiccomponentS6.Attimet=1,S12isobtainedbymergingS10togetherwithS13,whichhavebeenbothcreatedattimet=0.ObservethatinT1vertex12isaleafandthezvariableinthewhile-loopstep2isassignedtovertex10(vandandptvisrespectivelyvertex12and8).Regardingthesubtreerepresentationwiththesquarebracketnotationwecanwrite,forinstance,S2=!14\"andS12=!8\"10\"(\u2261!811\"\u2261!814\").Observethat,accordingtothede\ufb01nitionofa(2,s)-hierarchicalcover,wehave4\u2208\u21d31T(1)and10\u2208\u21931T(8).Finally,noticethattheheightofthe(2,s)-hierarchicalcoverofSvisequaltot+1iffvisdepictedinblackinTt.4OnlinemarginalizationInthissectionweintroduceouralgorithmforef\ufb01cientlycomputingmarginalsbysummingoverproductsofvariablesinatreetopology.Formallyourmodelisspeci\ufb01edbyatriple(T,\u0398,D)where5\fTisatree,\u0398=(\u03b8e,l,m:e\u2208E(T),l\u2208INk,m\u2208INk)sothat\u03b8eisapositivesymmetrick\u00d7kmatrixandD=(dv,c:v\u2208V(T),c\u2208INk)isa|V(T)|\u00d7kmatrix.Inaprobabilisticsettingitisnaturaltovieweachnormalized\u03b8easastochasticsymmetric\u201ctransition\u201dmatrixandthe\u201cdata\u201dDasarightstochasticmatrixcorrespondingto\u201cbeliefs\u201daboutkdifferentlabelsateachvertexinT.Inouronlinesetting\u0398isa\ufb01xedparameterandDischangingovertimeandthusanelementinasequence(D1,...,Dt,...)wheresuccessiveelementsonlydifferinasinglerow.Thusateachpointattimewereceiveinformationatasinglevertex.Inourintendedapplication(seeSection5)ofthemodelthereisnonecessary\u201crandomness\u201dinthegenerationofthedata.Howeverthelanguageofprobabilityprovidesanaturalmetaphorweuseforourcomputedquantities.Thusa(k-ary)labelingofTisavector\u00b5\u2208LwithL:=INV(T)kandits\u201cprobability\u201dwithrespectto(\u0398,D)isp(\u00b5|\u0398,D):=1Z%(i,j)\u2208E(T)\u03b8(i,j),\u00b5(i),\u00b5(j)%v\u2208V(T)dv,\u00b5(v),(1)withthenormalisingconstantZ:=&\u00b5\u2208L\u2019(i,j)\u2208E(T)\u03b8(i,j),\u00b5(i),\u00b5(j)\u2019v\u2208V(T)dv,\u00b5(v).Wedenotethemarginalprobabilityatavertexvasp(v\u2192a|\u0398,D):=(\u00b5\u2208L:\u00b5(v)=ap(\u00b5|\u0398,D).(2)Usingthehierarchicalcoverforef\ufb01cientonlinemarginalization.IntheprevioussectionwediscussedamethodtocomputeahierarchicalcoverofatreeTwithoptimalheight\u03c72s(T)intimelinearinT.Inthissubsectionwewillshowhowthesecoveringcomponentsformacoveringsetofcached\u201cmarginals\u201d\u2019.Sothatwemayeithercomputep(v\u2192a|\u0398,D)orupdateasinglerowofthedatamatrixDandrecomputethechangedcachedmarginalsallintimelinearin\u03c72s(T).De\ufb01nition7.GivenatreeS\u2286T,thepotentialfunction,\u03c8ST:L(\u2202T(S))\u2192Rwithrespectto(\u0398,D)isde\ufb01nedby:\u03c8ST(\u02dc\u00b5):=(\u00b5\u2208L(S):\u00b5(\u2202T(S))=\u02dc\u00b5\u00d1%(v,w)\u2208E(S)\u03b8(v,w),\u00b5(v),\u00b5(w)\u00e9\u00d1%v\u2208S\\\u2202T(S)dv,\u00b5(v)\u00e9(3)WhereL(X):=INXkwithX\u2286V(T)isthustherestrictionofLtoXandlikewiseif\u00b5\u2208Lthen\u00b5(X)\u2208L(X)istherestrictionof\u00b5toX.ForeachtreeinourhierarchicalcoverS\u2208Swewillhaveanassociatedpotentialfunction.Intuitivelyeachofthesepotentialfunctionssummarizetheinformationintheirinteriorbythemarginalfunctionde\ufb01nedontheirboundary.ThustreesS\u2208Swithaboundarysizeof1requirekvaluestobecached,the\u201c\u03b1\u201dweights;whileboundarysize2treesrequiresk2values,the\u201c\u03b2\u201dweights.Thisclari\ufb01esourmotivationto\ufb01ndacoverwithbothsmallheightandexposure.Wealsocache\u03b3weightsthatrepresenttheproductof\u03b1weights;theseweightsallowef\ufb01cientcomputationonhighdegreevertices.ThesetofcachedvaluesnecessaryforfastonlinecomputationcorrespondtothesethreetypesofweightsofwhichthereisalinearquantityandonanygivenupdateormarginalizationsteponlyO(\u03c72s(T))ofthemareaccessed.De\ufb01nitionsofweightsandpotentials.GivenatreeTandahierarchicalcoverSitisisomorphictoadecompositiontree(D,r).Thedecompositiontreewillserveadualpurpose.First,eachvertexz\u2208Dwillserveasa\u201cname\u201dforatreeSz\u2208S.Second,inthesamewaythatthe\u201cmessagespassing\u201dinbeliefpropagationthefollowsthetopologyoftheinputtree,thestructureofourcomputationsfollowsthedecompositiontreeD.Wenowintroduceournotationsforcomputingandtraversingthedecompositiontree.Asthecoverhastreeswithoneortwoboundaryvertices(exceptingTwhichhasnone)wede\ufb01nethecorrespondingverticesofthedecompositiontree,Ci:={z\u2208D:|\u2202T(Sz)|=i}fori\u2208{1,2}.Inthissectionsinceweareconcernedwiththetraversalof(D,r)weabbreviate\u2193D,\u2191Dasboth\u2193,\u2191respectivelyasconvenient.As\u2193D(v)isasetofchildren,wede\ufb01nethefollowingfunctionstoselectspeci\ufb01cchildren,)(v):=wifw\u2208\u2193(v),\u2191(v)\u2208\u2202T(S(w))forv\u2208D\u2022\u2229(C1\u222aC2)and*(v):=wifw\u2208\u2193(v),w#=)(v)forw\u2208C2andv\u2208D\u2022\u2229C2.Whenclearfromthecontextwewilluse)vfor)(v)aswellas*vfor*(v).WealsoneednotationforthepotentiallytwoboundaryverticesofatreeSv\u2208Sifv\u2208D\\{r}.Observethatforv\u2208C1\u222aC2oneboundaryvertexofSvisnecessarily\u02d9v:=\u2191vandifv\u2208C2thereexistsanancestor\u00a8vofvinDofsothat{\u02d9v,\u00a8v}=\u2202T(Sv).Wealsoextendthesplitnotationtopickoutthespeci\ufb01c6\f\u03b1a(v):=\u03c8SvT(\u02d9v\u2192a),(v\u2208C1)\u03b3a(v):=dva\u2019w\u2208\u2193(v)\u2229C1\u03b1a(w),(v\u2208V(T))\u03b2ab(v):=\u03c8SvT(\u02d9v\u2192a,\u00a8v\u2192b),(v\u2208C2)\u03c1a(v):=dva\u2019R\u2208\u2126(T,v)\u03c8RT(v\u2192a),(v\u2208V(T))\u03b4#a(v):=d\u02d9va\u03c8\u2126(T,\u02d9v,v)T(\u02d9v\u2192a),(v\u2208V(T)\\{r})\u03b4$a(v):=d\u00a8va\u03c8\u2126(T,\u00a8v,v)T(\u00a8v\u2192a),(v\u2208C2)\u0001#a(v):=\u03c8\u2126(T,v,\u02d9v)T(v\u2192a),(v\u2208V(T)\\{r})\u0001$a(v):=\u03c8\u2126(T,v,\u00a8v)T(v\u2192a),(v\u2208C2)Table1:Weightde\ufb01nitionscomplementarysubtreesofTresultingfromasplitthus\u2126(T,p,q):=Q\u2208\u2126(T,p)ifq\u2208Qandde\ufb01ne\u2126(T,p,q):=\u222a{R\u2208\u2126(T,p):q#\u2208R}.ObservethatT=\u2126(T,p,q)\u222a\u2126(T,p,q)and{p}=\u2126(T,p,q)\u2229\u2126(T,p,q).Weshallusethenotation(v1\u2192a1,v2\u2192a2,...,vm\u2192am)torepresentalabelingof{v1,v2,...,vm}thatmapsvitoai.InTable1wenowgivetheweightsusedinouronlinemarginalizationalgorithm.The\u03b1a,\u03b2ab,\u03b3aweightsarecachedvaluesmaintainedbythealgorithmandtheweights\u03c1a,\u03b4$a,\u03b4%a,\u0001$a,and\u0001%aaretemporaryvalues4computed\u201con-the-\ufb02y.\u201dTheindicesa,b\u2208INkandthusthememoryrequirementsofouralgorithmarelinearinthecardinalityofthetreeandquadraticinthenumberoflabels.Identitiesforweightsandpotentials.Forthefollowinglemmaweintroducethenotionoftheextensionofalabelling.Weextendbyavertexv\u2208V(T)andalabela\u2208INk,thelabelling\u00b5\u2208L(X)tothelabelling\u00b5av\u2208L(X\u222a{v})whichsatis\ufb01es\u00b5av(v)=aand\u00b5av(X)=\u00b5.Lemma8.Givenatree,S\u2286T,andavertexv\u2208Sthenifv\u2208S\\\u2202T(S)\u03c8ST(\u00b5)=(a\u2208INkdva%R\u2208\u2126(S,v)\u03c8RT(\u00b5av(\u2202T(R)))elseifv\u2208\u2202T(S)then\u03c8ST(\u00b5)=%R\u2208\u2126(S,v)\u03c8RT(\u00b5(\u2202T(R)))ThusadirectconsequenceofLemma8isthatwecancomputethemarginalprobabilityatvasp(v\u2192a|\u0398,D)=\u03c1a(v)&b\u2208INk\u03c1b(v).Therecursiveapplicationofsuchfactorizationsisthebasisofouralgorithm(thesefactorizationsaresummarizedinTable2inthetechnicalappendices).Algorithminitializationandcomplexity.InFigure2wegiveouralgorithmforcomputingthemarginalsatverticeswithrespectto(\u0398,D).Anumberofouridentitiesassumedforagivenvertexthatitisintheinteriorofthetreeandhenceintheinteriorofdecompositiontree.Thusbeforewe\ufb01ndthehierarchicalcoverofourinputtreeweextendthetreebyaddinga\u201cdummy\u201dedgefromeachleafofthetreetoanewdummyvertex.Thesedummyedgesplaynoroleexcepttosimplifynotation.Thehierarchicalcoveristhenfoundonthisenlargedtree;thecoverheightmayatmostonlyincreasebyone.Bysettingthevaluesindummyedgesandverticesin\u0398andDtoone,thisensuresthatallmarginalcomputationsareunchanged.Therunningtimeofthealgorithmisasfollows.Thecomputationofthehierarchicalcover5islinearin|V(T)|asistheinitializationstep.Theupdateandmarginalizationarelinearincoverheight\u03c7\u2217(T).ThealgorithmalsoscalesquadraticallyinkonthemarginalizationstepandcubicallyinkonupdateasthemergeoftwoC2treesrequirethemultiplicationoftwok\u00d7kmatrices.Thusforexampleifthesetofpossiblelabelsislinearinthesizeofthetreeclassicalbeliefpropagationmaybefaster.FinallyweobservethatwemayreducethecubicdependencetoaquadraticdependenceonkviaacoverwiththeheightboundedbythediameterofTasopposedto\u03c7\u2217(T).Thisfollowsastheonlycubicstepisintheupdateofanon-atomic(non-edge)\u03b2-potential.Thusifwecanbuildacover,withonlyatomic\u03b2-potentialstherunningtimewillscalewithkquadratically.Weaccomplishthisbymodifyingthecoveralgorithm(Figure1)toonlymergeleafvertices.ObservethattheheightofthiscoverisnowO(diameter(T));andwehaveahierarchicalfactorizationinto\u03b1-potentialsandonlyatomic\u03b2-potentials.5Multi-tasklearningintheallocationmodelwithTREE-HEDGEWeconcludebysketchingasimpleonlinelearningapplicationtomulti-tasklearningthatisamenabletoourmethods.Theinspirationisthatwehavemultipletasksandagiventreestruc-turethatdescribesourpriorexpectationof\u201crelatedness\u201dbetweentasks(seee.g.,[7,Sec.3.1.3]).4Note:iffor\u03b3a(v)iftheproductisemptythentheproductevaluatesto1;andifv\u2208C1then\u0001$a(v):=1.5Theconstructionofthedecompositiontreemaybesimultaneouslyaccomplishedwiththesamecomplexity.7\fMarginalization(vertexv\u2208D\u2022):1.w\u2190r2.\u03c1a(w)\u2190\u03b3a(r)3.while(w&=v)4.w\u2190\u2191v(w)5.if(w\u2208C1)6.\u03b4#a(w)\u2190\u03c1a(\u2191(w))/\u03b1a(w)7.\u0001#a(w)\u2190&b\u03b2ab(*(w))\u03b4#b(w)8.\u03c1a(w)=\u03b3a(w)\u0001#a(w)9.else10.if(w=*(\u2191(w)))11.\u03b4#a(w)\u2190\u0001$a(\u2191(w))\u03b3a(\u2191(w))12.\u03b4$a(w)\u2190\u03b4#a(\u2191(w))13.else14.\u03b4#a(w)\u2190\u0001#a(\u2191(w))\u03b3a(\u2191(w))15.\u03b4$a(w)\u2190\u03b4$a(\u2191(w))16.\u0001#a(w)\u2190&b\u03b4#b(w)\u03b2ab(*(w))17.\u0001$a(w)\u2190&b\u03b4$b(w)\u03b2ab(+(w))18.\u03c1a(w)\u2190\u0001#a(w)\u0001$a(w)\u03b3a(w)19.20.Output:\u03c1a(v)/(&b\u03c1b(v))Initialization:The\u03b1,\u03b2and\u03b3weightsareinitialisedinabottom-upfashiononthedecompositiontree-weinitialisetheweightsofavertexafterwehaveinitialisedtheweightsofallitschildren.Speci\ufb01cally,we\ufb01rstdoadepth-\ufb01rstsearchofDstartingfromr:Whenwereachanedge(v,w)\u2208E(T),ifneithervorwisaleafthenweset\u03b2ab((v,w))\u2190\u03b8(v,w),a,botherwiseassumingwisaleafweset\u03b1a(v)\u21901(dummyedge).Whenwereachavertex,v\u2208V(T),forthelasttime(i.e.justbeforewebacktrackfromv)thenset:\u03b3a(v)\u2190dva\u2019w\u2208\u2193(v)\u2229C1\u03b1a(w),andifv\u2208C2then\u03b2ab(v)\u2190&c\u03b2ca(*(v))\u03b2cb(+(v))\u03b3c(v),orifv\u2208C1then\u03b1a(v)\u2190&c\u03b2ca(*(v))\u03b3c(v).Update(vertexv\u2208D\u2022;datad\u2208[0,\u221e)k):1.\u03b3a(v)\u2190\u03b3a(v)dadva;dv\u2190d;w\u2190v2.while(w&=r)3.if(w\u2208C1)4.\u03b1olda\u2190\u03b1a(w)5.\u03b1a(w)\u2190&c\u03b2ca(*(w))\u03b3c(w)6.\u03b3a(\u2191(w))\u2190\u03b3a(\u2191(w))\u03b1a(w)/\u03b1olda7.else8.\u03b2ab(w)\u2190&c\u03b2ca(*(w))\u03b2cb(+(w))\u03b3c(w)9.w\u2190\u2191(w)Figure2:Algorithm:Initialization,MarginalizationandUpdate1.Parameters:Atriple(T,\u0398,D1)and\u03b7\u2208(0,\u221e).2.Fort=1to!do3.Receive:vt\u2208V(T)4.Predict:\u02c6pt=(p(vt\u2192a|\u0398,Dt))a\u2208INk5.Receive:yt\u2208[0,1]k6.Incurloss:Lmix(yt,\u02c6pt)7.Update:Dt+1=Dt;Dt+1(vt)=(\u02c6pt(a)e\u2212\u03b7yt(a))a\u2208INkFigure3:TREE-HEDGEThuseachvertexrepresentsataskandifwehaveanedgebetweenverticesthenaprioriweexpectthosetaskstoberelated.Thusthehopeisthatinformationreceivedforonetask(vertex)willallowustoimproveourpredictionsonanothertask.ForuseachofthesetasksisanallocationtaskasaddressedoftenwiththeHEDGEalgorithm[4].AsimilarapplicationoftheHEDGEalgorithminmulti-tasklearningwasgivenin[8].Theirtheauthorsconsideredamorechallengingset-upwherethetaskstructureisunknownandthehopeistodowellifthereisaposterioriasmallcliqueofcloselyrelatedtasks.Ourstrongassumptionofprior\u201ctree-structured\u201dknowledgeallowsustoob-tainaveryef\ufb01cientalgorithmandsharpboundswhicharenotdirectlycomparabletotheirresults.Finally,thisset-upisalsocloselyrelatedtoonlinegraphlabelingproblemasine.g.,[9,10,11].Thustheset-upisasfollows.Weincorporateourpriorknowledgeoftask-relatednesswiththetriple(T,\u0398,D1).Thenonatrialt,thealgorithmisgivenavt\u2208V(T),representingthetask.Thealgorithmthengivesanon-negativepredictionvector\u02c6pt\u2208{p:&ka=1p(a)=1}fortaskvtandreceivesanoutcomeyt\u2208[0,1]k.ItthensuffersamixturelossLmix(yt,\u02c6pt):=yt\u00b7\u02c6pt.Theaimistopredicttominimizethisloss.WegivethealgorithminFigure3.ThenotationfollowsSection4andthemethodthereinimpliesthatoneachtrialwecanpredictandupdateinO(\u03c7\u2217(T))time.Weobtainthefollowingtheorem(aproofsketchiscontainedinappendixCofthelongversion).Theorem9.GivenatreeT,avertexsequence$v1,...,v\"%andanoutcomesequence$y1,...,y\"%thelossoftheTREE-HEDGEalgorithmwiththeparameters(\u0398,D1)and\u03b7>0is,foralllabelings\u00b5\u2208INV(T)k,boundedby!(t=1Lmix(yt,\u02c6pt)\u2264c\u03b7)!(t=1yt(\u00b5(vt))+ln2\u03b71log2p(\u00b5|\u0398,D1)*withc\u03b7:=\u03b71\u2212e\u2212\u03b7.(4)Acknowledgements.WewouldliketothankDavidBarber,GuyLeverandMassimilianoPontilforvaluablediscussions.We,also,acknowledgethe\ufb01nancialsupportofthePASCAL2EuropeanNetworkofExcellence.8\fReferences[1]DavidBarber.BayesianReasoningandMachineLearning.CambridgeUniversityPress,2012.[2]ChristopherM.Bishop.PatternRecognitionandMachineLearning.Springer,2006.[3]FrankR.Kschischang,BrendenJ.Frey,andHansAndreaLoeliger.Factorgraphsandthesum-productalgorithm.IEEETransactionsonInformationTheory,47(2):498\u2013519,2001.[4]YoavFreundandRobertESchapire.Adecision-theoreticgeneralizationofon-linelearningandanapplicationtoboosting.JournalofComputerandSystemSciences,55(1):119\u2013139,1997.[5]JudeaPearl.ReverendBayesoninferenceengines:Adistributedhierarchicalapproach.InProc.Natl.Conf.onAI,pages133\u2013136,1982.[6]ArthurL.Delcher,AdamJ.Grove,SimonKasif,andJudeaPearl.Logarithmic-timeupdatesandqueriesinprobabilisticnetworks.J.Artif.Int.Res.,4:37\u201359,February1996.[7]TheodorosEvgeniou,CharlesA.Micchelli,andMassimilianoPontil.Learningmultipletaskswithkernelmethods.JournalofMachineLearningResearch,6:615\u2013637,2005.[8]JacobAbernethy,PeterL.Bartlett,andAlexanderRakhlin.Multitasklearningwithexpertadvice.InCOLT,pages484\u2013498,2007.[9]MarkHerbster,MassimilianoPontil,andLisaWainer.Onlinelearningovergraphs.InICML,pages305\u2013312.ACM,2005.[10]MarkHerbster,GuyLever,andMassimilianoPontil.Onlinepredictiononlargediametergraphs.InNIPS,pages649\u2013656.MITPress,2008.[11]Nicol`oCesa-Bianchi,ClaudioGentile,andFabioVitale.Fastandoptimalpredictiononalabeledtree.InCOLT,2009.9\f", "award": [], "sourceid": 4586, "authors": [{"given_name": "Mark", "family_name": "Herbster", "institution": null}, {"given_name": "Stephen", "family_name": "Pasteris", "institution": null}, {"given_name": "Fabio", "family_name": "Vitale", "institution": null}]}