{"title": "Fast and Guaranteed Tensor Decomposition via Sketching", "book": "Advances in Neural Information Processing Systems", "page_first": 991, "page_last": 999, "abstract": "Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized com- putation of tensor contractions via FFTs, without explicitly forming the tensors. Such tensor contractions are encountered in decomposition methods such as ten- sor power iterations and alternating least squares. We also design novel colliding hashes for symmetric tensors to further save time in computing the sketches. We then combine these sketching ideas with existing whitening and tensor power iter- ative techniques to obtain the fastest algorithm on both sparse and dense tensors. The quality of approximation under our method does not depend on properties such as sparsity, uniformity of elements, etc. We apply the method for topic mod- eling and obtain competitive results.", "full_text": "FastandGuaranteedTensorDecompositionviaSketchingYiningWang,Hsiao-YuTung,AlexSmolaMachineLearningDepartmentCarnegieMellonUniversity,Pittsburgh,PA15213{yiningwa,htung}@cs.cmu.edualex@smola.orgAnimaAnandkumarDepartmentofEECSUniversityofCaliforniaIrvineIrvine,CA92697a.anandkumar@uci.eduAbstractTensorCANDECOMP/PARAFAC(CP)decompositionhaswideapplicationsinstatisticallearningoflatentvariablemodelsandindatamining.Inthispaper,weproposefastandrandomizedtensorCPdecompositionalgorithmsbasedonsketching.Webuildontheideaofcountsketches,butintroducemanynovelideaswhichareuniquetotensors.Wedevelopnovelmethodsforrandomizedcom-putationoftensorcontractionsviaFFTs,withoutexplicitlyformingthetensors.Suchtensorcontractionsareencounteredindecompositionmethodssuchasten-sorpoweriterationsandalternatingleastsquares.Wealsodesignnovelcollidinghashesforsymmetrictensorstofurthersavetimeincomputingthesketches.Wethencombinethesesketchingideaswithexistingwhiteningandtensorpoweriter-ativetechniquestoobtainthefastestalgorithmonbothsparseanddensetensors.Thequalityofapproximationunderourmethoddoesnotdependonpropertiessuchassparsity,uniformityofelements,etc.Weapplythemethodfortopicmod-elingandobtaincompetitiveresults.Keywords:TensorCPdecomposition,countsketch,randomizedmethods,spec-tralmethods,topicmodeling1IntroductionInmanydata-richdomainssuchascomputervision,neuroscienceandsocialnetworksconsistingofmulti-modalandmulti-relationaldata,tensorshaveemergedasapowerfulparadigmforhan-dlingthedatadeluge.Animportantoperationwithtensordataisitsdecomposition,wheretheinputtensorisdecomposedintoasuccinctform.OneofthepopulardecompositionmethodsistheCANDECOMP/PARAFAC(CP)decomposition,alsoknownascanonicalpolyadicdecomposition[12,5],wheretheinputtensorisdecomposedintoasuccinctsumofrank-1components.TheCPdecompositionhasfoundnumerousapplicationsindatamining[4,18,20],computationalneuro-science[10,21],andrecently,instatisticallearningforlatentvariablemodels[1,30,28,6].Forlatentvariablemodeling,thesemethodsyieldconsistentestimatesundermildconditionssuchasnon-degeneracyandrequireonlypolynomialsampleandcomputationalcomplexity[1,30,28,6].Giventheimportanceoftensormethodsforlarge-scalemachinelearning,therehasbeenanin-creasinginterestinscalinguptensordecompositionalgorithmstohandlegiganticreal-worlddatatensors[27,24,8,16,14,2,29].However,thepreviousworksfallshortinmanyways,asdescribedsubsequently.Inthispaper,wedesignandanalyzeef\ufb01cientrandomizedtensormethodsusingideasfromsketching[23].Theideaistomaintainalow-dimensionalsketchofaninputtensorandthenperformimplicittensordecompositionusingexistingmethodssuchastensorpowerupdates,alter-natingleastsquaresoronlinetensorupdates.Weobtainthefastestdecompositionmethodsforbothsparseanddensetensors.Ourframeworkcaneasilyhandlemodernmachinelearningapplicationswithbillionsoftraininginstances,andatthesametime,comeswithattractivetheoreticalguarantees.1\fOurmaincontributionsareasfollows:Ef\ufb01cienttensorsketchconstruction:Weproposeef\ufb01cientconstructionoftensorsketcheswhentheinputtensorisavailableinfactoredformssuchasinthecaseofempiricalmomenttensors,wherethefactorcomponentscorrespondtorank-1tensorsoverindividualdatasamples.Weconstructthetensorsketchviaef\ufb01cientFFToperationsonthecomponentvectors.Sketchingeachrank-1componenttakesO(n+blogb)operationswherenisthetensordimensionandbisthesketchlength.ThisismuchfasterthantheO(np)complexityforbruteforcecomputationsofapth-ordertensor.SinceempiricalmomenttensorsareavailableinthefactoredformwithNcomponents,whereNisthenumberofsamples,ittakesO((n+blogb)N)operationstocomputethesketch.Implicittensorcontractioncomputations:Almostalltensormanipulationscanbeexpressedintermsoftensorcontractions,whichinvolvesmultilinearcombinationsofdifferenttensor\ufb01bres[19].Forexample,tensordecompositionmethodssuchastensorpoweriterations,alternatingleastsquares(ALS),whiteningandonlinetensormethodsallinvolvetensorcontractions.Weproposeahighlyef\ufb01cientmethodtodirectlycomputethetensorcontractionswithoutformingtheinputtensorex-plicitly.Inparticular,giventhesketchofatensor,eachtensorcontractioncanbecomputedinO(n+blogb)operations,regardlessoforderofthesourceanddestinationtensors.Thissigni\ufb01-cantlyacceleratesthebrute-forceimplementationthatrequiresO(np)complexityforpth-orderten-sorcontraction.Inaddition,inmanyapplications,theinputtensorisnotdirectlyavailableandneedstobecomputedfromsamples,suchasthecaseofempiricalmomenttensorsforspectrallearningoflatentvariablemodels.Insuchcases,ourmethodresultsinhugesavingsbycombiningimplicittensorcontractioncomputationwithef\ufb01cienttensorsketchconstruction.Novelcollidinghashesforsymmetrictensors:Whentheinputtensorissymmetric,whichisthecaseforempiricalmomenttensorsthatariseinspectrallearningapplications,weproposeanovelcollidinghashdesignbyreplacingtheBooleanringwiththecomplexringCtohandlemultiplicities.Asaresult,itmakesthesketchbuildingprocessmuchfasterandavoidsrepetitiveFFToperations.Thoughthecomputationalcomplexityremainsthesame,theproposedcollidinghashdesignresultsinsigni\ufb01cantspeed-upinpracticebyreducingtheactualnumberofcomputations.Theoreticalandempiricalguarantees:Weshowthatthequalityofthetensorsketchdoesnotdependonsparseness,uniformentrydistribution,oranyotherpropertiesoftheinputtensor.Ontheotherhand,previousworksassumespeci\ufb01csettingssuchassparsetensors[24,8,16],ortensorshavingentrieswithsimilarmagnitude[27].Suchassumptionsareunrealistic,andinpractice,wemayhavebothdenseandspikytensors,forexample,unorderedwordtrigramsinnaturallanguageprocessing.Weprovethatourproposedrandomizedmethodfortensordecompositiondoesnotleadtoanysigni\ufb01cantdegradationofaccuracy.Experimentsonsyntheticandreal-worlddatasetsshowhighlycompetitiveresults.Wedemonstratea10xto100xspeed-upoverexactmethodsfordecomposingdense,high-dimensionaltensors.Fortopicmodeling,weshowasigni\ufb01cantreductionincomputationaltimeoverexistingspectralLDAimplementationswithsmallperformanceloss.Inaddition,ourproposedalgorithmoutperformscollapsedGibbssamplingwhenrunningtimeisconstrained.WealsoshowthatifaGibbssamplerisinitializedwithouroutputtopics,itconvergeswithinseveraliterationsandoutperformsarandomlyinitializedGibbssamplerrunformuchmoreiterations.Sinceourproposedmethodisef\ufb01cientandavoidslocaloptima,itcanbeusedtoacceleratetheslowburn-inphaseinGibbssampling.RelatedWorks:Therehavebeenmanyworksondeployingef\ufb01cienttensordecompositionmeth-ods[27,24,8,16,14,2,29].Mostoftheseworksexcept[27,2]implementthealternatingleastsquares(ALS)algorithm[12,5].However,thisisextremelyexpensivesincetheALSmethodisrunintheinputspace,whichrequiresO(n3)operationstoexecuteoneleastsquaressteponann-dimensional(dense)tensor.Thus,theyareonlysuitedforextremelysparsetensors.Analternativemethodisto\ufb01rstreducethedimensionoftheinputtensorthroughproceduressuchaswhiteningtoO(k)dimension,wherekisthetensorrank,andthencarryoutALSinthedimension-reducedspaceonk\u00d7k\u00d7ktensor[13].Thisresultsinsigni\ufb01cantreductionofcomputationalcomplexitywhentherankissmall(k(cid:28)n).Nonetheless,inpractice,suchcomplexityisstillprohibitivelyhighaskcouldbeseveralthousandsinmanysettings.Tomakemattersevenworse,whenthetensorcorrespondstoempiricalmomentscomputedfromsamples,suchasinspectrallearningoflatentvariablemodels,itisactuallymuchslowertoconstructthereduceddimension2\fTable1:Summaryofnotations.SeealsoAppendixF.VariablesOperatorMeaningVariablesOperatorMeaninga,b\u2208Cna\u25e6b\u2208CnElement-wiseproducta\u2208Cna\u22973\u2208Cn\u00d7n\u00d7na\u2297a\u2297aa,b\u2208Cna\u2217b\u2208CnConvolutionA,B\u2208Cn\u00d7mA(cid:12)B\u2208Cn2\u00d7mKhatri-Raoproducta,b\u2208Cna\u2297b\u2208Cn\u00d7nTensorproductT\u2208Cn\u00d7n\u00d7nT(1)\u2208Cn\u00d7n2Modeexpansionk\u00d7k\u00d7ktensorfromtrainingdatathantodecomposeit,sincethenumberoftrainingsamplesistypicallyverylarge.Anotheralternativeistocarryoutonlinetensordecomposition,asopposedtobatchoperationsintheaboveworks.Suchmethodsareextremelyfast[14],butcansufferfromhighvariance.Thesketchingideasdevelopedinthispaperwillimproveourabilitytohandlelargersizesofmini-batchesandthereforeresultinreducedvarianceinonlinetensormethods.Anotheralternativemethodistoconsiderarandomizedsamplingoftheinputtensorineachiterationoftensordecomposition[27,2].However,suchmethodscanbeexpensiveduetoI/Ocallsandaresensitivetothesamplingdistribution.Inparticular,[27]employsuniformsampling,whichisincapableofhandlingtensorswithspikyelements.Thoughnon-uniformsamplingisadoptedin[2],itrequiresanadditionalpassoverthetrainingdatatocomputethesamplingdistribution.Incontrast,oursketchbasedmethodtakesonlyonepassofthedata.2PreliminariesTensor,tensorproductandtensordecompositionA3rdordertensor1Tofdimensionnhasn3entries.EachentrycanberepresentedasTijkfori,j,k\u2208{1,\u00b7\u00b7\u00b7,n}.Forann\u00d7n\u00d7ntensorTandavectoru\u2208Rn,wede\ufb01netwoformsoftensorproducts(contractions)asfollows:T(u,u,u)=nXi,j,k=1Ti,j,kuiujuk;T(I,u,u)=\uf8ee\uf8f0nXj,k=1T1,j,kujuk,\u00b7\u00b7\u00b7,nXj,k=1Tn,j,kujuk\uf8f9\uf8fb.NotethatT(u,u,u)\u2208RandT(I,u,u)\u2208Rn.FortwocomplextensorsA,Bofthesameorderanddimension,itsinnerproductisde\ufb01nedashA,Bi:=PlAlBl,wherelrangesoveralltuplesthatindexthetensors.TheFrobeniusnormofatensorissimplykAkF=phA,Ai.Therank-kCPdecompositionofa3rd-ordern-dimensionaltensorT\u2208Rn\u00d7n\u00d7nin-volvesscalars{\u03bbi}ki=1andn-dimensionalvectors{ai,bi,ci}ki=1suchthattheresidualkT\u2212Pki=1\u03bbiai\u2297bi\u2297cik2Fisminimized.HereR=a\u2297b\u2297cisa3rdordertensorde\ufb01nedasRijk=aibjck.Additionalnotationsarede\ufb01nedinTable1andAppendixF.RobusttensorpowermethodThemethodwasproposedin[1]andwasshowntoprovablysuc-ceediftheinputtensorisanoisyperturbationofthesumofkrank-1tensorswhosebasevectorsareorthogonal.FixaninputtensorT\u2208Rn\u00d7n\u00d7n,ThebasicideaistorandomlygenerateLinitialvectorsandperformTpowerupdatesteps:\u02c6u=T(I,u,u)/kT(I,u,u)k2.ThevectorthatresultsinthelargesteigenvalueT(u,u,u)isthenkeptandsubsequenteigenvectorscanbeobtainedviade\ufb02ation.Ifimplementednaively,thealgorithmtakesO(kn3LT)timetorun2,requiringO(n3)storage.Inaddition,incertaincaseswhenasecond-ordermomentmatrixisavailable,thetensorpowermethodcanbecarriedoutonak\u00d7k\u00d7kwhitenedtensor[1],thusimprovingthetimecom-plexitybyavoidingdependenceontheambientdimensionn.Apartfromthetensorpowermethod,otheralgorithmssuchasAlternatingLeastSquares(ALS,[12,5])andStochasticGradientDescent(SGD,[14])havealsobeenappliedtotensorCPdecomposition.TensorsketchTensorsketchwasproposedin[23]asageneralizationofcountsketch[7].ForatensorTofdimensionn1\u00d7\u00b7\u00b7\u00b7\u00d7np,randomhashfunctionsh1,\u00b7\u00b7\u00b7,hp:[n]\u2192[b]withPrhj[hj(i)=t]=1/bforeveryi\u2208[n],j\u2208[p],t\u2208[b]andbinaryRademachervariables\u03be1,\u00b7\u00b7\u00b7,\u03bep:[n]\u2192{\u00b11},thesketchsT:[b]\u2192RoftensorTisde\ufb01nedassT(t)=XH(i1,\u00b7\u00b7\u00b7,ip)=t\u03be1(i1)\u00b7\u00b7\u00b7\u03bep(ip)Ti1,\u00b7\u00b7\u00b7,ip,(1)1Thoughwemainlyfocuson3rdordertensorsinthiswork,extensiontohigherordertensorsiseasy.2LisusuallysettobealinearfunctionofkandTislogarithmicinn;seeTheorem5.1in[1].3\fwhereH(i1,\u00b7\u00b7\u00b7,ip)=(h1(i1)+\u00b7\u00b7\u00b7+hp(ip))modb.ThecorrespondingrecoveryruleisbTi1,\u00b7\u00b7\u00b7,ip=\u03be1(i1)\u00b7\u00b7\u00b7\u03bep(ip)sT(H(i1,\u00b7\u00b7\u00b7,ip)).Foraccuraterecovery,Hneedstobe2-wisein-dependent,whichisachievedbyindependentlyselectingh1,\u00b7\u00b7\u00b7,hpfroma2-wiseindependenthashfamily[26].Finally,theestimationcanbemademorerobustbythestandardapproachoftakingBindependentsketchesofthesametensorandthenreportthemedianoftheBestimates[7].3FasttensordecompositionviasketchingInthissectionwe\ufb01rstintroduceanef\ufb01cientprocedureforcomputingsketchesoffactoredorempir-icalmomenttensors,whichappearinawidevarietyofapplicationssuchasparameterestimationoflatentvariablemodels.Wethenshowhowtoruntensorpowermethoddirectlyonthesketchwithreducedcomputationalcomplexity.Inaddition,whenaninputtensorissymmetric(i.e.,Tijkthesameforallpermutationsofi,j,k)weproposeanovel\u201ccollidinghash\u201ddesign,whichspeedsupthesketchbuildingprocess.Duetospacelimitsweonlyconsidertherobusttensorpowermethodinthemaintext.MethodsandexperimentsforsketchingbasedALSarepresentedinAppendixC.Toavoidconfusions,weemphasizethatnisusedtodenotethedimensionofthetensortobedecom-posed,whichisnotnecessarilythesameasthedimensionoftheoriginaldatatensor.Indeed,oncewhiteningisappliedncouldbeassmallastheintrinsicdimensionkoftheoriginaldatatensor.3.1Ef\ufb01cientsketchingofempiricalmomenttensorsSketchinga3rd-orderdensen-dimensionaltensorviaEq.(1)takesO(n3)operations,whichingeneralcannotbeimprovedbecausetheinputsizeis\u2126(n3).However,inpracticedatatensorsareusuallystructured.Onenotableexampleisempiricalmomenttensors,whicharisesnaturallyinparameterestimationproblemsoflatentvariablemodels.Morespeci\ufb01cally,anempiricalmomenttensorcanbeexpressedasT=\u02c6E[x\u22973]=1NPNi=1x\u22973i,whereNisthetotalnumberoftrainingdatapointsandxiistheithdatapoint.Inthissectionweshowthatcomputingsketchesofsuchtensorscanbemadesigni\ufb01cantlymoreef\ufb01cientthanthebrute-forceimplementationsviaEq.(1).Themainideaistosketchlow-rankcomponentsofTef\ufb01cientlyviaFFT,atrickinspiredbypreviouseffortsonsketchingbasedmatrixmultiplicationandkernellearning[22,23].WeconsiderthemoregeneralizedcasewhenaninputtensorTcanbewrittenasaweightedsumofknownrank-1components:T=PNi=1aiui\u2297vi\u2297wi,whereaiarescalarsandui,vi,wiareknownn-dimensionalvectors.Thekeyobservationisthatthesketchofeachrank-1componentTi=ui\u2297vi\u2297wicanbeef\ufb01cientlycomputedbyFFT.Inparticular,sTicanbecomputedassTi=s1,ui\u2217s2,vi\u2217s3,wi=F\u22121(F(s1,ui)\u25e6F(s2,vi)\u25e6F(s3,wi)),(2)where\u2217denotesconvolutionand\u25e6standsforelement-wisevectorproduct.s1,u(t)=Ph1(i)=t\u03be1(i)uiisthecountsketchofuands2,v,s3,warede\ufb01nedsimilarly.FandF\u22121de-notetheFastFourierTransform(FFT)anditsinverseoperator.ByapplyingFFT,wereducetheconvolutioncomputationintoelement-wiseproductevaluationintheFourierspace.Therefore,sTcanbecomputedusingO(n+blogb)operations,wheretheO(blogb)termarisesfromFFTevalua-tions.Finally,becausethesketchingoperatorislinear(i.e.,s(PiaiTi)=Piais(Ti)),sTcanbecomputedinO(N(n+blogb)),whichismuchcheaperthanbrute-forcethattakesO(Nn3)time.3.2FastrobusttensorpowermethodWearenowreadytopresentthefastrobusttensorpowermethod,themainalgorithmofthispaper.Thecomputationalbottleneckoftheoriginalrobusttensorpowermethodisthecomputationoftwotensorproducts:T(I,u,u)andT(u,u,u).AnaiveimplementationrequiresO(n3)operations.Inthissection,weshowhowtospeedupcomputationoftheseproducts.WeshowthatgiventhesketchofaninputtensorT,onecanapproximatelycomputebothT(I,u,u)andT(u,u,u)inO(blogb+n)steps,wherebisthehashlength.Beforegoingintodetails,weexplainthekeyideabehindourfasttensorproductcomputation.ForanytwotensorsA,B,itsinnerproducthA,Bicanbeapproximatedby4hA,Bi\u2248hsA,sBi.(3)3<(\u00b7)denotestherealpartofacomplexnumber.med(\u00b7)denotesthemedian.4Allapproximationswillbetheoreticallyjusti\ufb01edinSection4andAppendixE.2.4\fAlgorithm1Fastrobusttensorpowermethod1:Input:noisysymmetrictensor\u00afT=T+E\u2208Rn\u00d7n\u00d7n;targetrankk;numberofinitializationsL,numberofiterationsT,hashlengthb,numberofindependentsketchesB.2:Initialization:h(m)j,\u03be(m)jforj\u2208{1,2,3}andm\u2208[B];computesketchess(m)\u00afT\u2208Cb.3:for\u03c4=1toLdo4:Drawu(\u03c4)0uniformlyatrandomfromunitsphere.5:fort=1toTdo6:Foreachm\u2208[B],j\u2208{2,3}computethesketchofu(\u03c4)t\u22121usingh(m)j,\u03be(m)jviaEq.(1).7:Computev(m)\u2248\u00afT(I,u(\u03c4)t\u22121,u(\u03c4)t\u22121)asfollows:\ufb01rstevaluate\u00afs(m)=F\u22121(F(s(m)\u00afT)\u25e6F(s(m)2,u)\u25e6F(s(m)3,u)).Set[v(m)]ias[v(m)]i\u2190\u03be1(i)[\u00afs(m)]h1(i)foreveryi\u2208[n].8:Set\u00afvi\u2190med(<(v(1)i),\u00b7\u00b7\u00b7,<(v(B)i))3.Update:u(\u03c4)t=\u00afv/k\u00afvk.9:SelectionCompute\u03bb(m)\u03c4\u2248\u00afT(u(\u03c4)T,u(\u03c4)T,u(\u03c4)T)usings(m)\u00afTfor\u03c4\u2208[L]andm\u2208[B].Evaluate\u03bb\u03c4=med(\u03bb(1)\u03c4,\u00b7\u00b7\u00b7,\u03bb(B)\u03c4)and\u03c4\u2217=argmax\u03c4\u03bb\u03c4.Set\u02c6\u03bb=\u03bb\u03c4\u2217and\u02c6u=u(\u03c4\u2217)T.10:De\ufb02ationForeachm\u2208[B]computesketch\u02dcs(m)\u2206Tfortherank-1tensor\u2206T=\u02c6\u03bb\u02c6u\u22973.11:Output:theeigenvalue/eigenvectorpair(\u02c6\u03bb,\u02c6u)andsketchesofthede\ufb02atedtensor\u00afT\u2212\u2206T.Table2:Computationalcomplexityofsketchedandplaintensorpowermethod.nisthetensordimension;kistheintrinsictensorrank;bisthesketchlength.Per-sketchtimecomplexityisshown.PLAINSKETCHPLAIN+WHITENINGSKETCH+WHITENINGpreprocessing:generaltensors-O(n3)O(kn3)O(n3)preprocessing:factoredtensorsO(Nn3)O(N(n+blogb))O(N(nk+k3))O(N(nk+blogb))withNcomponentspertensorcontractiontimeO(n3)O(n+blogb)O(k3)O(k+blogb)Eq.(3)immediatelyresultsinafastapproximationprocedureofT(u,u,u)becauseT(u,u,u)=hT,XiwhereX=u\u2297u\u2297uisarankonetensor,whosesketchcanbebuiltinO(n+blogb)timebyEq.(2).Consequently,theproductcanbeapproximatelycomputedusingO(n+blogb)operationsifthetensorsketchofTisavailable.FortensorproductoftheformT(I,u,u).TheithcoordinateintheresultcanbeexpressedashT,YiiwhereYi=ei\u2297u\u2297u;ei=(0,\u00b7\u00b7\u00b7,0,1,0,\u00b7\u00b7\u00b7,0)istheithindicatorvector.WecanthenapplyEq.(3)toapproximatelycomputehT,Yiief\ufb01ciently.However,thismethodisnotcompletelysatisfactorybecauseitrequiressketchingnrank-1tensors(Y1throughYn),whichresultsinO(n)FFTevaluationsbyEq.(2).BelowwepresentapropositionthatallowsustouseonlyO(1)FFTstoapproximateT(I,u,u).Proposition1.hsT,s1,ei\u2217s2,u\u2217s3,ui=hF\u22121(F(sT)\u25e6F(s2,u)\u25e6F(s3,u)),s1,eii.Proposition1isprovedinAppendixE.1.Themainideaisto\u201cshift\u201dalltermsnotdependingonitotheleftsideoftheinnerproductandeliminatetheinverseFFToperationontherightsidesothatseicontainsonlyonenonzeroentry.Asaresult,wecancomputeF\u22121(F(sT)\u25e6F(s2,u)\u25e6F(s3,u))onceandreadoffeachentryofT(I,u,u)inconstanttime.Inaddition,thetechniquecanbefurtherextendedtosymmetrictensorsketches,withdetailsdeferredtoAppendixBduetospacelimits.Whenoperatingonann-dimensionaltensor,ThealgorithmrequiresO(kLT(n+Bblogb))runningtime(excludingthetimeforbuilding\u02dcs\u00afT)andO(Bb)memory,whichsigni\ufb01cantlyimprovestheO(kn3LT)timeandO(n3)spacecomplexityoverthebruteforcetensorpowermethod.HereL,Tarealgorithmparametersforrobusttensorpowermethod.PreviousanalysisshowsthatT=O(logk)andL=poly(k),wherepoly(\u00b7)issomeloworderpolynomialfunction.[1]Finally,Table2summarizescomputationalcomplexityofsketchedandplaintensorpowermethod.3.3CollidinghashandsymmetrictensorsketchForsymmetricinputtensors,itispossibletodesignanewstyleoftensorsketchthatcanbebuiltmoreef\ufb01ciently.Theideaistodesignhashfunctionsthatdeliberatelycollidesymmetricentries,i.e.,(i,j,k),(j,i,k),etc.Consequently,weonlyneedtoconsiderentriesTijkwithi\u2264j\u2264kwhenbuildingtensorsketches.AnintuitiveideaistousethesamehashfunctionandRademacherrandomvariableforeachorder,thatis,h1(i)=h2(i)=h3(i)=:h(i)and\u03be1(i)=\u03be2(i)=\u03be3(i)=:\u03be(i).5\fInthisway,allpermutationsof(i,j,k)willcollidewitheachother.However,suchadesignhasanissuewithrepeatedentriesbecause\u03be(i)canonlytake\u00b11values.Consider(i,i,k)and(j,j,k)asanexample:\u03be(i)2\u03be(k)=\u03be(j)2\u03be(k)withprobability1evenifi6=j.Ontheotherhand,weneedE[\u03be(a)\u03be(b)]=0foranypairofdistinct3-tuplesaandb.Toaddresstheabove-mentionedissue,weextendtheRademacherrandomvariablestothecomplexdomainandconsiderallrootsofzm=1,thatis,\u2126={\u03c9j}m\u22121j=0where\u03c9j=ei2\u03c0jm.Suppose\u03c3(i)isaRademacherrandomvariablewithPr[\u03c3(i)=\u03c9i]=1/m.Byelementaryalgebra,E[\u03c3(i)p]=0whenevermisrelativeprimetopormcanbedividedbyp.Therefore,bysettingm=4weavoidcollisionsofrepeatedentriesina3rdordertensor.Morespeci\ufb01cally,ThesymmetrictensorsketchofasymmetrictensorT\u2208Rn\u00d7n\u00d7ncanbede\ufb01nedas\u02dcsT(t):=X\u02dcH(i,j,k)=tTi,j,k\u03c3(i)\u03c3(j)\u03c3(k),(4)where\u02dcH(i,j,k)=(h(i)+h(j)+h(k))modb.Torecoveranentry,weusebTi,j,k=1/\u03ba\u00b7\u03c3(i)\u00b7\u03c3(j)\u00b7\u03c3(k)\u00b7\u02dcsT(H(i,j,k)),(5)where\u03ba=1ifi=j=k;\u03ba=3ifi=jorj=kori=k;\u03ba=6otherwise.Forhigherordertensors,thecoef\ufb01cientscanbecomputedviatheYoungtableauxwhichcharacterizessymmetriesunderthepermutationgroup.Comparedtoasymmetrictensorsketches,thehashfunctionhneedstosatisfystrongerindependenceconditionsbecauseweareusingthesamehashfunctionforeachorder.Inourcase,hneedstobe6-wiseindependenttomake\u02dcH2-wiseindependent.Thefactisduetothefollowingproposition,whichisprovedinAppendixE.1.Proposition2.Fixpandq.Forh:[n]\u2192[b]de\ufb01nesymmetricmapping\u02dcH:[n]p\u2192[b]as\u02dcH(i1,\u00b7\u00b7\u00b7,ip)=h(i1)+\u00b7\u00b7\u00b7+h(ip).Ifhis(pq)-wiseindependentthenHisq-wiseindependent.Thesymmetrictensorsketchdescribedabovecansigni\ufb01cantlyspeedupsketchbuildingprocesses.ForageneraltensorwithMnonzeroentries,tobuild\u02dcsToneonlyneedstoconsiderroughlyM/6entries(thoseTijk6=0withi\u2264j\u2264k).Forarank-1tensoru\u22973,onlyoneFFTisneededtobuildF(\u02dcs);incontrast,tocomputeEq.(2)oneneedsatleast3FFTevaluations.Finally,inAppendixBwegivedetailsonhowtoseamlesslycombinesymmetrichashingandtech-niquesinprevioussectionstoef\ufb01cientlyconstructanddecomposeatensor.4ErroranalysisInthissectionweprovidetheoreticalanalysisonapproximationerrorofbothtensorsketchandthefastsketchedrobusttensorpowermethod.Wemainlyfocusonsymmetrictensorsketches,whileextensiontoasymmetricsettingsistrivial.Duetospacelimits,allproofsareplacedintheappendix.4.1TensorsketchconcentrationboundsTheorem1boundstheapproximationerrorofsymmetrictensorsketcheswhencomputingT(u,u,u)andT(I,u,u).ItsproofisdeferredtoAppendixE.2.Theorem1.FixasymmetricrealtensorT\u2208Rn\u00d7n\u00d7nandarealvectoru\u2208Rnwithkuk2=1.Suppose\u03b51,T(u)\u2208Rand\u03b52,T(u)\u2208RnareestimationerrorsofT(u,u,u)andT(I,u,u)usingBindependentsymmetrictensorsketches;thatis,\u03b51,T(u)=bT(u,u,u)\u2212T(u,u,u)and\u03b52,T(u)=bT(I,u,u)\u2212T(I,u,u).IfB=\u2126(log(1/\u03b4))thenwithprobability\u22651\u2212\u03b4thefollowingerrorboundshold:(cid:12)(cid:12)\u03b51,T(u)(cid:12)(cid:12)=O(kTkF/\u221ab);(cid:12)(cid:12)[\u03b52,T(u)]i(cid:12)(cid:12)=O(kTkF/\u221ab),\u2200i\u2208{1,\u00b7\u00b7\u00b7,n}.(6)Inaddition,forany\ufb01xedw\u2208Rn,kwk2=1withprobability\u22651\u2212\u03b4wehavehw,\u03b52,T(u)i2=O(kTk2F/b).(7)4.2AnalysisofthefasttensorpowermethodWepresentatheoremanalyzingrobusttensorpowermethodwithtensorsketchapproximations.AmoredetailedtheoremstatementalongwithitsproofcanbefoundinAppendixE.3.Theorem2.Suppose\u00afT=T+E\u2208Rn\u00d7n\u00d7nwhereT=Pki=1\u03bbiv\u22973iwithanorthonor-malbasis{vi}ki=1,\u03bb1>\u00b7\u00b7\u00b7>\u03bbk>0andkEk=\u0001.Let{(\u02c6\u03bbi,\u02c6vi)}ki=1betheeigen-6\fTable3:Squaredresidualnormontop10recoveredeigenvectorsof1000dtensorsandrunningtime(excludingI/Oandsketchbuildingtime)forplain(exact)andsketchedrobusttensorpowermethods.Twovectorsareconsideredmismatch(wrong)ifkv\u2212\u02c6vk22>0.1.AextendedversionisshownasTable5inAppendixA.ResidualnormNo.ofwrongvectorsRunningtime(min.)log2(b):121314151612131415161213141516\u03c3=.01B=20.40.19.10.09.0886300.851.63.57.416.6B=30.26.10.09.08.07752001.32.45.311.324.6B=40.17.10.08.08.07740001.83.37.315.233.0Exact.070293.5Table4:Negativelog-likelihoodandrunningtime(min)onthelargeWikipediadatasetfor200and300topics.klike.timelog2bitersklike.timelog2biters200Spectral7.493412-3007.395613-Gibbs6.85561-306.38818-30Hybrid6.771441256.313521310value/eigenvectorpairsobtainedbyAlgorithm1.Suppose\u0001=O(1/(\u03bb1n)),T=\u2126(log(n/\u03b4)+log(1/\u0001)maxi\u03bbi/(\u03bbi\u2212\u03bbi\u22121))andLgrowslinearlywithk.Assumetherandomnessofthetensorsketchisindependentamongtensorproductevaluations.IfB=\u2126(log(n/\u03b4))andbsatis\ufb01esb=\u2126(cid:18)max(cid:26)\u0001\u22122kTk2F\u2206(\u03bb)2,\u03b4\u22124n2kTk2Fr(\u03bb)2\u03bb21(cid:27)(cid:19)(8)where\u2206(\u03bb)=mini(\u03bbi\u2212\u03bbi\u22121)andr(\u03bb)=maxi,j>i(\u03bbi/\u03bbj),thenwithprobability\u22651\u2212\u03b4thereexistsapermutation\u03c0over[k]suchthatkv\u03c0(i)\u2212\u02c6vik2\u2264\u0001,|\u03bb\u03c0(i)\u2212\u02c6\u03bbi|\u2264\u03bbi\u0001/2,\u2200i\u2208{1,\u00b7\u00b7\u00b7,k}(9)andkT\u2212Pki=1\u02c6\u03bbi\u02c6v\u22973ik\u2264c\u0001forsomeconstantc.Theorem1showsthatthesketchlengthbcanbesetaso(n3)toprovablyapproximatelydecomposea3rd-ordertensorwithdimensionn.Theorem1togetherwithtimecomplexitycomparisoninTable2showsthatthesketchingbasedfasttensordecompositionalgorithmhasbettercomputationalcom-plexityoverbrute-forceimplementation.Onepotentialdrawbackofouranalysisistheassumptionthatsketchesareindependentlybuiltforeachtensorproduct(contraction)evaluation.Thisisanar-tifactofouranalysisandweconjecturethatitcanberemovedbyincorporatingrecentdevelopmentofdifferentiallyprivateadaptivequeryframework[9].5ExperimentsWedemonstratetheeffectivenessandef\ufb01ciencyofourproposedsketchbasedtensorpowermethodonbothsynthetictensorsandreal-worldtopicmodelingproblems.ExperimentalresultsinvolvingthefastALSmethodarepresentedinAppendixC.3.AllmethodsareimplementedinC++andtestedonasinglemachinewith8IntelX5550@2.67GhzCPUsand32GBmemory.Forsynthetictensordecompositionweuseonlyasinglethread;forfastspectralLDA8to16threadsareused.5.1SynthetictensorsInTable5wecompareourproposedalgorithmswithexactdecompositionmethodsonsynthetictensors.Letn=1000bethedimensionoftheinputtensor.We\ufb01rstgeneratearandomorthonormalbasis{vi}ni=1andthensettheinputtensorTasT=normalize(Pni=1\u03bbiv\u22973i)+E,wheretheeigenvalues\u03bbisatisfy\u03bbi=1/i.ThenormalizationstepmakeskTk2F=1beforeimposingnoise.TheGaussiannoisematrixEissymmetricwithEijk\u223cN(0,\u03c3/n1.5)fori\u2264j\u2264kandnoise-to-signallevel\u03c3.Duetotimeconstraints,weonlycomparetherecoveryerrorandrunningtimeonthetop10recoveredeigenvectorsofthefull-rankinputtensorT.BothLandTaresetto30.Table3showsthatourproposedalgorithmsachievereasonableapproximationerrorwithinafewminutes,whichismuchfasterthenexactmethods.Acompleteversion(Table5)isdeferredtoAppendixA.5.2TopicmodelingWeimplementafastspectralinferencealgorithmforLatentDirichletAllocation(LDA[3])bycom-biningtensorsketchingwithexistingwhiteningtechniquefordimensionalityreduction.Implemen-7\f9101112131415167.888.28.4Log hash lengthNegative Log\u2212likelihood k=50k=100k=200Exact, k=50Exact, k=100Exact, k=200Gibbs sampling, 100 iterations, 145 minsFigure1:Left:negativelog-likelihoodforfastandexacttensorpowermethodonWikipediadataset.Right:negativelog-likelihoodforcollapsedGibbssampling,fastLDAandGibbssamplingusingFastLDAasinitial-ization.tationdetailsareprovidedinAppendixD.WecompareourproposedfastspectralLDAalgorithmwithbaselinespectralmethodsandcollapsedGibbssampling(usingGibbsLDA++[25]implemen-tation)ontworeal-worlddatasets:WikipediaandEnron.DatasetdetailsarepresentedinAOnlythemostfrequentVwordsarekeptandthevocabularysizeVissetto10000.FortherobusttensorpowermethodtheparametersaresettoL=50andT=30.ForALSweiterateuntilconvergence,oramaximumnumberof1000iterationsisreached.\u03b10issetto1.0andBissetto30.Obtainedtopicmodels\u03a6\u2208RV\u00d7Kareevaluatedonaheld-outdatasetconsistingof1000documentsrandomlypickedoutfromtrainingdatasets.Foreachtestingdocumentd,we\ufb01tatopicmixingvector\u02c6\u03c0d\u2208RKbysolvingthefollowingoptimizationproblem:\u02c6\u03c0d=argmink\u03c0k1=1,\u03c0\u22650kwd\u2212\u03a6\u03c0k2,wherewdistheempiricalworddistributionofdocumentd.Theper-documentlog-likelihoodisthende\ufb01nedasLd=1ndPndi=1lnp(wdi),wherep(wdi)=PKk=1\u02c6\u03c0k\u03a6wdi,k.Finally,theaverageLdoveralltestingdocumentsisreported.Figure1leftshowstheheld-outnegativelog-likelihoodforfastspectralLDAunderdifferenthashlengthsb.Wecanseethatasbincreases,theperformanceapproachestheexacttensorpowermethodbecausesketchingapproximationbecomesmoreaccurate.Ontheotherhand,Table6showsthatfastspectralLDArunsmuchfasterthanexacttensordecompositionmethodswhileachievingcom-parableperformanceonbothdatasets.Figure1rightcomparestheconvergenceofcollapsedGibbssamplingwithdifferentnumberofiterationsandfastspectralLDAwithdifferenthashlengthsonWikipediadataset.ForcollapsedGibbssampling,weset\u03b1=50/Kand\u03b2=0.1following[11].Asshowninthe\ufb01gure,fastspectralLDAachievescomparableheld-outlikelihoodwhilerunningfasterthancollapsedGibbssampling.Wefurthertakethedictionary\u03a6outputbyfastspectralLDAanduseitasinitializationsforcollapsedGibbssampling(thewordtopicassignmentszareobtainedby5-iterationGibbssampling,withthedictionary\u03a6\ufb01xed).TheresultingGibbssamplerconvergesmuchfaster:withonly3iterationsitalreadyperformsmuchbetterthanarandomlyinitializedGibbssamplerrunfor100iterations,whichtakes10xmorerunningtime.WealsoreportperformanceoffastspectralLDAandcollapsedGibbssamplingonalargerdatasetinTable4.Thedatasetwasbuiltbycrawling1,085,768randomWikipediapagesandaheld-outevaluationsetwasbuiltbyrandomlypickingout1000documentsfromthedataset.Numberoftopicskissetto200or300,andaftergettingtopicdictionary\u03a6fromfastspectralLDAweuse2-iterationGibbssamplingtoobtainwordtopicassignmentsz.Table4showsthatthehybridmethod(i.e.,collapsedGibbssamplinginitializedbyspectralLDA)achievesthebestlikelihoodperformanceinamuchshortertime,comparedtoarandomlyinitializedGibbssampler.6ConclusionInthisworkweproposedasketchingbasedapproachtoef\ufb01cientlycomputetensorCPdecom-positionwithprovableguarantees.Weapplyourproposedalgorithmonlearninglatenttopicsofunlabeleddocumentcollectionsandachievesigni\ufb01cantspeed-upcomparedtovanillaspectralandcollapsedGibbssamplingmethods.Someinterestingfuturedirectionsincludefurtherimprovingthesamplecomplexityanalysisandapplyingtheframeworktoabroaderclassofgraphicalmodels.Acknowledgement:AnimaAnandkumarissupportedinpartbytheMicrosoftFacultyFellowshipandtheSloanFoundation.AlexSmolaissupportedinpartbyaGoogleFacultyResearchGrant.8\fReferences[1]A.Anandkumar,R.Ge,D.Hsu,S.Kakade,andM.Telgarsky.Tensordecompositionsforlearninglatentvariablemodels.JournalofMachineLearningResearch,15:2773\u20132832,2014.[2]S.BhojanapalliandS.Sanghavi.Anewsamplingtechniquefortensors.arXiv:1502.05023,2015.[3]D.M.Blei,A.Y.Ng,andM.I.Jordan.Latentdirichletallocation.JournalofmachineLearningresearch,3:993\u20131022,2003.[4]A.Carlson,J.Betteridge,B.Kisiel,B.Settles,E.R.HruschkaJr,andT.M.Mitchell.Towardanarchi-tecturefornever-endinglanguagelearning.InAAAI,2010.[5]J.D.CarrollandJ.-J.Chang.Analysisofindividualdifferencesinmultidimensionalscalingviaann-waygeneralizationof\u201ceckart-youngdecomposition.Psychometrika,35(3):283\u2013319,1970.[6]A.ChagantyandP.Liang.Estimatinglatent-variablegraphicalmodelsusingmomentsandlikelihoods.InICML,2014.[7]M.Charikar,K.Chen,andM.Farach-Colton.Findingfrequentitemsindatastreams.TheoreticalCom-puterScience,312(1):3\u201315,2004.[8]J.H.ChoiandS.Vishwanathan.DFacTo:Distributedfactorizationoftensors.InNIPS,2014.[9]C.Dwork,V.Feldman,M.Hardt,T.Pitassi,O.Reingold,andA.Roth.Preservingstatisticalvalidityinadaptivedataanalysis.InSTOC,2015.[10]A.S.FieldandD.Graupe.Topographiccomponent(parallelfactor)analysisofmultichannelevokedpotentials:practicalissuesintrilinearspatiotemporaldecomposition.BrainTopography,3(4):407\u2013423,1991.[11]T.L.Grif\ufb01thsandM.Steyvers.Findingscienti\ufb01ctopics.ProceedingsoftheNationalAcademyofSciences,101(suppl1):5228\u20135235,2004.[12]R.A.Harshman.FoundationsofthePARAFACprocedure:Modelsandconditionsforanexplanatorymulti-modalfactoranalysis.UCLAWorkingPapersinPhonetics,16:1\u201384,1970.[13]F.Huang,S.Matusevych,A.Anandkumar,N.Karampatziakis,andP.Mineiro.Distributedlatentdirichletallocationviatensorfactorization.InNIPSOptimizationWorkshop,2014.[14]F.Huang,U.N.Niranjan,M.U.Hakeem,andA.Anandkumar.Fastdetectionofoverlappingcommunitiesviaonlinetensormethods.arXiv:1309.0787,2013.[15]A.Jain.Fundamentalsofdigitalimageprocessing,1989.[16]U.Kang,E.Papalexakis,A.Harpale,andC.Faloutsos.Gigatensor:Scalingtensoranalysisupby100times-algorithmsanddiscoveries.InKDD,2012.[17]B.KlimtandY.Yang.Introducingtheenroncorpus.InCEAS,2004.[18]T.KoldaandB.Bader.Thetophitsmodelforhigher-orderweblinkanalysis.InWorkshoponlinkanalysis,counterterrorismandsecurity,2006.[19]T.KoldaandB.Bader.Tensordecompositionsandapplications.SIAMReview,51(3):455\u2013500,2009.[20]T.G.KoldaandJ.Sun.Scalabletensordecompositionsformulti-aspectdatamining.InICDM,2008.[21]M.M\u00f8rup,L.K.Hansen,C.S.Herrmann,J.Parnas,andS.M.Arnfred.Parallelfactoranalysisasanexploratorytoolforwavelettransformedevent-relatedeeg.NeuroImage,29(3):938\u2013947,2006.[22]R.Pagh.Compressedmatrixmultiplication.InITCS,2012.[23]N.PhamandR.Pagh.Fastandscalablepolynomialkernelsviaexplicitfeaturemaps.InKDD,2013.[24]A.-H.Phan,P.Tichavsky,andA.Cichocki.FastalternatingLSalgorithmsforhighorderCANDE-COMP/PARAFACtensorfactorizations.IEEETransactionsonSignalProcessing,61(19):4834\u20134846,2013.[25]X.-H.PhanandC.-T.Nguyen.GibbsLDA++:AC/C++implementationoflatentdirichletallocation(lda),2007.[26]M.Ptras\u00b8cuandM.Thorup.Thepowerofsimpletabulationhashing.JournaloftheACM,59(3):14,2012.[27]C.Tsourakakis.MACH:Fastrandomizedtensordecompositions.InSDM,2010.[28]H.-Y.TungandA.Smola.Spectralmethodsforindianbuffetprocessinference.InNIPS,2014.[29]C.Wang,X.Liu,Y.Song,andJ.Han.Scalablemoment-basedinferenceforlatentdirichletallocation.InECML/PKDD,2014.[30]Y.WangandJ.Zhu.Spectralmethodsforsupervisedtopicmodels.InNIPS,2014.9\f", "award": [], "sourceid": 626, "authors": [{"given_name": "Yining", "family_name": "Wang", "institution": "Carnegie Mellon University"}, {"given_name": "Hsiao-Yu", "family_name": "Tung", "institution": "Carnegie Mellon University"}, {"given_name": "Alexander", "family_name": "Smola", "institution": "Carnegie Mellon University"}, {"given_name": "Anima", "family_name": "Anandkumar", "institution": "UC Irvine"}]}