{"title": "Learning Mixed Multinomial Logit Model from Ordinal Data", "book": "Advances in Neural Information Processing Systems", "page_first": 595, "page_last": 603, "abstract": "Motivated by generating personalized recommendations using ordinal (or preference) data, we study the question of learning a mixture of MultiNomial Logit (MNL) model, a parameterized class of distributions over permutations, from partial ordinal or preference data (e.g. pair-wise comparisons). Despite its long standing importance across disciplines including social choice, operations research and revenue management, little is known about this question. In case of single MNL models (no mixture), computationally and statistically tractable learning from pair-wise comparisons is feasible. However, even learning mixture of two MNL model is infeasible in general. Given this state of affairs, we seek conditions under which it is feasible to learn the mixture model in both computationally and statistically efficient manner. To that end, we present a sufficient condition as well as an efficient algorithm for learning mixed MNL models from partial preferences/comparisons data. In particular, a mixture of $r$ MNL components over $n$ objects can be learnt using samples whose size scales polynomially in $n$ and $r$ (concretely, $n^3 r^{3.5} \\log^4 n$, with $r \\ll n^{2/7}$ when the model parameters are sufficiently {\\em incoherent}). The algorithm has two phases: first, learn the pair-wise marginals for each component using tensor decomposition; second, learn the model parameters for each component using RankCentrality introduced by Negahban et al. In the process of proving these results, we obtain a generalization of existing analysis for tensor decomposition to a more realistic regime where only partial information about each sample is available.", "full_text": "LearningMixedMultinomialLogitModelfromOrdinalDataSewoongOhDept.ofIndustrialandEnterpriseSystemsEngr.UniversityofIllinoisatUrbana-ChampaignUrbana,IL61801swoh@illinois.eduDevavratShahDepartmentofElectricalEngineeringMassachussettsInstituteofTechnologyCambridge,MA02139devavrat@mit.eduAbstractMotivatedbygeneratingpersonalizedrecommendationsusingordinal(orpref-erence)data,westudythequestionoflearningamixtureofMultiNomialLogit(MNL)model,aparameterizedclassofdistributionsoverpermutations,frompar-tialordinalorpreferencedata(e.g.pair-wisecomparisons).Despiteitslongstand-ingimportanceacrossdisciplinesincludingsocialchoice,operationsresearchandrevenuemanagement,littleisknownaboutthisquestion.IncaseofsingleMNLmodels(nomixture),computationallyandstatisticallytractablelearningfrompair-wisecomparisonsisfeasible.However,evenlearningmixturewithtwoMNLcomponentsisinfeasibleingeneral.Giventhisstateofaffairs,weseekconditionsunderwhichitisfeasibletolearnthemixturemodelinbothcomputationallyandstatisticallyef\ufb01cientmanner.Wepresentasuf\ufb01cientconditionaswellasanef\ufb01cientalgorithmforlearningmixedMNLmodelsfrompartialpreferences/comparisonsdata.Inparticular,amixtureofrMNLcomponentsovernobjectscanbelearntusingsampleswhosesizescalespolynomiallyinnandr(concretely,r3.5n3(logn)4,withr(cid:28)n2/7whenthemodelparametersaresuf\ufb01cientlyincoherent).Thealgorithmhastwophases:\ufb01rst,learnthepair-wisemarginalsforeachcomponentusingtensordecomposi-tion;second,learnthemodelparametersforeachcomponentusingRANKCEN-TRALITYintroducedbyNegahbanetal.Intheprocessofprovingtheseresults,weobtainageneralizationofexistinganalysisfortensordecompositiontoamorerealisticregimewhereonlypartialinformationabouteachsampleisavailable.1IntroductionBackground.Popularrecommendationsystemssuchascollaborative\ufb01lteringarebasedonapar-tiallyobservedratingsmatrix.Theunderlyinghypothesisisthatthetrue/latentscorematrixislow-rankandweobserveitspartial,noisyversion.Therefore,matrixcompletionalgorithmsareusedforlearning,cf.[8,14,15,20].Inreality,however,observedpreferencedataisnotjustscores.Forexample,clickingoneofthemanychoiceswhilebrowsingprovidespartialorderbetweenclickedchoiceversusotherchoices.Further,scoresdoconveyordinalinformationaswell,e.g.scoreof4forpaperAandscoreof7forpaperBbyareviewersuggestsorderingB>A.SimilarmotivationsledSamuelsontoproposetheAxiomofrevealedpreference[21]asthemodelforrationalbehavior.Inanutshell,itstatesthatconsumershavelatentorderofallobjects,andtherevealedpreferencesthroughactions/choicesareconsistentwiththisorder.Ifindeedallconsumershadidenticalorder-ing,thenlearningpreferencefrompartialpreferencesiseffectivelythequestionofsorting.Inpractice,individualshavedifferentorderingsofinterest,andfurther,eachindividualislikelytomakenoisychoices.Thisnaturallysuggeststhefollowingmodel\u2013eachindividualhasalatentdistributionoverorderingsofobjectsofinterest,andtherevealedpartialpreferencesareconsistent1\fwithit,i.e.samplesfromthedistribution.Subsequently,thepreferenceofthepopulationasawholecanbeassociatedwithadistributionoverpermutations.Recallthatthelow-rankstructureforscorematrices,asamodel,triestocapturethefactthatthereareonlyafewdifferenttypesofchoicepro\ufb01le.Inthecontextofmodelingconsumerchoicesasdistributionoverpermutation,MultiNomialLogit(MNL)modelwithasmallnumberofmixturecomponentsprovidessuchamodel.MixedMNL.Givennobjectsorchoicesofinterest,anMNLmodelisdescribedasaparametricdistributionoverpermutationsofnwithparametersw=[wi]\u2208Rn:eachobjecti,1\u2264i\u2264n,hasaparameterwi>0associatedwithit.Thenthepermutationsaregeneratedrandomlyasfollows:chooseoneofthenobjectstoberanked1atrandom,whereobjectiischosentoberanked1withprobabilitywi/(Pnj=1wj).Leti1beobjectchosenforthe\ufb01rstposition.Nowtoselectsecondrankedobject,choosefromremainingwithprobabilityproportionaltotheirweight.Werepeatuntilallobjectsforallrankedpositionsarechosen.Itcanbeeasilyseenthat,asperthismodel,anitemiisrankedhigherthanjwithprobabilitywi/(wi+wj).InthemixedMNLmodelwithr\u22652mixturecomponents,eachcomponentcorrespondstoadif-ferentMNLmodel:letw(1),...,w(r)bethecorrespondingparametersofthercomponents.Letq=[qa]\u2208[0,1]rdenotethemixturedistribution,i.e.Paqa=1.Togenerateapermutationatrandom,\ufb01rstchooseacomponenta\u2208{1,...,r}withprobabilityqa,andthendrawrandompermutationasperMNLwithparametersw(a).Briefhistory.TheMNLmodelisaninstanceofaclassofmodelsintroducedbyThurstone[23].ThedescriptionoftheMNLprovidedherewasformallyestablishedbyMcFadden[17].Thesamemodel(informofpair-wisemarginals)wasintroducedbyZermelo[25]aswellasBradleyandTerry[7]independently.In[16],LuceestablishedthatMNListheonlydistributionoverpermutationthatsatis\ufb01estheaxiomofIndependencefromIrrelevantAlternatives.Onlearningdistributionsoverpermutations,thequestionoflearningsingleMNLmodelandmoregenerallyinstancesofThurstone\u2019smodelhavebeenofinterestforquiteawhilenow.Themaximumlikelihoodestimator,whichislogisticregressionforMNL,hasbeenknowntobeconsistentinlargesamplelimit,cf.[13].Recently,RANKCENTRALITY[19]wasestablishedtobestatisticalef\ufb01cient.Forlearningsparsemixturemodel,i.e.distributionoverpermutationswitheachmixturebeingdeltadistribution,[11]providedsuf\ufb01cientconditionsunderwhichmixturescanbelearntexactlyusingpair-wisemarginals\u2013effectively,aslongasthenumberofcomponentsscaledaso(logn)wherecomponentssatis\ufb01edappropriateincoherencecondition,asimpleiterativealgorithmcouldrecoverthemixture.However,itisnotrobustwithrespecttonoiseindataor\ufb01nitesampleerrorinmarginalestimation.Otherapproacheshavebeenproposedtorecovermodelusingconvexoptimizationbasedtechniques,cf.[10,18].MNLmodelisaspecialcaseofalargerfamilyofdiscretechoicemodelsknownastheRandomUtilityModel(RUM),andanef\ufb01cientalgorithmtolearnRUMisintroducedin[22].Ef\ufb01cientalgorithmsforlearningRUMsfrompartialrankingshasbeenintroducedin[3,4].Wenotethattheabovelistofreferencesisverylimited,includingonlycloselyrelatedliterature.Giventhenatureofthetopic,therearealotofexcitinglinesofresearchdoneoverthepastcenturyandweshallnotbeabletoprovidecomprehensivecoverageduetoaspacelimitation.Problem.GivenobservationsfromthemixedMNL,wewishtolearnthemodelparameters,themixingdistributionq,andparametersofeachcomponentw(1),...,w(r).Theobservationsareinformofpair-wisecomparisons.Formally,togenerateanobservation,\ufb01rstoneofthermixturecomponentsischosen;andthenfor\u2018ofallpossible(cid:0)n2(cid:1)pairs,comparisonoutcomeisobservedasperthisMNLcomponent1.These\u2018pairsarechosen,uniformlyatrandom,fromapre-determinedN\u2264(cid:0)n2(cid:1)pairs:{(ik,jk),1\u2264k\u2264N}.WeshallassumethattheselectionofNissuchthattheundirectedgraphG=([n],E),whereE={(ik,jk):1\u2264k\u2264N},isconnected.Weaskfollowingquestionsofinterest:IsitalwaysfeasibletolearnmixedMNL?Ifnot,underwhatconditionsandhowmanysamplesareneeded?Howcomputationallyexpensivearethealgorithms?1Weshallassumethat,outcomesofthese\u2018pairsareindependentofeachother,butcomingfromthesameMNLmixturecomponent.Thisiseffectivelytrueeventheyweregeneratedby\ufb01rstsamplingapermutationfromthechosenMNLmixturecomponent,andthenobservingimplicationofthispermutationforthespeci\ufb01c\u2018pairs,aslongastheyaredistinctduetotheIrrelevanceofIndependentAlternativehypothesisofLucethatissatis\ufb01edbyMNL.2\fWebrie\ufb02yrecallarecentresult[1]thatsuggeststhatitisimpossibletolearnmixedMNLmodelsingeneral.OnesuchexampleisdescribedinFigure1.Itdepictsanexamplewithn=4andr=2andauniformmixturedistribution.Forthe\ufb01rstcase,inmixturecomponent1,withprobability1theorderingisa>b>c>d(wedenoten=4objectsbya,b,candd);andinmixturecomponent2,withprobability1theorderingisb>a>d>c.Similarlyforthesecondcase,thetwomixturesaremadeupofpermutationsb>a>c>danda>b>d>c.Itiseasytoseethedistributionoverany3-wisecomparisonsgeneratedfromthesetwomixturemodelsisidentical.Therefore,itisimpossibletodifferentiatethesetwousing3-wiseorpair-wisecomparisons.Ingeneral,[1]establishedthatthereexistmixturedistributionswithr\u2264n/2overnobjectsthatareimpossibletodistinguishusinglogn-wisecomparisondata.Thatis,learningmixedMNLisnotalwayspossible.Mixture Model 1ab>c>d>type 1ba>d>c>type 2ba>c>d>type 1ab>d>c>type 2ab>c>ba>c>ab>d>ba>d>ac>d>ad>c>bc>d>bd>c>Mixture Model 2LatentObservedP( ) = 0.5 P( ) = 0.5 P( ) = 0.5 P( ) = 0.5 P( ) = 0.5 P( ) = 0.5 P( ) = 0.5 P( ) = 0.5 Figure1:Twomixturemodelsthatcannotbedifferentiatedevenwith3-wisepreferencedata.Contributions.Themaincontributionofthisworkisidenti\ufb01cationofsuf\ufb01cientconditionsunderwhichmixedMNLmodelcanbelearntef\ufb01ciently,bothstatisticallyandcomputationally.Con-cretely,weproposeatwo-phaselearningalgorithm:inthe\ufb01rstphase,usingatensordecompositionmethodforlearningmixtureofdiscreteproductdistribution,weidentifypair-wisemarginalsasso-ciatedwitheachofthemixture;inthesecondphase,weusethesepair-wisemarginalsassociatedwitheachmixturetolearntheparametersassociatedwitheachoftheMNLmixturecomponent.Thealgorithminthe\ufb01rstphasebuildsupontherecentworkbyJainandOh[12].Inparticular,Theorem3generalizestheirworkforthesettingwhereforeachsample,wehavelimitedinforma-tion-asper[12],wewouldrequirethateachindividualgivestheentirepermutation;instead,wehaveextendedtheresulttobeabletocopewiththecurrentsettingwhenweonlyhaveinformationabout\u2018,potentially\ufb01nite,pair-wisecomparisons.ThealgorithminthesecondphaseutilizesRANK-CENTRALITY[19].ItsanalysisinTheorem4worksforsettingwhereobservationsarenolongerindependent,asrequiredin[19].We\ufb01ndthataslongascertainrankandincoherenceconditionsaresatis\ufb01edbytheparametersofeachofthemixture,theabovedescribedtwophasealgorithmisabletolearnmixturedistributionqandparametersassociatedwitheachmixture,w(1),...,w(r)faithfullyusingsamplesthatscalepolynomiallyinnandr\u2013concretely,thenumberofsamplesrequiredscaleasr3.5n3(logn)4withconstantsdependentontheincoherencebetweenmixturecomponents,andaslongasr(cid:28)n2/7aswellasG,thegraphofpotentialcomparisons,isaspectralexpanderwiththetotalnumberofedgesscalingasN=O(nlogn).Fortheprecisestatement,werefertoTheorem1.Thealgorithmsproposedareiterative,andprimarilybasedonspectralpropertiesofunderlyingtensors/matriceswithprovable,fastconvergenceguarantees.Thatis,algorithmsarenotonlypoly-nomialtime,theyarepracticalenoughtobescalableforhighdimensionaldatasets.Notations.Weuse[N]={1,...,N}forthe\ufb01rstNpositiveintegers.Weuse\u2297todenotetheouterproductsuchthat(x\u2297y\u2297z)ijk=xiyjzk.GivenathirdordertensorT\u2208Rn1\u00d7n2\u00d7n3andamatrixU\u2208Rn1\u00d7r1,V\u2208Rn2\u00d7r2,W\u2208Rn3\u00d7r3,wede\ufb01nealinearmappingT[U,V,W]\u2208Rr1\u00d7r2\u00d7r3asT[U,V,W]abc=Pi,j,kTijkUiaVjbWkc.Weletkxk=pPix2ibetheEuclideannormofavector,kMk2=maxkxk\u22641,kyk\u22641xTMybetheoperatornormofamatrix,andkMkF=qPi,jM2ijbetheFrobeniusnorm.Wesayaneventhappenswithhighprobability(w.h.p)iftheprobabilityislowerboundedby1\u2212f(n)suchthatf(n)=o(1)asnscalesto\u221e.2MainresultInthissection,wedescribethemainresult:suf\ufb01cientconditionsunderwhichmixedMNLmodelscanbelearntusingtractablealgorithms.Weprovideausefulillustrationoftheresultaswellasdiscussitsimplications.3\fDe\ufb01nitions.LetSdenotethecollectionofobservations,eachofwhichisdenotedasNdimensional,{\u22121,0,+1}valuedvector.Recallthateachobservationisobtainedby\ufb01rstselectingoneofthermixtureMNLcomponent,andthenviewingoutcomes,asperthechosenMNLmixturecomponent,of\u2018randomlychosenpair-wisecomparisonsfromtheNpre-determinedcomparisons{(ik,jk):1\u2264ik6=jk\u2264n,1\u2264k\u2264N}.Letxt\u2208{\u22121,0,+1}Ndenotethetthobservationwithxt,k=0ifthekthpair(ik,jk)isnotchosenamongstthe\u2018randomlychosenpairs,andxt,k=+1(respectively\u22121)ifik<jk(respectivelyik>jk)asperthechosenMNLmixturecomponent.Byde\ufb01nition,itiseasytoseethatforanyt\u2208Sand1\u2264k\u2264N,E[xt,k]=\u2018NhrXa=1qaPkai,wherePka=w(a)jk\u2212w(a)ikw(a)jk+w(a)ik.(1)WeshalldenotePa=[Pka]\u2208[\u22121,1]Nfor1\u2264a\u2264r.Therefore,inavectorformE[xt]=\u2018NPq,whereP=[P1...Pr]\u2208[\u22121,1]N\u00d7r.(2)Thatis,Pisamatrixwithrcolumns,eachrepresentingoneofthermixturecomponentsandqisthemixtureprobability.Byindependence,foranyt\u2208S,andanytwodifferentpairs1\u2264k6=m\u2264N,E[xt,kxt,m]=\u20182N2hrXa=1qaPkaPmai.(3)Therefore,theN\u00d7NmatrixE[xtxTt]orequivalentlytensorE[xt\u2297xt]isproportionaltoM2exceptindiagonalentries,whereM2=PQPT\u2261rXa=1qa(Pa\u2297Pa),(4)Q=diag(q)beingdiagonalmatrixwithitsentriesbeingmixtureprobabilities,q.Inasimilarmanner,thetensorE[xt\u2297xt\u2297xt]isproportionaltoM3(exceptinO(N2)entries),whereM3=rXa=1qa(Pa\u2297Pa\u2297Pa).(5)Indeed,empiricalestimates\u02c6M2and\u02c6M3,de\ufb01nedas\u02c6M2=1|S|hXt\u2208Sxt\u2297xti,and\u02c6M3=1|S|hXt\u2208Sxt\u2297xt\u2297xti,(6)providegoodproxyforM2andM3forlargeenoughnumberofsamples;andshallbeutilizedcruciallyforlearningmodelparametersfromobservations.Suf\ufb01cientconditionsforlearning.Withtheabovediscussion,westatesuf\ufb01cientconditionsforlearningthemixedMNLintermsofpropertiesofM2:C1.M2hasrankr;let\u03c31(M2),\u03c3r(M2)>0bethelargestandsmallestsingularvaluesofM2.C2.ForalargeenoughuniversalconstantC0>0,N\u2265C0r3.5\u00b56(M2)(cid:16)\u03c31(M2)\u03c3r(M2)(cid:17)4.5.(7)Intheabove,\u00b5(M2)representsincoherenceofasymmetricmatrixM2.WerecallthatforasymmetricmatrixM\u2208RN\u00d7NofrankrwithsingularvaluedecompositionM=USUT,theincoherenceisde\ufb01nedas\u00b5(M)=rNr(cid:16)maxi\u2208[N]kUik(cid:17).(8)C3.TheundirectedgraphG=([n],E)withE={(ik,jk):1\u2264k\u2264N}isconnected.LetA\u2208{0,1}n\u00d7nbeadjacencymatrixwithAij=1if(i,j)\u2208Eand0otherwise;letD=diag(di)withdibeingdegreeofvertexi\u2208[n]andletLG=D\u22121AbenormalizedLaplacianofG.Letdmax=maxidianddmin=minidi.LettheneigenvaluesofstochasticmatrixLGbe1=\u03bb1(LG)\u2265...\u03bbn(LG)\u2265\u22121.De\ufb01nespectralgapofG:\u03be(G)=1\u2212max{\u03bb2(L),\u2212\u03bbn(L)}.(9)4\fNotethatwechooseagraphG=([n],E)tocollectpairwisedataon,andwewanttouseagraphthatisconnected,hasalargespectralgap,andhasasmallnumberofedges.Incondition(C3),weneedconnectivitysincewecannotestimatetherelativestrengthbetweendisconnectedcomponents(e.g.see[13]).Further,itiseasytogenerateagraphwithspectralgap\u03be(G)boundedbelowbyauniversalconstant(e.g.1/100)andthenumberofedgesN=O(nlogn),forexampleusingthecon\ufb01gurationmodelforErd\u00a8os-Renyigraphs.Incondition(C2),werequirethematrixM2tobesuf\ufb01cientlyinco-herentwithbounded\u03c31(M2)/\u03c3r(M2).Forexample,ifqmax/qmin=O(1)andthepro\ufb01leofeachtypeinthemixturedistributionissuf\ufb01cientlydifferent,i.e.hPa,Pbi/(kPakkPbk)<1/(2r),thenwehave\u00b5(M2)=O(1)and\u03c31(M2)/\u03c3r(M2)=O(1).Wede\ufb01neb=maxra=1maxi,j\u2208[n]w(a)i/w(a)j,qmax=maxaqa,andqmin=minaqa.Thefollowingtheoremprovidesaboundontheerrorandwerefertotheappendixforaproof.Theorem1.ConsideramixedMNLmodelsatisfyingconditions(C1)-(C3).Thenforany\u03b4\u2208(0,1),thereexistspositivenumericalconstantsC,C0suchthatforanypositive\u03b5satisfying0<\u03b5<(cid:16)qmin\u03be2(G)d2min16qmaxr\u03c31(M2)b5d2max(cid:17)0.5,(10)Algorithm1producesestimates\u02c6q=[\u02c6qa]and\u02c6w=[\u02c6w(a)]sothatwithprobabilityatleast1\u2212\u03b4,(cid:12)(cid:12)\u02c6qa\u2212qa(cid:12)(cid:12)\u2264\u03b5,andk\u02c6w(a)\u2212w(a)kkw(a)k\u2264C(cid:16)rqmax\u03c31(M2)b5d2maxqmin\u03be2(G)d2min(cid:17)0.5\u03b5,(11)foralla\u2208[r],aslongas|S|\u2265C0rN4log(N/\u03b4)qmin\u03c31(M2)2\u03b52(cid:16)1\u20182+\u03c31(M2)\u2018N+r4\u03c31(M2)4\u03c3r(M2)5(cid:17).(12)AnillustrationofTheorem1.TounderstandtheapplicabilityofTheorem1,consideraconcreteexamplewithr=2;letthecorrespondingweightsw(1)andw(2)begeneratedbychoosingeachweightuniformlyfrom[1,2].Inparticular,therankorderforeachcomponentisauniformlyrandompermutation.Letthemixturedistributionbeuniformaswell,i.e.q=[0.50.5].Finally,letthegraphG=([n],E)bechosenaspertheErd\u00a8os-R\u00b4enyimodelwitheachedgechosentobepartofthegraphwithprobability\u00afd/n,where\u00afd>logn.Forthisexample,itcanbecheckedthatTheorem1guaranteesthatfor\u03b5\u2264C/\u221an\u00afd,|S|\u2265C0n2\u00afd2log(n\u00afd/\u03b4)/(\u2018\u03b52),andn\u00afd\u2265C0,wehaveforalla\u2208{1,2},|\u02c6qa\u2212qa|\u2264\u03b5andk\u02c6w(a)\u2212w(a)k/kw(a)k\u2264C00\u221an\u00afd\u03b5.Thatis,for\u2018=\u0398(1)andchoosing\u03b5=\u03b50/(\u221an\u00afd),weneedsamplesizeof|S|=O(n3\u00afd3logn)toguaranteeerrorinboth\u02c6qand\u02c6wsmallerthan\u03b50.Instead,ifwechoose\u2018=\u0398(n\u00afd),weonlyneed|S|=O((n\u00afd)2logn).Limitedsamplesperobservationleadstopenaltyoffactorof(n\u00afd/\u2018)insamplecomplexity.Toprovideboundsontheproblemparametersforthisexample,weusestandardconcentrationarguments.ItiswellknownforErd\u00a8os-R\u00b4enyirandomgraphs(see[6])that,withhighprobability,thenumberofedgesconcentratesin[(1/2)\u00afdn,(3/2)\u00afdn]implyingN=\u0398(\u00afdn),andthedegreesalsoconcentratein[(1/2)\u00afd,(3/2)\u00afd],implyingdmax=dmin=\u0398(\u00afd).Alsousingstandardconcentrationargumentsforspectrumofrandommatrices,itfollowsthatthespectralgapofGisboundedby\u03be\u22651\u2212(C/\u221a\u00afd)=\u0398(1)w.h.p.Sinceweassumetheweightstobein[1,2],thedynamicrangeisboundedbyb\u22642.ThefollowingPropositionshowsthat\u03c31(M2)=\u0398(N)=\u0398(\u00afdn),\u03c32(M2)=\u0398(\u00afdn),and\u00b5(M2)=\u0398(1).Proposition2.1.Fortheaboveexample,when\u00afd\u2265logn,\u03c31(M2)\u22640.02N,\u03c32(M2)\u22650.017N,and\u00b5(M2)\u226415withhighprobability.Supposennowforgeneralr,weareinterestedinwell-behavedscenariowhereqmax=\u0398(1/r)andqmin=\u0398(1/r).Toachievearbitrarysmallerrorratefork\u02c6w(a)\u2212w(a)k/kw(a)k,weneed\u0001=O(1/\u221arN),whichisachievedbysamplesize|S|=O(r3.5n3(logn)4)with\u00afd=logn.3AlgorithmWedescribethealgorithmachievingtheboundinTheorem1.Ourapproachistwo-phased.First,learnthemomentsformixturesusingatensordecomposition,cf.Algorithm2:foreachtypea\u2208[r],5\fproduceestimate\u02c6qa\u2208Rofthemixtureweightqaandestimate\u02c6Pa=[\u02c6P1a...\u02c6PNa]T\u2208RNoftheexpectedoutcomePa=[P1a...PNa]Tde\ufb01nedasin(1).Secondly,foreacha,usingtheestimate\u02c6Pa,applyRANKCENTRALITY,cf.Section3.2,toestimate\u02c6w(a)fortheMNLweightsw(a).Algorithm11:Input:Samples{xt}t\u2208S,numberoftypesr,numberofiterationsT1,T2,graphG([n],E)2:{(\u02c6qa,\u02c6Pa)}a\u2208[r]\u2190SPECTRALDIST({xt}t\u2208S,r,T1)(seeAlgorithm2)3:fora=1,...,rdo4:set\u02dcPa\u2190P[\u22121,1](\u02c6Pa)whereP[\u22121,1](\u00b7)istheprojectiononto[\u22121,1]N5:\u02c6w(a)\u2190RANKCENTRALITY(cid:16)G,\u02dcPa,T2(cid:17)(seeSection3.2)6:endfor7:Output:{(\u02c6q(a),\u02c6w(a))}a\u2208[r]ToachieveTheorem1,T1=\u0398(cid:0)log(N|S|)(cid:1)andT2=\u0398(cid:0)b2dmax(logn+log(1/\u03b5))/(\u03bedmin)(cid:1)issuf\ufb01cient.Next,wedescribethetwophasesofalgorithmsandassociatedtechnicalresults.3.1Phase1:Spectraldecomposition.ToestimatePandqfromthesamples,weshallusetensordecompositionof\u02c6M2and\u02c6M3,theempiricalestimationofM2andM3respectively,recall(4)-(6).LetM2=UM2\u03a3M2UTM2betheeigenvaluedecompositionandletH=M3[UM2\u03a3\u22121/2M2,UM2\u03a3\u22121/2M2,UM2\u03a3\u22121/2M2].ThenexttheoremshowsthatM2andM3aresuf\ufb01cienttolearnPandqexactly,whenM2hasrankr(throughout,weassumethatr(cid:28)n\u2264N).Theorem2(Theorem3.1[12]).LetM2\u2208RN\u00d7Nhaverankr.ThenthereexistsanorthogonalmatrixVH=[vH1vH2...vHr]\u2208Rr\u00d7randeigenvalues\u03bbHa,1\u2264a\u2264r,suchthattheorthogonaltensordecompositionofHisH=rXa=1\u03bbHa(vHa\u2297vHa\u2297vHa).Let\u039bH=diag(\u03bbH1,...,\u03bbHr).ThentheparametersofthemixturedistributionareP=UM2\u03a31/2M2VH\u039bHandQ=(\u039bH)\u22122.ThemainchallengeinestimatingM2(resp.M3)fromempiricaldataarethediagonalentires.In[12],alternatingminimizationapproachisusedformatrixcompletionto\ufb01ndthemissingdiagonalentriesofM2,andusedaleastsquaresmethodforestimatingthetensorHdirectlyfromthesamples.Let\u21262denotethesetofoff-diagonalindicesforanN\u00d7Nmatrixand\u21263denotetheoff-diagonalentriesofanN\u00d7N\u00d7Ntensorsuchthatthecorrespondingprojectionsarede\ufb01nedasP\u21262(M)ij\u2261(cid:26)Mijifi6=j,0otherwise.andP\u21263(T)ijk\u2261(cid:26)Tijkifi6=j,j6=k,k6=i,0otherwise.forM\u2208RN\u00d7NandT\u2208RN\u00d7N\u00d7N.Inlieuofabovediscussion,weshalluseP\u21262(cid:0)\u02c6M2(cid:1)andP\u21263(cid:0)\u02c6M3(cid:1)toobtainestimationofdiag-onalentriesofM2andM3respectively.Tokeeptechnicalargumentssimple,weshalluse\ufb01rst|S|/2samplesbased\u02c6M2,denotedas\u02c6M2(cid:0)1,|S|2(cid:1)andsecond|S|/2samplesbased\u02c6M3,denotedby\u02c6M3(cid:0)|S|2+1,|S|(cid:1)inAlgorithm2.Next,westatecorrectnessofAlgorithm2when\u00b5(M2)issmall;proofisinAppendix.Theorem3.Thereexistsuniversal,strictlypositiveconstantsC,C0>0suchthatforall\u03b5\u2208(0,C)and\u03b4\u2208(0,1),if|S|\u2265C0rN4log(N/\u03b4)qmin\u03c31(M2)2\u03b52(cid:16)1\u20182+\u03c31(M2)\u2018N+r4\u03c31(M2)4\u03c3r(M2)5(cid:17),andN\u2265C0r3.5\u00b56(cid:16)\u03c31(M2)\u03c3r(M2)(cid:17)4.5,6\fAlgorithm2SPECTRALDIST:MomentmethodforMixtureofDiscreteDistribution[12]1:Input:Samples{xt}t\u2208S,numberoftypesr,numberofiterationsT2:\u02dcM2\u2190MATRIXALTMIN(cid:16)\u02c6M2(cid:0)1,|S|2(cid:1),r,T(cid:17)(seeAlgorithm3)3:Computeeigenvaluedecompositionof\u02dcM2=\u02dcUM2\u02dc\u03a3M2\u02dcUTM24:\u02dcH\u2190TENSORLS(cid:16)\u02c6M3(cid:0)|S|2+1,|S|(cid:1),\u02dcUM2,\u02dc\u03a3M2(cid:17)(seeAlgorithm4)5:Computerank-rdecompositionPa\u2208[r]\u02c6\u03bb\u02dcHa(\u02c6v\u02dcHa\u2297\u02c6v\u02dcHa\u2297\u02c6v\u02dcHa)of\u02dcH,usingRTPMof[2]6:Output:\u02c6P=\u02dcUM2\u02dc\u03a31/2M2\u02c6V\u02dcH\u02c6\u039b\u02dcH,\u02c6Q=(\u02c6\u039b\u02dcH)\u22122,where\u02c6V\u02dcH=[\u02c6v\u02dcH1...\u02c6v\u02dcHr]and\u02c6\u039b\u02dcH=diag(\u03bb\u02dcH1,...,\u03bb\u02dcHr)thenthereexistsapermutation\u03c0over[r]suchthatAlgorithm2achievesthefollowingboundswithachoiceofT=C0log(N|S|)foralli\u2208[r],withprobabilityatleast1\u2212\u03b4:|\u02c6q\u03c0i\u2212qi|\u2264\u03b5,andk\u02c6P\u03c0i\u2212Pik\u2264\u03b5srqmax\u03c31(M2)qmin,where\u00b5=\u00b5(M2)de\ufb01nedin(8)withrun-timepoly(N,r,1/qmin,1/\u03b5,log(1/\u03b4),\u03c31(M2)/\u03c3r(M2)).Algorithm3MATRIXALTMIN:AlternatingMinimizationforMatrixCompletion[12]1:Input:\u02c6M2(cid:0)1,|S|2(cid:1),r,T2:InitializeN\u00d7rdimensionalmatrixU0\u2190top-reigenvectorsofP\u21262(\u02c6M2(cid:0)1,|S|2(cid:1))3:forall\u03c4=1toT\u22121do4:\u02c6U\u03c4+1=argminUkP\u21262(\u02c6M2(cid:0)1,|S|2(cid:1))\u2212P\u21262(UUT\u03c4)k2F5:[U\u03c4+1R\u03c4+1]=QR(\u02c6U\u03c4+1)(standardQRdecomposition)6:endfor7:Output:\u02dcM2=(\u02c6UT)(UT\u22121)TAlgorithm4TENSORLS:LeastSquaresmethodforTensorEstimation[12]1:Input:\u02c6M3(cid:0)|S|2+1,|S|(cid:1),\u02c6UM2,\u02c6\u03a3M22:De\ufb01neoperator\u02c6\u03bd:Rr\u00d7r\u00d7r\u2192RN\u00d7N\u00d7Nasfollows\u02c6\u03bdijk(Z)=(PabcZabc(\u02c6UM2\u02c6\u03a31/2M2)ia(\u02c6UM2\u02c6\u03a31/2M2)jb(\u02c6UM2\u02c6\u03a31/2M2)kc,ifi6=j6=k6=i,0,otherwise.(13)3:De\ufb01ne\u02c6A:Rr\u00d7r\u00d7r\u2192Rr\u00d7r\u00d7rs.t.\u02c6A(Z)=\u02c6\u03bd(Z)[\u02c6UM2\u02c6\u03a3\u22121/2M2,\u02c6UM2\u02c6\u03a3\u22121/2M2,\u02c6UM2\u02c6\u03a3\u22121/2M2]4:Output:argminZk\u02c6A(Z)\u2212P\u21263(cid:0)\u02c6M3(cid:0)|S|2+1,|S|(cid:1)(cid:1)[\u02c6UM2\u02c6\u03a3\u22121/2M2,\u02c6UM2\u02c6\u03a3\u22121/2M2,\u02c6UM2\u02c6\u03a3\u22121/2M2]k2F3.2Phase2:RANKCENTRALITY.RecallthatE={(ik,jk):ik6=jk\u2208[n],1\u2264k\u2264N}representscollectionofN=|E|pairsandG=([n],E)isthecorrespondinggraph.Let\u02dcPadenotetheestimationofPa=[Pka]\u2208[\u22121,1]Nforthemixturecomponenta,1\u2264a\u2264r;wherePkaisde\ufb01nedasper(1).Foreacha,usingGand\u02dcPa,weshallusetheRANKCENTRALITY[19]toobtainestimationofw(a).Nextwedescribethealgorithmandguaranteesassociatedwithit.Withoutlossofgenerality,wecanassumethatw(a)issuchthatPiw(a)i=1foralla,1\u2264a\u2264r.Giventhisnormalization,RANKCENTRALITYestimatesw(a)asstationarydistributionofanappropriateMarkovchainonG.Thetransitionprobabilitiesare0forall(i,j)/\u2208E.For(i,j)\u2208E,theyarefunctionof\u02dcPa.Speci\ufb01cally,transitionmatrix\u02dcp(a)=[\u02dcp(a)i,j]\u2208[0,1]n\u00d7nwith\u02dcp(a)i,j=0if7\f(i,j)/\u2208E,andfor(ik,jk)\u2208Efor1\u2264k\u2264N,\u02dcp(a)ik,jk=1dmax(1+\u02dcPka)2and\u02dcp(a)jk,ik=1dmax(1\u2212\u02dcPka)2,(14)Finally,\u02dcp(a)i,i=1\u2212Pj6=i\u02dcp(a)i,jforalli\u2208[n].Let\u02dc\u03c0(a)=[\u02dc\u03c0(a)i]beastationarydistributionoftheMarkovchainde\ufb01nedby\u02dcp(a).Thatis,\u02dc\u03c0(a)i=Xj\u02dcp(a)ji\u02dc\u03c0(a)jforalli\u2208[n].(15)Computationally,wesuggestobtainingestimationof\u02dc\u03c0byusingpower-iterationforTiterations.Asarguedbefore,cf.[19],T=\u0398(cid:0)b2dmax(logn+log(1/\u03b5))/(\u03bedmin)(cid:1),issuf\ufb01cienttoobtainreasonablygoodestimationof\u02dc\u03c0.Theunderlyingassumptionhereisthatthereisauniquestationarydistribution,whichisestablishedbyourresultundertheconditionsofTheorem1.Now\u02dcpisanapproximationoftheidealtransitionprobabilities,wherep(a)=[p(a)i,j]wherep(a)i,j=0if(i,j)/\u2208Eandp(a)i,j\u221dw(a)j/(w(a)i+w(a)j)forall(i,j)\u2208E.SuchanidealMarkovchainisreversibleandaslongasGisconnected(whichis,inourcase,bychoice),thestationarydistributionofthisidealchainis\u03c0(a)=w(a)(recall,wehaveassumedw(a)tobenormalizedsothatallitscomponentsupto1).Now\u02dcp(a)isanapproximationofsuchanidealtransitionmatrixp(a).Inwhatfollows,westateresultabouthowthisapproximationerrortranslatesintotheerrorbetween\u02dc\u03c0(a)andw(a).Recallthatb\u2261maxi,j\u2208[n]wi/wj,dmaxanddminaremaximumandminimumvertexdegreesofGand\u03beasde\ufb01nedin(9).Theorem4.LetG=([n],E)benon-bipartiteandconnected.Letk\u02dcp(a)\u2212p(a)k2\u2264\u03b5forsomepositive\u03b5\u2264(1/4)\u03beb\u22125/2(dmin/dmax).Then,forsomepositiveuniversalconstantC,k\u02dc\u03c0(a)\u2212w(a)kkw(a)k\u2264Cb5/2\u03bedmaxdmin\u03b5.(16)And,startingfromanyinitialcondition,thepoweriterationmanagestoproduceanestimateof\u02dc\u03c0(a)withintwicetheabovestatederrorboundinT=\u0398(cid:0)b2dmax(logn+log(1/\u03b5))/(\u03bedmin)(cid:1)iterations.ProofoftheaboveresultcanbefoundinAppendix.Forspectralexpander(e.g.connectedErdos-Renyigraphwithhighprobability),\u03be=\u0398(1)andthereforetheboundiseffectivelyO(\u03b5)forboundeddynamicrange,i.e.b=O(1).4DiscussionLearningdistributionoverpermutationsofnobjectsfrompartialobservationisfundamentaltomanydomains.Inthiswork,wehaveadvancedunderstandingofthisquestionbycharacterizingsuf\ufb01cientconditionsandassociatedalgorithmunderwhichitisfeasibletolearnmixedMNLmodelincomputationallyandstatisticallyef\ufb01cient(polynomialinproblemsize)mannerfrompartial/pair-wisecomparisons.Theconditionsarenatural\u2013themixturecomponentsshouldbe\u201cidenti\ufb01able\u201dgivenpartialpreference/comparisondata\u2013statedintermsoffullrankandincoherenceconditionsofthesecondmomentmatrix.Thealgorithmallowslearningofmixturecomponentsaslongasnumberofmixturecomponentsscaleo(n2/7)fordistributionoverpermutationsofnobjects.Tothebestofourknowledge,thisworkprovides\ufb01rstsuchsuf\ufb01cientconditionforlearningmixedMNLmodel\u2013aproblemthathasremainedopenineconometricsandstatisticsforawhile,andmorerecentlyMachinelearning.Ourworknicelycomplementstheimpossibilityresultsof[1].Analytically,ourworkadvancestherecentlypopularizedspectral/tensorapproachforlearningmix-turemodelfromlowerordermoments.Concretely,weprovidemeanstolearnthecomponentevenwhenonlypartialinformationaboutthesampleisavailableunlikethepriorworks.Tolearnthemodelparameters,onceweidentifythemomentsassociatedwitheachmixture,weadvancethere-sultof[19]initsapplicability.Spectralmethodshavealsobeenappliedtorankinginthecontextofassortmentoptimizationin[5].8\fReferences[1]A.Ammar,S.Oh,D.Shah,andL.Voloch.What\u2019syourchoice?learningthemixedmulti-nomiallogitmodel.InProceedingsoftheACMSIGMETRICS/internationalconferenceonMeasurementandmodelingofcomputersystems,2014.[2]A.Anandkumar,R.Ge,D.Hsu,S.M.Kakade,andM.Telgarsky.Tensordecompositionsforlearninglatentvariablemodels.CoRR,abs/1210.7559,2012.[3]H.AzariSou\ufb01ani,W.Chen,D.CParkes,andL.Xia.Generalizedmethod-of-momentsforrankaggrega-tion.InAdvancesinNeuralInformationProcessingSystems26,pages2706\u20132714.2013.[4]H.AzariSou\ufb01ani,D.Parkes,andL.Xia.Computingparametricrankingmodelsviarank-breaking.InProceedingsofThe31stInternationalConferenceonMachineLearning,pages360\u2013368,2014.[5]J.Blanchet,G.Gallego,andV.Goyal.Amarkovchainapproximationtochoicemodeling.InEC,pages103\u2013104,2013.[6]B.Bollob\u00b4as.RandomGraphs.CambridgeUniversityPress,January2001.[7]R.A.BradleyandM.E.Terry.Rankanalysisofincompleteblockdesigns:I.themethodofpairedcomparisons.Biometrika,39(3/4):324\u2013345,1955.[8]E.J.Cand`esandB.Recht.Exactmatrixcompletionviaconvexoptimization.FoundationsofComputa-tionalMathematics,9(6):717\u2013772,2009.[9]C.DavisandW.M.Kahan.Therotationofeigenvectorsbyaperturbation.iii.SIAMJournalonNumericalAnalysis,7(1):1\u201346,1970.[10]J.C.Duchi,L.Mackey,andM.I.Jordan.Ontheconsistencyofrankingalgorithms.InProceedingsoftheICMLConference,Haifa,Israel,June2010.[11]V.F.Farias,S.Jagabathula,andD.Shah.Adata-drivenapproachtomodelingchoice.InNIPS,pages504\u2013512,2009.[12]P.JainandS.Oh.Learningmixturesofdiscreteproductdistributionsusingspectraldecompositions.arXivpreprintarXiv:1311.2972,2014.[13]L.R.FordJr.Solutionofarankingproblemfrombinarycomparisons.TheAmericanMathematicalMonthly,64(8):28\u201333,1957.[14]R.H.Keshavan,A.Montanari,andS.Oh.Matrixcompletionfromafewentries.InformationTheory,IEEETransactionson,56(6):2980\u20132998,2010.[15]R.H.Keshavan,A.Montanari,andS.Oh.Matrixcompletionfromnoisyentries.TheJournalofMachineLearningResearch,99:2057\u20132078,2010.[16]D.R.Luce.IndividualChoiceBehavior.Wiley,NewYork,1959.[17]D.McFadden.Conditionallogitanalysisofqualitativechoicebehavior.FrontiersinEconometrics,pages105\u2013142,1973.[18]I.Mitliagkas,A.Gopalan,C.Caramanis,andS.Vishwanath.Userrankingsfromcomparisons:Learningpermutationsinhighdimensions.InCommunication,Control,andComputing(Allerton),201149thAnnualAllertonConferenceon,pages1143\u20131150.IEEE,2011.[19]S.Negahban,S.Oh,andD.Shah.Iterativerankingfrompair-wisecomparisons.InNIPS,pages2483\u20132491,2012.[20]S.NegahbanandM.J.Wainwright.Restrictedstrongconvexityand(weighted)matrixcompletion:Op-timalboundswithnoise.JournalofMachineLearningResearch,2012.[21]P.Samuelson.Anoteonthepuretheoryofconsumers\u2019behaviour.Economica,5(17):61\u201371,1938.[22]H.A.Sou\ufb01ani,D.C.Parkes,andL.Xia.Randomutilitytheoryforsocialchoice.InNIPS,pages126\u2013134,2012.[23]LouisLThurstone.Alawofcomparativejudgment.Psychologicalreview,34(4):273,1927.[24]J.Tropp.User-friendlytailboundsforsumsofrandommatrices.FoundationsofComputationalMathe-matics,2011.[25]E.Zermelo.Dieberechnungderturnier-ergebnissealseinmaximumproblemderwahrscheinlichkeit-srechnung.MathematischeZeitschrift,29(1):436\u2013460,1929.9\f", "award": [], "sourceid": 406, "authors": [{"given_name": "Sewoong", "family_name": "Oh", "institution": "UIUC"}, {"given_name": "Devavrat", "family_name": "Shah", "institution": "Massachusetts Institute of Technology"}]}