{"title": "iSplit LBI: Individualized Partial Ranking with Ties via Split LBI", "book": "Advances in Neural Information Processing Systems", "page_first": 3901, "page_last": 3911, "abstract": "Due to the inherent uncertainty of data, the problem of predicting partial ranking from pairwise comparison data with ties has attracted increasing interest in recent years. However, in real-world scenarios, different individuals often hold distinct preferences, thus might be misleading to merely look at a global partial ranking while ignoring personal diversity. In this paper, instead of learning a global ranking which is agreed with the consensus, we pursue the tie-aware partial ranking from an individualized perspective. Particularly, we formulate a unified framework which not only can be used for individualized partial ranking prediction, but can also be helpful for abnormal users selection. This is realized by a variable splitting-based algorithm called iSplit LBI. Specifically, our algorithm generates a sequence of estimations with a regularization path, where both the hyperparameters and model parameters are updated. At each step of the path, the parameters can be decomposed into three orthogonal parts, namely, abnormal signals, personalized signals and random noise. The abnormal signals can serve the purpose of abnormal user selection, while the abnormal signals and personalized signals \ntogether are mainly responsible for user partial ranking prediction. Extensive experiments on simulated and real-world datasets demonstrate that our new approach significantly outperforms state-of-the-art alternatives.", "full_text": "iSplit LBI: Individualized Partial Ranking\n\nwith Ties via Split LBI\n\nQianqian Xu1\nXiaochun Cao3,4,7 Qingming Huang1,5,6,7 Yuan Yao8\n\nXinwei Sun2\n\nZhiyong Yang3,4\n\n1Key Lab. of Intelligent Information Processing, Institute of Computing Technology, CAS\n\n2Microsoft Research Asia\n\n3State Key Laboratory of Information Security, Institute of Information Engineering, CAS\n\n4School of Cyber Security, University of Chinese Academy of Sciences\n\n5School of Computer Science and Tech., University of Chinese Academy of Sciences\n\n6Key Laboratory of Big Data Mining and Knowledge Management, CAS\n\n7Peng Cheng Laboratory\n\n8Department of Mathematics, Hong Kong University of Science and Technology\n\nxuqianqian@ict.ac.cn, xinsun@microsoft.com, yangzhiyong@iie.ac.cn\n\ncaoxiaochun@iie.ac.cn, qmhuang@ucas.ac.cn, yuany@ust.hk\n\nAbstract\n\nDue to the inherent uncertainty of data, the problem of predicting partial ranking\nfrom pairwise comparison data with ties has attracted increasing interest in recent\nyears. However, in real-world scenarios, different individuals often hold distinct\npreferences.\nIt might be misleading to merely look at a global partial ranking\nwhile ignoring personal diversity. In this paper, instead of learning a global rank-\ning which is agreed with the consensus, we pursue the tie-aware partial ranking\nfrom an individualized perspective. Particularly, we formulate a uni\ufb01ed frame-\nwork which not only can be used for individualized partial ranking prediction, but\nalso be helpful for abnormal user selection. This is realized by a variable splitting-\nbased algorithm called iSplitLBI. Speci\ufb01cally, our algorithm generates a se-\nquence of estimations with a regularization path, where both the hyperparameters\nand model parameters are updated. At each step of the path, the parameters can be\ndecomposed into three orthogonal parts, namely, abnormal signals, personalized\nsignals and random noise. The abnormal signals can serve the purpose of abnor-\nmal user selection, while the abnormal signals and personalized signals together\nare mainly responsible for individual partial ranking prediction. Extensive exper-\niments on simulated and real-world datasets demonstrate that our new approach\nsigni\ufb01cantly outperforms state-of-the-art alternatives.\n\n1\n\nIntroduction\n\nThe \ufb02ourish of various online crowdsourcing services (e.g., Amazon Mechanical Turk), presents us\nan effective way to distribute tasks to human workers around the world, on-demand and at scale.\nRecently, there arises a plethora of pairwise comparison data in crowdsourcing experiments on the\nInternet [16, 2], ranging from marketing and advertisements to competitions and election. Infor-\nmation of this kind is all around us: which college a student selected, who won the chess match,\nwhich movie a user watched, etc. How to aggregate the massive amount of personalized pairwise\ncomparison data to reveal the global preference function has been one important topic in the last\ndecades [4, 11, 26, 20, 2, 22].\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fBut is the aggregated result necessarily more important than individual opinions? This is not al-\nways the case especially when our Internet is \ufb02ooded with personalized information in diversity.\nThe disagreement over the crowd could not be simply interpreted as a random perturbation of a\nconsensus that everybody should follow. For example, we often observe quite different preferences\non a college ranking or a favorite movie list. Hence the wave of personalized ranking arises in\nrecent years in search of better individualized models. One line of the related research assumes\nthat the ranking function is determined by a small number of underlying intrinsic functions such\nthat every individual\u2019s personalized preference is a linear combination of these intrinsic functions\n[30, 19, 12, 6]. Another line of research attributes the personalized bias to user quality, where either\na single parameter or a general confusion matrix is adopted to model the users\u2019 ability to provide a\ncorrect label [7, 13, 15, 23, 25, 33]. There is also a trend to explore personalized ranking effects in\nterms of preference distributions [18, 17]. Moreover, [28, 27] take a wide spectrum by considering\nboth the social preference and individual variations simultaneously. Speci\ufb01cally, it designs a basic\nlinear mixed-effect model which not only can derive the common preference on population-level,\nbut also can estimate user\u2019s preference/utility deviation in an individual-level.\n\nFigure 1: An example of pairwise ranking\nwith ties.\n\nAll the work mentioned above either focuses on\ninstance-wise preference learning or assumes that\nthe candidates are comparable in a total order. For\npairwise preference learning, however, the answer\nmight go beyond a win/loss option in real-world sce-\nnarios. The following gives an example in crowd-\nsourced college ranking.\nExample.\nIn world college ranking with crowd-\nsourcing platforms such as Allourideas, a partici-\npant is asked about \u201cwhich university (of the follow-\ning two) would you rather attend?\u201d. As is shown in\nFig.1, let G = (V, E) be a pairwise ranking graph\nwhose vertex set is V , the set of universities to be\nranked, and the edge set is E, the set of university pairs which receive some comparisons from\nusers. Here different colors indicate different users. If a voter thinks college V3 is better than col-\nlege V6, a solid arrowed line from V3 to V6 occurs (i.e., superiority). However, when a voter thinks\nthe two colleges (i.e., V1 and V3) listed are incomparable and dif\ufb01cult to judge, he may click the\nbutton \u201cI can\u2019t decide\u201d, then a dotted line connecting V1 and V3 happens (i.e., tie).\nHere for a pair (i, j), if a voter believes i and j share a similar strength and neither one is superior to\nthe other, he may abstain from this decision and leave it with a tie. An abstention of this kind is an\nobvious means to avoid unreliable predictions. Such kind of pairwise comparison data, together with\n\u201cI cannot decide\u201d decision, provide us information about possible ties or equivalent classes of items\nin partial orders. Though there is some work in the literature studying how to organize information\nin partial orders of such tied subsets or equivalent classes (partitions, bucket orders) [5, 14], little\nhas been done on learning the individualized partial order models from such pairwise comparison\ndata with ties.\nIn this paper, we aim to learn the individualized partial ranking models for each user based on\nsuch kind of pairwise ranking graph with ties. Based on the partial ranking, we could recommend\nuniversities for a speci\ufb01c user. For example, recommending universities that are with the same\nquality as college A; or, recommending universities that are slightly better than college B, etc.\nMoreover, another challenge of personalized preference ranking comes from the fact that abnormal\nusers might exist in the crowd. They either bear an extremely different pattern with the majority of\nthe crowd or belong to malicious users trying to attack the learning system. To deal with abnormal\nuser detection in crowdsourced data, existing studies often take a majority voting strategy, which\noften ignores the personalized effect.\nSeeing the issues mentioned above, we propose a uni\ufb01ed framework, called iSplitLBI, for per-\nsonalized partial ranking, tie state recognition, and abnormal user detection. The merits of our\nframework are of three-fold: 1) It decomposes the parameters into three orthogonal parts, namely,\nabnormal signals, personalized signals, and random noise. The abnormal signals can serve the pur-\npose of abnormal user detection, while the abnormal signals and personalized signals together are\nmainly responsible for user partial ranking prediction. 2) It provides a compatible framework be-\n\n2\n\n\ftween predict individual preferences (i.e., model prediction) and identi\ufb01cation of abnormal users\n(i.e., model selection) by virtue of variable splitting scheme. 3) Exploiting the regularization path, it\nsimultaneously searches hyper-parameters and model parameters. Up to our knowledge, this is the\n\ufb01rst proposal of such a model in the literature on partial ranking.\n\n2 Methodology\n\nIn crowdsourced pairwise comparison experiments, suppose there are n alternatives or items to be\nranked. Traditionally, the pairwise comparison labels collected from users can be naturally repre-\nsented as a directed comparison graph G = (V, E). Let V = {1, 2, . . . , n} be the vertex set of n\nitems and E = {(u, i, j) : i, j \u2208 V, u \u2208 U} be the set of edges, where U is the set of all users\nwho compared items. User u provides his/her preference between choice i and j, such that yu\nij = 1\nmeans u prefers i to j and yu\nHowever, in real-world applications, ties are ubiquitous. In this case, if a rater thinks neither of the\ntwo items in a pair is superior to the other, he/she may abstain from this decision and instead declare\na tie, as is shown with the red dotted line in Fig.1. This inspires us to adopt a win/tie/lose user\nfeedback in the following sense:\n\nij = \u22121 otherwise.\n\n(cid:40) 1,\n\n\u22121,\n0,\n\nyu\nij =\n\nif u prefers i to j,\nif u prefers j to i,\n\notherwise.\n\n(1)\n\nGiven the de\ufb01nition of the user feedback, in the rest of this section, we elaborate our proposed model\nin the following order. First we propose a probability model to describe the generation process of\nthe comparison results yu\nij. Then we present a simple iterative algorithm called individualized Split\nLinearized Bregman Iterations (i.e., iSplitLBI) for individualized partial ranking. In the end,\nwe provide a decomposition property of iSplitLBI which dives deeper into the insights of our\nproposed model.\n\n2.1 Probabilistic Model of Partial Ranking with Ties\n\nNow we describe our dataset at hand with the following notations. Suppose that we have U users\nand for a speci\ufb01c user u, he/she annotates nu pairwise comparisons. For a speci\ufb01c comparison\n(i, j), the user provides a label yu\nij correspondingly following (1). We denote the set of all pairwise\ncomparisons available for user u as Ou, and de\ufb01ne the label set Y u as:\n\nY u =(cid:8)yu\n\nij : (i, j) \u2208 Ou(cid:9)\n\nThen our dataset could be expressed as {Ou,Y u}U\nu=1. We assume that each user has a personalized\n],\u2200u,\nscore list for all items. We denote such true personalized score lists as su = [su\nwhere nui is the number of items that are available for u. Furthermore, for any speci\ufb01c u, \u03bbu is a\npersonalized threshold value to be learned for decision. Then, for a speci\ufb01c user u, and a speci\ufb01c\nobservation (i, j), we assume that yu\nj with\nthe threshold \u03bbu. Meanwhile, to model the randomness of the sampling and the decision making\nprocess, we model the uncertainty of su\nij which has a\nc.d.f \u03a6(t). Then, in our model, user u would choose yu\nij = 1, if the observed personalized score\ndifference su\nij is smaller\ni \u2212 su\nthan \u2212\u03bbu, then user u would choose yu\nij = \u22121. Otherwise, su\nij has a smaller magnitude\nthan \u03bbu, in which case the user would claim a tie. Above all, yu\nij is obtained from the following rule:\n\nj with an associated random noise \u0001u\ni \u2212 su\n\nij is greater than the threshold \u03bbu. To the opposite, if su\n\nij is produced by comparing the score difference su\n\n1 ,\u00b7\u00b7\u00b7 , su\n\ni \u2212 su\n\ni \u2212 su\n\nj + \u0001u\n\ni \u2212 su\n\nj + \u0001u\n\nj + \u0001u\n\nnui\n\n(2)\n\n(3)\n\n(4)\n\nFurthermore, we de\ufb01ne two variables \u03b6 u+\n\nyu\nij =\n\nj + \u0001u\nj + \u0001u\n\nij > \u03bbu;\nij \u2264 \u2212\u03bbu;\n\n\uf8f1\uf8f2\uf8f3 1,\n\ni \u2212 su\nsu\n\u22121,\ni \u2212 su\nsu\nelse.\n0,\nand \u03b6 u\u2212\n\nk\n\nas :\nk\nij = \u03bbu \u2212 su\n\u03b6 u+\nij = \u2212\u03bbu \u2212 su\n\u03b6 u\u2212\n\ni + su\nj\ni + su\nj\n\n3\n\n\fij is a random variable with a c.d.f \u03a6, we could then derive the probability to observe yu\n\nSince \u0001u\n(cid:1)\n1, 0,\u22121, respectively. Speci\ufb01cally, together with (3) and (4) we have:\nij ) \u2212 \u03a6(\u03b6 u\u2212\nij } = \u03a6(\u03b6 u+\nij )\n\nij } = 1 \u2212 \u03a6(cid:0)\u03b6 u+\n\nP{yu\nP{yu\nP{yu\n\nij = 1} = P{\u0001u\nij = 0} = P{\u03b6 u\u2212\nij = \u22121} = P{\u0001u\n\nij > \u03b6 u+\nij \u2264 \u0001u\nij \u2264 \u03b6 u\u2212\n\nij < \u03b6 u+\nij } = \u03a6(\u03b6 u\u2212\nij ).\n\nij\n\nij =\n\nNote that different \u03a6 could lead to different models. In this paper, we simply consider the most\nwidely adopted Bradley-Terry model: \u03a6(t) = et\n1+et , while leaving other models for future studies.\n\n2.2\n\nIndividualized Split LBI\n\nIn our framework, we assume the majority of participants share a common preference interest and\nbehave rationally, while deviations from that exist but are sparse. To be speci\ufb01c, we consider the\nfollowing linear model for annotator\u2019s individualized partial ranking:\nij = (csi + pu\nsi\n\ns , \u03bbu = c\u03bb + pu\n\u03bb,\n\n) \u2212 (csj + pu\n\nsu = cs + pu\n\ni \u2212 su\nsu\n\n) + \u0001u\nij.\n\nj + \u0001u\n\n(5)\n\nsj\n\ns and pu\n\nsi and pu\n\n\u03bb represent the individualized bias pattern, in which pu\n\nTo make the notation clear, let P u\n\u03a6(\u03b6 u\u2212\n\nwhere (1) cs and c\u03bb represent the consensus level pattern, in which cs is the common global ranking\nscore, c\u03bb is the common \u03bb, as a \ufb01xed effect, csi and csj are the ith and jth element of cs, respectively;\n(2) pu\ns is the annotator\u2019s preference\n\u03bb is the individualized bias with c\u03bb, as a random\ndeviation from the common ranking score cs, pu\ns , respectively; (3) \u0001u\nsj are the ith and jth element of pu\neffect, pu\n1,ij = 1 \u2212 \u03a6(\u03b6 u+\n0,ij = \u03a6(\u03b6 u+\nij ), P u\n(cid:105)1{yu\nij =1}(cid:104)\n(cid:105)1{yu\nij =0}(cid:104)\nij} as:\nij ), then we could represent P{yu\nL(Ou,Y u|su, \u03bbu) = \u2212 (cid:88)\n\nij is the random noise.\nij ) \u2212 \u03a6(\u03b6 u\u2212\n(cid:105)1{yu\n\nGiven all above, for a speci\ufb01c user u, it is easy to write out the negative log-likelihood:\n\nij ) and P u\u22121,ij =\n\nij} =\n\nP{yu\n\nij =\u22121}\n\nP u\u22121,ij\n\n(cid:104)\n\nP u\n\nP u\n\n1,ij\n\n0,ij\n\n.\n\n(i,j)\u2208Ou\n\nij}.\n\nlog P{yu\n\u03bb, \u03bbu \u2265 \u03b4, c\u03bb \u2265 \u03b4.\n\ns.t. su = cs + pu\n\ns , \u03bbu = c\u03bb + pu\n\n(6)\n\n(7)\n\nIn the constraints we use \u03bbu \u2265 \u03b4, c\u03bb \u2265 \u03b4, where \u03b4 > 0, as closed and convex approximations of the\npositivity constraints \u03bbu > 0, c\u03bb > 0. The bene\ufb01t to employ the relaxations are two-fold: 1) The\nclosed domain constraints induce closed-form solution; 2) The threshold \u03b4 improves the quality of\nthe solution to avoid ill-conditioned cases being too close to zero.\nObviously, the personalized bias could not grow arbitrarily large. More reasonably, only highly\npersonalized users have a signi\ufb01cant bias pu\n\u03bb, while the majority of the mass tends to have\nsmaller or even zero biases. If we denote P s = {pu\n\u03bb : u =\n1,\u00b7\u00b7\u00b7 , U} , this means that (P s, P \u03bb) satis\ufb01es group sparsity, then we add a group lasso penalty to\nthe loss function J\u00b5(P s, P \u03bb), which is in the form:\n\ns : u = 1,\u00b7\u00b7\u00b7 , U} and P \u03bb = {pu\n\ns and pu\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:20)pu\n\ns\npu\n\u03bb\n\n(cid:21)(cid:13)(cid:13)(cid:13)(cid:13) , \u00b5 > 0,\n\n(cid:88)\n\nu\n\nJ\u00b5(P s, P \u03bb) = \u00b5\n\ns and pu\n\nwhere \u00b5 is a regularization parameter. Such a structural penalty (7) can identify abnormal users u\nwhose pu\n\u03bb are nonzero. These non-zero terms increase the penalty function. However the\ncorresponding reduction of loss function L(Ou,Y u|su, \u03bbu) must dominate the increasing penalty\nso as to minimize the overall objective function. In this sense, the abnormal users capture the strong\nsignals for individualized biases. However, it ignores the possibility that weak signals could also\ninduce individualized biases. Such signals help to decrease the loss, but the reduction of loss is not\nstrong enough to cover the penalty term. This motivates us to propose a variable splitting scheme to\nsimultaneously embrace strong and weak patterns. Speci\ufb01cally, we model the overall signal (pu\ns , pu\n\u03bb)\nas the sum of the strong signals (\u0393u\n\u03bb). The\ngroup lasso penalty is exhibited on the strong signals. Moreover, we give the weak signals an\n(cid:96)2 penalty in the form: S\u03bd(\u0393, P ) = 1\n\n, to avoid it being arbitrarily large.\n\n\u03bb) and weak signals (\u2206u\n\n\u03bb) \u2212 (\u0393u\n\n(cid:20)pu\n\n\u03bb) = (pu\n\n(cid:80)\n\ns , \u2206u\n\ns , \u0393u\n\ns , \u0393u\n\ns , pu\n\n(cid:21)\n\n\u2212\n\n2\u03bd\n\nu\n\n(cid:21)(cid:13)(cid:13)(cid:13)(cid:13)2\n\ns\npu\n\u03bb\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:20)\u0393u\n\ns\n\u0393u\n\u03bb\n\n4\n\n\fDenote the parameter set as \u0398 = {P s, P \u03bb, \u0393s, \u0393\u03bb, cs, c\u03bb}. De\ufb01ne \u0393s = {\u0393u\n\u0393\u03bb = {\u0393u\n\n\u03bb : u = 1,\u00b7\u00b7\u00b7 , U}, the loss function is de\ufb01ned as:\n\ns : u = 1,\u00b7\u00b7\u00b7 , U} and\n\nmin\n\nL(Ou,Y u|su, \u03bbu) + S\u03bd(\u0393, P ) + J\u00b5(\u0393s, \u0393\u03bb)\n\n(cid:88)\n\nu\n\n\u0398\ns.t. su = cs + \u0393u\n\u03bbu \u2265 \u03b4, c\u03bb \u2265 \u03b4.\n\ns , \u03bbu = c\u03bb + \u0393u\n\u03bb,\n\nInstead of directly solving the above-mentioned problem, we adopt the Split Linearized Bregman\nIterations which we call individualized Split LBI (iSplitLBI), which gives rise to a regularization\npath where both the model parameters and hyper-parameters are simultaneously evolved. The (k +\n1)-th iteration on such a path is given as:\n\n(8)\n\n(9a)\n\n(9b)\n\n(9c)\n\ns\ncu,+1\n\u03bb\n\n(cid:18) cu,k+1\n(cid:18) P k+1\n(cid:18) Z k+1\n(cid:18) \u0393k+1\n\nP k+1\n\nZ k+1\n\n\u03bb\n\n\u03bb\n\ns\n\ns\n\ns\n\n(cid:19)\n(cid:19)\n(cid:19)\n(cid:19)\n\n= ProxJc\n\n= ProxJP\n\n(cid:18) Z k\n\ns\nZ k\n\u03bb\n\n=\n\n(cid:19)\n(cid:19)\n\n\u2212 \u03ba\u03b1k\u2207cL(\u0398k)\n\n,\u2200u \u2208 U\n\n\u2212 \u03ba\u03b1k\u2207P L(\u0398k)\n\n,\n\ns\nP k\n\u03bb\n\ns\ncu,k\n\u03bb\n\n(cid:18)(cid:18) cu,k\n(cid:19)\n(cid:18)(cid:18) P k\n(cid:19)\n(cid:19)\n(cid:18)(cid:18) Z k\n(cid:19)(cid:19)\ns = 0 \u2208 Rp, P 0\n\n\u2212 \u03b1k\u2207\u0393L(\u0398k),\n\ns\nZ k\n\u03bb\n\n\u03bb\n\nk\n\ns , ck\n\n\u03bb, P k\n\ns , P k\n\ns , \u02dcP\n\n, \u00b5 = 1,\n\ns = Z0\n\n\u03bb = Z0\n\ns = 0 \u2208 RU\u00d7p, P 0\n\n\u03bb) and sparse estimators ( \u02dcP\n\ntime at \u03c4k = (cid:80)k\n\n= \u03ba \u00b7 ProxJ\u00b5\n(9d)\n\u0393k+1\n\u03bb = 0 \u2208\n\u03bb = 1 \u2208 R1, cu,0\nwhere the initial choice cu,0\nRU , parameters \u03ba > 0, \u03b1 > 0, \u03bd > 0, and the proximal map associated with a convex function h\nis de\ufb01ned by Proxh(z) = arg minx (cid:107)z \u2212 x(cid:107)2/2 + h(x). The Jc(c\u03bb) and JP (P \u03bb) are denoted as\nthe indicator function for the set c\u03bb \u2265 \u03b4 and \u03bbu \u2265 \u03b4 respectively (an indicator function of a set is\n0 when the input variable is in the set, otherwise it is +\u221e). Hence, at each step, the \ufb01rst two steps\ngive a projected gradient descent of c\u03bb and P \u03bb, which makes the variables feasible.\nThe iSplitLBI algorithm generates a regularized solution path of dense estimators\nk\n\u03bb). These sparse estimators could be obtained by\n(ck\nprojecting (P s, P \u03bb) onto the support set of (\u0393s, \u0393\u03bb), respectively. Along the path, the stopping\ni=1 \u03b1i in this algorithm plays the same role as the regularization parameter in the\nlasso problem. In fact, Eq.(9a)-(9d) describes one iteration of the optimization process, which is\nactually a discretization of a dynamical system shown in [10]. Such a dynamical system is known\nas inverse scale spaces [1, 21, 9], leveraging a regularization path consisting of sparse models at\ndifferent levels from the null to the full. At iteration k, the cumulative time \u03c4k can be regarded as\nthe inverse of the Lasso regularization parameter (here roughly \u03c4k \u223c 1/\u00b5): the larger is \u03c4k, the\nsmaller is the regularization and hence the more nonzero parameters enter the model. Following\nthe dynamics, the model gradually grows from sparse to dense models with increasing complexity.\nIn particular as \u03c4k \u2192 \u221e, the dynamics may reach some over-\ufb01tting models when noise exists like\nour case, equivalent to a full model in generalized Lasso of minimal regularization. To prevent such\nover-\ufb01tting models in noisy applications, we adopt an early stopping strategy to \ufb01nd an optimal\nstopping time by cross validation.\nMoreover, the \u03bd also plays an important role in the model. When \u03bd \u2192 0, only sparse strong signals\n(features) are kept in models, then the iSplitLBI reduces to LBI algorithm, which is shown to\nreach model selection consistency under nearly the same condition as LASSO for linear models [21].\nRecently, it is shown in [10] that the model selection consistency can also hold even under non-linear\nmodels. With a \ufb01nite value of \u03bd, it is shown in [8, 9] that the sparse estimator enjoys improved model\nselection consistency. Moreover, equipped with the variable splitting scheme, the \ufb01nite value of \u03bd\nenables the overall signals (here P ) to capture features ignored by the strong (sparse) signals. It has\nbeen shown in the literature (e.g. [24, 32]), which coincides with our discussion, that such kinds of\nfeatures can improve prediction in various tasks. Now we note the following implementation details\nfor iSplitLBI. The hyper-parameter \u03ba is a damping factor which determines the bias of the sparse\nestimators, a bigger \u03ba leading to less biased estimators (bias-free as \u03ba \u2192 \u221e). The hyper-parameter\n\u03b1k is the step size which determines the precise of the path, with a large \u03b1k rapidly traversing a\ncoarse-grained path. However one has to keep \u03b1k\u03ba small to avoid possible oscillations of the paths,\ne.g. \u03b1k\u03ba \u2264\nas a tradeoff between\nperformance and computation cost.\n\n. The default choice in this paper is\n\n\u03ba(cid:107)\u22072L(\u0398k)(cid:107)2\n\n(cid:107)\u22072L(\u0398k)(cid:107)2\n\n2\n\n1\n\n2\n\n2\n\n5\n\n\f2.3 Decomposition Property of iSplit LBI\n\nBy virtue of the variable splitting term,\nthe dense parameter P enjoys a speci\ufb01c\northogonal decomposition property, as is\nshown in Fig.2:\n\nP = P abn \u2295 P per \u2295 P ran.\n\nFigure 2: Decomposition of personalized parameters.\n\n(1) P abn is simply \u02dcP , i.e., the projection\nof P on the support set of \u0393.\nIn other\n(cid:54)= 0, and\nwords, Pabnij = Pij if \u0393ij\nPabnij = 0 otherwise. Users corresponding to the non-zero columns of P abn have signi\ufb01cant\nbiases toward the popular scores cs and the common threshold c\u03bb. Thus the structure of P abn could\ntell us who is an abnormal user in the crowd. In this sense, we refer to P abn as the abnormal sig-\nnal. This corresponds to the strong signals in the last subsection. (2) Among the remainder of such\nprojection, P per stands for the elements having a signi\ufb01cant magnitude than random noise. This\ncomponent drives the dense parameter P further away from the sparse parameter \u02dcP . According to\nthe discussion in the previous subsection, this component takes into consideration the weak signals\nthat help to further reduce the loss function. In this sense, including P per brings better performance\nto P . (3) The remaining entries in P are referred to as P ran, i.e., the random noises, which are\ninevitable due to the randomness of the data.\nWith all above, we present a compatible framework for both model prediction and model selection:\n(1) The strong signal (P s, P \u03bb) contains all the personalized biases which is a better choice for\nmodel prediction; (2) (\u0393s, \u0393\u03bb) and P abn exclude the weak and dense personalized signals in the\noverall signals, which makes it a natural choice of abnormal user identi\ufb01cation using model selec-\ntion. This motivates us to take advantage of the support set of \u02dcP to detect abnormal users, while\nutilizing P for prediction.\n\n3 Experiments\n\ns is nonzero, we generate pu\n\n3.1 Simulated Study\nSettings. We validate our algorithm on simulated data with n = |V | = 20 items and U = 50\nannotators. We \ufb01rst generate the true common ranking scores cs \u223c N (0, 52). Then each annotator\nhas a probability p1 = 0.2 to have a nonzero pu\ns . Those nonzero pu\ns s are drawn randomly from\n\u03bb \u223c c\u03bb \u2217 U(\u22120.5, 0.5), otherwise we simply set\nN (0, 52). If pu\n\u03bb as pu\n\u03bb = 0, where c\u03bb = 1.5. At last, we draw N u samples for each user randomly following the\npu\nBradley-Terry model. The sample number N u uniformly spans [N1, N2] = [200, 400]. Finally, we\nobtain a multi-edge graph with ties annotated by 50 annotators.\nAbnormal User Detection. In this part, we validate abnormal user detection ability of iSplitLBI\nwith visualization analysis. As we have stated, the support set of P (or \u02dcP equivalently) implies\nthe abnormal users. In this sense, we visualize the \u02dcP (the ground-truth parameters) and \u02dcP 0 (the\n| \u02dcP|) and\nestimated parameters) in Fig.4 (a)-(b), whereas we visualize the magnitude of \u02dcP (i.e.\n| \u02dcP 0|) in Fig.4 (c)-(d). Although the magnitude of \u02dcP 0 tends to be smaller than the true\n\u02dcP 0 (i.e.\nparameter, the results in Fig.4 (a)-(b) clearly suggest a perfect detection of the abnormal users.\nFurthermore, Fig.5 shows the L2-distance between each user\u2019s individualized ranking (i.e., su) and\nthe common ranking (i.e., cs), (cid:107)su \u2212 cs(cid:107). Clearly one can see the abnormal users we detected all\nexhibit larger L2-distance with the common ranking compared with other users. This indicates that\nthese 13 abnormal users detected are those with large deviations from the population\u2019s opinion.\nPrediction Ability. After showing the successful detection of abnormal users, in the following, we\nwill exhibit the prediction ability of the proposed iSplit LBI method.\n(1) Evaluation metrics: We measure the experimental results via two evaluation criteria, i.e., Macro-\nF1, and Micro-F1 over the three classes -1,0,1, which take both precision and recall into account.\nNote that the larger the value of Micro-F1 and Macro-F1, the better the performance. For more\ndetails, please refer to [31].\n\n6\n\n\f(a) Supp( \u02dcP )\n\n(b) Supp( \u02dcP 0)\n\n(c) Su \u02dcP Su\n\n(d) Su \u02dcP 0Su\n\nFigure 4: Visualization of the parameters.\n\nFigure 5: Detected abnormal users.\n\n(2) Competitors: We employ two competitors that share most of the problem settings with\niSplitLBI. i) the \u03b1-cut algorithm [3] is an early trial of common partial ranking. Since\n\u03b1-cut is an ensemble-based algorithm, its performance depends on the choice of weak learners.\nConsequently, we compare our proposed algorithm with the \u03b1-cut algorithm where different types\nof such weak learners and regularization schemes are adopted. Regarding the parameter-tuning of\nthe weak learners in \u03b1-cut, we tune the coef\ufb01cients for Ridge/LASSO regularization from the range\n{2\u221215, 2\u221213,\u00b7\u00b7\u00b7 , 2\u22125} and the best parameters are picked out through a 5-fold cross-validation\non the training set.\nii) a most recently developed margin-based MLE method [29] where\nUniform, Bradley-Terry, and Thurstone-Mosteller models are considered, re-\nspectively.\n(3) Qualitative Results: Tab.1 shows the\ncorresponding performance of our pro-\nposed algorithms and the competitors. In\nthis table, the second column shows the\nweak learners and regularization terms\nemployed in \u03b1-cut and three mod-\nels proposed in MLE-based algorithm.\nSpeci\ufb01cally, LR represents for logistics re-\ngression, SVM stands for the Support Vec-\ntor Machine method, LS stands for the\nmethod of least squares while SVR stands\nfor the Support Vector Regression method.\nFor regularization, we employ the Ridge\nand LASSO regularization terms. Here we\nsplit the data into a training set (80% of\neach user\u2019s pairwise comparisons) and a\ntesting set (the remaining 20%). To ensure the statistical stability, we repeat this procedure 20 times.\nIt is easy to see that iSplit LBI signi\ufb01cantly outperforms the other two competitors with an average\nof 0.834 \u00b1 0.005 in Micro-F1 and 0.761 \u00b1 0.007 in Macro-F1 due to its individualized property.\n\nalgorithms min median max std\nLRLasso .216 .345 .365 .033\nLRRidge .319 .347 .380 .018\nSVMlasso .318 .340 .367 .012\nSVMRidge .294 .349 .367 .016\nLSLasso .306 .344 .368 .016\nLSRidge .320 .346 .368 .013\nSVRlasso .329 .347 .377 .013\nSVRRidge .312 .336 .378 .019\n.599 .622 .660 .014\n.772 .801 .839 .015\n.767 .799 .820 .016\n.825 .834 .841 .005\n\nminmedian max std\n.238 .323 .364 .036\n.210 .338 .392 .061\n.216 .305 .401 .051\n.198 .334 .429 .077\n.206 .347 .413 .044\n.222 .325 .410 .051\n.221 .357 .448 .057\n.220 .346 .436 .050\n.588 .605 .631 .011\n.628 .660 .679 .015\n.608 .637 .669 .016\n.747 .761 .774 .007\n\nTable 1: Experimental results on simulated dataset.\n\n(a) Micro-F1\n\n(b) Macro-F1\n\n\u03b1-cut\n\ntypes\n\nMLE\n\nUn\nBT\nTM\n\nOurs\n\nOurs\n\n3.2 Human Age\n\nDataset. In this dataset, 25 images from human age dataset FG-NET 1 are annotated by a group\nof volunteers on ChinaCrowds platform. The annotator is presented with two images and given\na choice of which one is older (or dif\ufb01cult to judge). Totally, we obtain 9589 feedbacks from 91\nannotators.\nQualitative Results. Tab.2 shows the corresponding performance of our proposed algorithms and\nthe competitors. We can easily \ufb01nd that our proposed algorithm signi\ufb01cantly outperforms the other\ntwo competitors in terms of both Micro-F1 and Macro-F1. Moreover, Fig.6 (a) shows the L2-\ndistance between selected users\u2019 (i.e., the top 10% and bottom 10% in the regularization path) indi-\nvidualized ranking and the common ranking. Clearly one can see that users jumped out earlier (i.e.,\n\n1http://www.fgnet.rsunit.com/\n\n7\n\n\fthe top 10% marked with pink) show larger L2-distance, thus are those with large deviation from\nthe population\u2019s opinion and can be treated as abnormal users. On the contrary, users jumped out\nlater (i.e., the bottom 10% marked with blue) tend to have smaller or even zero L2-distance.\n\n3.3 WorldCollege Ranking\n\ntypes\n\n\u03b1-cut\n\n(a) Micro-F1\n\n(b) Macro-F1\n\nMLE\n\nOurs\n\nUni.\nBT\nTM\n\nOurs\n\nTable 2: Experimental results on Human Age dataset.\n\nmin median max std\n.327 .358 .381 .016\n.330 .364 .387 .015\n.331 .355 .371 .013\n.330 .351 .379 .014\n.335 .361 .383 .014\n.335 .365 .398 .014\n.333 .364 .397 .014\n.337 .367 .381 .012\n.589 .606 .641 .012\n.599 .628 .647 .012\n.603 .623 .647 .012\n.680 .694 .712 .010\n\nalgorithms min median max std\nLRLasso .428 .443 .458 .008\nLRRidge .422 .443 .457 .008\nSVMlasso .422 .442 .463 .010\nSVMRidge .424 .443 .457 .008\nLSLasso .423 .441 .455 .008\nLSRidge .426 .442 .464 .009\nSVRlasso .418 .431 .449 .008\nSVRRidge .422 .432 .450 .007\n.692 .705 .738 .012\n.731 .741 .755 .008\n.728 .739 .756 .008\n.765 .779 .791 .007\n\nDataset. We now apply the proposed\nmethod to the world college ranking\ndataset, which is composed of 261 col-\nleges. Using the Allourideas crowdsourc-\ning platform, a total of 340 random an-\nnotators with different backgrounds from\nvarious countries (e.g., USA, Canada,\nSpain, France, Japan, China, etc.)\nare\nshown randomly with pairs of these col-\nleges and asked to decide which of the\ntwo universities is more attractive to at-\ntend.\nIf the voter thinks the two col-\nleges are incomparable, he/she can choose\nthe third option by clicking \u201cI cannot de-\ncide\u201d. Finally, we obtain a total of 11012\nfeedbacks, among which 9409 samples are\npairwise comparisons with clear opinions (i.e., 1/-1) and the remaining 1603 are samples records\nwith voter clicking \u201cI cannot decide\u201d (i.e., 0).\nQualitative Results.\nTab.3 shows the\ncomparable results on the college dataset.\nIt is easy to see that our proposed algo-\nrithm again achieves better Micro-F1 and\nMacro-F1 with a large margin than all the\n\u03b1-cut and MLE-based variants. To in-\nvestigate the reason behind this, we further\ncompare our proposed algorithm with the\nMLE-based algorithms in terms of \ufb01ne-\ngrained precision, recall performances on\nlabel {\u22121, 0, 1} in Fig.6 (c). For labels\n-1 and 1, the performance improvement\nis relatively small, whereas a sharp im-\nprovement is highlighted for label 0. This\nsuggests that the major contribution of the\noverall improvements of our proposed al-\ngorithm comes from its strength to recognize the incomparable pairs, which is exactly the main\npursuit of this paper. Moreover, similar to the human age dataset, we also plot the L2 distance\nbetween the top/bottom 10% users\u2019 individualized ranking and the common ranking and similar\nphenomenon occurs on this dataset, as is shown in Fig.6 (b). Again, we see a signi\ufb01cant difference\nbetween the recognized most individualized rankers and the least individualized rankers.\n\nmin median max std\n.328 .349 .367 .011\n.323 .348 .364 .010\n.333 .345 .371 .009\n.327 .345 .365 .010\n.331 .342 .362 .010\n.334 .345 .365 .009\n.327 .346 .363 .010\n.323 .350 .361 .012\n.482 .496 .514 .009\n.496 .513 .526 .010\n.496 .511 .526 .010\n.608 .645 .674 .016\n\nalgorithms min median max std\nLRLasso .318 .350 .408 .026\nLRRidge .325 .352 .408 .023\nSVMlasso .325 .343 .404 .029\nSVMRidge .327 .354 .402 .025\nLSLasso .305 .320 .377 .016\nLSRidge .331 .345 .403 .023\nSVRlasso .306 .328 .378 .018\nSVRRidge .323 .348 .402 .023\n.521 .536 .550 .009\n.539 .552 .565 .008\n.539 .551 .565 .008\n.637 .649 .663 .008\n\nUni.\nBT\nTM\n\nOurs\n\nTable 3: Experimental results on College dataset.\n\n(b) Macro-F1\n\n(a) Micro-F1\n\ntypes\n\n\u03b1-cut\n\nMLE\n\nOurs\n\n4 Conclusions\n\nIn this paper, we propose a novel method called iSplitLBI which is capable of simultaneously\npredicting personalized rankings with ties and detecting the abnormal users in the crowd. To tackle\nthe personalized deviations of the scores, a hierarchical decomposition of the model parameters is\ndesigned where both the popular opinions and the individualized effects are taken into consideration.\nIn what follows, a speci\ufb01c variable splitting scheme is adopted to separate the functionality of model\nprediction and abnormal user detection. Experiments on both simulated examples and real-world\napplications together demonstrate the effectiveness of the proposed method.\n\n8\n\n\f(a) Age dataset\n\n(b) College dataset\n\n(c) Fine-grained comparison\n\nFigure 6: (a)-(b) The L2 distance between individualized ranking scores and common ranking scores\nof selected users on age and college datasets. (c) Fine-grained comparison on college dataset. P1,\nP2, P3 represent the precision for class -1, 0, 1, respectively; while R1, R2, R3 represent the corre-\nsponding recalls, respectively.\n\nAcknowledgments\n\nThis work was supported in part by the National Key R&D Program of China (Grant No.\n2016YFB0800403), in part by National Natural Science Foundation of China: 61620106009,\nU1636214, 61836002, U1803264, U1736219, 61672514 and 61976202, in part by National Basic\nResearch Program of China (973 Program): 2015CB351800, in part by Key Research Program of\nFrontier Sciences, CAS: QYZDJ-SSW-SYS013, in part by the Strategic Priority Research Program\nof Chinese Academy of Sciences, Grant No. XDB28000000, in part by Peng Cheng Laboratory\nProject of Guangdong Province PCL2018KP004, in part by Beijing Natural Science Foundation\n(4182079), in part by Youth Innovation Promotion Association CAS, and in part by Hong Kong\nResearch Grant Council (HKRGC) grant 16303817.\n\nReferences\n[1] M. Burger, S. Osher, J. Xu, and G. Gilboa. Nonlinear inverse scale space methods for image restoration.\nIn International Workshop on Variational, Geometric, and Level Set Methods in Computer Vision, pages\n25\u201336. Springer, 2005.\n\n[2] X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowd-\n\nsourced setting. In International Conference on Web Search and Data Mining, pages 193\u2013202, 2013.\n\n[3] W. Cheng, M. Rademaker, B. De Baets, and E. H\u00fcllermeier. Predicting partial orders: ranking with\nabstention. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases,\npages 215\u2013230, 2010.\n\n[4] D. Cynthia, K. Ravi, N. Moni, and S. Dandapani. Rank aggregation methods for the web. In International\n\nConference on World Wide Web, pages 613\u2013622, 2001.\n\n[5] A. Gionis, H. Mannila, K. Puolam\u00e4ki, and A. Ukkonen. Algorithms for discovering bucket orders from\ndata. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 561\u2013\n566, 2006.\n\n[6] X. He, Z. He, X. Du, and T.-S. Chua. Adversarial personalized ranking for recommendation. In Inter-\nnational ACM SIGIR Conference on Research & Development in Information Retrieval, pages 355\u2013364,\n2018.\n\n[7] H. Hu, Y. Zheng, Z. Bao, G. Li, J. Feng, and R. Cheng. Crowdsourced POI labelling: Location-aware\nresult inference and task assignment. In International Conference on Data Engineering, pages 61\u201372.\nIEEE, 2016.\n\n[8] C. Huang, X. Sun, J. Xiong, and Y. Yao. Split LBI: An iterative regularization path with structural sparsity.\n\nIn Advances in Neural Information Processing Systems, pages 3369\u20133377, 2016.\n\n[9] C. Huang, X. Sun, J. Xiong, and Y. Yao. Boosting with structural sparsity: A differential inclusion\n\napproach. Applied and Computational Harmonic Analysis, 48(1):1\u201345, 2020.\n\n[10] C. Huang and Y. Yao. A uni\ufb01ed dynamic approach to sparse model selection. In International Conference\n\non Arti\ufb01cial Intelligence and Statistics, pages 2047\u20132055, 2018.\n\n9\n\n\f[11] X. Jiang, L.-H. Lim, Y. Yao, and Y. Ye. Statistical ranking and combinatorial Hodge theory. Mathematical\n\nProgramming, 127(6):203\u2013244, 2011.\n\n[12] Z. Jiang, H. Liu, B. Fu, Z. Wu, and T. Zhang. Recommendation in heterogeneous information networks\nIn ACM International\n\nbased on generalized random walk model and bayesian personalized ranking.\nConference on Web Search and Data Mining, pages 288\u2013296, 2018.\n\n[13] E. Kamar, A. Kapoor, and E. Horvitz. Identifying and accounting for task-dependent bias in crowdsourc-\n\ning. In AAAI Conference on Human Computation and Crowdsourcing, 2015.\n\n[14] G. Lebanon and Y. Mao. Non-parametric modeling of partially ranked data. Journal of Machine Learning\n\nResearch, 9:2401\u20132429, 2008.\n\n[15] G. Li, C. Chai, J. Fan, X. Weng, J. Li, Y. Zheng, Y. Li, X. Yu, X. Zhang, and H. Yuan. Cdb: optimizing\nIn ACM International Conference on Management of\n\nqueries with crowd-based selections and joins.\nData, pages 1463\u20131478. ACM, 2017.\n\n[16] T. Liu. Learning to Rank for Information Retrieval. Springer, 2011.\n\n[17] T. Lu and C. Boutilier. Learning mallows models with pairwise preferences. In International Conference\n\non Machine Learning, pages 145\u2013152, 2011.\n\n[18] T. Lu and C. Boutilier. Effective sampling and learning for mallows models with pairwise-preference\n\ndata. The Journal of Machine Learning Research, 15(1):3783\u20133829, 2014.\n\n[19] Y. Lu and S. N. Negahban. Individualized rank aggregation using nuclear norm regularization. In Allerton\n\nConference on Communication, Control, and Computing (Allerton), pages 1473\u20131479, 2015.\n\n[20] S. Negahban, S. Oh, and D. Shah. Iterative ranking from pair-wise comparisons. In Advances in neural\n\ninformation processing systems, pages 2474\u20132482, 2012.\n\n[21] S. Osher, F. Ruan, J. Xiong, Y. Yao, and W. Yin. Sparse recovery via differential inclusions. Applied and\n\nComputational Harmonic Analysis, 41(2):436\u2013469, 2016.\n\n[22] B. Osting, C. Brune, and S. Osher. Enhanced statistical rankings via targeted data collection. In Interna-\n\ntional Conference on Machine Learning, pages 489\u2013497, 2013.\n\n[23] A. Sheshadri and M. Lease. Square: A benchmark for research on computing crowd consensus. In AAAI\n\nconference on human computation and crowdsourcing, 2013.\n\n[24] X. Sun, L. Hu, Y. Yao, and Y. Wang. GSplit LBI: Taming the procedural bias in neuroimaging for\nIn International Conference on Medical Image Computing and Computer-Assisted\n\ndisease prediction.\nIntervention, pages 107\u2013115, 2017.\n\n[25] M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation\nIn International Conference on World Wide Web, pages 155\u2013164. ACM,\n\nmodels for crowdsourcing.\n2014.\n\n[26] Q. Xu, T. Jiang, Y. Yao, Q. Huang, B. Yan, and W. Lin. Random partial paired comparison for subjective\nvideo quality assessment via HodgeRank. In ACM International Conference on Multimedia, pages 393\u2013\n402, 2011.\n\n[27] Q. Xu, J. Xiong, X. Cao, Q. Huang, and Y. Yao. From social to individuals: a parsimonious path of\nmulti-level models for crowdsourced preference aggregation. IEEE Transactions on Pattern Analysis and\nMachine Intelligence, 41(4):844\u2013856, 2019.\n\n[28] Q. Xu, J. Xiong, X. Cao, and Y. Yao. Parsimonious mixed-effects HodgeRank for crowdsourced prefer-\n\nence aggregation. In ACM International Conference on Multimedia, pages 841\u2013850, 2016.\n\n[29] Q. Xu, J. Xiong, X. Sun, Z. Yang, X. Cao, Q. Huang, and Y. Yao. A margin-based mle for crowdsourced\n\npartial ranking. In ACM International Conference on Multimedia, pages 591\u2013599, 2018.\n\n[30] J. Yi, R. Jin, S. Jain, and A. Jain. Inferring users\u2019 preferences from crowdsourced pairwise comparisons:\nA matrix completion approach. In AAAI Conference on Human Computation and Crowdsourcing, pages\n207\u2013215, 2013.\n\n[31] M.-L. Zhang and Z.-H. Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowl-\n\nedge and Data Engineering, 26(8):1819\u20131837, 2014.\n\n10\n\n\f[32] B. Zhao, X. Sun, Y. Fu, Y. Yao, and Y. Wang. MSplit LBI: realizing feature selection and dense estimation\nsimultaneously in few-shot and zero-shot learning. In International Conference on Machine Learning,\npages 5907\u20135916, 2018.\n\n[33] Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. QASCA: A quality-aware task assignment system for\ncrowdsourcing applications. In ACM SIGMOD International Conference on Management of Data, pages\n1031\u20131046, 2015.\n\n11\n\n\f", "award": [], "sourceid": 2161, "authors": [{"given_name": "Qianqian", "family_name": "Xu", "institution": "Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences"}, {"given_name": "Xinwei", "family_name": "Sun", "institution": "MSRA"}, {"given_name": "Zhiyong", "family_name": "Yang", "institution": "SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences; SCS, University of Chinese Academy of Sciences"}, {"given_name": "Xiaochun", "family_name": "Cao", "institution": "Institute of Information Engineering, Chinese Academy of Sciences"}, {"given_name": "Qingming", "family_name": "Huang", "institution": "University of Chinese Academy of Sciences"}, {"given_name": "Yuan", "family_name": "Yao", "institution": "Hong Kong Univ. of Science & Technology"}]}