{"title": "Active Ranking using Pairwise Comparisons", "book": "Advances in Neural Information Processing Systems", "page_first": 2240, "page_last": 2248, "abstract": "This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of $n$ objects can be identified by standard sorting methods using $n\\log_2 n$ pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. {Specifically, we assume that the objects can be embedded into a $d$-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in $\\R^d$. We show that under this assumption the number of possible rankings grows like $n^{2d}$ and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than $d\\log n$ adaptively selected pairwise comparisons, on average.} If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis.", "full_text": "Active Ranking using Pairwise Comparisons\n\nKevin G. Jamieson\n\nUniversity of Wisconsin\nMadison, WI 53706, USA\nkgjamieson@wisc.edu\n\nRobert D. Nowak\n\nUniversity of Wisconsin\nMadison, WI 53706, USA\nnowak@engr.wisc.edu\n\nAbstract\n\nThis paper examines the problem of ranking a collection of objects using pairwise\ncomparisons (rankings of two objects). In general, the ranking of n objects can be\nidenti\ufb01ed by standard sorting methods using n log2 n pairwise comparisons. We\nare interested in natural situations in which relationships among the objects may\nallow for ranking using far fewer pairwise comparisons. Speci\ufb01cally, we assume\nthat the objects can be embedded into a d-dimensional Euclidean space and that\nthe rankings re\ufb02ect their relative distances from a common reference point in Rd.\nWe show that under this assumption the number of possible rankings grows like\nn2d and demonstrate an algorithm that can identify a randomly selected ranking\nusing just slightly more than d log n adaptively selected pairwise comparisons,\non average.\nIf instead the comparisons are chosen at random, then almost all\npairwise comparisons must be made in order to identify any ranking. In addition,\nwe propose a robust, error-tolerant algorithm that only requires that the pairwise\ncomparisons are probably correct. Experimental studies with synthetic and real\ndatasets support the conclusions of our theoretical analysis.\n\n1\n\nIntroduction\n\nThis paper addresses the problem of ranking a set of objects based on a limited number of pair-\nwise comparisons (rankings between pairs of the objects). A ranking over a set of n objects\n\u0398= ( \u03b81,\u03b8 2, . . . ,\u03b8 n) is a mapping \u03c3 : {1, . . . , n}\u2192{ 1, . . . , n} that prescribes an order\n\n\u03c3(\u0398) := \u03b8\u03c3(1) \u227a \u03b8\u03c3(2) \u227a\u00b7\u00b7\u00b7\u227a \u03b8\u03c3(n\u22121) \u227a \u03b8\u03c3(n)\n\n(1)\nwhere \u03b8i \u227a \u03b8j means \u03b8i precedes \u03b8j in the ranking. A ranking uniquely determines the collection\nof pairwise comparisons between all pairs of objects. The primary objective here is to bound the\nnumber of pairwise comparisons needed to correctly determine the ranking when the objects (and\nhence rankings) satisfy certain known structural constraints. Speci\ufb01cally, we suppose that the objects\nmay be embedded into a low-dimensional Euclidean space such that the ranking is consistent with\ndistances in the space. We wish to exploit such structure in order to discover the ranking using a\nvery small number of pairwise comparisons. To the best of our knowledge, this is a previously open\nand unsolved problem.\nThere are practical and theoretical motivations for restricting our attention to pairwise rankings that\nare discussed in Section 2. We begin by assuming that every pairwise comparison is consistent with\nan unknown ranking. Each pairwise comparison can be viewed as a query: is \u03b8i before \u03b8j? Each\nquery provides 1 bit of information about the underlying ranking. Since the number of rankings is\nn!, in general, specifying a ranking requires \u0398(n log n) bits of information. This implies that at least\nthis many pairwise comparisons are required without additional assumptions about the ranking. In\nfact, this lower bound can be achieved with a standard adaptive sorting algorithm like binary sort\n[1]. In large-scale problems or when humans are queried for pairwise comparisons, obtaining this\nmany pairwise comparisons may be impractical and therefore we consider situations in which the\nspace of rankings is structured and thereby less complex.\n\n1\n\n\fA natural way to induce a structure on the space of rankings is to suppose that the objects can be\nembedded into a d-dimensional Euclidean space so that the distances between objects are consistent\nwith the ranking. This may be a reasonable assumption in many applications, and for instance the\naudio dataset used in our experiments is believed to have a 2 or 3 dimensional embedding [2]. We\nfurther discuss motivations for this assumption in Section 2. It is not dif\ufb01cult to show (see Section 3)\nthat the number of full rankings that could arise from n objects embedded in Rd grows like n2d, and\nso specifying a ranking from this class requires only O(d log n) bits. The main results of the paper\nshow that under this assumption a randomly selected ranking can be determined using O(d log n)\n\npairwise comparisons selected in an adaptive and sequential fashion, but almost all\uffffn\n\nrankings are needed if they are picked randomly rather than selectively. In other words, actively\nselecting the most informative queries has a tremendous impact on the complexity of learning the\ncorrect ranking.\n\n2\uffff pairwise\n\n1.1 Problem statement\n\nLet \u03c3 denote the ranking to be learned. The objective is to learn the ranking by querying the reference\nfor pairwise comparisons of the form\n\nqi,j := {\u03b8i \u227a \u03b8j}.\n\n(2)\nThe response or label of qi,j is binary and denoted as yi,j := 1{qi,j} where 1 is the indicator\nfunction; ties are not allowed. The main results quantify the minimum number of queries or labels\nrequired to determine the reference\u2019s ranking, and they are based on two key assumptions.\nA1 Embedding: The set of n objects are embedded in Rd (in general position) and we will also use\n\u03b81, . . . ,\u03b8 n to refer to their (known) locations in Rd. Every ranking \u03c3 can be speci\ufb01ed by a reference\npoint r\u03c3 \u2208 Rd, as follows. The Euclidean distances between the reference and objects are consistent\nwith the ranking in the following sense: if the \u03c3 ranks \u03b8i \u227a \u03b8j, then \uffff\u03b8i \u2212 r\u03c3\uffff < \uffff\u03b8j \u2212 r\u03c3\uffff. Let\n\u03a3n,d denote the set of all possible rankings of the n objects that satisfy this embedding condition.\nThe interpretation of this assumption is that we know how the objects are related (in the embedding),\nwhich limits the space of possible rankings. The ranking to be learned, speci\ufb01ed by the reference\n(e.g., preferences of a human subject), is unknown. Many have studied the problem of \ufb01nding\nan embedding of objects from data [3, 4, 5]. This is not the focus here, but it could certainly\nplay a supporting role in our methodology (e.g., the embedding could be determined from known\nsimilarities between the n objects, as is done in our experiments with the audio dataset). We assume\nthe embedding is given and our interest is minimizing the number of queries needed to learn the\nranking, and for this we require a second assumption.\nA2 Consistency: Every pairwise comparison is consistent with the ranking to be learned. That is, if\nthe reference ranks \u03b8i \u227a \u03b8j, then \u03b8i must precede \u03b8j in the (full) ranking.\nAs we will discuss later in Section 3.2, these two assumptions alone are not enough to rule out\npathological arrangements of objects in the embedding for which at least \u2126(n) queries must be\nmade to recover the ranking. However, because such situations are not representative of what is\ntypically encountered, we analyze the problem in the framework of the average-case analysis [6].\n\nDe\ufb01nition 1. With each ranking \u03c3 \u2208 \u03a3n,d we associate a probability \u03c0\u03c3 such that\uffff\u03c3\u2208\u03a3n,d\n\u03c0\u03c3 =\n1. Let \u03c0 denote these probabilities and write \u03c3 \u223c \u03c0 for shorthand. The uniform distribution\ncorresponds to \u03c0\u03c3 = |\u03a3n,d|\u22121 for all \u03c3 \u2208 \u03a3n,d, and we write \u03c3 \u223cU for this special case.\nDe\ufb01nition 2. If Mn(\u03c3) denotes the number of pairwise comparisons requested by an algorithm to\nidentify the ranking \u03c3, then the average query complexity with respect to \u03c0 is denoted by E\u03c0[Mn].\nThe main results are proven for the special case of \u03c0 = U, the uniform distribution, to make the\nanalysis more transparent and intuitive. However the results can easily be extended to general dis-\ntributions \u03c0 that satisfy certain mild conditions [7]. All results henceforth, unless otherwise noted,\nwill be given in terms of (uniform) average query complexity and we will say such results hold \u201con\naverage.\u201d\nOur main results can be summarized as follows. If the queries are chosen deterministically or ran-\ndomly in advance of collecting the corresponding pairwise comparisons, then we show that almost\n\nall\uffffn\n2\uffff pairwise comparisons queries are needed to identify a ranking under the assumptions above.\n\nHowever, if the queries are selected in an adaptive and sequential fashion according to the algorithm\n\n2\n\n\fQuery Selection Algorithm\ninput: n objects in Rd\ninitialize: objects \u03b81, . . . ,\u03b8 n in uniformly\nrandom order\nfor j=2,. . . ,n\n\nfor i=1,. . . ,j-1\nif qi,j is ambiguous,\nrequest qi,j\u2019s label from reference;\nelse\nimpute qi,j\u2019s label from previously\nlabeled queries.\n\noutput: ranking of n objects\n\nFigure 1: Sequential algorithm for selecting\nqueries. See Figure 2 and Section 4.2 for the\nde\ufb01nition of an ambiguous query.\n\nq\n1\n,\n2\n\n\u03b82\n\nq2,3\n\n\u03b81\n\n1 , 3\n\nq\n\n\u03b83\n\nFigure 2: Objects \u03b81,\u03b8 2,\u03b8 3 and queries. The\nr\u03c3 lies in the shaded region (consistent with the\nlabels of q1,2, q1,3, q2,3). The dotted (dashed)\nlines represent new queries whose labels are\n(are not) ambiguous given those labels.\n\nin Figure 1, then we show that the number of pairwise rankings required to identify a ranking is no\nmore than a constant multiple of d log n, on average. The algorithm requests a query if and only\nif the corresponding pairwise ranking is ambiguous (see Section 4.2), meaning that it cannot be\ndetermined from previously collected pairwise comparisons and the locations of the objects in Rd.\nThe ef\ufb01ciency of the algorithm is due to the fact that most of the queries are unambiguous when\nconsidered in a sequential fashion. For this very same reason, picking queries in a non-adaptive or\nrandom fashion is very inef\ufb01cient. It is also noteworthy that the algorithm is also computationally\nef\ufb01cient with an overall complexity no greater than O(n poly(d) poly(log n)) [7]. In Section 5 we\npresent a robust version of the algorithm of Figure 1 that is tolerant to a fraction of errors in the\npairwise comparison queries. In the case of persistent errors (see Section 5) we show that at least\nO(n/ log n) objects can be correctly ranked in a partial ranking with high probability by requesting\njust O(d log2 n) pairwise comparisons. This allows us to handle situations in which either or both\nof the assumptions, A1 and A2, are reasonable approximations to the situation at hand, but do not\nhold strictly (which is the case in our experiments with the audio dataset).\nProving the main results involves an uncommon marriage of ideas from the ranking and statistical\nlearning literatures. Geometrical interpretations of our problem derive from the seminal works of [8]\nin ranking and [9] in learning. From this perspective our problem bears a strong resemblance to the\nhalfspace learning problem, with two crucial distinctions. In the ranking problem, the underlying\nhalfspaces are not in general position and have strong dependencies with each other. These depen-\ndencies invalidate many of the typical analyses of such problems [10, 11]. One popular method\nof analysis in exact learning involves the use of something called the extended teaching dimension\n[12]. However, because of the possible pathological situations alluded to earlier, it is easy to show\nthat the extended teaching dimension must be at least \u2126(n) making that sort of worst-case analysis\nuninteresting. These differences present unique challenges to learning.\n\n2 Motivation and related work\n\nThe problem of learning a ranking from few pairwise comparisons is motivated by what we perceive\nas a signi\ufb01cant gap in the theory of ranking and permutation learning. Most work in ranking assumes\na passive approach to learning; pairwise comparisons or partial rankings are collected in a random or\nnon-adaptive fashion and then aggregated to obtain a full ranking (cf. [13, 14, 15, 16]). However, this\nmay be quite inef\ufb01cient in terms of the number of pairwise comparisons or partial rankings needed\nto learn the (full) ranking. This inef\ufb01ciency was recently noted in the related area of social choice\ntheory [17]. Furthermore, empirical evidence suggests that, even under complex ranking models,\nadaptively selecting pairwise comparisons can reduce the number needed to learn the ranking [18]. It\nis cause for concern since in many applications it is expensive and time-consuming to obtain pairwise\ncomparisons. For example, psychologists and market researchers collect pairwise comparisons to\ngauge human preferences over a set of objects, for scienti\ufb01c understanding or product placement.\nThe scope of these experiments is often very limited simply due to the time and expense required\n\n3\n\n\fto collect the data. This suggests the consideration of more selective and judicious approaches to\ngathering inputs for ranking. We are interested in taking advantage of underlying structure in the\nset of objects in order to choose more informative pairwise comparison queries. From a learning\nperspective, our work adds an active learning component to a problem domain that has primarily\nbeen treated from a passive learning mindset.\nWe focus on pairwise comparison queries for two reasons. First, pairwise comparisons admit a\nhalfspace representation in embedding spaces which allows for a geometrical approach to learning in\nsuch structured ranking spaces. Second, pairwise comparisons are the most common form of queries\nin many applications, especially those involving human subjects. For example, consider the problem\nof \ufb01nding the most highly ranked object, as illustrated by the following familiar task. Suppose\na patient needs a new pair of prescription eye lenses. Faced with literally millions of possible\nprescriptions, the doctor will present candidate prescriptions in a sequential fashion followed by\nthe query: better or worse? Even if certain queries are repeated to account for possible inaccurate\nanswers, the doctor can locate an accurate prescription with just a handful of queries. This is possible\npresumably because the doctor understands (at least intuitively) the intrinsic space of prescriptions\nand can ef\ufb01ciently search through it using only binary responses from the patient.\nWe assume that the objects can be embedded in Rd and that the distances between objects and\nthe reference are consistent with the ranking (Assumption A1). The problem of learning a general\nfunction f : Rd \u2192 R using just pairwise comparisons that correctly ranks the objects embedded in\nRd has previously been studied in the passive setting [13, 14, 15, 16]. The main contributions of\nthis paper are theoretical bounds for the speci\ufb01c case when f (x) = ||x \u2212 r\u03c3|| where r\u03c3 \u2208 Rd is\nthe reference point. This is a standard model used in multidimensional unfolding and psychometrics\n[8, 19]. We are unaware of any existing query-complexity bounds for this problem. We do not\nassume a generative model is responsible for the relationship between rankings to embeddings,\nbut one could. For example, the objects might have an embedding (in a feature space) and the\nranking is generated by distances in this space. Or alternatively, structural constraints on the space\nof rankings could be used to generate a consistent embedding. Assumption A1, while arguably quite\nnatural/reasonable in many situations, signi\ufb01cantly constrains the set of possible rankings.\n\n3 Geometry of rankings from pairwise comparisons\n\nThe embedding assumption A1 gives rise to geometrical interpretations of the ranking problem,\nwhich are developed in this section. The pairwise comparison qi,j can be viewed as the membership\nquery: is \u03b8i ranked before \u03b8j in the (full) ranking \u03c3? The geometrical interpretation is that qi,j re-\nquests whether the reference r\u03c3 is closer to object \u03b8i or object \u03b8j in Rd. Consider the line connecting\n\u03b8i and \u03b8j in Rd. The hyperplane that bisects this line and is orthogonal to it de\ufb01nes two halfspaces:\none containing points closer to \u03b8i and the other the points closer to \u03b8j. Thus, qi,j is a membership\nquery about which halfspace r\u03c3 is in, and there is an equivalence between each query, each pair of\nobjects, and the corresponding bisecting hyperplane. The set of all possible pairwise comparison\n\nqueries can be represented as\uffffn\n\n2\uffff distinct halfspaces in Rd. The intersections of these halfspaces\n\npartition Rd into a number of cells, and each one corresponds to a unique ranking of \u0398. Arbitrary\nrankings are not possible due to the embedding assumption A1, and recall that the set of rankings\npossible under A1 is denoted by \u03a3n,d. The cardinality of \u03a3n,d is equal to the number of cells in the\npartition. We will refer to these cells as d-cells (to indicate they are subsets in d-dimensional space)\nsince at times we will also refer to lower dimensional cells; e.g., (d \u2212 1)-cells.\n3.1 Counting the number of possible rankings\n\nThe following lemma determines the cardinality of the set of rankings, \u03a3n,d, under assumption A1.\nLemma 1. [8] Assume A1-2. Let Q(n, d) denote the number of d-cells de\ufb01ned by the hyperplane ar-\nrangement of pairwise comparisons between these objects (i.e. Q(n, d) = |\u03a3n,d|). Q(n, d) satis\ufb01es\nthe recursion\n(3)\n\nQ(n, d) = Q(n \u2212 1, d) + (n \u2212 1)Q(n \u2212 1, d \u2212 1) , where Q(1, d) = 1 and Q(n, 0) = 1.\n\nIn the hyperplane arrangement induced by the n objects in d dimensions, each hyperplane is inter-\nsected by every other and is partitioned into Q(n\u2212 1, d\u2212 1) subsets or (d\u2212 1)-cells. The recursion,\n\n4\n\n\fabove, arises by considering the addition of one object at a time. Using this lemma in a straightfor-\nward fashion, we prove the following corollary in [7].\nCorollary 1. Assume A1-2. There exist positive real numbers k1 and k2 such that\n\nk1\n\nn2d\n2dd!\n\n< Q(n, d) < k2\n\nn2d\n2dd!\n\nfor n > d + 1. If n \u2264 d + 1 then Q(n, d) = n!. For n suf\ufb01ciently large, k1 = 1 and k2 = 2 suf\ufb01ce.\n3.2 Lower bounds on query complexity\nSince the cardinality of the set of possible rankings is |\u03a3n,d| = Q(n, d), we have a simple lower\nbound on the number of queries needed to determine the ranking.\nTheorem 1. Assume A1-2. To reconstruct an arbitrary ranking \u03c3 \u2208 \u03a3n,d any algorithm will require\nat least log2 |\u03a3n,d| =\u0398(2 d log2 n) pairwise comparisons.\nProof. By Corollary 1 |\u03a3n,d| =\u0398( n2d), and so at least 2d log n bits are needed to specify a ranking.\nEach pairwise comparison provides at most one bit.\nIf each query provides a full bit of information about the ranking, then we achieve this lower bound.\nFor example, in the one-dimensional case (d = 1) the objects can be ordered and binary search\ncan be used to select pairwise comparison queries, achieving the lower bound. This is generally\nimpossible in higher dimensions. Even in two dimensions there are placements of the objects (still\nin general position) that produce d-cells in the partition induced by queries that have n \u2212 1 faces\n(i.e., bounded by n \u2212 1 hyperplanes) as shown in [7]. It follows that the worst case situation may\nrequire at least n \u2212 1 queries in dimensions d \u2265 2. In light of this, we conclude that worst case\nbounds may be overly pessimistic indications of the typical situation, and so we instead consider the\naverage case performance introduced in Section 1.1.\n\n3.3\n\nInef\ufb01ciency of random queries\n\nThe geometrical representation of the ranking problem reveals that randomly choosing pairwise\ncomparison queries is inef\ufb01cient relative to the lower bound above. To see this, suppose m queries\n\nset of possible rankings to a d-cell in Rd. This d-cell may consist of one or more of the d-cells\nin the partition induced by all queries. If it contains more than one of the partition cells, then the\nunderlying ranking is ambiguous.\n\n2\uffff. The answers to m queries narrows the\n2\uffff. Suppose m pairwise comparison are chosen uniformly at\n2\uffff. Then for all positive integers N \u2265 m \u2265 d the\n\nwere chosen uniformly at random from the possible\uffffn\nTheorem 2. Assume A1-2. Let N =\uffffn\nrandom without replacement from the possible\uffffn\nprobability that the m queries yield a unique ranking is\uffffm\nd\uffff/\uffffN\nd\uffff \u2264 ( em\nd! \u2264\uffff em\nN\uffffd dd\nN d \u2264\uffff m\nN \uffffd\n\nProof. No fewer than d hyperplanes bound each d-cell in the partition of Rd induced by all possible\nqueries. The probability of selecting d speci\ufb01c queries in a random draw of m is equal to\n\nwe will need to ask almost all queries to guarantee that the inferred ranking is probably correct.\n\nd\uffff < 1/2 unless m =\u2126( n2). Therefore, if the queries are randomly chosen, then\n\n\uffffN \u2212 d\nm \u2212 d\uffff\uffff\uffffN\nd\uffff/\uffffN\nNote that\uffffm\n\nm\uffff =\uffffm\n\nd\uffff\uffff\uffffN\n\nd\uffff \u2264\n\ndd\n\nmd\nd!\n\nN )d.\n\n.\n\n\uffff\n\n4 Analysis of sequential algorithm for query selection\n\nNow consider the basic sequential process of the algorithm in Figure 1. Suppose we have ranked\nk \u2212 1 of the n objects. Call these objects 1 through k \u2212 1. This places the reference r\u03c3 within\na d-cell (de\ufb01ned by the labels of the comparison queries between objects 1, . . . , k \u2212 1). Call this\nd-cell Ck\u22121. Now suppose we pick another object at random and call it object k. A comparison\nquery between object k and one of objects 1, . . . , k \u2212 1 can only be informative (i.e., ambiguous)\nif the associated hyperplane intersects this d-cell Ck\u22121 (see Figure 2). If k is signi\ufb01cantly larger\nthan d, then it turns out that the cell Ck\u22121 is probably quite small and the probability that one of the\nqueries intersects Ck\u22121 is very small; in fact the probability is on the order of 1/k2.\n\n5\n\n\f4.1 Hyperplane-point duality\n\nConsider a hyperplane h = (h0, h1, . . . , hd) with (d + 1) parameters in Rd and a point p =\n(p1, . . . , pd) \u2208 Rd that does not lie on the hyperplane. Checking which halfspace p falls in, i.e.,\nh1p1 + h2p2 + \u00b7\u00b7\u00b7 + hdpd + h0 \u2277 0, has a dual interpretation: h is a point in Rd+1 and p is a\nhyperplane in Rd+1 passing through the origin (i.e., with d free parameters).\nRecall that each possible ranking can be represented by a reference point r\u03c3 \u2208 Rd. Our problem is\nto determine the ranking, or equivalently the vector of responses to the\uffffn\n2\uffff queries represented by\nhyperplanes in Rd. Using the above observation, we see that our problem is equivalent to \ufb01nding a\n2\uffff points in Rd+1 with as few queries as possible. We will refer to this alternative\nlabeling over\uffffn\n\nrepresentation as the dual and the former as the primal.\n\n4.2 Characterization of an ambiguous query\n\nThe characterization of an ambiguous query has interpretations in both the primal and dual spaces.\nWe will now describe the interpretation in the dual which will be critical to our analysis of the\nsequential algorithm of Figure 1.\nDe\ufb01nition 3. [9] Let S be a \ufb01nite subset of Rd and let S+ \u2282 S be points labeled +1 and S\u2212 =\nS \\ S+ be the points labeled \u22121 and let x be any other point except the origin. If there exists two\nhomogeneous linear separators of S+ and S\u2212 that assign different labels to the point x, then the\nlabel of x is said to be ambiguous with respect to S.\nLemma 2. [9, Lemma 1] The label of x is ambiguous with respect to S if and only if S+ and S\u2212\nare homogeneously linearly separable by a (d \u2212 1)-dimensional subspace containing x.\nLet us consider the implications of this lemma to our scenario. Assume that we have labels for all the\npairwise comparisons of k \u2212 1 objects. Next consider a new object called object k. In the dual, the\npairwise comparison between object k and object i, for some i \u2208{ 1, . . . , k\u22121}, is ambiguous if and\nonly if there exists a hyperplane that still separates the original points and also passes through this\nnew point. In the primal, this separating hyperplane corresponds to a point lying on the hyperplane\nde\ufb01ned by the associated pairwise comparison.\n\n4.3 The probability that a query is ambiguous\n\nAn essential component of the sequential algorithm of Figure 1 is the initial random order of the\nobjects; every sequence in which it could consider objects is equally probable. This allows us to\nstate a nontrivial fact about the partial rankings of the \ufb01rst k objects observed in this sequence.\nLemma 3. Assume A1-2 and \u03c3 \u223cU . Consider the subset S \u2282 \u0398 with |S| = k that is randomly\nselected from \u0398 such that all\uffffn\nk\uffff subsets are equally probable. If \u03a3k,d denotes the set of possible\nrankings of these k objects then every \u03c3 \u2208 \u03a3k,d is equally probable.\nProof. Let a k-partition denote the partition of Rd into Q(k, d) d-cells induced by k objects for\n1 \u2264 k \u2264 n. In the n-partition, each d-cell is weighted uniformly and is equal to 1/Q(n, d). If we\nuniformly at random select k objects from the possible n and consider the k-partition, each d-cell in\nthe k-partition will contain one or more d-cells of the n-partition. If we select one of these d-cells\nfrom the k-partition, on average there will be Q(n, d)/Q(k, d) d-cells from the n-partition contained\nin this cell. Therefore the probability mass in each d-cell of the k-partition is equal to the number\nof cells from the n-partition in this cell multiplied by the probability of each of those cells from the\nn-partition: Q(n, d)/Q(k, d) \u00d7 1/Q(n, d) = 1/Q(k, d), and |\u03a3k,d| = Q(k, d).\nAs described above, for 1 \u2264 i \u2264 k some of the pairwise comparisons qi,k+1 may be ambiguous.\nThe algorithm chooses a random sequence of the n objects in its initialization and does not use\nthe labels of q1,k+1, . . . , qj\u22121,k+1, qj+1,k+1, . . . , qk,k+1 to make a determination of whether or not\nqj,k+1 is ambiguous. It follows that the events of requesting the label of qi,k+1 for i = 1, 2, . . . , k\nare independent and identically distributed (conditionally on the results of queries from previous\nsteps). Therefore it makes sense to talk about the probability of requesting any one of them.\nLemma 4. Assume A1-2 and \u03c3 \u223cU . Let A(k, d,U) denote the probability of the event that the\npairwise comparison qi,k+1 is ambiguous for i = 1, 2, . . . , k. Then there exists positive, real number\nconstants a1 and a2 independent of k such that for k > 2d, a1\n\n2d\n\nk2 \u2264 A(k, d,U) \u2264 a2\n\n2d\n\nk2 .\n\n6\n\n\fProof. By Lemma 2, a point in the dual (pairwise comparison) is ambiguous if and only if there\nexists a separating hyperplane that passes through this point. This implies that the hyperplane rep-\nresentation of the pairwise comparison in the primal intersects the cell containing r\u03c3 (see Figure 2\nfor an illustration of this concept). Consider the partition of Rd generated by the hyperplanes cor-\nresponding to pairwise comparisons between objects 1, . . . , k. Let P (k, d) denote the number of\nd-cells in this partition that are intersected by a hyperplane corresponding to one of the queries\nqi,k+1, i \u2208{ 1, . . . , k}. Then it is not dif\ufb01cult to show that P (k, d) is bounded above and below by\nconstants independent of n and k times\n2d\u22121(d\u22121)! [7]. By Lemma 3, every d-cell in the partition\ninduced by the k objects corresponds to an equally probable ranking of those objects. Therefore,\nthe probability that a query is ambiguous is the number of cells intersected by the corresponding\nhyperplane divided by the total number of d-cells, and therefore A(k, d,U) = P (k,d)\nQ(k,d). The result\nfollows immediately from the bounds on P (k, d) and Corollary 1.\nBecause the individual events of requesting each query are conditionally independent, the total num-\nber of queries requested by the algorithm is just Mn =\uffffn\u22121\nk=1\uffffk\ni=1 1{Request qi,k+1}. Using the\nresults above, it straightforward to prove the main theorem below (see [7]).\nTheorem 3. Assume A1-2 and \u03c3 \u223cU . Let the random variable Mn denote the number of pairwise\ncomparisons that are requested in the algorithm of Figure 1, then\n\nk2(d\u22121)\n\nEU [Mn] \u2264 2d log2 2d + 2da2 log n.\n\nFurthermore, if \u03c3 \u223c \u03c0 and max\u03c3\u2208\u03a3n,d \u03c0\u03c3 \u2264 c|\u03a3n,d|\u22121 for some c > 0, then E\u03c0[Mn] \u2264 cEU [Mn].\n5 Robust sequential algorithm for query selection\n\nWe now extend the algorithm of Figure 1 to situations in which the response to each query is only\nprobably correct. If the correct label of a query qi,j is yi,j, we denote the possibly incorrect response\nby Yi,j. The probability that Yi,j = yi,j is at least 1 \u2212 p, p < 1/2. The robust algorithm operates\nin the same fashion as the algorithm in Figure 1, with the exception that when an ambiguous query\nis encountered several (equivalent) queries are made and a decision is based on the majority vote.\nThis voting procedure allows us to construct a ranking (or partial ranking) that is correct with high\nprobability by requesting just O(d log2 n) queries where the extra log factor comes from voting.\nFirst consider the case in which each query can be repeated to obtain multiple independent responses\n(votes) for each comparison query. This random noise model arises, for example, in social choice\ntheory where the \u201creference\u201d is a group of people, each casting a vote. The elementary proof of the\nnext theorem is given in [7].\nTheorem 4. Assume A1-2 and \u03c3 \u223cU but that each query response is a realization of an i.i.d.\nBernoulli random variable Yi,j with P (Yi,j \uffff= yi,j) \u2264 p < 1/2. If all ambiguous queries are\ndecided by the majority vote of R independent responses to each such query, then with probability\n2 (1 \u2212 2p)2R) this procedure correctly identi\ufb01es the correct\ngreater than 1 \u2212 2n log2(n) exp(\u2212 1\nranking and requests no more than O(Rd log n) queries on average.\nIn other situations, if we ask the same query multiple times we may get the same, possibly incorrect,\nresponse each time. This persistent noise model is natural, for example, if the reference is a single\nhuman. Under this model, if two rankings differ by only a single pairwise comparison, then they\ncannot be distinguished with probability greater than 1 \u2212 p. So, in general, exact recovery of the\nranking cannot be guaranteed with high probability. The best we can hope for is to exactly recover\na partial ranking of the objects (i.e. the ranking over a subset of the objects). Henceforth, we will\nassume the noise is persistent and aim to exactly recover a partial ranking of the objects.\nThe key ingredient in the persistent noise setting is the design of a voting set for each ambiguous\nquery encountered. Suppose that at the jth object in the algorithm in Figure 1 the query qi,j is\nambiguous. In principle, a voting set could be constructed using objects ranked between i and j. If\nobject k is between i and j, then note that yi,j = yi,k = yk,j. In practice, we cannot identify the\nsubset of objects ranked between i and j, but it is contained within the set Ti,j, de\ufb01ned to be the\nsubset of objects \u03b8k such that qi,k, qk,j, or both are ambiguous. Furthermore, Lemma 3 implies that\neach object in Ti,j is ranked between i and j with probability at least 1/3 [7]. Ti,j will be our voting\nset. Note however, if objects i and j are closely ranked, then Ti,j may be rather small, and so it is not\n\n7\n\n\fTable 1: Statistics for the algorithm robust to\npersistent noise of Section 5 with respect to\n\nall\uffffn\n2\uffff pairwise comparisons. Recall y is the\n\nnoisy response vector, \u02dcy is the embedding\u2019s\nsolution, and \u02c6y is the output of the robust al-\ngorithm.\n\nlog2 |\u03a3 n,d|\n\nDimension\n% of queries\nmean\nrequested\nstd\nAverage error d(y, \u02dcy)\nd(y, \u02c6y)\n\n2\n14.5\n5.3\n0.23\n0.31\n\n3\n18.5\n6\n0.21\n0.29\n\n2 log2 |\u03a3 n,d|\n\n600\n\n500\n\n400\n\n300\n\n200\n\n100\n\ns\nt\ns\ne\nu\nq\ne\nr\n\ny\nr\ne\nu\nq\n\nf\no\n\nr\ne\nb\nm\nu\nN\n\n0\n\n0\n\n10\n\n20\n\n30\n\n50\n\n60\n40\nDimension\n\n70\n\n80\n\n90\n\n100\n\nFigure 3: Mean and standard deviation of re-\nquested queries (solid) in the noiseless case for\nn = 100; log2 |\u03a3n,d| is a lower bound (dashed).\nalways possible to \ufb01nd a suf\ufb01ciently large voting set. Therefore, we must specify a size-threshold\nR \u2265 1. If the size of Ti,j is at least R, then we decide the label for qi,j by voting over the responses\nto {qi,k, qk,j : k \u2208 Ti,j} and qi,j; otherwise we pass over object j and move on to the next object in\nthe list. This allows us to construct a probably correct ranking of the objects that are not passed over.\nThe theorem below proves that a large portion of objects will not be passed over. At the end of the\nprocess, some objects that were passed over may then be unambiguously ranked (based on queries\nmade after they were passed over) or they can be ranked without voting (and without guarantees).\nThe proof of the next theorem is provided in the longer version of this paper [7].\nTheorem 5. Assume A1-2, \u03c3 \u223cU , and P (Yi,j \uffff= yi,j) = p. For any size-threshold R \u2265 1, with\nprobability greater than 1\u2212 2n log2(n) exp\uffff \u2212 2\n9 (1 \u2212 2p)2R\uffff the procedure above correctly ranks\nat least n/(2R + 1) objects and requests no more than O(Rd log n) queries on average.\n\n6 Empirical results\n\n:= 1{q(k)\n\nIn this section we present empirical results for both the noiseless algorithm of Figure 1 and the\nrobust algorithm of Section 5. For the noiseless algorithm, n = 100 points, representing the\nobjects to be ranked, were uniformly at random simulated from the unit hypercube [0, 1]d for\nd = 1, 10, 20, . . . , 100. The reference was simulated from the same distribution. For each value of\nd the experiment was repeated 25 times using a new simulation of points and the reference. Because\nresponses are noiseless, exact identi\ufb01cation of the ranking is guaranteed. The number of requested\nqueries is plotted in Figure 3 with the lower bound of Theorem 1 for reference. The number of\nrequested queries never exceeds twice the lower bound which agrees with the result of Theorem 3.\nThe robust algorithm in Section 5 was evaluated using a symmetric similarity matrix dataset avail-\nable at [20] whose (i, j)th entry, denoted si,j, represents the human-judged similarity between audio\nsignals i and j for all i \uffff= j \u2208{ 1, . . . , 100}. If we consider the kth row of this matrix, we can rank\nthe other signals with respect to their similarity to the kth signal; we de\ufb01ne q(k)\ni,j := {sk,i > sk,j}\nand y(k)\ni,j }. Since the similarities were derived from human subjects, the derived labels\ni,j\nmay be erroneous. Moreover, there is no possibility of repeating queries here and so the noise is\npersistent. The analysis of this dataset in [2] suggests that the relationship between signals can be\nwell approximated by an embedding in 2 or 3 dimensions. We used non-metric multidimensional\nscaling [5] to \ufb01nd an embedding of the signals: \u03b81, . . . ,\u03b8 100 \u2208 Rd for d = 2 and 3. For each object\n\u03b8k, we use the embedding to derive pairwise comparison labels between all other objects as follows:\n\u02dcy(k)\ni,j := 1{||\u03b8k \u2212 \u03b8i|| < ||\u03b8k \u2212 \u03b8j||}, which can be considered as the best approximation to the la-\nbels y(k)\ni,j (de\ufb01ned above) in this embedding. The output of the robust sequential algorithm, which\nuses only a small fraction of the similarities, is denoted by \u02c6y(k)\ni,j . We set R = 15 using Theorem 5 as\ni,j \uffff= \u02c6y(k)\ni,j }\n[21] for each object k, we denote the average of this metric over all objects by d(y, \u02c6y) and report\nthis statistic and the number of queries requested in Table 1. Because the average error of \u02c6y is only\n0.07 higher than that of \u02dcy, this suggests that the algorithm is doing almost as well as we could hope.\n\na rough guide. Using the popular Kendell-Tau distance d(y(k), \u02c6y(k)) =\uffffn\nAlso, note that 2R 2d log n/\uffffn\n\n2\uffff is equal to 11.4% and 17.1% for d = 2 and 3, respectively, which\n\n2\uffff\u22121\uffffi