{"title": "A Statistical Decision-Theoretic Framework for Social Choice", "book": "Advances in Neural Information Processing Systems", "page_first": 3185, "page_last": 3193, "abstract": "In this paper, we take a statistical decision-theoretic viewpoint on social choice, putting a focus on the decision to be made on behalf of a system of agents. In our framework, we are given a statistical ranking model, a decision space, and a loss function defined on (parameter, decision) pairs, and formulate social choice mechanisms as decision rules that minimize expected loss. This suggests a general framework for the design and analysis of new social choice mechanisms. We compare Bayesian estimators, which minimize Bayesian expected loss, for the Mallows model and the Condorcet model respectively, and the Kemeny rule. We consider various normative properties, in addition to computational complexity and asymptotic behavior. In particular, we show that the Bayesian estimator for the Condorcet model satisfies some desired properties such as anonymity, neutrality, and monotonicity, can be computed in polynomial time, and is asymptotically different from the other two rules when the data are generated from the Condorcet model for some ground truth parameter.", "full_text": "A Statistical Decision-Theoretic Framework for\n\nSocial Choice\n\nHossein Azari Sou\ufb01ani\u2217\n\nDavid C. Parkes \u2020\n\nLirong Xia\u2021\n\nAbstract\n\nIn this paper, we take a statistical decision-theoretic viewpoint on social choice,\nputting a focus on the decision to be made on behalf of a system of agents. In\nour framework, we are given a statistical ranking model, a decision space, and a\nloss function de\ufb01ned on (parameter, decision) pairs, and formulate social choice\nmechanisms as decision rules that minimize expected loss. This suggests a general\nframework for the design and analysis of new social choice mechanisms. We\ncompare Bayesian estimators, which minimize Bayesian expected loss, for the\nMallows model and the Condorcet model respectively, and the Kemeny rule. We\nconsider various normative properties, in addition to computational complexity\nand asymptotic behavior. In particular, we show that the Bayesian estimator for the\nCondorcet model satis\ufb01es some desired properties such as anonymity, neutrality,\nand monotonicity, can be computed in polynomial time, and is asymptotically\ndifferent from the other two rules when the data are generated from the Condorcet\nmodel for some ground truth parameter.\n\n1\n\nIntroduction\n\nSocial choice studies the design and evaluation of voting rules (or rank aggregation rules). There\nhave been two main perspectives: reach a compromise among subjective preferences of agents, or\nmake an objectively correct decision. The former has been extensively studied in classical social\nchoice in the context of political elections, while the latter is relatively less developed, even though\nit can be dated back to the Condorcet Jury Theorem in the 18th century [9].\nIn many multi-agent and social choice scenarios the main consideration is to achieve the second\nobjective, and make an objectively correct decision. Meanwhile, we also want to respect agents\u2019\npreferences and opinions, and require the voting rule to satisfy well-established normative proper-\nties in social choice. For example, when a group of friends vote to choose a restaurant for dinner,\nperhaps the most important goal is to \ufb01nd an objectively good restaurant, but it is also important\nto use a good voting rule in the social choice sense. Even for applications with less societal con-\ntext, e.g. using voting rules to aggregate rankings in meta-search engines [12], recommender sys-\ntems [15], crowdsourcing [23], semantic webs [27], some social choice normative properties are still\ndesired. For example, monotonicity may be desired, which requires that raising the position of an\nalternative in any vote does not hurt the alternative in the outcome of the voting rule. In addition,\nwe require voting rules to be ef\ufb01ciently computable.\nSuch scenarios propose the following new challenge: How can we design new voting rules with\ngood statistical properties as well as social choice normative properties?\nTo tackle this challenge, we develop a general framework that adopts statistical decision theory [3].\nOur approach couples a statistical ranking model with an explicit decision space and loss function.\n\u2217azari@google.com, Google Research, New York, NY 10011, USA. The work was done when the author\n\u2020parkes@eecs.harvard.edu, Harvard University, Cambridge, MA 02138, USA.\n\u2021xial@cs.rpi.edu, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.\n\nwas at Harvard University.\n\n1\n\n\fAnonymity, neutrality\n\nMonotonicity\n\nMajority,\nCondorcet Consistency\n\nY\n\nY\n\nY\n\nY\n\nN\n\nN\n\nN\n\nN\n\nN\n\nKemeny\n\nBayesian est. of\nM1\n\u03d5 (uni. prior)\nBayesian est. of\nM2\n\u03d5 (uni. prior)\n\nComplexity\n\nMin. Bayesian risk\n\nNP-hard, PNP|| -hard\nNP-hard, PNP|| -hard\n\n(Theorem 3)\nP (Theorem 4)\n\u03d5 and M2\n\nN\n\nY\n\nY\n\nTable 1: Kemeny for winners vs. Bayesian estimators of M1\n\n\u03d5 to choose winners.\n\n\u03d5 is superior from a computational viewpoint.\n\nGiven these, we can adopt Bayesian estimators as social choice mechanisms, which make decisions\nto minimize the expected loss w.r.t. the posterior distribution on the parameters (called the Bayesian\nrisk). This provides a principled methodology for the design and analysis of new voting rules.\nTo show the viability of the framework, we focus on selecting multiple alternatives (the alternatives\nthat can be thought of as being \u201ctied\u201d for the \ufb01rst place) under a natural extension of the 0-1 loss\nfunction for two models: let M1\n\u03d5 denote the Mallows model with \ufb01xed dispersion [22], and let M2\n\u03d5\ndenote the Condorcet model proposed by Condorcet in the 18th century [9, 34]. In both models the\ndispersion parameter, denoted \u03d5, is taken as a \ufb01xed parameter. The difference is that in the Mallows\nmodel the parameter space is composed of all linear orders over alternatives, while in the Condorcet\nmodel the parameter space is composed of all possibly cyclic rankings over alternatives (irre\ufb02exive,\nantisymmetric, and total binary relations). M2\n\u03d5 is a natural model that captures real-world scenarios\nwhere the ground truth may contain cycles, or agents\u2019 preferences are cyclic, but they have to report\na linear order due to the protocol. More importantly, as we will show later, a Bayesian estimator on\nM2\nThrough this approach, we obtain two voting rules as Bayesian estimators and then evaluate them\nwith respect to various normative properties, including anonymity, neutrality, monotonicity, the ma-\njority criterion, the Condorcet criterion and consistency. Both rules satisfy anonymity, neutrality,\nand monotonicity, but fail the majority criterion, Condorcet criterion,1 and consistency. Admittedly,\nthe two rules do not enjoy outstanding normative properties, but they are not bad either. We also\ninvestigate the computational complexity of the two rules. Strikingly, despite the similarity of the\ntwo models, the Bayesian estimator for M2\n\u03d5 can be computed in polynomial time, while computing\nthe Bayesian estimator for M1\n\u03d5 is PNP|| -hard, which means that it is at least NP-hard. Our results are\nsummarized in Table 1.\nWe also compare the asymptotic outcomes of the two rules with the Kemeny rule for winners,\nwhich is a natural extension of the maximum likelihood estimator of M1\n\u03d5 proposed by Fishburn\n[14]. It turns out that when n votes are generated under M1\n\u03d5, all three rules select the same winner\nasymptotically almost surely (a.a.s.) as n \u2192 \u221e. When the votes are generated according to M2\n\u03d5,\nthe rule for M1\n\u03d5 still selects the same winner as Kemeny a.a.s.; however, for some parameters, the\nwinner selected by the rule for M2\n\u03d5 is different with non-negligible probability. These are con\ufb01rmed\nby experiments on synthetic datasets.\nRelated work. Along the second perspective in social choice (to make an objectively correct de-\ncision), in addition to Condorcet\u2019s statistical approach to social choice [9, 34], most previous work\nin economics, political science, and statistics focused on extending the theorem to heterogeneous,\ncorrelated, or strategic agents for two alternatives, see [25, 1] among many others. Recent work in\ncomputer science views agents\u2019 votes as i.i.d. samples from a statistical model, and computes the\nMLE to estimate the parameters that maximize the likelihood [10, 11, 33, 32, 2, 29, 7]. A limitation\nof these approaches is that they estimate the parameters of the model, but may not directly inform\nthe right decision to make in the multi-agent context. The main approach has been to return the\nmodal rank order implied by the estimated parameters, or the alternative with the highest, predicted\nmarginal probability of being ranked in the top position.\nThere have also been some proposals to go beyond MLE in social choice.\nIn fact, Young [34]\nproposed to select a winning alternative that is \u201cmost likely to be the best (i.e., top-ranked in the true\nranking)\u201d and provided formulas to compute it for three alternatives. This idea has been formalized\nand extended by Procaccia et al. [29] to choose a given number of alternatives with highest marginal\n\n1The new voting rule for M1\n\n\u03d5 fails them for all \u03d5 < 1/\n\n2\n\n\u221a\n2.\n\n\fprobability under the Mallows model. More recently, independent to our work, Elkind and Shah\n[13] investigated a similar question for choosing multiple winners under the Condorcet model. We\nwill see that these are special cases of our proposed framework in Example 2. Pivato [26] conducted\na similar study to Conitzer and Sandholm [10], examining voting rules that can be interpreted as\nexpect-utility maximizers.\nWe are not aware of previous work that frames the problem of social choice from the viewpoint\nof statistical decision theory, which is our main conceptual contribution. Technically, the approach\ntaken in this paper advocates a general paradigm of \u201cdesign by statistics, evaluation by social choice\nand computer science\u201d. We are not aware of a previous work following this paradigm to design\nand evaluate new rules. Moreover, the normative properties for the two voting rules investigated in\nthis paper are novel, even though these rules are not really novel. Our result on the computational\ncomplexity of the \ufb01rst rule strengthens the NP-hardness result by Procaccia et al. [29], and the\ncomplexity for the second rule (Theorem 5) was independently discovered by Elkind and Shah [13].\nThe statistical decision-theoretic framework is quite general, allowing considerations such as estima-\ntors that minimize the maximum expected loss, or the maximum expected regret [3]. In a different\ncontext, focused on uncertainty about the availability of alternatives, Lu and Boutilier [20] adopt a\ndecision-theoretic view of the design of an optimal voting rule. Caragiannis et al. [8] studied the\nrobustness of social choice mechanisms w.r.t. model uncertainty, and characterized a unique social\nchoice mechanism that is consistent w.r.t. a large class of ranking models.\nA number of recent papers in computational social choice take utilitarian and decision-theoretical\napproaches towards social choice [28, 6, 4, 5]. Most of them evaluate the joint decision w.r.t. agents\u2019\nsubjective preferences, for example the sum of agents\u2019 subjective utilities (i.e. the social welfare).\nWe don\u2019t view this as \ufb01tting into the classical approach to statistical decision theory as formulated\nby Wald [30]. In our framework, the joint decision is evaluated objectively w.r.t. the ground truth in\nthe statistical model. Several papers in machine learning developed algorithms to compute MLE or\nBayesian estimators for popular ranking models [18, 19, 21], but without considering the normative\nproperties of the estimators.\n\n2 Preliminaries\nIn social choice, we have a set of m alternatives C = {c1, . . . , cm} and a set of n agents. Let\nL(C) denote the set of all linear orders over C. For any alternative c, let Lc(C) denote the set\nof linear orders over C where c is ranked at the top. Agent j uses a linear order Vj \u2208 L(C) to\nrepresent her preferences, called her vote. The collection of agents votes is called a pro\ufb01le, denoted\nby P = {V1, . . . , Vn}. A (irresolute) voting rule r : L(C)n \u2192 (2C \\ \u2205) selects a set of winners that\nare \u201ctied\u201d for the \ufb01rst place for every pro\ufb01le of n votes.\nFor any pair of linear orders V, W , let Kendall(V, W ) denote the Kendall-tau distance between\nV and W , that is, the number of different pairwise comparisons in V and W . The Kemeny rule\n(a.k.a. Kemeny-Young method) [17, 35] selects all linear orders with the minimum Kendall-tau dis-\ntance from the preference pro\ufb01le P , that is, Kemeny(P ) = arg minW Kendall(P, W ). The most\nwell-known variant of Kemeny to select winning alternatives, denoted by KemenyC, is due to Fish-\nburn [14], who de\ufb01ned it as a voting rule that selects all alternatives that are ranked in the top\nposition of some winning linear orders under the Kemeny rule. That is, KemenyC(P ) = {top(V ) :\nV \u2208 Kemeny(P )}, where top(V ) is the top-ranked alternative in V .\nVoting rules are often evaluated by the following normative properties. An irresolute rule r satis\ufb01es:\n\u2022 anonymity, if r is insensitive to permutations over agents;\n\u2022 neutrality, if r is insensitive to permutations over alternatives;\n\u2022 monotonicity, if for any P , c \u2208 r(P ), and any P (cid:48) that is obtained from P by only raising the\npositions of c in one or multiple votes, then c \u2208 r(P (cid:48));\n\u2022 Condorcet criterion, if for any pro\ufb01le P where a Condorcet winner exists, it must be the unique\nwinner. A Condorcet winner is the alternative that beats every other alternative in pair-wise elections.\n\u2022 majority criterion, if for any pro\ufb01le P where an alternative c is ranked in the top positions for more\nthan half of the votes, then r(P ) = {c}. If r satis\ufb01es Condorcet criterion then it also satis\ufb01es the\nmajority criterion.\n\u2022 consistency, if for any pair of pro\ufb01les P1, P2 with r(P1)\u2229r(P2) (cid:54)= \u2205, r(P1\u222aP2) = r(P1)\u2229r(P2).\n\n3\n\n\fFor any pro\ufb01le P , its weighted majority graph (WMG), denoted by WMG(P ), is a weighted directed\ngraph whose vertices are C, and there is an edge between any pair of alternatives (a, b) with weight\nwP (a, b) = #{V \u2208 P : a (cid:31)V b} \u2212 #{V \u2208 P : b (cid:31)V a}.\nA parametric model M = (\u0398,S, Pr) is composed of three parts: a parameter space \u0398, a sample\nspace S composing of all datasets, and a set of probability distributions over S indexed by elements\nof \u0398: for each \u03b8 \u2208 \u0398, the distribution indexed by \u03b8 is denoted by Pr(\u00b7|\u03b8).2\nGiven a parametric model M, a maximum likelihood estimator (MLE) is a function fMLE : S \u2192 \u0398\nsuch that for any data P \u2208 S, fMLE(P ) is a parameter that maximizes the likelihood of the data.\nThat is, fMLE(P ) \u2208 arg max\u03b8\u2208\u0398 Pr(P|\u03b8).\nIn this paper we focus on parametric ranking models. Given C, a parametric ranking model MC =\n(\u0398, Pr) is composed of a parameter space \u0398 and a distribution Pr(\u00b7|\u03b8) over L(C) for each \u03b8 \u2208\n(cid:81)\n\u0398, such that for any number of voters n, the sample space is Sn = L(C)n, where each vote is\ngenerated i.i.d. from Pr(\u00b7|\u03b8). Hence, for any pro\ufb01le P \u2208 Sn and any \u03b8 \u2208 \u0398, we have Pr(P|\u03b8) =\nV \u2208P Pr(V |\u03b8). We omit the sample space because it is determined by C and n.\n(cid:81)\nDe\ufb01nition 1 In the Mallows model [22], a parameter is composed of a linear order W \u2208 L(C)\nand a dispersion parameter \u03d5 with 0 < \u03d5 < 1. For any pro\ufb01le P and \u03b8 = (W, \u03d5), Pr(P|\u03b8) =\n\nZ \u03d5Kendall(V,W ), where Z is the normalization factor with Z =(cid:80)\n\nV \u2208L(C) \u03d5Kendall(V,W ).\n\n1\n\nV \u2208P\n\nStatistical decision theory [30, 3] studies scenarios where the decision maker must make a decision\nd \u2208 D based on the data P generated from a parametric model, generally M = (\u0398,S, Pr). The\nquality of the decision is evaluated by a loss function L : \u0398\u00d7D \u2192 R, which takes the true parameter\nand the decision as inputs.\nIn this paper, we focus on the Bayesian principle of statistical decision theory to design social\nchoice mechanisms as choice functions that minimize the Bayesian risk under a prior distribution\nover \u0398. More precisely, the Bayesian risk, RB(P, d), is the expected loss of the decision d when\nthe parameter is generated according to the posterior distribution given data P . That is, RB(P, d) =\nE\u03b8|P L(\u03b8, d). Given a parametric model M, a loss function L, and a prior distribution over \u0398, a\n(deterministic) Bayesian estimator fB is a decision rule that makes a deterministic decision in D\nto minimize the Bayesian risk, that is, for any P \u2208 S, fB(P ) \u2208 arg mind RB(P, d). We focus on\ndeterministic estimators in this work and leave randomized estimators for future research.\nExample 1 When \u0398 is discrete, an MLE of a parametric model M is a Bayesian estimator of the\nstatistical decision problem (M,D = \u0398, L0-1) under the uniform prior distribution, where L0-1 is\nthe 0-1 loss function such that L0-1(\u03b8, d) = 0 if \u03b8 = d, otherwise L0-1(\u03b8, d) = 1.\n\nIn this sense, all previous MLE approaches in social choice can be viewed as the Bayesian estimators\nof a statistical decision-theoretic framework for social choice where D = \u0398, a 0-1 loss function, and\nthe uniform prior.\n\n3 Our Framework\nOur framework is quite general and \ufb02exible because we can choose any parametric ranking model,\nany decision space, any loss function, and any prior to use the Bayesian estimators social choice\nmechanisms. Common choices of both \u0398 and D are L(C), C, and (2C \\ \u2205).\nDe\ufb01nition 2 A statistical decision-theoretic framework for social choice is a tuple F =\n(MC,D, L), where C is the set of alternatives, MC = (\u0398, Pr) is a parametric ranking model,\nD is the decision space, and L : \u0398 \u00d7 D \u2192 R is a loss function.\nLet B(C) denote the set of all irre\ufb02exive, antisymmetric, and total binary relations over C. For\nany c \u2208 C, let Bc(C) denote the relations in B(C) where c (cid:31) a for all a \u2208 C \u2212 {c}. It follows\nthat L(C) \u2286 B(C), and moreover, the Kendall-tau distance can be de\ufb01ned to count the number of\npairwise disagreements between elements of B(C).\nIn the rest of the paper, we focus on the following two parametric ranking models, where the disper-\nsion is a \ufb01xed parameter.\n\n2This notation should not be taken to mean a conditional distribution over S unless we are taking a Bayesian\n\npoint of view.\n\n4\n\n\fDe\ufb01nition 3 (Mallows model with \ufb01xed dispersion, and the Condorcet model) Let M1\n\u03d5 denote\nthe Mallows model with \ufb01xed dispersion, where the parameter space is \u0398 = L(C) and given any\nW \u2208 \u0398, Pr(\u00b7|W ) is Pr(\u00b7|(W, \u03d5)) in the Mallows model, where \u03d5 is \ufb01xed.\nIn the Condorcet model, M2\n\n(cid:0) 1\nZ \u03d5Kendall(V,W )(cid:1), where Z is the normalization factor such that\n\n\u03d5, the parameter space is \u0398 = B(C). For any W \u2208 \u0398 and any pro\ufb01le\nV \u2208P\n\nP , we have Pr(P|W ) = (cid:81)\nZ =(cid:80)\n\nV \u2208B(C) \u03d5Kendall(V,W ), and parameter \u03d5 is \ufb01xed.3\n\n\u03d5 for any \u03d5.\n\n\u03d5 = (M1\n\n\u03d5 and M2\n\n\u03d5, 2C \\ \u2205, Ltop) and F 2\n\n\u03d5 = (M2\nB (respectively, f 2\n\n\u03d5 degenerate to the Condorcet model for two alternatives [9]. The Kemeny rule that\n\nM1\nselects a linear order is an MLE of M1\nWe now formally de\ufb01ne two statistical decision-theoretic frameworks associated with M1\n\u03d5 and M2\n\u03d5,\nwhich are the focus of the rest of our paper.\nDe\ufb01nition 4 For \u0398 = L(C) or B(C), any \u03b8 \u2208 \u0398, and any c \u2208 C, we de\ufb01ne a loss function Ltop(\u03b8, c)\nsuch that Ltop(\u03b8, c) = 0 if for all b \u2208 C, c (cid:31) b in \u03b8; otherwise Ltop(\u03b8, c) = 1.\n(cid:80)\nLet F 1\nc\u2208C Ltop(\u03b8, c)/|C|. Let f 1\nF 2\n\u03d5) under the uniform prior.\nWe note that Ltop in the above de\ufb01nition takes a parameter and a decision in 2C \\ \u2205 as inputs, which\nmakes it different from the 0-1 loss function L0-1 that takes a pair of parameters as inputs, as the\nB are not the MLEs of their respective models, as was the\none in Example 1. Hence, f 1\ncase in Example 1. We focus on voting rules obtained by our framework with Ltop. Certainly our\nframework is not limited to this loss function.\nExample 2 Bayesian estimators f 1\nB coincide with Young [34]\u2019s idea of selecting the al-\nB and f 2\nternative that is \u201cmost likely to be the best (i.e., top-ranked in the true ranking)\u201d, under F 1\n\u03d5 and\nF 2\n\u03d5 respectively. This gives a theoretical justi\ufb01cation of Young\u2019s idea and other followups under\nB was\nour framework. Speci\ufb01cally, f 1\nindependently studied by Elkind and Shah [13].\n\n\u03d5, 2C \\ \u2205, Ltop), where for any C \u2286 C, Ltop(\u03b8, C) =\nB) denote the Bayesian estimators of F 1\n\u03d5 (respectively,\n\nB is similar to rule studied by Procaccia et al. [29] and f 2\n\nB and f 2\n\n4 Normative Properties of Bayesian Estimators\nAll omitted proofs can be found in the full version on arXiv.\nTheorem 1 For any \u03d5, f 1\nmajority or the Condorcet criterion for any \u03d5 < 1\u221a\n2\n\n,4 and it does not satisfy consistency.\n\nB satis\ufb01es anonymity, neutrality, and monotonicity. f 1\n\nB does not satisfy\n\nProof sketch: Anonymity and neutrality are obviously satis\ufb01ed.\nMonotonicity. Monotonicity follows from the following lemma.\nLemma 1 For any c \u2208 C, let P (cid:48) denote a pro\ufb01le obtained from P by raising the position of c in\none vote. For any W \u2208 Lc(C), Pr(P (cid:48)|W ) = Pr(P|W )/\u03d5; for any b \u2208 C and any V \u2208 Lb(C),\nPr(P (cid:48)|V ) \u2264 Pr(P|V )/\u03d5.\nMajority and the Condorcet criterion. Let C = {c, b, c3, . . . , cm}. We construct a pro\ufb01le P \u2217\nwhere c is ranked in the top positions for more than half of the votes, but c (cid:54)\u2208 f 1\nFor any k, let P \u2217 denote a pro\ufb01le composed of k copies of [c (cid:31) b (cid:31) c3 (cid:31) \u00b7\u00b7\u00b7 (cid:31) cm], 1 of\n[c (cid:31) b (cid:31) cm (cid:31) \u00b7\u00b7\u00b7 (cid:31) c3] and k \u2212 1 copies of [b (cid:31) cm (cid:31) \u00b7\u00b7\u00b7 (cid:31) c3 (cid:31) c]. It is not hard to verify that\nthe WMG of P \u2217 is as in Figure 1 (a).\nThen, we prove that for any \u03d5 < 1\u221a\n1+\u03d52k+\u00b7\u00b7\u00b7+\u03d52k(m\u22122)\n1+\u03d52+\u00b7\u00b7\u00b7+\u03d52(m\u22122)\nminimize the Bayesian risk under M1\n\n(cid:80)\n(cid:80)\nV \u2208Lc (C) Pr(P|V )\nW \u2208Lb (C) Pr(P|W ) =\nIt follows that c is the Condorcet winner in P \u2217 but it does not\n\n\u03d5, which means that it is not the winner under f 1\nB.\n\n, we can \ufb01nd m and k so that\n\n\u00b7 \u03d52 < 1.\n\nB(P \u2217).\n\n2\n\n3In the Condorcet model the sample space is B(C)n [31]. We study a variant with sample space L(C)n.\n4Characterizing majority and Condorcet criterion of f 1\n\nis an open question.\n\nB for \u03d5 \u2265 1\u221a\n\n2\n\n5\n\n\f(a) The WMG of P \u2217.\n\n(c) The WMG of P (cid:48) (Thm. 3).\nFigure 1: WMGs of the pro\ufb01les for proofs: (a) for majority and Condorcet (Thm. 1); (b) for consistency\n(Thm. 1); (c) for computational complexity (Thm. 3).\n\n(b) The WMGs of P1 (left) and P2 (right).\n\nConsistency. We construct an example to show that f 1\nB does not satisfy consistency. In our con-\nstruction m and n are even, and C = {c, b, c3, c4}. Let P1 and P2 denote pro\ufb01les whose WMGs are\nas shown in Figure 1 (b), respectively. We have the following lemma.\nLemma 2 Let P \u2208 {P1, P2},\n\n(cid:80)\n(cid:80)\nV \u2208Lc (C) Pr(P|V )\nW \u2208Lb (C) Pr(P|W ) = 3(1+\u03d54k)\n\n2(1+\u03d52k+\u03d54k) .\n\n2(1+\u03d52k+\u03d54k) > 1 for all k. It is not hard to verify that f 1\n\nB(P1) = f 1\n\nB(P2) = {c}\n(cid:3)\n\n3(1+\u03d54k)\n\nB(P1 \u222a P2) = {c, b}, which means that f 1\n\nFor any 0 < \u03d5 < 1,\nand f 1\nB.\nSimilarly, we can prove the following theorem for f 2\nTheorem 2 For any \u03d5, f 2\nmajority, the Condorcet criterion, or consistency.\n\nB is not consistent.\n\nB satis\ufb01es anonymity, neutrality, and monotonicity.\n\nIt does not satisfy\n\nB and f 2\n\nBy Theorem 1 and 2, f 1\nrule for winners. On the other hand, they minimize Bayesian risk under F 1\nfor which Kemeny does neither. In addition, neither f 1\nthat they are not positional scoring rules.\n\nB do not satisfy as many desired normative properties as the Kemeny\n\u03d5, respectively,\nB satisfy consistency, which means\n\n\u03d5 and F 2\n\nB nor f 2\n\n5 Computational Complexity\nWe consider the following two types of decision problems.\nDe\ufb01nition 5 In the BETTER BAYESIAN DECISION problem for a statistical decision-theoretic\nframework (MC,D, L) under a prior distribution, we are given d1, d2 \u2208 D, and a pro\ufb01le P . We are\nasked whether RB(P, d1) \u2264 RB(P, d2).\nWe are also interested in checking whether a given alternative is the optimal decision.\nDe\ufb01nition 6 In the OPTIMAL BAYESIAN DECISION problem for a statistical decision-theoretic\nframework (MC,D, L) under a prior distribution, we are given d \u2208 D and a pro\ufb01le P . We are\nasked whether d minimizes the Bayesian risk RB(P,\u00b7).\nis the class of decision problems that can be computed by a P oracle machine with polynomial\nPNP||\nnumber of parallel calls to an NP oracle. A decision problem A is PNP|| -hard, if for any PNP|| problem\nB, there exists a polynomial-time many-one reduction from B to A. It is known that PNP|| -hard\nproblems are NP-hard.\nTheorem 3 For any \u03d5, BETTER BAYESIAN DECISION and OPTIMAL BAYESIAN DECISION for F 1\nunder uniform prior are PNP|| -hard.\nProof: The hardness of both problems is proved by a uni\ufb01ed reduction from the KEMENY WINNER\nproblem, which is PNP|| -complete [16]. In a KEMENY WINNER problem, we are given a pro\ufb01le P and\nan alternative c, and we are asked if c is ranked in the top of at least one V \u2208 L(C) that minimizes\nKendall(P, V ).\nFor any alternative c, the Kemeny score of c under M1\nP and any linear order where c is ranked in the top. We next prove that when \u03d5 < 1\nrisk of c is largely determined by the Kemeny score of c.\nLemma 3 For any \u03d5 < 1\nsmaller than the Kemeny score of b in P , then RB(P, c) < RB(P, b) for M1\n\u03d5.\n\n\u03d5 is the smallest distance between the pro\ufb01le\nm!, the Bayesian\nm! , any c, b \u2208 C, and any pro\ufb01le P , if the Kemeny score of c is strictly\n\n\u03d5\n\n6\n\ncbc3\t\n\rcm\t\n\rc4\t\n\r\u20262k\t\n\r2\t\n\r2\t\n\r2\t\n\r2\t\n\r2k\t\n\r2k\t\n\rcbc3\t\n\rc4\t\n\r4k\t\n\r2k\t\n\r2k\t\n\rcbc3\t\n\rc4\t\n\r4k\t\n\r2k\t\n\r2k\t\n\rc\t\n\rab\t\n\r4\t\n\r6\t\n\rWMG of 6P\t\n\r6\t\n\r6\t\n\r2\t\n\r6\t\n\r6\t\n\r\fLet t be any natural number such that \u03d5t < 1\nm!. For any KEMENY WINNER instance (P, c) for\nalternatives C(cid:48), we add two more alternatives {a, b} and de\ufb01ne a pro\ufb01le P (cid:48) whose WMG is as\nshown in Figure 3(c) using McGarvey\u2019s trick [24]. The WMG of P (cid:48) contains the WMG(P ) as a\nsubgraph, where the weights are 6 times the weights in WMG(P ).\nThen, we let P \u2217 = tP (cid:48), which is t copies of P (cid:48). It follows that for any V \u2208 L(C), Pr(P \u2217|V, \u03d5) =\nPr(P (cid:48)|V, \u03d5t). By Lemma 3, if an alternative e has the strictly lowest Kemeny score for pro\ufb01le P (cid:48),\nthen it the unique alternative that minimizes the Bayesian risk for P (cid:48) and dispersion parameter \u03d5t,\nwhich means that e minimizes the Bayesian risk for P \u2217 and dispersion parameter \u03d5.\nLet O denote the set of linear orders over C(cid:48) that minimizes the Kendall tau distance from P and let\nk denote this minimum distance. Choose an arbitrary V (cid:48) \u2208 O. Let V = [b (cid:31) a (cid:31) V (cid:48)]. It follows\nthat Kendall(P (cid:48), V ) = 4 + 6k. If there exists W (cid:48) \u2208 O where c is ranked in the top position, then\nwe let W = [a (cid:31) c (cid:31) b (cid:31) (V (cid:48) \u2212 {c})]. We have Kendall(P (cid:48), W ) = 2 + 6k. If c is not a Kemeny\nwinner in P , then for any W where d is not ranked in the top position, Kendall(P (cid:48), W ) \u2265 6 + 6k.\nTherefore, a minimizes the Bayesian risk if and only if c is a Kemeny winner in P , and if c does not\nminimize the Bayesian risk, then b does. Hence BETTER DECISION (checking if a is better than b)\n(cid:3)\nand OPTIMAL BAYESIAN DECISION (checking if a is the optimal alternative) are PNP|| -hard.\nWe note that OPTIMAL BAYESIAN DECISION in Theorem 3 is equivalent to checking whether a\nB(P ). We do not know whether these problems are PNP|| -complete. In\ngiven alternative c is in f 1\nsharp contrast to f 1\nTheorem 4 For any rational number5 \u03d5, BETTER BAYESIAN DECISION and OPTIMAL BAYESIAN\nDECISION for F 2\nThe theorem is a corollary of the following stronger theorem that provides a closed-form formula\nfor Bayesian loss for F 2\n\u03d5.6 We recall that for any pro\ufb01le P and any pair of alternatives c, b, that\nwP (c, b) is the weight on c \u2192 b in the weighted majority graph of P .\n\u03d5 under uniform prior, for any c \u2208 C and any pro\ufb01le P , RB(P, c) = 1 \u2212\nTheorem 5 For F 2\n.\n\nB, the next theorem states that f 2\n\n\u03d5 under uniform prior are in P.\n\nB under uniform prior is in P.\n\n(cid:81)\n\n1\n\nb(cid:54)=c\n\n1 + \u03d5wP (c,b)\n\nB, and f 2\n\nB are summarized in Table 1. According to the criteria we\nThe comparisons of Kemeny, f 1\nconsider, none of the three outperforms the others. Kemeny does well in normative properties, but\ndoes not minimize Bayesian risk under either F 1\nB minimizes the\nBayesian risk under F 1\nB, which minimizes\nthe Bayesian risk under F 2\n\u03d5, and more importantly, can be computed in polynomial time despite the\nsimilarity between F 1\n\n\u03d5, but is hard to compute. We would like to highlight f 2\n\u03d5 and F 2\n\u03d5.\n\n\u03d5, and is hard to compute. f 1\n\n\u03d5 or F 2\n\n6 Asymptotic Comparisons\nIn this section, we ask the following question: as the number of voters, n \u2192 \u221e, what is the\nprobability that Kemeny, f 1\nB, and f 2\nB choose different winners? We show that when the data is\ngenerated from M1\n\u03d5, all three methods are equal asymptotically almost surely (a.a.s.), that is, they\nare equal with probability 1 as n \u2192 \u221e.\nTheorem 6 Let Pn denote a pro\ufb01le of n votes generated i.i.d. from M1\n\u03d5 given W \u2208 Lc(C). Then,\nPrn\u2192\u221e(Kemeny(Pn) = f 1\nHowever, when the data are generated from M2\nTheorem 7 For any W \u2208 B(C) and any \u03d5, f 1\nPn are generated i.i.d. from M2\nFor any m \u2265 5, there exists W \u2208 B(C) such that for any \u03d5, there exists \u0001 > 0 such that with\nB(Pn) as n \u2192 \u221e and votes in Pn\nprobability at least \u0001, f 1\nare generated i.i.d. from M2\n\n\u03d5, we have a different story.\nB(Pn) = Kemeny(Pn) a.a.s. as n \u2192 \u221e and votes in\n\nB(Pn) and Kemeny(Pn) (cid:54)= f 2\n\nB(Pn) (cid:54)= f 2\n\nB(Pn) = c) = 1.\n\nB(Pn) = f 2\n\n\u03d5 given W .\n\n\u03d5 given W .\n\n5We require \u03d5 to be rational to avoid representational issues.\n6The formula resembles Young\u2019s calculation for three alternatives [34], where it was not clear whether the\n\ncalculation was done for F 2\n\n\u03d5. Recently it was clari\ufb01ed by Xia [31] that this is indeed the case.\n\n7\n\n\f(a) W \u2208 B(C) for m = 5.\n\n(b) Probability that g is different from Kemeny under M2\n\u03d5.\n\nFigure 2: The ground truth W and asymptotic comparisons between Kemeny and g in De\ufb01nition 7.\n\nProof sketch: The \ufb01rst part of Theorem 7 is proved by the Central Limit Theorem. For the second\npart, the proof for m = 5 uses an acyclic W \u2208 B(C) illustrated in Figure 2 (a).\n(cid:3)\nTheorem 6 suggests that, when n is large and the votes are generated from M1\n\u03d5, it does not matter\nB, and Kemeny we use. A similar observation has been made for other voting\nmuch which of f 1\nrules by Caragiannis et al. [7]. On the other hand, Theorem 7 states that when the votes are generated\nfrom M2\nB is different from the other two with\nnon-negligible probability, and as we will see in the experiments, this probability can be quite large.\n\n\u03d5, interestingly, for some ground truth parameter, f 2\n\nB, f 2\n\n6.1 Experiments\nB and Kemeny using synthetic data generated from M2\nWe focus on the comparison between rule f 2\n\u03d5\ngiven the binary relation W illustrated in Figure 2 (a). By Theorem 5, the computation involves\ncomputing \u03d5\u2126(n), which is exponentially small for large n since \u03d5 < 1. Hence, we need a special\ndata structure to handle the computation of f 2\nB, because a straightforward implementation easily\nloses precision. In our experiments, we use the following approximation for f 2\nB.\n\nDe\ufb01nition 7 For any c \u2208 C and pro\ufb01le P , let s(c, P ) =(cid:80)\n\nb:wP (b,c)>0 wP (b, c). Let g be the voting\n\nrule such that for any pro\ufb01le P , g(P ) = arg minc s(c, P ).\n\nIn words, g selects the alternative c with the minimum total weight on the incoming edges in the\nWMG. By Theorem 5, the Bayesian risk is largely determined by \u03d5\u2212s(c,P ). Therefore, g is a good\napproximation of f 2\nTheorem 8 For any W \u2208 B(C) and any \u03d5, f 2\ngenerated i.i.d. from M2\n\nB with reasonably large n. Formally, this is stated in the following theorem.\n\nB(Pn) = g(Pn) a.a.s. as n \u2192 \u221e and votes in Pn are\n\n\u03d5 given W .\n\nIn our experiments, data are generated by M2\n\u03d5 given W in Figure 2 (a) for m = 5, n \u2208\n{100, 200, . . . , 2000}, and \u03d5 \u2208 {0.1, 0.5, 0.9}. For each setting we generate 3000 pro\ufb01les, and\ncalculate the fraction of trials in which g and Kemeny are different. The results are shown in Figu-\nire 2 (b). We observe that for \u03d5 = 0.1 and 0.5, the probability for g(Pn) (cid:54)= Kemeny(Pn) is about\n30% for most n in our experiments; when \u03d5 = 0.9, the probability is about 10%. In light of Theo-\nrem 8, these results con\ufb01rm Theorem 7. We have also conducted similar experiments for M1\n\u03d5, and\nfound that the g winner is the same as the Kemeny winner in all 10000 randomly generated pro\ufb01les\nwith m = 5, n = 100. This provides a check for Theorem 6.\n\n7 Acknowledgments\n\nWe thank Shivani Agarwal, Craig Boutilier, Yiling Chen, Vincent Conitzer, Edith Elkind, Ariel\nProcaccia, and anonymous reviewers of AAAI-14 and NIPS-14 for helpful suggestions and discus-\nsions. Azari Sou\ufb01ani acknowledges Siebel foundation for the scholarship in his last year of PhD\nstudies. Parkes was supported in part by NSF grant CCF #1301976 and the SEAS TomKat fund.\nXia acknowledges an RPI startup fund for support.\n\n8\n\nc1\t\n\rc2\t\n\rc3\t\n\rc4\t\n\rc5\t\n\r\fReferences\n[1] David Austen-Smith and Jeffrey S. Banks. Information Aggregation, Rationality, and the Condorcet Jury\n\nTheorem. The American Political Science Review, 90(1):34\u201345, 1996.\n\n[2] Hossein Azari Sou\ufb01ani, David C. Parkes, and Lirong Xia. Random utility theory for social choice. In\n\nProc. NIPS, pages 126\u2013134, 2012.\n\n[3] James O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, 2nd edition, 1985.\n[4] Craig Boutilier and Tyler Lu. Probabilistic and Utility-theoretic Models in Social Choice: Challenges for\n\nLearning, Elicitation, and Manipulation. In IJCAI-11 Workshop on Social Choice and AI, 2011.\n\n[5] Craig Boutilier, Ioannis Caragiannis, Simi Haber, Tyler Lu, Ariel D. Procaccia, and Or Sheffet. Optimal\n\nsocial choice functions: A utilitarian view. In Proc. EC, pages 197\u2013214, 2012.\n\n[6] Ioannis Caragiannis and Ariel D. Procaccia. Voting Almost Maximizes Social Welfare Despite Limited\n\nCommunication. Arti\ufb01cial Intelligence, 175(9\u201310):1655\u20131671, 2011.\n\n[7] Ioannis Caragiannis, Ariel Procaccia, and Nisarg Shah. When do noisy votes reveal the truth? In Proc. EC,\n\n2013.\n\nRule. In Proc. AAAI, 2014.\n\n[8] Ioannis Caragiannis, Ariel D. Procaccia, and Nisarg Shah. Modal Ranking: A Uniquely Robust Voting\n\n[9] Marquis de Condorcet. Essai sur l\u2019application de l\u2019analyse `a la probabilit\u00b4e des d\u00b4ecisions rendues `a la\n\npluralit\u00b4e des voix. Paris: L\u2019Imprimerie Royale, 1785.\n\n[10] Vincent Conitzer and Tuomas Sandholm. Common voting rules as maximum likelihood estimators. In\n\nProc. UAI, pages 145\u2013152, Edinburgh, UK, 2005.\n\n[11] Vincent Conitzer, Matthew Rognlie, and Lirong Xia. Preference functions that score rankings and maxi-\n\nmum likelihood estimation. In Proc. IJCAI, pages 109\u2013115, 2009.\n\n[12] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In\n\nProc. WWW, pages 613\u2013622, 2001.\n\n[13] Edith Elkind and Nisarg Shah. How to Pick the Best Alternative Given Noisy Cyclic Preferences? In\n\n[14] Peter C. Fishburn. Condorcet social choice functions. SIAM Journal on Applied Mathematics, 33(3):\n\nProc. UAI, 2014.\n\n469\u2013489, 1977.\n\n[15] Sumit Ghosh, Manisha Mundhe, Karina Hernandez, and Sandip Sen. Voting for movies: the anatomy of\n\na recommender system. In Proc. AAMAS, pages 434\u2013435, 1999.\n\n[16] Edith Hemaspaandra, Holger Spakowski, and J\u00a8org Vogel. The complexity of Kemeny elections. Theoret-\n\nical Computer Science, 349(3):382\u2013391, December 2005.\n\n[17] John Kemeny. Mathematics without numbers. Daedalus, 88:575\u2013591, 1959.\n[18] Jen-Wei Kuo, Pu-Jen Cheng, and Hsin-Min Wang. Learning to Rank from Bayesian Decision Inference.\n\nIn Proc. CIKM, pages 827\u2013836, 2009.\n\n[19] Bo Long, Olivier Chapelle, Ya Zhang, Yi Chang, Zhaohui Zheng, and Belle Tseng. Active Learning for\n\nRanking Through Expected Loss Optimization. In Proc. SIGIR, pages 267\u2013274, 2010.\n\n[20] Tyler Lu and Craig Boutilier. The Unavailable Candidate Model: A Decision-theoretic View of Social\n\nChoice. In Proc. EC, pages 263\u2013274, 2010.\n\n[21] Tyler Lu and Craig Boutilier. Learning mallows models with pairwise preferences. In Proc. ICML, pages\n\n[22] Colin L. Mallows. Non-null ranking model. Biometrika, 44(1/2):114\u2013130, 1957.\n[23] Andrew Mao, Ariel D. Procaccia, and Yiling Chen. Better human computation through principled voting.\n\n[24] David C. McGarvey. A theorem on the construction of voting paradoxes. Econometrica, 21(4):608\u2013610,\n\n145\u2013152, 2011.\n\nIn Proc. AAAI, 2013.\n\n1953.\n\n[25] Shmuel Nitzan and Jacob Paroush. The signi\ufb01cance of independent decisions in uncertain dichotomous\n\nchoice situations. Theory and Decision, 17(1):47\u201360, 1984.\n\n[26] Marcus Pivato. Voting rules as statistical estimators. Social Choice and Welfare, 40(2):581\u2013630, 2013.\n[27] Daniele Porello and Ulle Endriss. Ontology Merging as Social Choice: Judgment Aggregation under the\n\nOpen World Assumption. Journal of Logic and Computation, 2013.\n\n[28] Ariel D. Procaccia and Jeffrey S. Rosenschein. The Distortion of Cardinal Preferences in Voting.\n\nIn\n\nProc. CIA, volume 4149 of LNAI, pages 317\u2013331. 2006.\n\n[29] Ariel D. Procaccia, Sashank J. Reddi, and Nisarg Shah. A maximum likelihood approach for selecting\n\nsets of alternatives. In Proc. UAI, 2012.\n\n[30] Abraham Wald. Statistical Decision Function. New York: Wiley, 1950.\n[31] Lirong Xia. Deciphering young\u2019s interpretation of condorcet\u2019s model. ArXiv, 2014.\n[32] Lirong Xia and Vincent Conitzer. A maximum likelihood approach towards aggregating partial orders. In\n\nProc. IJCAI, pages 446\u2013451, Barcelona, Catalonia, Spain, 2011.\n\n[33] Lirong Xia, Vincent Conitzer, and J\u00b4er\u02c6ome Lang. Aggregating preferences in multi-issue domains by using\n\nmaximum likelihood estimators. In Proc. AAMAS, pages 399\u2013406, 2010.\n\n[34] H. Peyton Young. Condorcet\u2019s theory of voting. American Political Science Review, 82:1231\u20131244, 1988.\n[35] H. Peyton Young and Arthur Levenglick. A consistent extension of Condorcet\u2019s election principle. SIAM\n\nJournal of Applied Mathematics, 35(2):285\u2013300, 1978.\n\n9\n\n\f", "award": [], "sourceid": 1629, "authors": [{"given_name": "Hossein", "family_name": "Azari Soufiani", "institution": "Harvard University"}, {"given_name": "David", "family_name": "Parkes", "institution": "Harvard University"}, {"given_name": "Lirong", "family_name": "Xia", "institution": "Harvard University"}]}