{"title": "Diverse Randomized Agents Vote to Win", "book": "Advances in Neural Information Processing Systems", "page_first": 2573, "page_last": 2581, "abstract": "We investigate the power of voting among diverse, randomized software agents. With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning. This model allows us to reason about a collection of agents with different biases (determined by the first-stage noise models), which, furthermore, apply randomized algorithms to evaluate alternatives and produce votes (captured by the second-stage noise models). We analytically demonstrate that a uniform team, consisting of multiple instances of any single agent, must make a significant number of mistakes, whereas a diverse team converges to perfection as the number of agents grows. Our experiments, which pit teams of computer Go agents against strong agents, provide evidence for the effectiveness of voting when agents are diverse.", "full_text": "Diverse Randomized Agents Vote to Win\n\nAlbert Xin Jiang\nTrinity University\n\nLeandro Soriano Marcolino\n\nUSC\n\nAriel D. Procaccia\n\nCMU\n\nxjiang@trinity.edu\n\nsorianom@usc.edu\n\narielpro@cs.cmu.edu\n\nTuomas Sandholm\n\nCMU\n\nNisarg Shah\n\nCMU\n\nMilind Tambe\n\nUSC\n\nsandholm@cs.cmu.edu\n\nnkshah@cs.cmu.edu\n\ntambe@usc.edu\n\nAbstract\n\nWe investigate the power of voting among diverse, randomized software agents.\nWith teams of computer Go agents in mind, we develop a novel theoretical model\nof two-stage noisy voting that builds on recent work in machine learning. This\nmodel allows us to reason about a collection of agents with different biases (de-\ntermined by the \ufb01rst-stage noise models), which, furthermore, apply randomized\nalgorithms to evaluate alternatives and produce votes (captured by the second-\nstage noise models). We analytically demonstrate that a uniform team, consisting\nof multiple instances of any single agent, must make a signi\ufb01cant number of mis-\ntakes, whereas a diverse team converges to perfection as the number of agents\ngrows. Our experiments, which pit teams of computer Go agents against strong\nagents, provide evidence for the effectiveness of voting when agents are diverse.\n\n1\n\nIntroduction\n\nRecent years have seen a surge of work at the intersection of social choice and machine learning. In\nparticular, signi\ufb01cant attention has been given to the learnability and applications of noisy preference\nmodels [16, 2, 1, 3, 24]. These models enhance our understanding of voters\u2019 behavior in elections,\nand provide a theoretical basis for reasoning about crowdsourcing systems that employ voting to\naggregate opinions [24, 8]. In contrast, this paper presents an application of noisy preference models\nto the design of systems of software agents, emphasizing the importance of voting and diversity.\nOur starting point is two very recent papers by Marcolino et al. [19, 20], which provide a new\nperspective on voting among multiple software agents. Their empirical results focus on Computer\nGo programs (see, e.g., [10]), which often use Monte Carlo tree search algorithms [7]. Taking the\nteam formation point of view, Marcolino et al. establish that a team consisting of multiple (four\nto six) different computer Go programs that use plurality voting \u2014 each agent giving one point to\na favorite alternative \u2014 to decide on each move outperforms a team consisting of multiple copies\nof the strongest program (which is better than a single copy because the copies are initialized with\ndifferent random seeds). The insight is that even strong agents are likely to make poor choices in\nsome states, which is why diversity beats strength. And while the bene\ufb01ts of diversity in problem\nsolving are well studied [12, 13, 6, 14], the setting of Marcolino et al. combines several ingredients.\nFirst, performance is measured across multiple states; as they point out, this is also relevant when\nmaking economic decisions (such as stock purchases) across multiple scenarios, or selecting item\nrecommendations for multiple users. Second, agents\u2019 votes are based on randomized algorithms;\nthis is also a widely applicable assumption, and in fact even Monte Carlo tree search speci\ufb01cally\nis used for problems ranging from traveling salesman to classical (deterministic) planning, not to\nmention that randomization is often used in many other AI applications.\n\n1\n\n\fFocusing on the computer Go application, we \ufb01nd it exciting because it provides an ideal example\nof voting among teams of software agents: It is dif\ufb01cult to compare quality scores assigned by\nheterogeneous agents to different moves, so optimization approaches that rely on cardinal utilities\nfall short while voting provides a natural aggregation method. More generally the setting\u2019s new\ningredients call for a novel model of social choice, which should be rich enough to explain the\nempirical \ufb01nding that diversity beats strength.\nHowever, the model suggested by Marcolino et al. [19] is rather rudimentary: they prove that a\ndiverse team would outperform copies of the strongest agent only if one of the weaker agents out-\nperforms the strongest agent in at least one state; their model cannot quantify the advantage of\ndiversity. Marcolino et al. [20] present a similar model, but study the effect of increasing the size of\nthe action space (i.e., the board size in the Go domain). More importantly, Marcolino et al. [19, 20]\n\u2014 and other related work [6] \u2014 assume that each agent votes for a single alternative. In contrast, it\nis potentially possible to design agents that generate a ranking of multiple alternatives, calling for a\nprincipled way to harness this additional information.\n\n1.1 Our Approach and Results\n\nWe introduce the following novel, abstract model of voting, and instantiate it using Computer Go.\nIn each state, which corresponds to a board position in Go, there is a ground truth, which captures\nthe true quality of different alternatives \u2014 feasible moves in Go. Heuristic agents have a noisy\nperception of the quality of alternatives. We model this using a noise model for each agent, which\nrandomly maps the ground truth to a ranking of the alternatives, representing the agent\u2019s biased view\nof their qualities. But if a single agent is presented with the same state twice, the agent may choose\ntwo different alternatives. This is because agents are assumed to be randomized. For example, as\nmentioned above, most computer Go programs, such as Fuego [10], rely on Monte Carlo Tree Search\nto randomly decide between different moves. We model this additional source of noise via a second\nnoise model, which takes the biased ranking as input, and outputs the agent\u2019s vote (another ranking\nof the alternatives). A voting rule is employed to select a single alternative (possibly randomly) by\naggregating the agents\u2019 votes. Our main theoretical result is the following theorem, which is, in a\nsense, an extension of the classic Condorcet Jury Theorem [9].\nTheorem 2 (simpli\ufb01ed and informal). (i) Under extremely mild assumptions on the noise models\nand voting rule, a uniform team composed of copies of any single agent (even the \u201cstrongest\u201d one\nwith the most accurate noise models), for any number of agents and copies, is likely to vote for\nsuboptimal alternatives in a signi\ufb01cant fraction of states; (ii) Under mild assumptions on the noise\nmodels and voting rule, a diverse team composed of a large number of different agents is likely to\nvote for optimal alternatives in almost every state.\nWe show that the assumptions in both parts of the theorem are indeed mild by proving that three well-\nknown noise models \u2014 the Mallows-\u03c6 model [18], The Thurstone-Mosteller model [26, 21], and\nthe Plackett-Luce model [17, 23] \u2014 satisfy the assumptions in both parts of the theorem. Moreover,\nthe assumptions on the voting rule are satis\ufb01ed by almost all prominent voting rules.\nWe also present experimental results in the Computer Go domain. As stated before, our key method-\nological contributions are a procedure for automatically generating diverse teams by using different\nparameterizations of a Go program, and a novel procedure for extracting rankings of moves from\nalgorithms that are designed to output only a single good move. We show that the diverse team\nsigni\ufb01cantly outperforms the uniform team under the plurality rule. We also show that it is possible\nto achieve better performance by extracting rankings from agents using our novel methodology, and\naggregating them via ranked voting rules.\n\n2 Background\nWe use [k] as shorthand for {1, . . . , k}. A vote is a total order (ranking) over the alternatives, usually\ndenoted by \u03c3. The set of rankings over a set of alternatives A is denoted by L(A). For a ranking \u03c3,\nwe use \u03c3(i) to denote the alternative in position i in \u03c3, so, e.g., \u03c3(1) is the most preferred alternative\nin \u03c3. We also use \u03c3([k]) to denote {\u03c3(1), . . . , \u03c3(k)}. A collection of votes is called a pro\ufb01le, denoted\nby \u03c0. A deterministic voting rule outputs a winning alternative on each pro\ufb01le. For a randomized\nvoting rule f (or simply a voting rule), the output f (\u03c0) is a distribution over the alternatives. A\n\n2\n\n\fvoting rule is neutral if relabeling the alternatives relabels the output accordingly; in other words,\nthe output of the voting rule is independent of the labels of the alternatives. All prominent voting\nrules, when coupled with uniformly random tie breaking, are neutral.\n\nFamilies of voting rules. Next, we de\ufb01ne two families of voting rules. These families are quite\nwide, disjoint, and together they cover almost all prominent voting rules.\n\n\u2022 Condorcet consistency. An alternative is called the Condorcet winner in a pro\ufb01le if it is\npreferred to every other alternative in a majority of the votes. Note that there can be at\nmost one Condorcet winner. A voting rule is called Condorcet consistent if it outputs the\nCondorcet winner (with probability 1) whenever it exists. Many famous voting rules such\nas Kemeny\u2019s rule, Copeland\u2019s rule, Dodgson\u2019s rule, the ranked pairs method, the maximin\nrule, and Schulze\u2019s method are Condorcet consistent.\n\u2022 PD-c Rules [8]. This family is a generalization of positional scoring rules that include\nprominent voting rules such as plurality and Borda count. While the de\ufb01nition of Cara-\ngiannis et al. [8] outputs rankings, we naturally modify it to output winning alternatives.\nLet T\u03c0(k, a) denote the number of times alternative a appears among \ufb01rst k positions in\npro\ufb01le \u03c0. Alternative a is said to position-dominate alternative b in \u03c0 if T\u03c0(k, a) > T\u03c0(k, b)\nfor all k \u2208 [m \u2212 1], where m is the number of alternatives in \u03c0. An alternative is called the\nposition-dominating winner if it position-dominates every other alternative in a pro\ufb01le. It\nis easy to check that there can be at most one position-dominating winner. A voting rule is\ncalled position-dominance consistent (PD-c) if it outputs the position-dominating winner\n(with probability 1) whenever it exists. Caragiannis et al. [8] show that all positional scoring\nrules (including plurality and Borda count) and Bucklin\u2019s rule are PD-c (as rules that output\nrankings). We show that this holds even when the rules output winning alternatives. This\nis presented as Proposition 1 in the online appendix (speci\ufb01cally, Appendix A).\n\nCaragiannis et al. [8] showed that PD-c rules are disjoint from Condorcet consistent rules (actually,\nfor rules that output rankings, they use a natural generalization of Condorcet consistent rules that\nthey call PM-c rules). Their proof also establishes the disjointness of the two families for rules that\noutput winning alternatives.\n\n2.1 Noise Models\n\nOne view of computational social choice models the votes as noisy estimates of an unknown true or-\nder of the alternatives. These votes come from a distribution that is parametrized by some underlying\nground truth. The ground truth can itself be the true order of alternatives, in which case we say that\nthe noise model is of the rank-to-rank type. The ground truth can also be an objective true quality\nlevel for each alternative, which is more \ufb01ne-grained than a true ranking of alternatives. In this case,\nwe say that the noise model is of the quality-to-rank type. See [15] for examples of quality-to-rank\nmodels and how they are learned. Note that the output votes are rankings over alternatives in both\ncases. We denote the ground truth by \u03b8. It de\ufb01nes a true ranking of the alternatives (even when the\nground truth is a quality level for each alternative), which we denote by \u03c3\u2217.\nFormally, a noise model P is a set of distributions over rankings \u2014 the distribution corresponding\nto the ground truth \u03b8 is denoted by P (\u03b8). The probability of sampling a ranking \u03c3 from P (\u03b8) is\ndenoted by PrP [\u03c3; \u03b8].\nSimilarly to voting rules, a noise model is called neutral if relabeling the alternatives permutes\nthe probabilities of various rankings accordingly. Formally, a noise model P is called neutral if\nPrP [\u03c3; \u03b8] = PrP [\u03c4 \u03c3; \u03c4 \u03b8], for every permutation \u03c4 of the alternatives, every ranking \u03c3, and every\nground truth \u03b8. Here, \u03c4 \u03c3 and \u03c4 \u03b8 denote the result of applying \u03c4 on \u03c3 and \u03b8, respectively.\n\nClassic noise models. Below, we de\ufb01ne three classical noise models:\n\n\u2022 The Mallows-\u03c6 model\n\n[18]. This is a rank-to-rank noise model, where the probability\nof a ranking decreases exponentially in its distance from the true ranking. Formally, the\nMallows-\u03c6 model for m alternatives is de\ufb01ned as follows. For all rankings \u03c3 and \u03c3\u2217,\n\n\u03c6dKT (\u03c3,\u03c3\u2217)\n\nZ m\n\u03c6\n\nPr[\u03c3; \u03c3\u2217] =\n\n3\n\n,\n\n(1)\n\n\f\u03c6 =(cid:81)m\n\n(cid:80)k\u22121\nwhere dKT is the Kendall-Tau distance that measures total pairwise disagreement between\nj=0 \u03c6j is independent of \u03c3\u2217.\ntwo rankings, and the normalization constant Z m\n\u2022 The Thurstone-Mosteller (TM) [26, 21] and the Plackett-Luce (PL) [17, 23] models. Both\nmodels are of the quality-to-rank type, and are special cases of a more general random\nutility model (see [2] for its use in social choice). In a random utility model, each alternative\na has an associated true quality parameter \u03b8a and a distribution \u00b5a parametrized by \u03b8a. In\neach sample from the model, a noisy quality estimate Xa \u223c \u00b5a(\u03b8a) is obtained, and the\nranking where the alternatives are sorted by their noisy qualities is returned.\nFor the Thurstone-Mosteller model, \u00b5a(\u03b8a) is taken to be the normal distribution N (\u03b8a, \u03bd2)\nwith mean \u03b8a, and variance \u03bd2. Its PDF is\n\nk=1\n\n1\u221a\n2\u03c0\u03bd2\n\n\u2212 (x\u2212\u03b8a )2\n\n2\u03bd2\n\ne\n\n.\n\nf (x) =\n\nFor the Plackett-Luce model, \u00b5a(\u03b8a) is taken to be the Gumbel distribution G(\u03b8a). Its PDF\nfollows f (x) = e\u2212(x\u2212\u03b8a)\u2212e\u2212(x\u2212\u03b8a). The CDF of the Gumbel distribution G(\u03b8a) is given by\nF (x) = e\u2212e\u2212(x\u2212\u03b8a ). Note that we do not include a variance parameter because this subset\nof Gumbel distributions is suf\ufb01cient for our purposes.\nThe Plackett-Luce model has an alternative, more intuitive, formulation. Taking \u03bba =\ne\u03b8a, the probability of obtaining a ranking is the probability of sequentially choosing its\nalternatives from the pool of remaining alternatives. Each time, an alternative is chosen\n, where\n\namong a pool proportional to its \u03bb value. Hence, Pr[\u03c3;{\u03bba}] =(cid:81)m\n\n(cid:80)m\n\ni=1\n\n\u03bb\u03c3(i)\nj=i \u03bb\u03c3(j)\n\nm is the number of alternatives.\n\n3 Theoretical Results\n\ni associated with each agent i.\n\nIn this section, we present our theoretical results. But, \ufb01rst, we develop a novel model that will\nprovide the backdrop for these results. Let N = {1, . . . , n} be a set of agents. Let S be the set of\nstates of the world, and let |S| = t. These states represent different scenarios in which the agents\nneed to make decisions; in Go, these are board positions. Let \u00b5 denote a probability distribution\nover states in S, which represents how likely it is to encounter each state. Each state s \u2208 S has\na set of alternatives As, which is the set of possible actions the agents can choose in state s. Let\n|As| = ms for each s \u2208 S. We assume that the set of alternatives is \ufb01xed in each state. We will later\nsee how our model and results can be adjusted for varying sets of alternatives. The ground truth in\nstate s \u2208 S is denoted by \u03b8s, and the true ranking in state s is denoted by \u03c3\u2217\ns.\nVotes of agents. The agents are presented with states sampled from \u00b5. Their goal is to choose\ns (1), in each state s \u2208 S (although we discuss why our results also hold\nthe true best alternative, \u03c3\u2217\nwhen the goal is to maximize expected quality). The inability of the agents to do so arises from two\ndifferent sources: the suboptimal heuristics encoded within the agents, and their inability to fully\noptimize according to their own heuristics \u2014 these are respectively modeled by two noise models\ni and P 2\nP 1\nThe agents inevitably employ heuristics (in domains like Go) and therefore can only obtain a noisy\nevaluation of the quality of different alternatives, which is modeled by the noise model P 1\ni of agent\ni. The biased view of agent i for the true order of the alternatives in As, denoted \u03c3is, is modeled\nas a sample from the distribution P 1\ns ). Moreover, we assume that the agents\u2019 decision making is\nrandomized. For example, top computer Go programs use Monte Carlo tree search algorithms [7].\nWe therefore assume that each agent i has another associated noise model P 2\ni such that the \ufb01nal\ni (\u03c3is). To summarize, agent i\u2019s vote is obtained\nranking that the agent returns is a sample from P 2\nby \ufb01rst sampling its biased truth from P 1\ni . It is clear that the\ncomposition P 2\nAgent teams. Since the agents make errors in estimating the best alternative, it is natural to form a\nteam of agents and aggregate their votes. We consider two team formation methods: a uniform team\ncomprising of multiple copies of a single agent that share the same biased truths but have different\n\ufb01nal votes due to randomness; and a diverse team comprising of a single copy of each agent with\ndifferent biased truths and different votes. We show that the diverse team outperforms the uniform\nteam irrespective of the choice of the agent that is copied in the uniform team.\n\ni , and then sampling its vote from P 2\n\ni plays a crucial role in this process.\n\ni \u25e6 P 1\n\ni (\u03c3\u2217\n\n4\n\n\f3.1 Restrictions on Noise Models\n\nlose all useful information. Hence,\nNo team can perform well if the noise models P 1\nwe impose intuitive restrictions on the noise models; our restrictions are mild, as we demonstrate\n(Theorem 1) that the three classical noise models presented in Section 2.1 satisfy all our assumptions.\n\ni and P 2\ni\n\nPM-\u03b1 Noise Model For \u03b1 > 0, a neutral noise model P is called pairwise majority preserving with\nstrength \u03b1 (or PM-\u03b1) if for every ground truth \u03b8 (and the corresponding true ranking \u03c3\u2217) and every\ni < j, we have\n\nPr\u03c3\u223cP (\u03b8)[\u03c3\u2217(i) (cid:31)\u03c3 \u03c3\u2217(j)] \u2265 Pr\u03c3\u223cP (\u03b8)[\u03c3\u2217(j) (cid:31)\u03c3 \u03c3\u2217(i)] + \u03b1,\n\n(2)\nwhere (cid:31)\u03c3 is the preference relation of a ranking \u03c3 sampled from P (\u03b8). Note that this de\ufb01nition\napplies to both quality-to-rank and rank-to-rank noise models. In other words, in PM-\u03b1 noise models\nevery pairwise comparison in the true ranking is preserved in a sample with probability at least \u03b1\nmore than the probability of it not being preserved.\n\nPD-\u03b1 Noise Model For \u03b1 > 0, a neutral noise model is called position-dominance preserving with\nstrength \u03b1 (or PD-\u03b1) if for every ground truth \u03b8 (and the corresponding true ranking \u03c3\u2217), every\ni < j, and every k \u2208 [m \u2212 1] (where m is the number of alternatives),\n\nPr\u03c3\u223cP (\u03b8)[\u03c3\u2217(i) \u2208 \u03c3([k])] \u2265 Pr\u03c3\u223cP (\u03b8)[\u03c3\u2217(j) \u2208 \u03c3([k])] + \u03b1.\n\n(3)\nThat is, for every k \u2208 [m \u2212 1], an alternative higher in the true ranking has probability higher by at\nleast \u03b1 of appearing among the \ufb01rst k positions in a vote than an alternative at a lower position in\nthe true ranking.\n\nCompositions of noise models with restrictions. As mentioned above, compositions of noise\nmodels play an important role in our work. The next lemma shows that our restrictions on noise\nmodels are preserved, in a sense, under composition; its proof appears in Appendix B.\nLemma 1. For \u03b11, \u03b12 > 0, the composition of a PD-\u03b11 noise model with a PD-\u03b12 noise model is\na PD-(\u03b11 \u00b7 \u03b12) noise model.\nUnfortunately, a similar result does not hold for PM-\u03b1 noise models; the composition of a PM-\u03b11\nnoise model and a PM-\u03b12 noise model may yield a noise model that is not PM-\u03b1 for any \u03b1 > 0. In\nAppendix C, we give such an example. While this is slightly disappointing, we show that a stronger\nassumption on the \ufb01rst noise model in the composition suf\ufb01ces.\n\nPPM-\u03b1 Noise Model For \u03b1 > 0, a neutral noise model P is called positional pairwise majority\npreserving (or PPM-\u03b1) if for every ground truth \u03b8 (and the corresponding true ranking \u03c3\u2217) and\nevery i < j, the quantity\n\nPr\u03c3\u223cP (\u03b8)[\u03c3(i(cid:48)) = \u03c3\u2217(i) \u2227 \u03c3(j(cid:48)) = \u03c3\u2217(j)] \u2212 Pr\u03c3\u223cP (\u03b8)[\u03c3(j(cid:48)) = \u03c3\u2217(i) \u2227 \u03c3(i(cid:48)) = \u03c3\u2217(j)]\n\n(4)\nis non-negative for every i(cid:48) < j(cid:48), and at least \u03b1 for some i(cid:48) < j(cid:48). That is, for i(cid:48) < j(cid:48), the probability\nthat \u03c3\u2217(i) and \u03c3\u2217(j) go to positions i(cid:48) and j(cid:48) respectively in a vote should be at least as high as the\nprobability of them going to positions j(cid:48) and i(cid:48) respectively (and at least \u03b1 greater for some i(cid:48) and\nj(cid:48)). Summing Equation (4) over all i(cid:48) < j(cid:48) shows that every PPM-\u03b1 noise model is also PM-\u03b1.\nLemma 2. For \u03b11, \u03b12 > 0, if noise models P 1 and P 2 are PPM-\u03b11 and PM-\u03b12, respectively, then\ntheir composition P 2 \u25e6 P 1 is PM-(\u03b11 \u00b7 \u03b12).\nThe lemma\u2019s proof is relegated to Appendix D.\n\n3.2 Team Formation and the Main Theoretical Result\n\nLet us explain the process of generating votes for the uniform team and for the diverse team. Con-\nsider a state s \u2208 S. For the uniform team consisting of k copies of agent i, the biased truth \u03c3is is\ni (\u03b8s), and is common to all the copies. Each copy j then individually draws a vote\ndrawn from P 1\n\u03c0j\nis from P 2\ni (\u03c3is); we denote the collection of these votes by \u03c0k\nis). Under a voting\nis = I[f (\u03c0k\ns (1)] be the indicator random variable denoting whether the uniform\nrule f, let X k\n\nis) = \u03c3\u2217\n\nis = (\u03c01\n\nis, . . . , \u03c0k\n\n5\n\n\fi and P 2\ni .\n\ns ) = \u03c3\u2217\n\ns = I[f (\u03c8n\n\ni (\u03c3is). This results in the pro\ufb01le \u03c8n\n\nis], where the expectation is over the state s and the draws from P 1\n\nteam selects the best alternative, namely \u03c3\u2217\ns (1). Finally, agent i is chosen to maximize the overall\naccuracy E[X k\nThe diverse team consists of one copy of each agent i \u2208 N. Importantly, although we can take\nmultiple copies of each agent and a total of k copies, we show that taking even a single copy of\neach agent outperforms the uniform team. Each agent i has its own biased truth \u03c3is drawn from\ns = (\u03c81s, . . . , \u03c8ns).\ni (\u03b8s), and it draws its vote \u03c8is from P 2\nP 1\ns (1)] be the indicator random variable denoting whether the diverse team\nLet Y n\nselects the best alternative, namely \u03c3\u2217\nBelow we put forward a number of assumptions on noise models; different subsets of assumptions\nare required for different results. We remark that each agent i \u2208 N has two noise models for each\npossible number of alternatives m. However, for the sake of notational convenience, we refer to\nthese noise models as P 1\nirrespective of m. This is natural, as the classic noise models\nde\ufb01ned in Section 2.1 describe a noise model for each m.\nA1 For each agent i \u2208 N, the associated noise models P 1\nA2 There exists a universal constant \u03b7 > 0 such that for each agent i \u2208 N, every possible ground\ntruth \u03b8 (and the corresponding true ranking \u03c3\u2217), and every k \u2208 [m] (where m is the number of\nalternatives), Pr\u03c3\u223cP 1\n\ni (\u03b8)[\u03c3\u2217(1) = \u03c3(k)] \u2264 1 \u2212 \u03b7.\n\ni are neutral.\n\ni and P 2\n\ns (1).\n\ni and P 2\ni\n\nIn words, assumption A2 requires that the true best alternative appear in any particular position\nwith probability at most a constant which is less than 1. This ensures that the noise model indeed\nintroduces a non-zero constant amount of noise in the position of the true best alternative.\nA3 There exists a universal constant \u03b1 > 0 such that for each agent i \u2208 N, the noise models P 1\nand P 2\nA4 There exists a universal constant \u03b1 > 0 such that for each agent i \u2208 N, the noise models P 1\nand P 2\n\ni are PPM-\u03b1 and PM-\u03b1, respectively.\n\ni are PD-\u03b1.\n\ni\n\ni\n\nWe show that the preceding assumptions are indeed very mild in that classical noise models in-\ntroduced in Section 2.1 satisfy all four assumptions. The proof of the following result appears in\nAppendix E.\nTheorem 1. With a \ufb01xed set of alternatives (such that the true qualities of every two alternatives\nare distinct in the case where the ground truth is the set of true qualities), the Mallows-\u03c6 model\nwith \u03c6 \u2208 [\u03c1, 1 \u2212 \u03c1], the Thurstone-Mosteller model with variance parameter \u03c32 \u2208 [L, U ], and the\nPlackett-Luce model all satisfy assumption A1, A2, A3, and A4, given that \u03c1 \u2208 (0, 1/2), L > 0, and\nU > L are constants.\n\nWe are now ready to present our main result; its proof appears in Appendix F.\nTheorem 2. Let \u00b5 be a distribution over the state space S. Let the set of alternatives in all states\n{As}s\u2208S be \ufb01xed.\n\n1. Under the assumptions A1 and A2, and for any neutral voting rule f, there exists a uni-\nversal constant c > 0 such that for every k and every N = {1, . . . , n}, it holds that\nis] \u2264 1 \u2212 c, where the expectation is over the state s \u223c \u00b5, the ground truths\nmaxi\u2208N E[X k\n\u03c3is \u223c P 1\n\ni (\u03b8s) for all s \u2208 S, and the votes \u03c0j\n\nis \u223c P 2\nthe following two conditions,\n\ni (\u03c3is) for all j \u2208 [k].\nfor a voting rule f,\n\n2. Under each of\nit holds that\ns ] = 1, where the expectation is over the state s \u223c \u00b5, the biased truths\nlimn\u2192\u221e E[Y n\n\u03c3is \u223c P 1\ni (\u03c3is) for all i \u2208 N and\ns \u2208 S: (i) assumptions A1 and A3 hold, and f is PD-c; (ii) assumptions A1 and A4 hold,\nand f is Condorcet consistent.\n\ni (\u03b8s) for all i \u2208 N and s \u2208 S, and the votes \u03c8is \u223c P 2\n\n4 Experimental Results\n\nWe now present our experimental results in the Computer Go domain. We use a novel methodology\nfor generating large teams, which we view as one of our main contributions. It is fundamentally\n\n6\n\n\f(a) Plurality voting rule\n\n(b) All voting rules\n\nFigure 1: Winning rates for Diverse (continuous line) and Uniform (dashed line), for a variety of\nteam sizes and voting rules.\n\ndifferent from that of Marcolino et al. [19, 20], who created a diverse team by combining four\ndifferent, independently developed Go programs. Here we automatically create arbitrarily many\ndiverse agents by parameterizing one Go program. Speci\ufb01cally, we use different parametrizations\nof Fuego 1.1 [10]. Fuego is a state-of-the-art, open source, publicly available Go program; it won\n\ufb01rst place in 19\u00d719 Go in the Fourth Computer Go UEC Cup, 2010, and also won \ufb01rst place in 9\u00d79\nGo in the 14th Computer Olympiad, 2009. We sample random values for a set of parameters for each\ngenerated agent, in order to change its behavior. In Appendix G we list the sampled parameters, and\nthe range of sampled values. The original Fuego is the strongest agent, as we show in Appendix H.\nAll results were obtained by simulating 1000 9\u00d79 Go games, in an HP dl165 with dual dodeca core,\n2.33GHz processors and 48GB of RAM. We compare the winning rates of games played against a\n\ufb01xed opponent. In all games the system under evaluation plays as white, against the original Fuego\nplaying as black. We evaluate two types of teams: Diverse is composed of different agents, and\nUniform is composed of copies of a speci\ufb01c agent (with different random seeds). In order to study\nthe performance of the uniform team, for each sample (which is an entire Go game) we construct\na team consisting of copies of a randomly chosen agent from the diverse team. Hence, the results\npresented for Uniform are approximately the mean behavior of all possible uniform teams, given the\nset of agents in the diverse team. In all graphs, the error bars show 99% con\ufb01dence intervals.\nFuego (and, in general, all programs using Monte Carlo tree search algorithms) is not originally\ndesigned to output a ranking over all possible moves (alternatives), but rather to output a single\nmove \u2014 the best one according to its search tree (of course, there is no guarantee that the selected\nmove is in fact the best one). In this paper, however, we wish to compare plurality (which only\nrequires each agent\u2019s top choice) with voting rules that require an entire ranking from each agent.\nHence, we modi\ufb01ed Fuego to make it output a ranking over moves, by using the data available in its\nsearch tree (we rank by the number of simulations per alternative). We ran games under 5 different\nvoting rules: plurality, Borda count, the harmonic rule, maximin, and Copeland. Plurality, Borda\ncount (which we limit to the top 6 positions in the rankings), and the harmonic rule (see Appendix A)\nare PD-c rules, while maximin and Copeland are Condorcet-consistent rules (see, e.g., [24]).\nWe \ufb01rst discuss Figure 1(a), which shows the winning rates of Diverse and Uniform for a varying\nnumber of agents using the plurality voting rule. The winning rates of both teams increase as the\nnumber of agents increases. Diverse and Uniform start with similar winning rates, around 35%\nwith 2 agents and 40% with 5 agents, but with 25 agents Diverse reaches 57%, while Uniform only\nreaches 45.9%. The improvement of Diverse over Uniform is not statistically signi\ufb01cant with 5\nagents (p = 0.5836), but is highly statistically signi\ufb01cant with 25 agents (p = 8.592 \u00d7 10\u22127). We\nperform linear regression on the winning rates of the two teams to compare their rates of improve-\nment in performance as the number of agents increases. Linear regression (shown as the dotted lines\nin Figure 1(a)) gives the function y = 0.0094x + 0.3656 for Diverse (R2 = 0.9206, p = 0.0024)\nand y = 0.0050x + 0.3542 for Uniform (R2 = 0.8712, p = 0.0065). In particular, the linear ap-\nproximation for the winning rate of Diverse increases roughly twice as fast as the one for Uniform\nas the number of agents increases.\n\n7\n\n2510152025NumberofAgents0.300.350.400.450.500.550.600.65WinningRateDiverseUniform251015NumberofAgents0.10.20.30.40.50.60.70.8WinningRatePluralityBordaHarmonicMaximinCopeland\fDespite the strong performance of Diverse (it beats the original Fuego more than 50% of the time),\nit seems surprising that its winning rate converges to a constant that is signi\ufb01cantly smaller than 1, in\nlight of Theorem 2. There are (at least) two reasons for this apparent discrepancy. First, Theorem 2\ndeals with the probability of making good moves in individual board positions (states), whereas\nthe \ufb01gure shows winning rates. Even if the former probability is very high, a bad decision in a\nsingle state of a game can cost Diverse the entire game. Second, our diverse team is formed by\nrandomly sampling different parametrizations of Fuego. Hence, there might still exist a subset of\nworld states where all agents would play badly, regardless of the parametrization. In other words,\nthe parametrization procedure may not be generating the idealized diverse team (see Appendix H).\nFigure 1(b) compares the results across different voting rules. As mentioned above, to generate\nranked votes, we use the internal data in the search tree of an agent\u2019s run (in particular, we rank\nusing the number of simulations per alternative). We can see that increasing the number of agents\nhas a positive impact for all voting rules under consideration. Moving from 5 to 15 agents for\nDiverse, plurality has a 14% increase in the winning rate, whereas other voting rules have a mean\nincrease of only 6.85% (std = 2.25%), close to half the improvement of plurality. For Uniform,\nthe impact of increasing the number of agents is much smaller: Moving from 5 to 15 agents, the\nincrease for plurality is 5.3%, while the mean increase for other voting rules is 5.70%(std = 1.45%).\nPlurality surprisingly seems to be the best voting rule in these experiments, even though it uses less\ninformation from the submitted rankings. This suggests that the ranking method used does not\ntypically place good alternatives in high positions other than the very top.\nHence, we introduce a novel pro-\ncedure to generate rankings, which\nwe view as another major method-\nological contribution. To generate a\nranked vote from an agent on a given\nboard state, we run the agent on the\nboard state 10 times (each run is inde-\npendent of other runs), and rank the\nmoves by the number of times they\nare played by the agent. We use these\nvotes to compare plurality with the\nfour other voting rules, for Diverse\nwith 5 agents. Figure 2 shows the\nresults. All voting rules outperform\nplurality; Borda and maximin are sta-\ntistically signi\ufb01cantly better (p < 0.007 and p = 0.06, respectively). All ranked voting rules are\nalso statistically signi\ufb01cantly better than the non-sampled (single run) version of plurality.\n\nFigure 2: All voting rules, for Diverse with 5 agents, using\nthe new ranking methodology.\n\n5 Discussion\n\nWhile we have focused on computer Go for motivation, we have argued in Section 1 that our theo-\nretical model is more widely applicable. At the very least, it is relevant to modeling game-playing\nagents in the context of other games. For example, random sampling techniques play a key role in\nthe design of computer poker programs [25]. A complication in some poker games is that the space\nof possible moves, in some stages of the game, is in\ufb01nite, but this issue can likely be circumvented\nvia an appropriate discretization.\nOur theoretical model does have (at least) one major shortcoming when applied to multistage games\nlike Go or poker: it assumes that the state space is \u201c\ufb02at\u201d. So, for example, making an excellent move\nin one state is useless if the agent makes a horrible move in a subsequent state. Moreover, rather\nthan having a \ufb01xed probability distribution \u00b5 over states, the agents\u2019 strategies actually determine\nwhich states are more likely to be reached. To the best of our knowledge, existing models of voting\ndo not capture sequential decision making \u2014 possibly with a few exceptions that are not relevant\nto our setting, such as the work of Parkes and Procaccia [22]. From a theoretical and conceptual\nviewpoint, the main open challenge is to extend our model to explicitly deal with sequentiality.\nAcknowledgments: Procaccia and Shah were partially supported by the NSF under grants IIS-\n1350598 and CCF-1215883, and Marcolino by MURI grant W911NF-11-1-0332.\n\n8\n\nPluralityNon-sampledPluralitySampledBordaHarmonicMaximinCopeland0.00.10.20.30.40.50.6WinningRate\fReferences\n[1] H. Azari Sou\ufb01ani, W. Z. Chen, D. C. Parkes, and L. Xia. Generalized method-of-moments for rank\n\naggregation. In Proc. of 27th NIPS, pages 2706\u20132714, 2013.\n\n[2] H. Azari Sou\ufb01ani, D. C. Parkes, and L. Xia. Random utility theory for social choice. In Proc. of 26th\n\nNIPS, pages 126\u2013134, 2012.\n\n[3] H. Azari Sou\ufb01ani, D. C. Parkes, and L. Xia. Computing parametric ranking models via rank-breaking. In\n\nProc. of 31st ICML, 2014. Forthcoming.\n\n[4] P. Baudi\u02d8s and J. l. Gailly. PACHI: State of the art open source go program. In Proc. of 13th ACG, pages\n\n24\u201338, 2011.\n\n[5] C. Boutilier, I. Caragiannis, S. Haber, T. Lu, A. D. Procaccia, and O. Sheffet. Optimal social choice\n\nfunctions: A utilitarian view. In Proc. of 13th EC, pages 197\u2013214, 2012.\n\n[6] Y. Braouezec. Committee, expert advice, and the weighted majority algorithm: An application to the\n\npricing decision of a monopolist. Computational Economics, 35(3):245\u2013267, 2010.\n\n[7] C. Browne, E. J. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez,\nS. Samothrakis, and S. Colton. A survey of Monte Carlo tree search methods. IEEE Transactions on\nComputational Intelligence and AI in Games, 4(1):1\u201343, 2012.\n\n[8] I. Caragiannis, A. D. Procaccia, and N. Shah. When do noisy votes reveal the truth? In Proc. of 14th EC,\n\npages 143\u2013160, 2013.\n\n[9] M. de Condorcet. Essai sur l\u2019application de l\u2019analyse `a la probabilit\u00b4e de d\u00b4ecisions rendues `a la pluralit\u00b4e\nde voix. Imprimerie Royal, 1785. Facsimile published in 1972 by Chelsea Publishing Company, New\nYork.\n\n[10] M. Enzenberger, M. M\u00a8uller, B. Arneson, and R. Segal. Fuego \u2014 an open-source framework for board\ngames and Go engine based on Monte Carlo tree search. IEEE Transactions on Computational Intelli-\ngence and AI in Games, 2(4):259\u2013270, 2010.\n\n[11] E. Hellinger. Neue begr\u00a8undung der theorie quadratischer formen von unendlichvielen ver\u00a8anderlichen.\n\nJournal f\u00a8ur die reine und angewandte Mathematik, 136:210\u2013271, 1909. In German.\n\n[12] L. Hong and S. E. Page. Groups of diverse problem solvers can outperform groups of high ability\nproblem solvers. Proceedings of the National Academy of Sciences of the United States of America,\n101(46):16385\u201316389, 2004.\n\n[13] L. Hong and S. E. Page. Some microfoundations of collective wisdom. In H. Landemore and J. Elster,\n\neditors, Collective Wisdom, pages 56\u201371. Cambridge University Press, 2009.\n\n[14] M. LiCalzi and O. Surucu. The power of diversity over large solution spaces. Management Science,\n\n58(7):1408\u20131421, 2012.\n\n[15] T.-Y. Liu. Learning to Rank for Information Retrieval. Springer, 2011.\n[16] T. Lu and C. Boutilier. Learning Mallows models with pairwise preferences. In Proc. of 28th ICML,\n\npages 145\u2013152, 2011.\n\n[17] R. D. Luce. Individual choice behavior: A theoretical analysis. Wiley, 1959.\n[18] C. L. Mallows. Non-null ranking models. Biometrika, 44:114\u2013130, 1957.\n[19] L. S. Marcolino, A. X. Jiang, and M. Tambe. Multi-agent team formation \u2014 diversity beats strength? In\n\nProc. of 23rd IJCAI, pages 279\u2013285, 2013.\n\n[20] L. S. Marcolino, H. Xu, A. X. Jiang, M. Tambe, and E. Bowring. Give a hard problem to a diverse team:\n\nExploring large action spaces. In Proc. of 28th AAAI, 2014.\n\n[21] F. Mosteller. Remarks on the method of paired comparisons: I. the least squares solution assuming equal\n\nstandard deviations and equal correlations. Psychometrika, 16(1):3\u20139, 1951.\n\n[22] D. C. Parkes and A. D. Procaccia. Dynamic social choice with evolving preferences. In Proc. of 27th\n\nAAAI, pages 767\u2013773, 2013.\n\n[23] R. Plackett. The analysis of permutations. Applied Statistics, 24:193\u2013202, 1975.\n[24] A. D. Procaccia, S. J. Reddi, and N. Shah. A maximum likelihood approach for selecting sets of alterna-\n\ntives. In Proc. of 28th UAI, pages 695\u2013704, 2012.\n\n[25] T. Sandholm. The state of solving large incomplete-information games, and application to Poker. AI\n\nMagazine, 31(4):13\u201332, 2010.\n\n[26] L. L. Thurstone. A law of comparative judgement. Psychological Review, 34:273\u2013286, 1927.\n\n9\n\n\f", "award": [], "sourceid": 1334, "authors": [{"given_name": "Albert", "family_name": "Jiang", "institution": "USC"}, {"given_name": "Leandro", "family_name": "Soriano Marcolino", "institution": "University of Southern California"}, {"given_name": "Ariel", "family_name": "Procaccia", "institution": "Carnegie Mellon University"}, {"given_name": "Tuomas", "family_name": "Sandholm", "institution": "Carnegie Mellon University"}, {"given_name": "Nisarg", "family_name": "Shah", "institution": "Carnegie Mellon University"}, {"given_name": "Milind", "family_name": "Tambe", "institution": "USC"}]}