{"title": "A Mathematical Model For Optimal Decisions In A Representative Democracy", "book": "Advances in Neural Information Processing Systems", "page_first": 4702, "page_last": 4711, "abstract": "Direct democracy, where each voter casts one vote, fails when the average voter competence falls below 50%. This happens in noisy settings when voters have limited information. Representative democracy, where voters choose representatives to vote, can be an elixir in both these situations. We introduce a mathematical model for studying representative democracy, in particular understanding the parameters of a representative democracy that gives maximum decision making capability. Our main result states that under general and natural conditions,\n\n1. for fixed voting cost, the optimal number of representatives is linear;\n\n2. for polynomial cost, the optimal number of representatives is logarithmic.", "full_text": "A Mathematical Model For Optimal Decisions In A\n\nRepresentative Democracy\n\nMalik Magdon-Ismail\n\nDepartment of Computer Science\nRensselaer Polytechnic Institute\n\nTroy, NY 12180\n\nmagdon@cs.rpi.edu\n\nLirong Xia\n\nDepartment of Computer Science\nRensselaer Polytechnic Institute\n\nTroy, NY 12180\n\nxial@cs.rpi.edu\n\nAbstract\n\nDirect democracy, where each voter casts one vote, fails when the average voter\ncompetence falls below 50%. This happens in noisy settings when voters have lim-\nited information. Representative democracy, where voters choose representatives to\nvote, can be an elixir in both these situations. We introduce a mathematical model\nfor studying representative democracy, in particular understanding the parameters\nof a representative democracy that gives maximum decision making capability. Our\nmain result states that under general and natural conditions,\n\n1. for \ufb01xed voting cost, the optimal number of representatives is linear;\n2. for polynomial cost, the optimal number of representatives is logarithmic.\n\n1\n\nIntroduction\n\nSuppose a voter-population of size n must vote in a referendum to make an important binary decision\nto optimize some objective, e.g. social welfare growth over 10 years. A typical solution is direct\ndemocracy which decides based on a majority vote, the so called \u201cwisdom of the crowd.\u201d Direct\ndemocracy works when the crowd does indeed have wisdom. In reality, the voters cannot directly\nobserve which decision is correct. Instead, they form beliefs using perceived information, which can\nbe inaccurate, misinterpretable or even manipulated. For example, suppose that each voter\u2019s chance\nto vote for the correct decision, called her competence, is i.i.d. generated uniformly over [0, 0.99].\nNow, the majority among many voters makes the wrong decision with near certainty [17].\nThis highlights a \ufb02aw of direct democracy, where voters participate in decision-making irrespective\nof their competence. The problems arise in high noise situations, where peoples\u2019 beliefs between\ntwo choices are nearly split, as in the example above. Such close ties are common in real high-stakes\nscenarios. For example, in the 2016 United Kingdom European Union membership referendum,\n51.89% voted for leave and 48.11% voted for remain. In the 2016 US Presidential Election, 46.1%\nvoted for Trump and 48.2% voted for Clinton.1 How can democracy cope with high-stakes noisy\nissues where the average voter competence may drop below 0.5, especially if there is misinformation?\nOne promising rescue is representative democracy, where voters form groups, each group chooses a\nrepresentative, and the representatives decide via a majority vote. The tradeoff is that there are fewer\nrepresentatives than base-voters, but, in return, each representative is (hopefully) better informed,\nbeing the \u201cwisest\u201d from its group, or at least having a higher competence than the average of its group\nmembers. Continuing the example above, let us now use representative democracy, where people are\ndivided into households (e.g. 5 people per group), and let each group choose the member with highest\ncompetence as its representative. Then, with high probability the representative\u2019s competence is\n\n1These examples are only used to show real situations where close ties exist. We do not know if direct\n\ndemocracy would succeed or fail in these cases since we do not know what the \u201ccorrect\u201d outcome is.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fstrictly larger than 0.5. Now, with enough representatives, a majority vote will now make the correct\ndecision with near certainty [17].\nIt is widely accepted that representative democracy is ef\ufb01cient (lower operational cost and better\nturnout than direct democracy), yet there is still debate on a fundamental question: which democracy\nmakes better decisions? We are not aware of any mathematical framework for analyzing representative\ndemocracy w.r.t. its ability to make correct decisions. This is in contrast to direct democracy, which\nhas been mathematically analyzed in depth to provide a justi\ufb01cation of the \u201cwisdom of the crowd,\u201d\nwhich dates back to the Condorcet Jury Theorem [13]. Roughly speaking, the Jury Theorem states\nthat a large group of competent voters are likely to make a correct decision by majority voting, which\n\u201clays, among other things, the foundations of the ideology of the democratic regime\u201d [33]. Direct\ndemocracy is just a representative democracy where each group has one voter. Thus, a mathematical\ncharacterization of optimal representative democracies would also highlight the subcases where\ndirect democracy is best. The goal of this paper is to establish rigorous mathematical foundations of\nrepresentative democracy, and provide quantitative answers to the following key questions:\n\nQ1: What is the optimal number of representatives for representative democracy?\nQ2: How should electoral group-sizes be distributed (each group has one representative)?\n\nWe will answer these questions in a general setting, where the groups satisfy a weak form of\nhomogeneity with respect to representative election. A concrete example of homogeneity is when\neach group, which maybe of different sizes, runs the same type of election process on its members\nwho are independent and drawn from some underlying voter-distribution.\nIn our analysis, we consider two cases. The \ufb01rst is when there is a \ufb01xed cost for the voting. In this\ncase, the goal of the representative democracy is to maximize the chances of making the correct\ndecision. This case is relevant to making extremely important decisions where the operational cost of\nvoting is not considered a valid tradeoff for correctness (e.g. presidential election). The second case\nis when the cost of voting increases with the number of representatives who vote. In this case, one\nmust balance the cost with the bene\ufb01t that accrues to all n individuals.\nOur Contributions. We provide a novel mathematical model of representative democracy w.r.t. its\nability to make correct decisions and characterize the optimal number of representatives as follows.\n\n1. When the cost of voting is \ufb01xed, \u0398(n) representatives is optimal. (n is the population size).\n2. When the cost and bene\ufb01t of voting are both polynomial, O(log n) representatives is optimal.\nIn our basic model, there is a single binary issue to be decided and n voters are divided into L groups.\nEach group chooses a representative by a representative selection function, and the representatives\nwill use majority voting to decide a binary issue. Each voter is characterized by her competence, which\nis the probability for her vote to be correct; voter competence is generated i.i.d. from a distribution F .\nLet Ben(n) \u2208 R denote the bene\ufb01t of making the correct decision compared to making the wrong\ndecision for n voters, and let Cost(L) \u2208 R denote the operational cost for L representatives to vote.\nWe reduce representative selection to a group competence function \u00b5 : N \u2192 [0, 1], mapping a group\u2019s\nsize to the expected competence of its representative. We then extend the Condorcet Jury Theorem to\nrepresentative democracy by characterizing group competence functions for which the representative\ndemocracy makes the correct decision with probability 1 as n \u2192 \u221e. We informally summarize our\nmain theorems, which are surprising characterizations of the optimal number of representatives.\nTheorems 2, 3, 4 (Optimal representative democracy with \ufb01xed voting cost.) Under natural and\nmild conditions on the group competence function \u00b5 and when Cost(L) = constant:\n1. (Homogeneous groups) The optimal group size K\u2217(n) is at most a constant independent of n.\n2. (Inhomogeneous groups) The optimal number of representatives L\u2217 is linear in n.\n3. The optimal group size distribution is nearly homogeneous.\n\nThese results hold, independently of the speci\ufb01c details of the representative selection process. Let us\nhighlight why the result is unexpected in a concrete context. Suppose voter competence is drawn from\nsome continuous density on [0, 1] and a group elects the most competent of its voters as representative\n(very optimistic since it cannot get better than that). Then by choosing larger and larger groups,\nthe competence of the representative approaches 1. The price paid is that there are fewer of these\nultra-smart representatives, but there will still be many of them as n \u2192 \u221e. Indeed, one might posit\nthat some optimal tradeoff exists whereby the group size tends to in\ufb01nity but at a slower rate than\nn, so that the representatives get smarter and smarter and there are more and more representatives.\nOur theorem establishes the contrary. The optimal group size will never exceed some constant (the\nconstant\u2019s value may depend on the speci\ufb01c parameters of the selection process).\n\n2\n\n\fTo prove our results, we use novel combinations of combinatorial bounds. In addition, some of\nour results may be of independent interest, for example, Lemma 7 in the full version answers an\nopen question on the probability for majority voting to be correct when the average competence of\nvoters is exactly 0.5 where there are no general results for the non-asymptotic behavior [17] and the\nasymptotic behavior is only conjectured [32, Lemma 5].\nWe then consider the case of polynomial cost and polynomial bene\ufb01t: for q1, q2 > 0, Cost(L) = Lq1\nand Ben(n) = nq2. A special case is linear cost and linear gain, where q1 = q2 = 1. The cost of\nvoting has a signi\ufb01cant impact on the optimal group size.\nTheorem 5 (Optimal representative democracy with polynomial cost and bene\ufb01t). Under natural\nand mild conditions on the group competence function \u00b5, when Cost(L)\n\nBen(n) = \u0398( Lq1\n\nnq2 ):\n\n1. The optimal number of representatives is in O(log n).\n2. When \u00b5(K) polynomially converges to 1, the optimal number of representatives is in \u0398(log n).\n3. If \u00b5(K) is bounded below 1, the optimal number of representatives is in \u0398(1).\n\nOur analysis can be extended to the situation where representatives will have to vote on d > 1 possibly\ncorrelated issues. This is the case with the US Senate and House members, who after election may\nhave to cast a vote on multiple occasions on different issues (about once per week). We defer these\nresults to the full version.\nRelated Work and Discussion. In the US, each House member represents about 700K people,\nwhich used to be as low as 33K. Many favor enlarging the House [18, 3, 22]. We provide a\nmathematical foundation to analyze such choices. Mathematical models of representative democracy\nexist, e.g. [10, 1], however none quantitatively characterizes optimal representative democracy\nw.r.t. its decision-accuracy. Our work is related to extensions of The Condorcet Jury Theorem\nto heterogeneous agents [29, 21, 30]. Along this vein are three subareas: (1) understanding the\nconditions for consistency of the majority [32, 33, 24, 17, 37], (2) studying optimal population size to\nmaximize correctness [16, 27, 20, 34, 7, 4, 28, 26, 8, 9, 38, 6], and (3) incentivizing voters to increase\ntheir competence [29, 25, 5]. See [31] for a recent survey, including extensions to strategic voters.\nOur work is related to the \ufb01rst two subareas (consistency and optimal size). The key differences\nare, \ufb01rst, in our work, the competence of the representatives is endogenous to our model, a result of\npartitioning and representative selection, while the competence of voters in the literature is given.\nSecond, in our work, our setting has a tradeoff: increasing the number of representatives means\nweaker representatives, and hence may decrease the overall correctness. We give quantitative analysis\nof the quality vs. quantity tradeoff in representative democracy. Our results also extend to multiple\nissues (discussed in the full version).\nOur work is related to some recent work in proxy voting or liquid (delegative) democracy [12, 23].\nOur voting dynamic is different because voters are not allowed to delegate to an arbitrary voter,\nand representatives cannot delegate their votes. Technically, we require minimal assumptions on\nthe representative selection process. We also consider cost of voting, which is not considered in\n[12, 23]. There is recent work in computational social choice on using statistics to make better\ndecisions [14, 11, 40, 41, 36, 35, 15, 2, 39], which focuses on direct (not representative) democracy.\n\n2 Mathematical Model of Representative Democracy\n\nIn this section we propose a mathematical model for representative democracy for one issue. As\nin the Condorcet Jury Theorem, we assume that voters\u2019 ability to vote for the correct decision is\ndrawn i.i.d. from a distribution F . The voters are divided into L \u2265 1 groups. For any group (cid:96) with\nK voters, whose competences are {q(cid:96),1, . . . , q(cid:96),K}, a deterministic or randomized representative\nselection process V(cid:96) chooses a member q(cid:96). The chosen representative casts a vote, and majority\nvoting succeeds if strictly more than half of representatives vote 1. The process for a group to choose\na representative to vote is summarized below.\n\ngenerate voters\n\nF\n\n{q(cid:96),1, . . . , q(cid:96),K}\n\nelect representative\n\nq(cid:96)\n\ncast vote\n\nx(cid:96)\n\nDe\ufb01nition 1. A representative democracy for one issue is composed of the following components.\n\u2022 Issue. Suppose there is one issue to decide, whose outcome is 1 (correct), and 0 (incorrect).\n\n3\n\n\fi=1 Ki = n.\n\n\u2022 Partition function (cid:126)K. For any number of voters n, (cid:126)K denotes a partition function that divides n\n\nvoters into L(n) groups, that is, (cid:126)K(n) = (K1, . . . , KL(n)) > 0 and(cid:80)L(n)\n\n\u2022 Distribution of competence F . We assume that each voter\u2019s type is characterized by her com-\npetence, which is the probability for her to vote for 1. Each voter\u2019s competence is i.i.d. from a\ndistribution F over [0, 1].\n\u2022 Representative selection process V(cid:96). Suppose group (cid:96) has K users whose competences are\nq(cid:96),1, . . . , q(cid:96),K respectively. The group uses a (randomized) process V(cid:96)(q(cid:96),1, . . . , q(cid:96),K) to select a\nrepresentative q(cid:96), whose vote is represented by a random variable x(cid:96).\n\u2022 Voting by the representatives. The chosen representatives will vote for 1 with probability equiva-\nlent to her competence, and vote for 0 otherwise. The majority rule is used to decide the outcome\nbased on representatives\u2019 votes. We assume that a strict majority of votes is necessary to make the\ncorrect decision. That is, when there is a tie, the outcome is 0 (incorrect).\n\nWhen each group has exactly 1 member, we obtain the direct democracy, otherwise we have the\nrepresentative democracy as in the following example.\nExample 1 (Uniform Voters with Uniform Process). Suppose there are L \u2265 1 groups, each group\nhas K \u2265 1 members. Each voter\u2019s competence is i.i.d. from Uniform[a, b]. For all groups, V(cid:96) chooses\na member uniformly at random. It is not hard to see that majority voting by the representatives is\n(cid:4)\ncorrect with probability no more than 0.5 when a + b < 1.\n\nWe note that the vote of group (cid:96)\u2019s representative, i.e. the random variable x(cid:96), is characterized entirely\nby its expectation, which contains all information needed in the analysis in this paper. Therefore, we\nwill simplify the representative selection process to a single group competence function \u00b5, which\nspeci\ufb01es the expected competence of the representative as a function of the group size K.\nDe\ufb01nition 2. A representative democracy is a group competence function \u00b5 : N (cid:55)\u2192 [0, 1].\nFor example, the group competence function for the uniform process in Example 1 can be represented\nby \u00b5U(K) = (a + b)/2 for all K. We will see that the group competence function signi\ufb01cantly\nsimpli\ufb01es the process in the following two examples. In the next example, the group chooses its\nmost-informed member\u2014the one with the highest competence\u2014as the representative.\nExample 2 (Max Process). As in Example 1, suppose F = Uniform[a, b]. Each group now chooses\nthe member with the highest competence as the representative. So q(cid:96) = max{q(cid:96),1, . . . , q(cid:96),K}, which\nmeans (q(cid:96) \u2212 a)/(b \u2212 a) \u223c Beta(K, 1) [19], from which \u00b5max = E[x(cid:96)] = (a + Kb)/(K + 1). (cid:4)\nThe group competence function for MAX process is monotonically increasing in K, between a and\nb, while the group competence function for UNIFORM process outputs the same value for all K.\nThe MAX process is an upper bound on group competence functions. Competence may not be\nobservable as in Example 2. However, when the group size is not too large, for example a group is a\nhousehold, then it is natural to assume that the family members are able to choose the max-informed\nrepresentative. Even if the process is noisy, it is reasonable to expect more competent voters to have\nhigher probability to become the representative, as in the noisy-max process below.\nExample 3 (Noisy-Max Process). Continuing with Example 1, let (cid:126)p = (p1, . . . , pK) satisfy\ni=1 pi = 1. For i \u2264 K, choose the member with i-th highest com-\npetence as the representative with probability pi. Then, \u00b5(cid:126)p(K) = ((cid:126)p \u00b7 (cid:126)k+)a + ((cid:126)p \u00b7 (cid:126)k\u2212)b, where\n(cid:4)\n(cid:126)k+ = 1\n\np1 \u00b7\u00b7\u00b7 \u2265 pK \u2265 0 and(cid:80)K\n\nK+1 (1, 2, . . . , K) and (cid:126)k\u2212 = 1\n\nK+1 (K, K \u2212 1, . . . , 1).\n\n3 Optimal Representative Democracy for One Issue\n\nWe \ufb01rst extend the classical Condorcet Jury Theorem to representative democracy. Let us formally\nde\ufb01ne consistency, the main desired property of a democracy. As the population increases, i.e.\nasymptotic in n, it should be possible to partition the voters into L groups, with potentially different\nnumber of voters in each group, such that with probability 1 the majority representatives vote for 1.\nGiven (cid:126)K = (K1, . . . , KL(n)), we let Sn, (cid:126)K,\u00b5 be the fraction of 1\u2019s in L(n) independent Bernoulli\nrandom variables with success probabilities (\u00b5([ (cid:126)K(n)]1), . . . , \u00b5([ (cid:126)K(n)]L(n))), where for i \u2264 L(n),\n[ (cid:126)K(n)]i = Ki is the i-th component of (cid:126)K(n).\n\n4\n\n\fDe\ufb01nition 3. Given a partition function (cid:126)K and a group competence function \u00b5, for any n, we let\nRn( (cid:126)K, \u00b5) denote the probability for majority voting by representatives according to (cid:126)K and \u00b5 to be\ncorrect. That is, Rn( (cid:126)K, \u00b5) = P(Sn, (cid:126)K,\u00b5 > 1\n2 ). (We use R( (cid:126)K, \u00b5) when n is clear from the context.)\nDe\ufb01nition 4. A representative democracy with group competence function \u00b5 is consistent if there\nexists a partition function (cid:126)K(n) for which the majority voting of representatives is correct with\nprobability 1 as n \u2192 \u221e, that is, limn\u2192\u221e Rn( (cid:126)K, \u00b5) = 1.\nTheorem 1. For a voter-competence distribution F , the representative democracy with group\ncompetence function \u00b5 is consistent if and only if there exists K\u2217 \u2208 N such that \u00b5(K\u2217) > 0.5.\nMissing proofs are in the full version. We now de\ufb01ne the bene\ufb01t, cost, and social welfare.\nDe\ufb01nition 5. Let Ben(n) \u2208 R be the bene\ufb01t of making the correct decision and let Cost(L) be cost of\nmaintaining L representatives. Given a partition (cid:126)K = (K1, . . . , KL(n)) \u2208 NL(n), the social welfare\nof (cid:126)K is the expected bene\ufb01t minus the cost of voting, that is,\nSW( (cid:126)K) = Ben(n)Rn( (cid:126)K, \u00b5) \u2212 Cost(L) = Ben(n)\n\n(cid:18)\n\n(cid:19)\n\nRn( (cid:126)K, \u00b5) \u2212 Cost(L)\nBen(n)\n\n3.1 Optimal Group Size for Fixed Cost of Voting\n\nIn this section, we focus on a \ufb01xed voting cost regardless of the number of representatives. In this case,\nthe goal is to \ufb01nd the optimal partition (cid:126)K that maximizes Rn( (cid:126)K, \u00b5). Let us \ufb01rst informally discuss\nthe effect of increasing the size of groups, which trades quality (competence of each representative)\nfor quantity (the number of representatives), and is a form of mean vs. variance tradeoff.\nThe quality vs. quantity tradeoff. Suppose n is \ufb01xed\nand we are deciding between group sizes K1 < K2, with\n\u00b5(K2) \u2265 \u00b5(K1) > 0.5. For simplicity, suppose K1 and\nK2 divide n and assume group sizes are equal in both cases,\nso (cid:126)K1 = (K1 . . . , K1) and (cid:126)K2 = (K2 . . . , K2). With K1\nmembers in each group, each representative has competence\n\u00b5(K1) < \u00b5(K2). On the other hand, the number of repre-\nsentatives is L1 = n/K1 > n/K2 = L2. Therefore, we\nhave E(Sn, (cid:126)K1,\u00b5) = \u00b5(K1) \u2264 \u00b5(K2) = E(Sn, (cid:126)K2,\u00b5), while\nthe variance of Sn, (cid:126)K2,\u00b5, which is \u00b5(K1)(1 \u2212 \u00b5(K1))/L1,\ncan potentially be smaller than the variance of Sn, (cid:126)K2,\u00b5.2\nTherefore, the optimal group size K\u2217 minimizes the left tail\nprobability P[S < 0.5]. This mean vs. variance tradeoff is\nillustrated using the distribution of S in Figure 1 for the MAX\nprocess with n = 2000 and F = Uniform[0.44, 0.55]. We compare three group sizes: K = 2 (low\nmean, low variance), K\u2217 = 8 (optimal, middle), and K = 20 (high mean, high variance).\nNaturally, our next goal is to identify an optimal number of representatives (groups), denoted by\nL\u2217(n). We will \ufb01rst focus on the speci\ufb01c partition functions where almost all groups have the same\nsize, with the exception for the last group, whose size is allowed to be larger than other groups. This\nis a natural setting in practice because each representative is supposed to represent equal number of\nvoters. Later in this section we will show how to extend our study to general partition functions.\nDe\ufb01nition 6 (Homogeneous groups). Given n and a group size K, in the homogeneous setting,\nL = (cid:98)n/K(cid:99) representatives are selected using partition function (cid:126)HK = (K, . . . , K, n \u2212 (L \u2212 1)K),\nwith L \u2212 1 groups of size K.\nOur main result is that for homogeneous groups, the optimal group size is bounded by a constant,\nprovided that some value of K can achieve consistency and the group competence function \u00b5\nis polynomially bounded away from 1 as K goes to in\ufb01nity. Let K\u2217\nhom(n) denote the optimal\nhomogeneous group size that maximizes SW( (cid:126)HK), where Cost(L) is a constant. Then, K\u2217\nhom(n)\nmaximizes Rn( (cid:126)HK\u2217\nhom(n), \u00b5), the probability for the majority of representatives to be correct.\nTheorem 2 (Optimal homogeneous group size). Suppose the Cost(L) is a constant, there exists\nK\u2217 \u2208 N such that \u00b5(K\u2217) = 1\n2 , and for all K \u2208 N, \u00b5(K) \u2264 1 \u2212 A/K \u03b1 for\n\nFigure 1: The mean vs. variance trade-\noff when selecting group size.\n\n2 + \u0001 for 0 < \u0001 < 1\n\n5\n\n0.40.50.60.7fraction of representatives with correct vote1020probability densityS2000,2\u20d7,\u00b5maxS2000,8\u20d7,\u00b5maxS2000,20,\u00b5max\fconstants A > 0 (w.l.o.g. A \u2264 1) and \u03b1 \u2265 0. Then, K\u2217\n\n\u22124\n\nc =\n\nln(1 \u2212 4\u00012)\n\nln\n\n32\n\n9\u00012A\n\n+ \u03b1 ln\n\ne\n\n(cid:18)\n\nhom(n) \u2264 cK\u2217, where\n4\u03b1K\u2217\n\n\u2212 \u03b1 ln| ln(1 \u2212 4\u00012)|\n\n(cid:19)\n\n.\n\nThe intuition is that as K increases, we experience diminishing returns with respect to \u00b5 because \u00b5 is\nbounded away from 1. On the other hand, there is a loss due to decreasing L = (cid:98) n/K (cid:99), the number\nof representatives. One may expect that the best tradeoff is achieved when K is a slowly increasing\nfunction of n, but our theorem proves otherwise. The proof is technical and involves a subtle analysis\nof the tail probabilty P[S < 0.5], which is deferred to the appendix.\nProof. (Sketch) We may assume the \ufb01rst L\u22121 groups has K members. We \ufb01rst observe an elementary\nbound that allows us to ignore the last group whose number of representatives is unknown:\n\n(cid:34)L\u22121(cid:88)\n\n(cid:96)=1\n\nP\n\n(cid:124)\n\n(cid:24) L + 1\n(cid:123)(cid:122)\n\n2\n\n(cid:25)(cid:35)\n(cid:125)\n\nx(cid:96) \u2265\n\nR\u2212(L,\u00b5(K))\n\n(cid:34)L\u22121(cid:88)\n\n(cid:96)=1\n\n(cid:24) L \u2212 1\n(cid:123)(cid:122)\n\n2\n\n(cid:25)(cid:35)\n(cid:125)\n\n.\n\nx(cid:96) \u2265\n\nR+(L,\u00b5(K))\n\n\u2264 Rn( (cid:126)HK, \u00b5) \u2264 P\n\n(cid:124)\n\nThe functions R\u2212 and R+ are Binomial upper tail probabilities. We have not yet invoked any\nproperties of the group competence function \u00b5. The following lemmas are properties of R\u00b1(L, p).\nLemma 1 (Monotonicity of R(L, p)). For \ufb01xed L, R\u2212(L, p) and R+(L, p) are increasing in p. For\n\ufb01xed p > 1\nLemma 2. For p \u2264 1\nLemma 3 (Binomial tail inequality). Given p > 1\n\n2 , R\u2212(L, p) and R+(L, p) are increasing in L.\n\n4 (the maximum is attained for L = 3, p = 1\n2 ).\n\n2 , R+(L, p) \u2264 3\n\n2 , L and k \u2264 (cid:100) L/2(cid:101),\n\n(cid:18) L\n\n(cid:19)\n\npk(1 \u2212 p)L\u2212k \u2264 k(cid:88)\n\n2p \u2212 1\nLemma 4 (Near-Central binomial coef\ufb01cient bound). For L > 1,\n\n(cid:96)=0\n\nk\n\np(cid:96)(1 \u2212 p)L\u2212(cid:96) \u2264 p\n\n(cid:18) L\n\n(cid:19)\n\nk\n\npk(1 \u2212 p)L\u2212k.\n\n2 (L \u2212 1)(cid:7)(cid:19)\n(cid:6) 1\n\nL\n\n\u2264 2 \u00b7 4L/2\u221a\n\u03c0L\n\n.\n\n3\n4\n\n\u00b7 4L/2\u221a\n\u03c0L\n\n(cid:19)\n(cid:18) L\n(cid:18)\n\n(cid:96)\n\n\u2264\n\n(cid:18)\n\n(cid:19)\n\nLemma 5 (Bounding R\u2212(L, p) and R+(L, p)). For 1\n\n2 < p < 1,\n\n1 \u2212\n\n2\n\n(2p \u2212 1)\n\n\u00b7 (4p(1 \u2212 p))L/2\n\n\u221a\n\n\u03c0L\n\n\u2264 R\u2212(L, p) \u2264 R+(L, p) \u2264 1 \u2212 3\n8p\n\n\u00b7 (4p(1 \u2212 p))L/2\n\n\u221a\n\n\u03c0L\n\n.\n\nWe are ready to prove the theorem. Let c be de\ufb01ned as in the statement of the theorem. We may\nhom(n) \u2264 n \u2264 cK\u2217. Further,\nassume n > cK\u2217 otherwise the theorem automatically holds, because K\u2217\nif L\u2217 = 1, then there is just one group and any K > K\u2217 will also have just one group and be\nequivalent. Therefore, we may assume L\u2217 \u2265 2. Now suppose K > cK\u2217. De\ufb01ne \u00b5K = \u00b5(K)\nand LK = (cid:98) n/K (cid:99), \u00b5\u2217 = \u00b5(K\u2217) and L\u2217 = (cid:98) n/K\u2217 (cid:99). Observe that LK \u2264 L\u2217. We show that\nR(L\u2217, \u00b5\u2217) \u2265 R(LK, \u00b5K) which means that K cannot be better than K\u2217 for a homogeneous partition\nof n, proving the theorem.\nIf \u00b5K \u2264 1\nn > cK\u2217, we have n/K\u2217 > c \u2265\n\u22122\n\n2 then R+(LK, \u00b5K) \u2264 3\n4 (Lemma 2). We show that R\u2212(L\u2217, \u00b5\u2217) > 3\n(cid:23)\nln(1\u22124\u00012) \u00b7 ln 32\n\u22124\n\n1\nL\u2217 =\n,\n4\nwhere in the last inequality we used A \u2264 1 and L\u2217 \u2265 2. Using the bound for R\u2212 from Lemma 5 gives\n2. Also, if\nR\u2212(L\u2217, \u00b5\u2217) > 3\n2 then the representatives always vote for the correct decision and the theorem automatically\n\u0001 = 1\nholds, so we may assume \u0001 < 1\n\n4, which proves K is not optimal. Therefore, we may assume that \u00b5K > 1\n\n2. Now, using Lemma 5, L\u2217 \u2265 LK and some algebra:\n\n9\u00012A, and so\n=\u21d2\n\n4. Indeed, since\n\n(1 \u2212 4\u00012)L\u2217/2\n\nln(1 \u2212 4\u00012)\n\n(cid:22) n\n\n\u2265 n\n2K\u2217\n\n\u221a\n9\u0001A\n\n\u00b7 ln\n\n9\u00012A\n\n\u03c0L\u2217\n\n\u03c0L\u2217\n\nK\u2217\n\n\u221a\n\n32\n\n32\n\n<\n\n<\n\n>\n\n\u0001\n\nRn( (cid:126)HK\u2217 , \u00b5) \u2212 Rn( (cid:126)HK, \u00b5) \u2265 R\u2212(L\u2217, \u00b5\u2217) \u2212 R+(LK, \u00b5K)]\n\n(cid:34)\n\n1 \u2212 C\n\n(cid:18) (4\u00b5\u2217(1 \u2212 \u00b5\u2217))L\u2217/LK\n\n4\u00b5K(1 \u2212 \u00b5K)\n\n(cid:19)LK /2(cid:35)\n\n,\n\n\u2265 (positive) \u00b7\n\n6\n\n\fwhere C = 16\nand \u0001 < 1\n\n3 \u00b7 \u00b5K\n\n2\u00b5\u2217\u22121 = 8\u00b5K\n\n3\u0001 . Note that C \u2264 8\n\n3\u0001 (because \u00b5K < 1) and C > 1 (because \u00b5K > 1\n\n2\n\n=\n\n2). We prove the term in square parentheses is positive. Observe that\n\u2265 K\n2K\u2217\n\n(cid:98) n/K\u2217 (cid:99)\nn/K\u2217\n\nn/K\n\nK\nK\u2217\n\n(cid:98) n/K (cid:99) \u2265 (cid:98) n/K\u2217 (cid:99)\n(cid:98) n/K\u2217 (cid:99)\nL\u2217\nLK\n2 when x \u2265 1. Because \u00b5K > 1\n2 + \u0001, which means that \u00b5\u2217(1 \u2212 \u00b5\u2217) \u2264 ( 1\n(cid:19)LK /2 \u2264 C\n\n(cid:18) (4\u00b5\u2217(1 \u2212 \u00b5\u2217))L\u2217/LK\n\n2 + \u0001)( 1\n\n(cid:18) K \u03b1(1 \u2212 4\u00012)K/2K\u2217\n\n=\n\nC\n\n.\n\n4\u00b5K(1 \u2212 \u00b5K)\n\n2A\n\nWe used (cid:98) x(cid:99)/x \u2265 1\nA/2K \u03b1. Also, \u00b5\u2217 \u2265 1\n\n(cid:19)LK /2\n\n2 and 1\u2212 \u00b5K \u2265 A/K \u03b1, we have \u00b5K(1\u2212 \u00b5K) \u2265\n2 \u2212 \u0001), and K \u2264 n. Therefore,\n\nWe show that the RHS is at most 1, or equivalently its logarithm is at most zero, concluding the proof.\nTaking the logarithm of the RHS, we get:\n\n(LK\u22651,C>1)\n\n\u2264\n\nLK\n\nLK\n\n(cid:18) K\n(cid:18) K\n(cid:18) K\n\n4K\u2217\n\n4K\u2217\n\nln(1 \u2212 4\u00012) +\n\n\u03b1\n2\n\nln(1 \u2212 4\u00012) +\n\nln K \u2212 1\n2\nln K \u2212 1\n2\n\n(cid:18)\n\n\u03b1\n2\n\u03b1\n2\n\n(cid:19)\n\nln 2A\n\n+ ln C\n\n(cid:19)\n\nln 2A + ln C\n\n\u2264\n\nLK\n\nln(1 \u2212 4\u00012) +\n\n8\n3\u0001\nThe last step follows by using the fact for any z > 0, ln x \u2264 ln(z/e) + x/z, which holds because for\nany y = x/z > 0, ln y\u2212y+1 is maximized at y = 1. In the last step, we set z = \u22124\u03b1K\u2217/ ln(1\u22124\u00012).\nTo conclude the proof, collecting terms and use K/K\u2217 > c and ln(1 \u2212 4\u00012) < 0.\n\nln 2A + ln\n\n\u2212 1\n2\n\n4K\u2217\n\nK\n\nln\n\n\u22124\u03b1K\u2217/e\nln(1 \u2212 4\u00012)\n\n\u2212 ln(1 \u2212 4\u00012)\n4\u03b1K\u2217\n\n(cid:19)\n\n(cid:19)\n\n.\n\nAs a corollary, Theorem 2 can be applied to any F with continuous density function, which means\nthat in such cases the optimal number of representatives with homogeneous group-size is \u2126(n).\nCorrollary 1 (Linear number of representatives). Suppose F has a continuous density function on\n[0, 1]. For any \u00b5 such that \u00b5(K\u2217) > 0.5 for some K\u2217. The optimal number of representatives for\nhomogeneous groups is at least n\n\ncK\u2217 , where c is the constant de\ufb01ned in Theorem 2.\n\nWhen costs are \ufb01xed, a representative democracy with \ufb01xed group size makes better choices than\nthe representative democracy with \ufb01xed number of representatives. One limitation of Theorem 2 is\nthat it only holds for homogeneous group size. We can drop this restriction if the group competence\nfunction \u00b5(K) is concave (for example the MAX process is concave). Let L\u2217(n) denote the optimal\nnumber of groups for n voters.\nTheorem 3 (Optimal number of representatives for general group sizes). Suppose Cost(L) is a\nconstant, there exists K\u2217 \u2208 N such that \u00b5(K\u2217) \u2265 1\n2 , \u00b5 is concave, and\n\n\u00b5(K) \u2264 1 \u2212 A/K \u03b1 for constant A < 1 and \u03b1 \u2265 0. Then, for any n, L\u2217(n) \u2265(cid:106) n\n(cid:19)\n\n2 + \u0001 for 0 < \u0001 < 1\n\n/c, where\n\n(cid:18)\n\n(cid:107)\n\nK\u2217\n\n\u22124\n\nc =\n\nln(1 \u2212 4\u00012)\n\nln\n\n32\n\n9\u00012A\n\n+ \u03b1 ln\n\n4\u03b1K\u2217\n\ne\n\n\u2212 \u03b1 ln| ln(1 \u2212 4\u00012)|\n\n.\n\nThe proof of Theorem 3 is similar to the proof of Theorem 2. Next, we extend Theorem 3 by\nshowing that the optimal partitioning function (cid:126)K is nearly homogeneous if \u00b5 is log-concave and\n1 \u2212 \u00b5 is log-convex. Log-concavity is weaker than concavity (e.g., the MAX process is log-concave).\nTheorem 4 (Near-Homogeneity of group sizes). Suppose Cost(L) is a constant, the group compe-\ntence function \u00b5(K) is log-concave and non-decreasing, and further that 1 \u2212 \u00b5(K) is log-convex.\nGiven L groups, there is an optimal partition (K1, . . . , KL) of n into the L groups with no two\ngroups differing in size by more than 1. That is, maxi Ki \u2212 mini Ki \u2264 1.\nTheorem 4 also applies to non-constant cost functions because the number of groups is \ufb01xed.\n\n7\n\n\fNumerical Example of Optimal Group Size. As an application\nof our results, we consider the uniform voters with MAX represen-\ntative selection process (Example 2) within a simple setting which\ncapturea at a very coarse level the US House of Representatives.\nWe pick a voter distribution F = Uniform[0.45, b] (of course, this\nmay not capture voters in the US, but it is just illustrative.) Below,\nwe show how the optimal homogeneous group size K\u2217 and the\nminimum group size required to achieve consistency K\u2217 depend\non b. When b is small, as voters get wiser (b goes up), K\u2217 and\nK\u2217 are decreasing. At some point, even the direct-democracy\n(K\u2217 = 1) is consistent. For very large b, the optimal group size\nstarts to increase due to the \u00b5(1\u2212 \u00b5) term. In general, the optimal\ngroup size is less than about 5, the size of the typical household. The optimal representative democracy\nis obtained when each household elects a head to vote on its behalf.\nThe optimal group size is useful to know, but for practical purposes, there may not be a signi\ufb01cant\ndifference between different values of K for a large n. Let us take the US House as an example,\nwhich has 435 representatives. Suppose the voting population is about n = 235 million, and that\nvoter competence is uniformly distributed from 0.45 to 0.52 (the average competence is slightly less\nthan 0.5), which corresponds to b = 0.52 in our example.\n\nFigure 2: Optimal group size.\n\nSuccess rate\n\nK = 1 K\u2217 = 3 K\u2217 = 9\n0%\n100%\n\n100%\n\n5 \u00d7 House\n\n97%\n\n2 \u00d7 House House\n80%\n\n88%\n\nIn this simple setting, direct democracy (K = 1) would be wrong, and the current size of the House\nis far from optimal, 20% less accurate than what is achievable. Doubling congress gets you to 88%\nand multiplying by 5 pretty much gets you to optimal. A House that is 20 times larger (each member\nrepresenting 35K citizen) would be essentially indistinguishable from optimal. This suggests that a\nmuch larger House is needed for noisy issues like the one in this example. Also note that if group\nsizes are about 5 (the size of a household) then we have near-perfect results.\n\n3.2 Optimal Group Size for Polynomial Cost of Voting\n\nWhen there is a non-constant cost to voting, things change dramatically. We consider the case\nwhere the cost and bene\ufb01t are both polynomial here, though our analysis can be extend to other\nfunctional forms, with different results. Perhaps the most intuitive cost and bene\ufb01t are both linear\n(per-representative cost and per-voter bene\ufb01t). We simply state the result.\nTheorem 5 (Optimal homogeneous group size, polynomial cost and polynomial bene\ufb01t). Suppose\nCost(L)\nBen(n) = \u0398( Lq1\n2 , \u00b5 is\nnon-decreasing, and for all K \u2208 N, \u00b5(K) \u2264 1 \u2212 A/K \u03b1 for constants A > 0 and \u03b1 \u2265 0. Then, the\noptimal group size K\u2217\n\nnq2 ) for constants q1 > 0 and q2 > 0, there exists K\u2217 \u2208 N such that \u00b5(K\u2217) > 1\n\nhom(n) = \u2126(n/ log n). Moreover, we have:\n\n(i) If limK\u2192\u221e \u00b5(K) < 1, then K\u2217\n(ii) If there exists B, \u03b2 > 0 such that \u00b5(K) \u2265 1 \u2212 B/K \u03b2, then K\u2217\n\nhom(n) = \u0398(n/ log n).\n\nhom(n) = \u0398(n).\n\n4 Summary and Future Work\n\nWe set the mathematical foundation for studying the optimal number of representatives in a rep-\nresentative democracy and showed that under general and natural conditions, the optimal is linear\nwhen the cost of voting is a constant, and logarithmic when the cost and bene\ufb01t are both polynomial.\nOur results can be extended to multiple issues. There are many open questions: Can we extend\nto inhomogeneous representative selection processes, e.g. different states use different processes\nto choose representatives? Does diversity (inhomogeneous agents) in population help make better\ndecisions? What happens if agents are strategic?\n\nAcknowledgements\n\nWe thank all anonymous reviewers for helpful comments and suggestions. LX is supported by NSF\n#1453542 and ONR #N00014-17-1-2621.\n\n8\n\n0.50.60.70.80.9upper bound on voter probability, b05101520group size, KK*K*\fReferences\n[1] Emmanuelle Auriol and Robert J. Gary-Bobo. On the optimal number of representatives. Public\n\nChoice, 153(3\u20134):419\u2013445, 2012.\n\n[2] Hossein Azari Sou\ufb01ani, David C. Parkes, and Lirong Xia. Statistical decision theory approaches\nto social choice. In Proceedings of Advances in Neural Information Processing Systems (NIPS),\nMontreal, Quebec, Canada, 2014.\n\n[3] Bruce Bartlett. Enlarging the House of Representatives. https://economix.blogs.\nnytimes.com/2014/01/07/enlarging-the-house-of-representatives/,\n2014.\n\n[4] Ruth Ben-Yashar and Jacob Paroush. A nonasymptotic Condorcet jury theorem. Social Choice\n\nand Welfare, 17(2):189\u2013199, 2000.\n\n[5] Ruth Ben-Yashar and Jacob Paroush. Investment in Human Capital in Team Members Who Are\nInvolved in Collective Decision Making. Journal of Public Economic Theory, 5(3):527\u2014539,\n2003.\n\n[6] Ruth Ben-Yashar and Mor Zahavi. The Condorcet jury theorem and extension of the franchise\n\nwith rationally ignorant voters. Public Choice, 148(3):435\u2013443, 2011.\n\n[7] Daniel Berend and Jacob Paroush. When is Condorcet\u2019s Jury Theorem valid? Social Choice\n\nand Welfare, 15(4):481\u2013488, 1998.\n\n[8] Daniel Berend and Luba Sapir. Monotonicity in Condorcet Jury Theorem. Social Choice and\n\nWelfare, 24:83\u201392, 2005.\n\n[9] Daniel Berend and Luba Sapir. Monotonicity in Condorcet\u2019s Jury Theorem with dependent\n\nvoters. Social Choice and Welfare, 28(3):507\u2013528, 2007.\n\n[10] Timothy Besley and Stephen Coate. An Economic Model of Representative Democracy. The\n\nQuarterly Journal of Economics, 122(1):85\u2013114, 1997.\n\n[11] Ioannis Caragiannis, Ariel D. Procaccia, and Nisarg Shah. When Do Noisy Votes Reveal the\n\nTruth? ACM Transactions on Economics and Computation, 4(3):Article No. 15, 2016.\n\n[12] Gal Cohensius, Shie Mannor, Reshef Meir, Eli Meirom, and Ariel Orda. Proxy Voting for Better\nOutcomes. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent\nSystems, pages 858\u2013866, 2017.\n\n[13] Marquis de Condorcet. Essai sur l\u2019application de l\u2019analyse \u00e0 la probabilit\u00e9 des d\u00e9cisions\n\nrendues \u00e0 la pluralit\u00e9 des voix. Paris: L\u2019Imprimerie Royale, 1785.\n\n[14] Vincent Conitzer and Tuomas Sandholm. Common voting rules as maximum likelihood estima-\ntors. In Proceedings of the 21st Annual Conference on Uncertainty in Arti\ufb01cial Intelligence\n(UAI), pages 145\u2013152, Edinburgh, UK, 2005.\n\n[15] Edith Elkind and Nisarg Shah. Electing the Most Probable Without Eliminating the Irrational:\nVoting Over Intransitive Domains. In Proceedings of the 30th Conference on Uncertainty in\nArti\ufb01cial Intelligence, pages 182\u2013191, 2014.\n\n[16] Scott L. Feld and Bernard Grofman. The accuracy of group majority decisions in groups with\n\nadded members. Public Choice, 42(3):273\u2013285, 1984.\n\n[17] Mark Fey. A note on the Condorcet Jury Theorem with supermajority voting rules. Social\n\nChoice and Welfare, 20(1):27\u201332, 2003.\n\n[18] Brian Flynn. What\u2019s wrong with Congress? It\u2019s not big enough. https://www.cnn.com/\n\n2012/03/09/opinion/flynn-expand-congress/index.html, 2012.\n\n[19] James E. Gentle. Computational Statistics. Springer, 2009.\n\n[20] Mark Gradstein and Shmuel Nitzan. Organizational decision-making quality and the severity of\n\nthe free-riding problem. Economics Letters, 23(4):335\u2013339, 1987.\n\n9\n\n\f[21] Bernard Grofman, Guillermo Owen, and Scott L. Feld. Thirteen theorems in search of the truth.\n\nTheory and Decision, 15(3):261\u2013278, 1983.\n\n[22] Keith Humphreys. Why we might want to grow the House of Representatives by 250\nmore seats. https://www.washingtonpost.com/news/wonk/wp/2016/10/03/\nwhy-we-might-want-to-grow-the-house-of-representatives-by-250-more-seats/\n?utm_term=.2948bef5e749, 2016.\n\n[23] Anson Kahng, Simon Mackenzie, and Ariel D. Procaccia. Liquid Democracy: An Algorithmic\n\nPerspective. In Proc. 32nd AAAI Conference on Arti\ufb01cial Intelligence, 2018.\n\n[24] Satoshi Kanazawa. A brief note on a further re\ufb01nement of the Condorcet Jury Theorem for\n\nheterogeneous groups. Mathematical Social Sciences, 35(1):69\u201373, 1998.\n\n[25] Drora Karotkin and Jacob Paroush. Incentive schemes for investment in human capital by\n\nmembers of a team of decision makers. Labour Economics, 2(1):41\u201451, 1995.\n\n[26] Drora Karotkin and Jacob Paroush. Optimum committee size: Quality-versus-quantity dilemma.\n\nSocial Choice and Welfare, 20:429\u2013441, 2003.\n\n[27] Nicholas R. Miller. Information, Electorates, and Democracy: Some Extensions and Interpreta-\ntions of the Condorcet Jury Theorem. In Grofman B. and Owen G., editors, Information Pooling\nand Group Decision Making, pages 173\u2014192. JAI Press, 1986.\n\n[28] Kaushik Mukhopadhaya. Jury Size and the Free Rider Problem. Journal of Law, Economics,\n\nand Organization, 19(1):24\u201344, 2003.\n\n[29] Shmuel Nitzan and Jacob Paroush. Investment in Human Capital and Social Self Protection\n\nunder Uncertainty. International Economic Review, 21(3):547\u2013557, 1980.\n\n[30] Shmuel Nitzan and Jacob Paroush. The signi\ufb01cance of independent decisions in uncertain\n\ndichotomous choice situations. Theory and Decision, 17(1):47\u201360, 1984.\n\n[31] Shmuel Nitzan and Jacob Paroush. Collective decision making and jury theorems. In Francesco\nParisi, editor, The Oxford Handbook of Law and Economics: Volume 1: Methodology and\nConcepts. Oxford University Press, 2017.\n\n[32] Guillermo Owen, Bernard Grofman, and Scott L. Feld. Proving a distribution-free generalization\n\nof the Condorcet Jury Theorem. Mathematical Social Sciences, 17(1):1\u201316, 1989.\n\n[33] Jacob Paroush. Stay away from fair coins: A Condorcet jury theorem. Social Choice and\n\nWelfare, 15(1):15\u201320, 1998.\n\n[34] Jacob Paroush and D. Karotkin. Robustness of Optimal Majority Rules Over Teams with\n\nChanging Size. Social Choice and Welfare, 6(2):127\u2013138, 1989.\n\n[35] Marcus Pivato. Voting rules as statistical estimators. Social Choice and Welfare, 40(2):581\u2013630,\n\n2013.\n\n[36] Ariel D. Procaccia, Sashank J. Reddi, and Nisarg Shah. A maximum likelihood approach for\nselecting sets of alternatives. In Proceedings of the 28th Conference on Uncertainty in Arti\ufb01cial\nIntelligence, 2012.\n\n[37] Luba Sapir. Generalized means of jurors\u2019 competencies and marginal changes of jury\u2019s size.\n\nMathematical Social Sciences, 50(1):83\u2013101, 2005.\n\n[38] Peter Stone and Koji Kagotani. Optimal Committee Performance: Size versus Diversity. draft,\n\n2013.\n\n[39] Lirong Xia. Bayesian estimators as voting rules. In Proceedings of the Thirty-Second Conference\n\non Uncertainty in Arti\ufb01cial Intelligence, pages 785\u2013794, 2016.\n\n[40] Lirong Xia and Vincent Conitzer. A maximum likelihood approach towards aggregating partial\nIn Proceedings of the Twenty-Second International Joint Conference on Arti\ufb01cial\n\norders.\nIntelligence (IJCAI), pages 446\u2013451, Barcelona, Catalonia, Spain, 2011.\n\n[41] H. Peyton Young. Condorcet\u2019s theory of voting. American Political Science Review, 82:\n\n1231\u20131244, 1988.\n\n10\n\n\f", "award": [], "sourceid": 2286, "authors": [{"given_name": "Malik", "family_name": "Magdon-Ismail", "institution": "Rensselaer"}, {"given_name": "Lirong", "family_name": "Xia", "institution": "RPI"}]}