By Warren D. Smith. First Draft 16 Oct. 2008. Declaring it final 27 January 2009. Email comments to: warren.wds AT gmail.com.
Regard this paper as "Part III" of the previous The Best Rank-Order Voting System versus Range Voting. The same key definitions and notions such as "Bayesian regret" (BR) are re-used (see the glossary in that paper). Part I had mostly analysed the "random normal elections model" (RNEM), but noted that the underlying idea in its section 1 allowing us to identify "best voting systems" did not depend on the RNEM and was valid much more generally. We now attempt to break free of the limitations of the RNEM and/or of the kind of voter "strategy" considered in Parts I and II, by considering other models, e.g. "D-dimensional politics." We find:
We shall also prove that median-based range voting can outperform (exhibit lower BR than) ordinary average-based range voting in a certain contrived model involving 1-sided strategists and sharply-peaked honest utility distributions. All our computer simulations involving unbiased strategists, however, have not detected any advantage for medians (indeed, generally they show lower BRs for average-based range).
In Part I, the key property of the RNEM which allowed our analytic machinery to crank out exact formulas for Bayesian Regrets, was the massive amount of statistical independence available. As we shall explain near the start of section 2, that resource is no longer available in models of "D-dimensional politics." Hence it naively seems as though any computations of regret in D-dimensional politics models would need to be far more difficult, requiring doing integrals with far greater dimensionality. However, fortunate miracles occur.
This paper proposes two mildly-realistic-seeming kinds of D-dimensional politics models:
It appears that the optimality of, e.g, Condorcet voting systems in model A is an artifact of our assumption of exact spherical symmetry. Under perturbations, no matter how slight, away from spherical symmetry, we shall show the optimality instantly vanishes. In contrast, range's superiority over every rank-order voting system in Part I and Part II, is robust to small perturbations. Also, the optimality of range voting in model B is an artifact of its assumption that a "full democratic marketplace" exists.
The fact that our voting-system-optimality theorems can be regarded as artifacts of our models, constitutes a legitimate criticism of the present paper (although not of Parts I & II). As a defense, we note that the models employed nevertheless do seem somewhat realistic and uncontrived.
The two main nonobvious mathematical techniques employed are
Background: We shall assume a fairly sophisticated reader, who understands and is familiar with several standard techniques of statistical argumentation and related facts, e.g. "binomial distribution," "central limit theorem," "laws of large numbers," and Chernoff and Chebyshev "tail bounds"; since we intend to use them without providing the details. (We take a small amount of mercy on the reader by providing an abbreviated appendix A summarizing some of that.) For a reader who can just do those details in her head (or see that they will work) this is no obstacle, but a reader unfamiliar with these things will probably find some parts of our proofs mysterious. The reader should also know some linear algebra ("orthogonal matrices," "linear transformations"). We also shall assume more understanding of voting-method jargon than Part I assumed. Finally, understanding all about Fourier transforms and orthogonal polynomials would help; we shall not explicitly use either, but will use reasoning analogous to well-trod logic-trails in those areas.
At first I thought almost exactly the same theorems and proofs we devised in the RNEM would also work for certain models of issue-based politics. However, as we shall see, they do not; and most of the techniques Part I was able to throw at the problem under the RNEM, now become inapplicable.
DEFINITION of NORMAL ISSUE-BASED-POLITICS MODEL: For some fixed integer D≥1, each voter and each candidate get assigned an "issue stance vector" from a D-dimensional standard normal distribution. Then voter X has, as her election-utility for candidate Y, a decreasing function of the distance S between them, which is differentiable as a function of the locations of X and Y, and which tends sufficiently quickly to a constant when S→∞.
REMARK: One specific utility formula which I have successfully used in computer investigations is U=1/(0.6D+S2). This was chosen to behave like a generic smooth peak [locally quadratic with height 5/(3D)] for S≈0; it is a decreasing function of |S|, and falls to a constant value (namely 0) as S→∞. However, this formula actually may not fall off fast enough as S→∞ for some of our proof techniques.
DEFINITION
of HYPERCUBE ISSUE-BASED-POLITICS MODEL:
Same thing except use "uniform in the interval
REMARK ON TIES: In some proofs below, we shall assume that there are enough voters that perfect-tie elections can be regarded as neglectibly uncommon.
REMARK ON INFINITE DIMENSIONAL LIMIT: In the D→∞ limit, both issue-based politics models become equivalent (up to scaling factors which have no real effect) to the RNEM. (Note, this kind of development of statistical independence in the large-D limit is a useful mental tool in other contexts, including some later in this paper.)
Do theorems 1, 2, 3, etc from part I still hold when transplanted into the normal and hypercube issue-based politics models? No (although for a brief shining moment I naively thought "yes"). The reason is that now voters' utilities (and hence votes too) are dependent. Consider a decision by the election system to do something beneficial for voter #1. Or voter #2. Now one might naively say "due to linearity of expectation, the expected summed-utility for both voters together, is got by maximizing expectation #1 plus expectation #2, and this is true regardless of any dependencies." That naive statement is, in fact, true. The problem is not the truth of that statement – it is that it is invalid to apply it at all. Because: The best voting system does not maximize expected summed-utility; it maximizes conditional expected summed-utility, conditioned on the votes. And that, with general dependencies, is not the same thing as the sum of utility #1 conditioned on vote #1, plus utility #2 conditioned on vote #2. (It just happens, due to total independence of everything from everything else, as well as total even-symmetry, that this equality is valid in the RNEM; but it is not valid for the D-dimensional issue-based politics models here with D finite.)
This devastates most of Part I's theorems, although the overall conceptual idea from its section 1 for identifying "best" voting systems, remains true.
Despite the total failure of our RNEM-friendly techniques (which had all relied heavily on independence) in, e.g. the hypercube and normal issues-based politics models, we still can identify an asymptotically optimum voting system in the latter model provided we restrict ourselves to the special case where the normal distribution of the voters in issue-space is spherically symmetric! The reason is simple: regret is always non-negative. We shall prove Condorcet voting methods achieve (with probability→1 in the V→∞ limit) exactly zero regret, and further infer that they have zero expected regret. Therefore, Condorcet must be optimal in the infinite-voter limit.
THEOREM 1 (Condorcet optimality): Let the V voters and N candidates each be points in a D-dimensional issue space with N≥1 and D≥1 fixed. Let the utility of each candidate for each voter be a fixed decreasing function of the Euclidean distance between them, which decreases to zero sufficiently quickly at ∞ (or it is acceptable for it to tend to any other constant, of course). Suppose the voters are distributed according to a radially-decreasing probability density spherically symmetric about some centerpoint C, with each voter chosen independently from that density (and again we need this density to decrease sufficiently quickly at ∞; it will suffice if it is a normal distribution or has compact support). But no assumption is made about the locations of the candidates aside from supposing their locations are sufficiently generic that the N distances from the candidates to C are distinct. Then in the limit V→∞:
Proof:
1 & 2.
The theorem that for any centrosymmetric configuration of honest voters,
when utility is a decreasing function of voter-candidate distance, a
Condorcet winner always exists and is the closest candidate to the center
of symmetry, apparently was first proved by
Davis, DeGroot, and Hinich 1972 (their
theorems 1 and 4 and corollary 2).
Our probability→1 result in the V→∞ limit then follows
from the "strong law of large numbers" (SLLN) from probability theory applied to the
pairwise-preference counts. Each pairwise preference count is a sum of random variables
whose expectations arose from a precisely centrosymmetric density and
which thus obey the DDH 1972 theorems. The SLLN causes those expectations to be approached
arbitrarily closely with probability→1 in the V→∞ limit.
That causes a Condorcet winner to exist and be the closest to C,
yielding zero regret, with probability→1. Thanks to Chernoff tail bound,
nonzero regret occurs with probability
going to zero faster than
3. Beckner 1975 (on his pages 171-172) proves that "convolution preserves the class of radial decreasing functions" in any dimension. His proof requires the functions to be sufficiently well behaved, e.g. decreasing quickly enough at ∞, that their Fourier transforms exist. Hence, the expected utility of a candidate for a random voter, is a radially-decreasing function. From this and the SLLN again, the candidate closest to C will, with probability→1 in the V→∞ limit, have the greatest (summed over the voters) utility.
4. Immediate from combining 1, 2 and 3 (at least, provided the probability→1 in those claims converges quickly enough to 1; certainly that is valid in situations where the Chernoff bound is applicable).
5. We construct 1-dimensional 3-candidate examples in this model in which both range voting and every weighted positional method deliver "wrong winners" with probability→1 in the V→∞ limit:
We also remark that in 2 dimensions with 14 randomly located candidates in a square, usually some of the candidates are found to lose under Borda, Range, IRV, Plurality, or AntiPlurality voting (which candidate, may change with the voting system) even for a Gaussian circularly symmetric voter distribution centered exactly at that candidate.
QED.
Before the reader rushes to adopt Condorcet and abandon range voting, though, let us make the following points.
4ε2Y2 + 4X2 |
THEOREM 2 (sensitivity to perturbation): If the voters are not distributed according to an exactly circularly symmetric normal density, but instead according to an ellipsoidal one with principal axes 1 and 1+ε, then no matter how small ε≠0 is, Condorcet voting will no longer be optimal and indeed range voting will have less regret (with probability→1 in the V→∞ limit). This is all provided that the locations of the N candidates and the utility function are appropriately designed.
Proof: We shall employ some sufficiently large number N of candidates distributed circularly symmetric normally outside of a circular "hole" with radius A, for some small constant A with 0<A<1/2. We make this normal density have characteristic width 1/A. Then (as we saw in the preceding theorem) Condorcet will elect the candidate closest to C, which is the center of both the hole and the voter-Gaussian and the candidate-Gaussian, with probability→1 in the V→∞ limit. Make the utility function of voter-candidate distance S have utility=1 if 0≤S<2A and then decrease smoothly toward 0 as S increases, in fact having utility=0 if S>1/A. If N is large, then with high probability the Condorcet candidate will both exist (by Davis et al.) and lie near the inner (radius=A) circle and at a uniformly random angle. It therefore will have mean regret bounded below by a positive number which depends on ε.
This particular scenario has been set up, meanwhile, to cause honest range voting to act like "honest utility voting" because (with probability→1) essentially every voter's min- and max-utility candidates will have utilities 0 and 1 respectively, so that every voter rescales her utility with the same scale factor to generate her range vote. Range voting therefore will (with probability→1 in the V→∞ limit) deliver the optimum-utility winner, i.e. achieving zero regret. QED.
THEOREM 3 (Honest approval's optimality with independent distance-threshholds): Let the V voters and N candidates each be points in a D-dimensional issue space with N≥1 and D≥1 fixed. Let the utility of each candidate for each voter be a fixed decreasing function of the Euclidean distance between them, which decreases to zero sufficiently quickly at ∞ (or it is acceptable for it to tend to any other constant, of course). Suppose the voters are distributed according to a radially-decreasing probability density spherically symmetric about some centerpoint C, with each voter chosen independently from that density (and again we need this density to decrease sufficiently quickly at ∞, but never reaching 0). Suppose the distances from the candidates to C are distinct. Suppose we employ "approval voting" where each voter approves all candidates Euclidean distance<R away from her location (and if the distance exactly equals R, then she tosses a fair coin to decide whether to approve him). Here R can be a positive constant, or each voter independently chooses R at random according to some fixed probability density on the positive reals. (Either assumption yields a valid theorem.) Then in the limit V→∞, the best (i.e. regret-minimizing) candidate will be elected with probability→1.
Proof. Employ the same Beckner theorem as in part 3 of the proof of theorem 1 to see that expected number of approvals garnered by a candidate will be a decreasing radial function of his position. Then by the strong law of large numbers the closest candidate to C (the center of symmetry of the voter distribution) will win, which as we proved before is the one with least regret – both claims holding with probability→1 in the V→∞ limit. QED.
THEOREM 4 (Optimality of iterated strategic approval voting): Same assumptions as the preceding theorem, except now we repeatedly redo the approval voting election until the winner stops changing; each voter strategically changes her distance-threshhold R each election to be the distance to the winner of the preceding round. Then in the limit V→∞, the procedure will terminate after a finite number of rounds, whereupon it will elect the best (i.e. regret-minimizing) candidate – both with probability→1.
Proof. From theorem 1 we know that a Condorcet winner W exists with probability→1. Because W is pairwise-preferred over each rival X, W will, in every round after the first, get more approvals than the preceding round's winner (with probability→1). Hence the winner must keep changing from round to round until the winner is W at which point it must stay there, whereupon the process terminates. Since each round the winner changes (with probability→1) to somebody pairwise-preferred, and since we know from Davis, DeGroot, and Hinich 1972 and the strong law of large numbers that (with probability→1) no cycles exist in the pairwise-preference digraph, the process must terminate. QED.
REMARK. The optimality of these kinds of approval voting also are artifacts of spherical symmetry and the exact same theorem 2 and proof show that.
The "YN-model" is a simple model of voting (with a fair amount of realism) in which range voting performs optimally with "honest voters."
DEFINITION: There are D independent "issues" and 2D "candidates"; each candidate announces a clear yes or no stance on each issue, and by some benificent miracle we have a full democratic marketplace, i.e, every possible issue-stance is represented. Hence in the case D=4 we could name the 16 candidates YYYY ("yes" on all 4 issues), YYYN, YYNY,..., NNNN. We shall suppose, for each voter, the utility of electing any given candidate is just the number of issues on which that candidate agrees with the voter. Finally, we suppose without loss of generality that all D issues each would win if put to a single-issue yes/no vote, therefore the objectively best candidate is YYYY.
(The "without loss of generality" is because we could just rename the issues to make "yes" become "no" for that issue, e.g. the issue "raise taxes?" becomes "do not raise taxes," so with such canonical renaming we can always cause the best candidate to be named YYYY.)
REMARK: Most previous analyses of voting system "properties" ignore the fact that "issues" exist and candidates have stances on them. The YN model is approximately the simplest and most natural possible voting model which doesn't ignore that. It turns out that honest range voting is always optimal in the YN model and is "self consistent." But most other commonly proposed voting methods turn out to be both "self contradictory" and non-optimal (indeed they can even behave pessimally) in the YN model.
THEOREM 5 (Optimality of Range versus pessimality of various other voting systems in YN model):
Proof:
a. With honest range voters, "Y" (on the first issue) can be regarded as getting a 1-vote or a 0-vote (for those voters who support or oppose issue #1), and ditto for each other issue, as a contribution to each candidate's range score. This view is mathematically equivalent thanks to our postulate the honest range voters linearly interpolate their range votes (between the best and the worst candidates getting the max and min range votes D and 0) based on utilities. Since we have postulated that Y beats N on each issue, the candidate YYYY beats every other candidate. In other words, range voting always delivers the best possible winner in the YN model.
To put it a different way: since all range votes have the same max-score D and min-score 0 they all are compatibly scaled utilities and hence the greatest-utility candidate must be elected.
b. Say 71% of the voters each have ideologies consisting of 71% Ys (all such ideologies equinumerous) while 29% have the NNN...NN ideology. Then 50.4%=71%² of the voters prefer Y on any given issue, so by majority vote, Y would win on each issue individually. But NNN..NN is the plurality-vote winner with 29% of the vote, while YYY..YY gets zero top-rank votes, and indeed every candidate besides NNN..NN gets (in the D→∞ limit) zero percent of the votes – almost the ultimate in "vote splitting." Note: throughout this proof, when we say "in the D→∞ limit" we actually mean "for every sufficiently large D."
As a very concrete example of this kind of idea with D=4 (this election data was created artificially by S.J.Brams for the purpose of solving this problem), let the plurality votes for each candidate be as follows in a 31-voter, 4-issue, 16-candidate election
candidate | votes | candidate | votes |
---|---|---|---|
YYYY | 0 | YNYY | 4 |
YYYN | 4 | YNYN | 1 |
YYNY | 4 | YNNY | 1 |
YYNN | 1 | YNNN | 1 |
NYYY | 4 | NNYY | 1 |
NYYN | 1 | NNYN | 1 |
NYNY | 1 | NNNY | 1 |
NYNN | 1 | NNNN | 5 |
Then on each issue Y beats N by 16 to 15. (For example, just look at the first letter of each candidate's name to see Y on the first issue wins by 16:15 over N.) But nevertheless YYYY gets zero votes and NNNN wins a plurality election with 5 votes.
c. Instant runoff (IRV) never elects a candidate with zero top-rank votes (since it always eliminates that candidate in the first round). [Also Coombs,which is like IRV except that it eliminates the candidate with the most bottom-rank votes each round, cannot elect YYYY since it eliminates it immediately.] Professional IRV-advocate Rob Richie once pronounced this an "advantage" of IRV. However, in the present example, it is clearly a disadvantage, because the clearly best candidate YYYY gets zero top-rank votes.
A fully explicit example with 83 voters, created by Abd ul-Rahman Lomax and myself, is shown in the table. The top-preference of each voter is, of course, the issue-stance of that voter and we have ranked all 16 candidates consistently with that top ranking, but breaking utility-ties somewhat arbitrarily.
#voters | Their honest vote |
---|---|
32 | YYYY>YYYN>YYNY>YNYY>NYYY>YYNN>YNYN>YNNY>NYYN>NYNY>NNYY>YNNN>NYNN>NNYN>NNNY>NNNN |
10 | YNNN>YNNY>YNYN>YYNN>NNNN>YNYY>YYNY>YYYN>NNNY>NNYN>NYNN>YYYY>NNYY>NYNY>NYYN>NYYY |
10 | NYNN>NYNY>NYYN>NNNN>YYNN>NYYY>NNNY>NNYN>YYNY>YYYN>YNNN>NNYY>YYYY>YNNY>YNYN>YNYY |
10 | NNYN>NNYY>NNNN>NYYN>YNYN>NNNY>NYYY>NYNN>YNYY>YNNN>YYYN>NYNY>YNNY>YYYY>YYNN>YYNY |
10 | NNNY>NNNN>NNYY>NYNY>YNNY>NNYN>NYNN>NYYY>YNNN>YNYY>YYNY>NYYN>YNYN>YYNN>YYYY>YYYN |
11 | NNNN>NNNY>NNYN>NYNN>YNNN>NNYY>NYNY>NYYN>YNNY>YNYN>YYNN>NYYY>YNYY>YYNY>YYYN>YYYY |
In this election Y wins on issue #1 by 42-to-41 majority (and ditto for any other issue). But NNNN is the election winner using any of these Condorcet methods {Basic Condorcet, Schulze beatpaths, Tideman Ranked Pairs, Nanson-Baldwin, Simpson-Kramer min-max} and also Bucklin and IRV.
d.
The YN model with D=6 can exhibit a Condorcet cycle
among the three candidates named
YYNNNN, NNYYNN, and NNNNYY.
This will happen if there are three equinumerous types of voters whose
ideologies are
YYYNNN, NNYYYN, and YNNNYY.
e. Let 51% of the voters have ideologies which are 49% Y. The remaining 49% of the voters have ideologies which are 60% Y. (All such ideologies equinumerous.) Then Y beats N on every individual issue. Now compare a candidate with fraction-P of Ys in its name, versus a candidate with fraction-Q of Ys in its name, 0≤P<Q≤1. The former candidate is preferred arbitrarily-near-unanimously within the 51%-bloc of voters, hence by an overall majority, if
This idea also will usually cause IRV to elect the worst candidate NNN...NN if D is large enough, and we throw in enough NNN...NN voters to make sure that NNN...NN does not get eliminated in an early round for having zero top-rank votes, and we randomly pre-perturb the vote-counts slightly to break ties.
f. As a wholy-explicit Borda example with D=4, take 21 YYYY voters versus 10 each for YNNN, NYNN, NNYN, NNNY (and use Borda score expectation value among all random honest orderings of the candidates); then the unique Borda winner is NNNN even though any individual issue is won by Y by a 31-to-30 majority.
g. If 70% of the voters each have ideologies consisting of 70% Ns (all such ideologies equinumerous) while 30% have the YYY...YY ideology, then the worst candidate NNN...NN is majority-preferred over the best candidate YYY...YY, and NNN..NN is the Approval Voting winner if D is large enough for any fixed F with 0<F<100%. Note 49%=70%2 of the voters prefer N on any given issue, so by 51-49 majority vote, Y would win on each issue individually.
More-concrete examples: Also, of course, the Brams D=4 example for plurality voting is also valid for approval voting if the voters only approve their top choice, i.e. the top 1/16 of the candidates. In that example Abd ul-Rahman Lomax also notes that voters approving only their favorite if he is a "frontrunner" but approving both their favorite and successive candidates worse than him until they reach a "frontrunner"... will sadly still elect NNNN if we regard "frontrunners" as those with 4 or more plurality votes.
Further, in the 61-voter concrete D=4 example in item "f", if voters approve their top 5 candidates, then NNNN wins with 40 approvals; no rival gets more than 30.
Finally we note that range voting can elect NNNN if the voters are strategic exaggerators rather than "honest," i.e. if we assume they vote in "approval style." Also, if voters have different strengths of feeling, e.g. if the NNNN voters were extra passionate (this differs from the utilities assumed in the theorem statement), then range can elect NNNN, albeit in that case this might be "good behavior," not a "malfunction."
QED.
REMARK: The perfect behavior of range voting in the YN model can be regarded as an artifact of the YN model's simplistic assumption that all voters have equal "strength of feeling," combined with the assumption of a "full democratic marketplace."
Incidentally, in the contemporary USA the plurality voting system has engendered "two-party domination" (via "Duverger's law," see, e.g, Riker 1982) in which there are only 2 politically-feasible issue-stance vectors not 2D. E.g, in 2006 USA, a candidate who is against "gay rights" will also nearly-automatically be against abortion rights, for lower taxes for the rich, and for the Iraq War, even though, logically speaking, there seems little connection between these.
Theorem 5 totally settles the worst-case behavior of the most-common voting system proposals in the YN model. However, these pathologies all may have little practical interest because
To dodge the latter criticism, we now examine the YN model for random voters.
DEFINITION: In the RANDOM-VOTER YN MODEL every possible issue-stance (from among the 2D) is equally likely and we pick some large number V (far exceeding the number 2D of candidates) of such random voters. [This contrasts with the original YN model, where the voters could be distributed in a nonrandom, indeed adversarially chosen, manner.] Again we assume a full democratic marketplace: all 2D possible candidates exist. We then canonize the issues by reversing some of their signs so that for each issue considered alone, "Y" is the majority winner. (This canonical-renaming is purely for convenience and has no genuine effect.)
THEOREM 6 (BRs in random YN model): In the random YN model, call a candidate "bad" if he has over 50%, and "regrettable" if he has more than a constant positive fraction, of Ns in his name.
Proof sketches:
a. Same proof as in the preceding theorem.
b. Consider
Specifically (using Chernoff), the gap in vote-count between the winningest and losingest
candidates will
with high probability be
c. We use the same argument as in (b), altered appropriately to make it work for approval voting where each voter approves candidates disagreeing with her on at most a fraction F of the issues. The number of candidates approved by each voter is, for any fixed F with 0≤F<50%, exponentially tiny compared to the number 2D of candidates (for D large). To be precise, the total number of approved candidates is
which, while large, is exponentially tiny compared to 2D, i.e. a random candidate has a tiny chance 2-DA of being approved by a random voter. The expected number of approvals got by any particular candidate then is
The variance in this number (easily computed since all voter-locations were independent) then is essentially the same as the number itself and by the Chernoff bound it has exponentially-small tails. Further, the correlation between the approval counts for two candidates further than F issues apart, is essentially nil. We can conclude from this that if we obliterate enough voters to stop the approval-winner winning while adding enough to make an arbitrary far-away loser become the winner, we will with high probability need to mess with O(V1/2A1/2D1/22-D/2) voters. And as we argued before, this will with high probability in the D→∞ and V/2D→∞ limits not be enough to affect the canonizing renaming.
In both (b) and (c), note that the argument about removing voters for W while adding voters for L works for any set of voters {supporting L and not W} of that cardinality. The same argument could then be re-used to go backwards, and this works for any L. The point of those remarks is their complete symmetry: the canonizing renaming yields essentially no constraint on who wins.d & e. These are going to be consequences of a more general theory. Range voting in the random YN model (which is optimal) is equivalent to a certain weighted positional voting system. Specifically, the top weight is D, the next D weights are D-1, the next binomial(D,2) weights are each D-2,..., the next binomial(D,k) weights are each D-k,... the last (2Dth) weight is 0. But Borda and "approve the top half of the candidates" both are weighted positional systems with quite different, and non-optimal, weights. Consider the candidates and voters as D-bit binary words (defining their issue-stances). With range voting the range vote score of the candidates is a linear function of the candidate-defining binary vector. With some other voting system we instead get a nonlinear function.
Any such function can be thought of as the sum of its best (under the L2 norm) linear approximation, plus nonlinear terms. The best linear approximation of the Borda or 50%-approval weights is (by symmetry) the same as the range-voting weight function (up to a scaling factor which is irrelevant). The nonlinear terms are therefore the sole reason they fail to deliver always-optimum results.
In fact, in the jargon of the "Boolean Fourier transform" (BFT, see appendix B), the vote-totals for the candidates are, as a function of the D-bit candidate-defining word, precisely given by the convolution of the weight-function defining that voting system, with the voter-population-function defining how many voters of type T there are, for each D-bit word T.
Because of theory in appendix B, the best linear approximation is given by the 1+D linear terms in the Boolean Fourier series, while the remaining 2D-D-1 terms, all of which are orthogonal, are nonlinear.
Now the convolution of two functions arises from multiplying their BFTs. By the central limit theorem the voter-population-function just becomes a vector of 2D identical independent random-normal deviates in the large-V/2D limit. The same is true of its Boolean Fourier Transform because the BFT is an orthogonal linear transformation in 2D-dimensional space. So when we multiply this by the BFT of the voting system's weight function, we leave the weight-function's BFT-coefficients unaffected in the sense that all coefficients are being multiplied by identical Gaussian random deviates, hence their relative magnitudes do not change in any systematic way.
Our idea is that if the sum-of-squares of the nonlinear Fourier coefficients is large enough compared to the sum-of-squares for the D nonconstant linear coefficients, then it becomes improbable that the voting system will deliver an optimum winner or one agreeing with it on more than a constant fraction of issues.
The function of bit-sum that arises as the weight function for 50%-approval voting is a step function, while for Borda voting it resembles the "error function" rescaled so that its rise has characteristic width of order D1/2. For either, we can see immediately from Parseval's equality that in the large-D limit, the L2 norm of the nonlinear terms vastly swamps that of the linear terms in the weight function's Fourier series, by, indeed, exponential(D) factors.
The result, in the V/2D→∞ limit where the BFT is just taking an orthogonal transformation of normal randoms, is that the nonlinear terms in the Fourier series of the vote-total function of candidate-defining-binary-word Q, all added together and evaluated at all 2D locations, form a 2D-dimensional vector of identical independent random normals conditioned on having sum=0 and exactly zero correlation with each bit of Q. The amplitudes each of these normals is within a constant factor of the maximum of the linear function arising from the linear terms. In view of this, the probability→1 that the resulting normal randoms will include some with arbitrarily great-factor larger magnitude than the maximum of the linear part.
Knowing that, one finally can argue that, for a sufficiently small but fixed ε>0, the probability→1 that the maximum vote will be for a candidate with at least a fraction ε of "Ns" in his name. QED.
CONJECTURE: With probability→1, no Condorcet winner exists in the random-voter YN model in the limit where D is large and V/2D→∞; in that case the worth of Condorcet voting methods would seem to depend entirely on the worth of their "backup" methods intended to provide a winner when a "beats-all" winner does not exist.
M.Balinski & R.Laraki (2008) have recently advocated (essentially) range voting but with the winner chosen to be the candidate with the greatest median score. (The usual form of range voting elects the one with greatest average score.) Since it might be expected to be common (in the presence of a nonzero fraction of strategic voters) for two candidates to be tied for the greatest median score, B&L also advocate a specific tie-breaking method.
In 1999-2000 and since, I have done many computer simulations. In those simulations, median-based range voting never exhibited smaller BR than average-based range voting, for any modeling assumptions tried and any mixture of honest and strategic voters. Also, median-based range voting fails many logical properties which average-based obeys, for discussion see /MedianVrange.html. In my mind, this had seemed to clearly show the inferiority of median. (Incidentally, we should remark that median and average-based range are the endpoints of the continuum family of "trimmed mean" range voting methods where some specified fraction T of "outlier" scores are discarded for each candidate and then the mean taken. With T→100% from below we get median, while with T→0+ we get average-based range.)
However, in 2009 Balinski was quoted by Wall Street Journal reporter Carl Bialik as stating that average-based range voting was "a ridiculous method" because it can be manipulated by strategic voters.
To understand what Balinski could have meant by that, I suggest reading his paper. My personal view that Balinski & Laraki's "optimality" and "vulnerability to strategy" definitions were contrived – with other (but reasonable) definitions, their theorems would fail. And despite their "optimality" result for the vulnerability to strategy median-based range voting, typically about 50% of voters have incentive to exaggerate their votes on the two frontrunners maximally, as opposed to casting honest votes for them. (For example a voter whose honest opinions of A and B are both sub-median, finds it advantageous to dishonestly score one max and the other min.) This is assuming each voter possesses excellent information, i.e. knows what the median scores are going to be to high accuracy; under the analogous assumption 100% of average-based range voters would find it strategic to thus-exaggerate. With however less-perfect information, 100% of median-based range voters would find it advantageous to thus-strategize (thus-strategizing does not hurt, but can help, a voter). So I fail to see much if any advantage, in realistic political-election practice, for median-based range voting.
However, as we shall now show, under a model perhaps approximating the situation for judging figure-skating events (e.g. the Olympics), median-based or trimmed-mean-based range voting can outperform ordinary average-based range voting. Here "outperform" means "has lower Bayesian regret than," of course. Here is the model.
(This seems unrealistic for most political elections, where utilities tend to be distributed more broadly or exhibit multimodal distributions. For sample range-voting polls in USA political elections see /RangePolls.html. Indeed, Balinski & Laraki's own exit-poll range-voting study in Orsay France 2007, see /OrsayTable.html, also failed either to find sharp-peakedness or to exhibit any visible difference in strategic voting behavior in median- versus in average-based range voting elections. But this assumption might be realistic for figure skating judging.)
(This again seems unrealistic for most political elections. Indeed in the 2004 USA presidential election, an exit poll study I was a co-author of found no evidence Bush- or Kerry-supporters were more or less strategic than the other. All my computer simulations alluded to above had assumed strategicness of a voter was independent of her politics, so this biased-strategy assumption it seems essential to justify the below theorem. Again this assumption seems plausibly realistic in the case of figure skating judging.)
THEOREM 7 (Median's superiority under this model): In this sharp-peaked biased-strategy model with V→∞ voters, median-based range voting (and also trimmed-mean range voting with any sufficiently large fraction of outliers trimmed before computing the mean) delivers regret≤2ε. However, average-based range voting can deliver regret of order S/2 times full range.
The proof is trivial.
THEOREM 8: In the YN model, the answer is "no": there exists a rank-order voting method which also always delivers zero regret.
PROOF: It is not one of the usual rank-order methods you can find in political science papers/books, although it is in the well known "weighted positional" subclass of rank-order voting methods. Here is the method. Consider the (D+1)th row (which has D+1 entries with sum 2D) of Pascal's triangle:
1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 1 6 15 20 15 6 1 1 7 21 35 35 21 7 1 1 8 28 56 70 56 28 8 1 1 9 36 84 126 126 84 36 9 1
Each voter produces an honest rank-ordering of the C=2D candidates. If D=6 (say), then our voting system awards the first candidate in the ordering 6 "points", the next 6 candidates get 5 points each, the next 15 get 4 each, next 20 get 3 each, next 15 get 2 each, next 6 get 1 each, and finally the last candidate in that voter's ordering gets zero points. (This is based on the 7th row 1,6,15,20,15,6,1 of the triangle; it should be obvious how to generalize this to any D≠6.) The "winner" is the candidate with the most points. The reason this works is that, in the YN model (more precisely, its version with "unweighted issues"), this rank-order voting system happens to be equivalent (for honest voters) to score voting, which we already established was a perfect voting system in this model (regret=0 always).
The answer still is "no" for the more general "YN model with arbitrarily-weighted issues" (for any particular D fixed issue-weights) by an analogous but more complicated construction. The point is that although the number V of voters can vastly exceed the number C=2D of candidates, that can only happen if many voters are identical since there are at most 2D different kinds of voters. And the 2D allowed voter-types all are equivalent under one of the 2D symmetries of the D-dimensional hyper-rectangle. Consequently for any particular voter-type, if she uses honest range voting, then re-orders the 2D scores on her ballot to sort them from greatest to least, the result always is the same 2D dimensional vector, regardless of the voter – and this vector gives the 2D weights defining our "weighted positional" rank-order voting system, which cause it (in that YN model) to be equivalent for honest voters to range voting. Q.E.D.
That theorem will disappoint readers who wanted to see range voting proven superior to every rank-order-ballot voting system. However, here is a modification of the YN model that likely can accomplish that:
DEFINITION of the "YN plus jokers" model: In the YN model, add J additional candidates, called "jokers," to the list – so that there are C=2D+J candidates in all. Jokers have utilities (for each voter) that are almost-arbitrary real numbers, Hence in a V-voter election there will be JV additional utility numbers. The only constraints the model imposes on these numbers are:
THEOREM 9: In the YN model with added jokers, honest score voting still remains an optimal voting method (always delivers exactly zero regret). But no weighted-positional rank-order system can say that. Indeed, with a bit of evilly-clever design of the VJ joker voter-candidate utilities, we can (with random voter issue-preference vectors) systematically bias things to make the voters almost certainly elect any candidate the evil designer chooses (in an approriate limit involving C, J, and V all going to ∞).
PROOF SKETCH: The point is that there are only 2D+J weights, but there are VJ+D numbers defining the utilities, and the latter can be made far greater than the former by making V and J large enough. So with jokers there simply are not enough degrees of freedom available to enable any weighted-positional voting system to always effectively compute the true society-wide utility of electing each candidate. (Versus: Honest-voter range voting still can, and will.) And furthermore, it is not hard to see that it is not possible for any weighted-positional system even to approximately compute them well enough to always be able to know the best. To show this, you set up some scenarios where having weights enabling you to get some right, forces you to get others wrong – the proof can then be automated since it just comes down to showing a linear program has no solution, a feat available "simplex methpd" computer packages will do. We can with randomized voters in the original (no jokers) election, evilly nonrandomly design the joker utilities to systematically bias the weighted-positional voting system in favor of whatever candidate X we want – simply make the jokers have utilities (in the view of each voter) mostly slightly below X's. It is easy to see this necessarily creates large-enough statistical biases to ensure X wins with probability→1 in appropriate limits. Q.E.D.
CONJECTURE: The preceding theorem is true not only for the "weighted positional" subclass of rank-order-ballot election methods, but in fact for all rank-order methods.
I thank Abd ul-Rahman Lomax for contributing some helpful computations (noted in the text).
Noga Alon & Joel Spencer:
The Probabilistic Method, 3rd ed. J.Wiley 2008.
Michel Balinski & Rida Laraki:
A theory of measuring electing and ranking,
Proc. National Acad. Sci. USA 104,21 (May 2007) B720-725.
William Beckner: Inequalities in Fourier Analysis, Annals of Maths 102 (1975) 159-182.
Steven J. Brams: Mathematics and Democracy,
Princeton Univ. Press 2008.
Patrick Billingsley: Probability and Measure, Wiley (3rd ed. 1995).
Otto A. Davis, Morris H. DeGroot, Melvin J. Hinich:
Social Preference Orderings and Majority Rule, Econometrica 40,1 (Jan. 1972) 147-157.
William Feller:
An Introduction to Probability Theory and Its Applications, Wiley, 2 vols, 1968.
T.Hagerup & C.Rüb:
A guided tour of Chernoff bounds, Information Processing Letters 33,6 (1990) 305-308.
E.J.Gumbel: Statistics of Extremes, Columbia University Press 1958.
Hannu J. Nurmi: Comparing Voting Systems, Kluwer 1987.
William H. Riker:
The Two-party System and Duverger's Law: An Essay on the History of Political Science,
Amer. Polit. Sci. Review 76 (December 1982) 753-766.
Sheldon M. Ross: Probability models for computer science,
Academic Press 2002.
(Chapter 3 is a low-level introduction to tail bounds.)
Warren D.Smith, Jacqueline N. Quintal, Douglas S. Greene:
What if the 2004 US presidential election had been held using Range or Approval voting?,
paper #82 here
/WarrenSmithPages/homepage/works.html.
Nicolaus Tideman: Collective Decisions and Voting: The Potential for Public Choice, Ashgate 2006.
Central limit theorem: the sum of N identically distributed independent random variables, each with finite mean and variance, will in the limit N→∞ if rescaled to have unit variance and translated to have zero mean, has probability CDF which pointwise approaches the standard normal CDF.
Chernoff bound: If the summands X have finite expectation of exp(kX) for any real constant k, then the rescaled sum will have a density bounded by an exponentially-declining function of the number of standard deviations we are away from the mean.
Chebyshev bound: If X is a random variable with finite mean μ and variance σ2, then
Strong law of large numbers: The mean of N identically distributed random variables is a random variable which, when N→∞ has probability→1 of being within ±ε of the expected value of each summand (and this is true for any ε>0, no matter how small).
We define and recount known results about (what some people call) the Boolean Fourier transform. (This has also been called the "Walsh transform," the "Hadamard transform," the "Hadamard-Walsh transform," and "expansion into Boolean polynomials.")
Let F(x) be a function mapping binary n-bit words to real numbers. It is always possible to write F(x) as a polynomial in the n bits x1, x2,…, xn. Because b2=b if b is a bit (and this would also be true for {-1,+1} bits as well as our convention of {0,1} bits) we may regard the polynomial as multilinear and with degree≤n, and with at most 2n terms. One way to write this is
and then F(x) is the "Walsh transform" of f(u).
If we instead were using {-1,+1} bits then we could replace
our
Like the usual (1-dimensional real) Fourier transform, the Walsh transform enjoys a convolution theorem. Define the convolution
Then the Walsh transform of f∗g is FG.
The uniquely best (in the L2 norm) linear approximation of a Boolean function F(x) is got by employing only the constant and linear terms in the sum, i.e. arising from the "n-bit words u" with at most a single nonzero bit. Due to Parseval's equality the summed-squared-error in this approximation is just the sum of the squares of all the other 2n-n-1 Fourier coefficients f(u).
Functions of bit-sum: If a function f of binary n-bit words depends only on the bit-sum s of that word (in which case it can be alternatively be viewed as an ordinary 1-dimensional function) then its BFT also does, and its best linear approximation also does. [This inspires a common abuse of notation where we write "f(s)" instead of f(u).]