Best voting systems in D-dimensional politics models

By Warren D. Smith. First Draft 16 Oct. 2008. Declaring it final 27 January 2009. Email comments to: warren.wds AT gmail.com.

Abstract

Regard this paper as "Part III" of the previous The Best Rank-Order Voting System versus Range Voting. The same key definitions and notions such as "Bayesian regret" (BR) are re-used (see the glossary in that paper). Part I had mostly analysed the "random normal elections model" (RNEM), but noted that the underlying idea in its section 1 allowing us to identify "best voting systems" did not depend on the RNEM and was valid much more generally. We now attempt to break free of the limitations of the RNEM and/or of the kind of voter "strategy" considered in Parts I and II, by considering other models, e.g. "D-dimensional politics." We find:

In "D-dimensional issue-based politics models" instead of the RNEM, most of Part I's theoretical techniques now fail because they had relied on the massive amount of statistical independence inside the RNEM. But nevertheless we are able in a very particular D-dimensional model involving exact spherical symmetry, to prove that Condorcet voting methods are optimum in the limit of an infinite number of wholy-honest voters, in the sense that they achieve asymptotically zero regret with probability→1. (Meanwhile, range voting and every weighted positional system have positive regret.)
In the same spherically-symmetric model, certain kinds of honest and strategic approval voting also are asymptotically optimum. But the optimality of Condorcet and these flavors of approval voting both are artifacts of exact spherical symmetry destroyed by even slight asphericity.
A different D-dimensional politics model, this time with binary (yes/no) rather than continuum issues, is advanced, the "YN model." We show that range voting always delivers the max-utility winner in this model, i.e. always has zero regret for any number V of honest voters. In contrast we show that many other voting systems such as approval, Borda, instant runoff, and Condorcet can exhibit horrible pathologies in contrived sitations inside this model. More importantly, many of them exhibit significantly positive regret in more realistic randomized settings.

We shall also prove that median-based range voting can outperform (exhibit lower BR than) ordinary average-based range voting in a certain contrived model involving 1-sided strategists and sharply-peaked honest utility distributions. All our computer simulations involving unbiased strategists, however, have not detected any advantage for medians (indeed, generally they show lower BRs for average-based range).

1. Core ideas, overview, and plan

In Part I, the key property of the RNEM which allowed our analytic machinery to crank out exact formulas for Bayesian Regrets, was the massive amount of statistical independence available. As we shall explain near the start of section 2, that resource is no longer available in models of "D-dimensional politics." Hence it naively seems as though any computations of regret in D-dimensional politics models would need to be far more difficult, requiring doing integrals with far greater dimensionality. However, fortunate miracles occur.

This paper proposes two mildly-realistic-seeming kinds of D-dimensional politics models:

Continuum issue space; candidate-utilities based on the Euclidean voter-to-candidate distance; V voters distributed according to a spherically symmetric normal density in the V→∞ limit. (See section 2 for precise definitions.)
Binary yes/no issues; utilities based on the number of issues on which the voter and candidate agree; V voters; full "democratic marketplace" i.e. a candidate exists with every possible issue-stance. (See section 3 for precise definitions.)

In both models, miraculously we are able to show that certain already-known voting methods (with probability→1 in the V→∞ limit for model A) have zero regret. Because regret by definition is nonnegative, those voting methods are optimal in this limit. Meanwhile we can show that various other methods have positive regret hence are non-optimal.

It appears that the optimality of, e.g, Condorcet voting systems in model A is an artifact of our assumption of exact spherical symmetry. Under perturbations, no matter how slight, away from spherical symmetry, we shall show the optimality instantly vanishes. In contrast, range's superiority over every rank-order voting system in Part I and Part II, is robust to small perturbations. Also, the optimality of range voting in model B is an artifact of its assumption that a "full democratic marketplace" exists.

The fact that our voting-system-optimality theorems can be regarded as artifacts of our models, constitutes a legitimate criticism of the present paper (although not of Parts I & II). As a defense, we note that the models employed nevertheless do seem somewhat realistic and uncontrived.

The two main nonobvious mathematical techniques employed are

What Erdös calls "the probabilistic method in combinatorics";
The "Boolean Fourier Transform" (BFT).

The former is discussed in Alon & Spencer's book. We shall only employ it in comparatively-elementary ways. Appendix B provides facts about (ii).

Background: We shall assume a fairly sophisticated reader, who understands and is familiar with several standard techniques of statistical argumentation and related facts, e.g. "binomial distribution," "central limit theorem," "laws of large numbers," and Chernoff and Chebyshev "tail bounds"; since we intend to use them without providing the details. (We take a small amount of mercy on the reader by providing an abbreviated appendix A summarizing some of that.) For a reader who can just do those details in her head (or see that they will work) this is no obstacle, but a reader unfamiliar with these things will probably find some parts of our proofs mysterious. The reader should also know some linear algebra ("orthogonal matrices," "linear transformations"). We also shall assume more understanding of voting-method jargon than Part I assumed. Finally, understanding all about Fourier transforms and orthogonal polynomials would help; we shall not explicitly use either, but will use reasoning analogous to well-trod logic-trails in those areas.

2. Some continuum models of D-dimensional politics

At first I thought almost exactly the same theorems and proofs we devised in the RNEM would also work for certain models of issue-based politics. However, as we shall see, they do not; and most of the techniques Part I was able to throw at the problem under the RNEM, now become inapplicable.

DEFINITION of NORMAL ISSUE-BASED-POLITICS MODEL: For some fixed integer D≥1, each voter and each candidate get assigned an "issue stance vector" from a D-dimensional standard normal distribution. Then voter X has, as her election-utility for candidate Y, a decreasing function of the distance S between them, which is differentiable as a function of the locations of X and Y, and which tends sufficiently quickly to a constant when S→∞.

REMARK: One specific utility formula which I have successfully used in computer investigations is U=1/(0.6D+S²). This was chosen to behave like a generic smooth peak [locally quadratic with height 5/(3D)] for S≈0; it is a decreasing function of |S|, and falls to a constant value (namely 0) as S→∞. However, this formula actually may not fall off fast enough as S→∞ for some of our proof techniques.

DEFINITION of HYPERCUBE ISSUE-BASED-POLITICS MODEL: Same thing except use "uniform in the interval [-1,+1]" not "standard normal."

REMARK ON TIES: In some proofs below, we shall assume that there are enough voters that perfect-tie elections can be regarded as neglectibly uncommon.

REMARK ON INFINITE DIMENSIONAL LIMIT: In the D→∞ limit, both issue-based politics models become equivalent (up to scaling factors which have no real effect) to the RNEM. (Note, this kind of development of statistical independence in the large-D limit is a useful mental tool in other contexts, including some later in this paper.)

Do theorems 1, 2, 3, etc from part I still hold when transplanted into the normal and hypercube issue-based politics models? No (although for a brief shining moment I naively thought "yes"). The reason is that now voters' utilities (and hence votes too) are dependent. Consider a decision by the election system to do something beneficial for voter #1. Or voter #2. Now one might naively say "due to linearity of expectation, the expected summed-utility for both voters together, is got by maximizing expectation #1 plus expectation #2, and this is true regardless of any dependencies." That naive statement is, in fact, true. The problem is not the truth of that statement – it is that it is invalid to apply it at all. Because: The best voting system does not maximize expected summed-utility; it maximizes conditional expected summed-utility, conditioned on the votes. And that, with general dependencies, is not the same thing as the sum of utility #1 conditioned on vote #1, plus utility #2 conditioned on vote #2. (It just happens, due to total independence of everything from everything else, as well as total even-symmetry, that this equality is valid in the RNEM; but it is not valid for the D-dimensional issue-based politics models here with D finite.)

This devastates most of Part I's theorems, although the overall conceptual idea from its section 1 for identifying "best" voting systems, remains true.

2.1. Optimality of Condorcet voting systems in certain spherically symmetric D-dimensional politics models

Despite the total failure of our RNEM-friendly techniques (which had all relied heavily on independence) in, e.g. the hypercube and normal issues-based politics models, we still can identify an asymptotically optimum voting system in the latter model provided we restrict ourselves to the special case where the normal distribution of the voters in issue-space is spherically symmetric! The reason is simple: regret is always non-negative. We shall prove Condorcet voting methods achieve (with probability→1 in the V→∞ limit) exactly zero regret, and further infer that they have zero expected regret. Therefore, Condorcet must be optimal in the infinite-voter limit.

THEOREM 1 (Condorcet optimality): Let the V voters and N candidates each be points in a D-dimensional issue space with N≥1 and D≥1 fixed. Let the utility of each candidate for each voter be a fixed decreasing function of the Euclidean distance between them, which decreases to zero sufficiently quickly at ∞ (or it is acceptable for it to tend to any other constant, of course). Suppose the voters are distributed according to a radially-decreasing probability density spherically symmetric about some centerpoint C, with each voter chosen independently from that density (and again we need this density to decrease sufficiently quickly at ∞; it will suffice if it is a normal distribution or has compact support). But no assumption is made about the locations of the candidates aside from supposing their locations are sufficiently generic that the N distances from the candidates to C are distinct. Then in the limit V→∞:

A Condorcet winner (with honest rank-order votes) will exist with probability→1.
This winner will, with probability→1, be the candidate closest to C.
That will be the candidate with greatest social utility, i.e. regret=0, with probability→1.
In this model, every Condorcet voting method is optimal (i.e. exhibits regret→0 in the V→∞ limit).
In contrast, under the same assumptions (if D≥1 and N≥3) honest range voting, instant runoff voting (IRV), and every weighted positional rank-order ballot system have regrets bounded above zero, because we can exhibit situations in which they elect "wrong winners" with probability→1.

Proof:

1 & 2. The theorem that for any centrosymmetric configuration of honest voters, when utility is a decreasing function of voter-candidate distance, a Condorcet winner always exists and is the closest candidate to the center of symmetry, apparently was first proved by Davis, DeGroot, and Hinich 1972 (their theorems 1 and 4 and corollary 2). Our probability→1 result in the V→∞ limit then follows from the "strong law of large numbers" (SLLN) from probability theory applied to the pairwise-preference counts. Each pairwise preference count is a sum of random variables whose expectations arose from a precisely centrosymmetric density and which thus obey the DDH 1972 theorems. The SLLN causes those expectations to be approached arbitrarily closely with probability→1 in the V→∞ limit. That causes a Condorcet winner to exist and be the closest to C, yielding zero regret, with probability→1. Thanks to Chernoff tail bound, nonzero regret occurs with probability going to zero faster than V^-1 with large V, i.e. sufficiently quickly so that the expectation value is exactly zero in this limit.

3. Beckner 1975 (on his pages 171-172) proves that "convolution preserves the class of radial decreasing functions" in any dimension. His proof requires the functions to be sufficiently well behaved, e.g. decreasing quickly enough at ∞, that their Fourier transforms exist. Hence, the expected utility of a candidate for a random voter, is a radially-decreasing function. From this and the SLLN again, the candidate closest to C will, with probability→1 in the V→∞ limit, have the greatest (summed over the voters) utility.

4. Immediate from combining 1, 2 and 3 (at least, provided the probability→1 in those claims converges quickly enough to 1; certainly that is valid in situations where the Chernoff bound is applicable).

5. We construct 1-dimensional 3-candidate examples in this model in which both range voting and every weighted positional method deliver "wrong winners" with probability→1 in the V→∞ limit:

Range voting: consider the 1-dimensional standard normal distribution of voters, place three candidates at ±0.01 and 0, and use the utility function U=ε/(ε²+S²) which is a concave-∪ function of voter-candidate distance S except in a O(ε)-wide region near S=0. If ε>0 is sufficiently small, then honest (normalized) range voting will (with probability→1) not elect 0.
Plurality voting: same three candidates – the one located at 0 will almost surely finish in last place. This same example also works for any weighted positional system with weights (2, x, 0) with x<1 (if x is sufficiently close to 1, the example may have to be shrunk, but with enough shrinking it will always work).
IRV also will fail to elect the Condorcet Winner 0 (eliminates it in the first round since it is the plurality loser).
AntiPlurality voting: candidates at 0, 0.1, and 0.2: the winner will almost surely be 0.1 and not 0.
Borda voting: candidates at -0.1, 0.11, and 0.12: the winner will almost surely be 0.11. This example also works for any weighted positional system with weights (2, x, 0) with x≥1 (also x slightly below 1 will work).

Note the full set of weighted positional voting systems, i.e. every x from 0 to 2, is covered by our examples.

We also remark that in 2 dimensions with 14 randomly located candidates in a square, usually some of the candidates are found to lose under Borda, Range, IRV, Plurality, or AntiPlurality voting (which candidate, may change with the voting system) even for a Gaussian circularly symmetric voter distribution centered exactly at that candidate.

QED.

Before the reader rushes to adopt Condorcet and abandon range voting, though, let us make the following points.

The superiority of Condorcet voting appears to be an artifact of exact spherical symmetry. The proof depends on regret being exactly zero, which is destroyed by even a slight perturbation. (In contrast, the superiority of range voting over every rank-order voting method in the RNEM from parts I and II, is not destroyed by slight perturbations, e.g. to slight non-normality.) If the voter distribution is normal but ellipsoidal rather than spherically symmetric, then the theorem breaks down. Specifically claim 2 breaks: If we convolve the normal distribution with principal axes 1 in the X-direction and ε in the Y-direction, with a radial function, e.g. R⁴, that does not yield a radial function; in fact in this example it yields R⁴ + 2(1+ε²)R² + 3ε⁴+2ε²+3 +
4ε²Y² + 4X²
where R²=X²+Y². The boxed terms are spherically unsymmetric. Indeed, any perturbation away from precise circular symmetry, no matter how slight, to get an ellipsoidal normal distribution in the XY plane, will yield a scenario in which, with probability→1, range voting has lower regret than Condorcet voting, provided we design the geometry and utility function to make that happen. We prove that in theorem 2.
Also (relatedly), Condorcet's optimality depends on the utility function being based on L₂ distance. If it instead is based on L₁ distance (which is probably more realistic) then Condorcet winners will not necessarily minimize regret (and then RV experimentally usually seems to have better BR than Condorcet).
The alert reader may be asking "in your old (1999-2000) BR experiments, range was the BR-champion in every one of 720 different modeling assumptions. So why didn't those old experiments see the circumstances (in theorem 1 here) where Condorcet is the BR-champion?" The answer is that the old experiments involving multidimensional issue-spaces had employed a utility function not based on voter-candidate distance, but rather based on the voter-candidate vector dot-product. One can argue about which is more "realistic," distance or inner product. Probably some combination of both ingredients would be more realistic than either alone. Inner product is more realistic in at least this sense: it permits different issues to have different "importances" whereas with unweighted distance, all issues unrealistically have the "same" importance.
Theorem 1 only works for honest voters. Strategic voters can, e.g, create Condorcet cycles by "burying" top rivals, invalidating it.
Even with honest voters and exact spherical symmetry, range voting usually empirically seems to be only slightly outperformed by Condorcet. With non-spherical, e.g. ellipsoidal-normal, multimodal, or "skew," voter distributions, range voting empirically usually exhibits lower regret than Condorcet voting. And with strategic voters range tends to have lower regret. Indeed in computer experiments involving voters some of whom are strategic, range voting often is found to be more likely to elect an (honest-voter) Condorcet winner, than Condorcet methods!
There are many different Condorcet voting methods, all co-optimal in the sense of theorem 1. Furthermore, some nonCondorcet methods also are co-optimal in the same sense, e.g: "If the first V/10 voters unanimously rank some candidate top, elect him; otherwise employ a Condorcet method on the full set of voters." (We shall see some more-interesting examples in section 2.2.)
With any finite number V of voters, the probability will not be 1 that a Condorcet winner exists, and also will not be 1 that it is the candidate closest to C, and finally the probability will not be 1 that is the regret-minimizing candidate. I have no idea what the optimum voting system is in this model if V is finite.

THEOREM 2 (sensitivity to perturbation): If the voters are not distributed according to an exactly circularly symmetric normal density, but instead according to an ellipsoidal one with principal axes 1 and 1+ε, then no matter how small ε≠0 is, Condorcet voting will no longer be optimal and indeed range voting will have less regret (with probability→1 in the V→∞ limit). This is all provided that the locations of the N candidates and the utility function are appropriately designed.

Proof: We shall employ some sufficiently large number N of candidates distributed circularly symmetric normally outside of a circular "hole" with radius A, for some small constant A with 0<A<1/2. We make this normal density have characteristic width 1/A. Then (as we saw in the preceding theorem) Condorcet will elect the candidate closest to C, which is the center of both the hole and the voter-Gaussian and the candidate-Gaussian, with probability→1 in the V→∞ limit. Make the utility function of voter-candidate distance S have utility=1 if 0≤S<2A and then decrease smoothly toward 0 as S increases, in fact having utility=0 if S>1/A. If N is large, then with high probability the Condorcet candidate will both exist (by Davis et al.) and lie near the inner (radius=A) circle and at a uniformly random angle. It therefore will have mean regret bounded below by a positive number which depends on ε.

This particular scenario has been set up, meanwhile, to cause honest range voting to act like "honest utility voting" because (with probability→1) essentially every voter's min- and max-utility candidates will have utilities 0 and 1 respectively, so that every voter rescales her utility with the same scale factor to generate her range vote. Range voting therefore will (with probability→1 in the V→∞ limit) deliver the optimum-utility winner, i.e. achieving zero regret. QED.

2.2. Optimality of certain kinds of Approval Voting in the same spherically symmetric D-dimensional politics models

THEOREM 3 (Honest approval's optimality with independent distance-threshholds): Let the V voters and N candidates each be points in a D-dimensional issue space with N≥1 and D≥1 fixed. Let the utility of each candidate for each voter be a fixed decreasing function of the Euclidean distance between them, which decreases to zero sufficiently quickly at ∞ (or it is acceptable for it to tend to any other constant, of course). Suppose the voters are distributed according to a radially-decreasing probability density spherically symmetric about some centerpoint C, with each voter chosen independently from that density (and again we need this density to decrease sufficiently quickly at ∞, but never reaching 0). Suppose the distances from the candidates to C are distinct. Suppose we employ "approval voting" where each voter approves all candidates Euclidean distance<R away from her location (and if the distance exactly equals R, then she tosses a fair coin to decide whether to approve him). Here R can be a positive constant, or each voter independently chooses R at random according to some fixed probability density on the positive reals. (Either assumption yields a valid theorem.) Then in the limit V→∞, the best (i.e. regret-minimizing) candidate will be elected with probability→1.

Proof. Employ the same Beckner theorem as in part 3 of the proof of theorem 1 to see that expected number of approvals garnered by a candidate will be a decreasing radial function of his position. Then by the strong law of large numbers the closest candidate to C (the center of symmetry of the voter distribution) will win, which as we proved before is the one with least regret – both claims holding with probability→1 in the V→∞ limit. QED.

THEOREM 4 (Optimality of iterated strategic approval voting): Same assumptions as the preceding theorem, except now we repeatedly redo the approval voting election until the winner stops changing; each voter strategically changes her distance-threshhold R each election to be the distance to the winner of the preceding round. Then in the limit V→∞, the procedure will terminate after a finite number of rounds, whereupon it will elect the best (i.e. regret-minimizing) candidate – both with probability→1.

Proof. From theorem 1 we know that a Condorcet winner W exists with probability→1. Because W is pairwise-preferred over each rival X, W will, in every round after the first, get more approvals than the preceding round's winner (with probability→1). Hence the winner must keep changing from round to round until the winner is W at which point it must stay there, whereupon the process terminates. Since each round the winner changes (with probability→1) to somebody pairwise-preferred, and since we know from Davis, DeGroot, and Hinich 1972 and the strong law of large numbers that (with probability→1) no cycles exist in the pairwise-preference digraph, the process must terminate. QED.

REMARK. The optimality of these kinds of approval voting also are artifacts of spherical symmetry and the exact same theorem 2 and proof show that.

3. The YN model of binary issues

The "YN-model" is a simple model of voting (with a fair amount of realism) in which range voting performs optimally with "honest voters."

DEFINITION: There are D independent "issues" and 2^D "candidates"; each candidate announces a clear yes or no stance on each issue, and by some benificent miracle we have a full democratic marketplace, i.e, every possible issue-stance is represented. Hence (in the case D=4) we can name the 16 candidates YYYY ("yes" on all 4 issues), YYYN, YYNY,..., NNNN. We shall suppose, for each voter, the utility of electing any given candidate is just the number of issues on which that candidate agrees with the voter. Finally, we suppose without loss of generality that all D issues each would win if put to a single-issue yes/no vote, therefore the objectively best candidate is YYYY.

(The "without loss of generality" is because we could just rename the issues to make "yes" become "no" for that issue, e.g. the issue "raise taxes?" becomes "do not raise taxes," so with such canonical renaming we can always cause the best candidate to be named YYYY.)

REMARK: Most previous analyses of voting system "properties" ignore the fact that "issues" exist and candidates have stances on them. The YN model is approximately the simplest and most natural possible voting model which doesn't ignore that. It turns out that honest range voting is always optimal in the YN model and is "self consistent." But most other commonly proposed voting methods turn out to be both "self contradictory" and non-optimal (indeed they can even behave pessimally) in the YN model.

THEOREM 5 (Optimality of Range versus pessimality of various other voting systems in YN model):

Range voting, with "honest voters" (who always score their favorite D, their least-favorite 0, and everybody else linearly-interpolated between according to their utility) will always elect the best candidate in the YN model (i.e, here, YYYY). This more generally is true even if the utility is a weighted sum where issue k has weight W_k, for an arbitrary set of D positive constants W₁, W₂,… W_D, with ∑_1≤k≤DW_k=D.
But plurality voting can unambiguously elect the worst candidate (here NNNN) while causing the best candidate (YYYY) to have zero votes.
Hence Instant Runoff (and Coombs), in such an election, would fail to elect YYYY. Indeed, Instant Runoff can elect the worst candidate.
Condorcet Cycles can happen in the YN model.
All Condorcet methods (with honest voters) can elect the worst candidate NNNN and can mis-order the finishers exactly backwards so that candidates with more Ns always are majority-preferred over those with fewer!
Borda voting (with honest voters) both can elect the worst candidate NNNN and order the candidates exactly backwards.
Approval voting (with honest voters who approve the top F percent of the available candidates and disapprove the rest, where F is any fixed value with 0<F<100%) can elect the worst candidate (and range voting can also do this with strategic voting).

Proof:

a. With honest range voters, "Y" (on the first issue) can be regarded as getting a 1-vote or a 0-vote (for those voters who support or oppose issue #1), and ditto for each other issue, as a contribution to each candidate's range score. This view is mathematically equivalent thanks to our postulate the honest range voters linearly interpolate their range votes (between the best and the worst candidates getting the max and min range votes D and 0) based on utilities. Since we have postulated that Y beats N on each issue, the candidate YYYY beats every other candidate. In other words, range voting always delivers the best possible winner in the YN model.

To put it a different way: since all range votes have the same max-score D and min-score 0 they all are compatibly scaled utilities and hence the greatest-utility candidate must be elected.

b. Say 71% of the voters each have ideologies consisting of 71% Ys (all such ideologies equinumerous) while 29% have the NNN...NN ideology. Then 50.4%=71%² of the voters prefer Y on any given issue, so by majority vote, Y would win on each issue individually. But NNN..NN is the plurality-vote winner with 29% of the vote, while YYY..YY gets zero top-rank votes, and indeed every candidate besides NNN..NN gets (in the D→∞ limit) zero percent of the votes – almost the ultimate in "vote splitting." Note: throughout this proof, when we say "in the D→∞ limit" we actually mean "for every sufficiently large D."

As a very concrete example of this kind of idea with D=4 (this election data was created artificially by S.J.Brams for the purpose of solving this problem), let the plurality votes for each candidate be as follows in a 31-voter, 4-issue, 16-candidate election

candidate	votes	candidate	votes
YYYY	0	YNYY	4
YYYN	4	YNYN	1
YYNY	4	YNNY	1
YYNN	1	YNNN	1
NYYY	4	NNYY	1
NYYN	1	NNYN	1
NYNY	1	NNNY	1
NYNN	1	NNNN	5

Then on each issue Y beats N by 16 to 15. (For example, just look at the first letter of each candidate's name to see Y on the first issue wins by 16:15 over N.) But nevertheless YYYY gets zero votes and NNNN wins a plurality election with 5 votes.

c. Instant runoff (IRV) never elects a candidate with zero top-rank votes (since it always eliminates that candidate in the first round). [Also Coombs,which is like IRV except that it eliminates the candidate with the most bottom-rank votes each round, cannot elect YYYY since it eliminates it immediately.] Professional IRV-advocate Rob Richie once pronounced this an "advantage" of IRV. However, in the present example, it is clearly a disadvantage, because the clearly best candidate YYYY gets zero top-rank votes.

A fully explicit example with 83 voters, created by Abd ul-Rahman Lomax and myself, is shown in the table. The top-preference of each voter is, of course, the issue-stance of that voter and we have ranked all 16 candidates consistently with that top ranking, but breaking utility-ties somewhat arbitrarily.

#voters	Their honest vote
32	YYYY>YYYN>YYNY>YNYY>NYYY>YYNN>YNYN>YNNY>NYYN>NYNY>NNYY>YNNN>NYNN>NNYN>NNNY>NNNN
10	YNNN>YNNY>YNYN>YYNN>NNNN>YNYY>YYNY>YYYN>NNNY>NNYN>NYNN>YYYY>NNYY>NYNY>NYYN>NYYY
10	NYNN>NYNY>NYYN>NNNN>YYNN>NYYY>NNNY>NNYN>YYNY>YYYN>YNNN>NNYY>YYYY>YNNY>YNYN>YNYY
10	NNYN>NNYY>NNNN>NYYN>YNYN>NNNY>NYYY>NYNN>YNYY>YNNN>YYYN>NYNY>YNNY>YYYY>YYNN>YYNY
10	NNNY>NNNN>NNYY>NYNY>YNNY>NNYN>NYNN>NYYY>YNNN>YNYY>YYNY>NYYN>YNYN>YYNN>YYYY>YYYN
11	NNNN>NNNY>NNYN>NYNN>YNNN>NNYY>NYNY>NYYN>YNNY>YNYN>YYNN>NYYY>YNYY>YYNY>YYYN>YYYY

In this election Y wins on issue #1 by 42-to-41 majority (and ditto for any other issue). But NNNN is the election winner using any of these Condorcet methods {Basic Condorcet, Schulze beatpaths, Tideman Ranked Pairs, Nanson-Baldwin, Simpson-Kramer min-max} and also Bucklin and IRV.

d. The YN model with D=6 can exhibit a Condorcet cycle among the three candidates named
YYNNNN, NNYYNN, and NNNNYY.
This will happen if there are three equinumerous types of voters whose ideologies are
YYYNNN, NNYYYN, and YNNNYY.

e. Let 51% of the voters have ideologies which are 49% Y. The remaining 49% of the voters have ideologies which are 60% Y. (All such ideologies equinumerous.) Then Y beats N on every individual issue. Now compare a candidate with fraction-P of Ys in its name, versus a candidate with fraction-Q of Ys in its name, 0≤P<Q≤1. The former candidate is preferred arbitrarily-near-unanimously within the 51%-bloc of voters, hence by an overall majority, if

0.49(1-P)+0.51P<0.49(1-Q)+0.51Q [since these formulae give the expected number of letters-changed between the voter and the candidate for a random voter in the 51% bloc, and the distribution becomes arbitrarily sharply peaked about its expectation value in the D→∞ limit thanks to the "strong law of large numbers"] i.e. if P<Q. Therefore, a candidate with fewer Ys in its name, is always majority-preferred over one with more Ys in its name if D is sufficiently large! [Actually, the same analysis also works for the 49%-voter-bloc too, but in reverse, so that we see that the majority preference here is always a 51-49 majority to within ε.] Hence NNN..NN is the Condorcet winner and there are no cycles. This proves that every Condorcet voting method will elect the worst candidate NNN..NN and will order the finishers in exactly the reverse of the "right" ordering. Indeed every voting method based purely on the "pairwise matrix" – Borda is another – will get everything totally backwards. (A candidate's Borda score is the sum of his pairwise-victory margins, up to an overall additive constant.)

This idea also will usually cause IRV to elect the worst candidate NNN...NN if D is large enough, and we throw in enough NNN...NN voters to make sure that NNN...NN does not get eliminated in an early round for having zero top-rank votes, and we randomly pre-perturb the vote-counts slightly to break ties.

f. As a wholy-explicit Borda example with D=4, take 21 YYYY voters versus 10 each for YNNN, NYNN, NNYN, NNNY (and use Borda score expectation value among all random honest orderings of the candidates); then the unique Borda winner is NNNN even though any individual issue is won by Y by a 31-to-30 majority.

g. If 70% of the voters each have ideologies consisting of 70% Ns (all such ideologies equinumerous) while 30% have the YYY...YY ideology, then the worst candidate NNN...NN is majority-preferred over the best candidate YYY...YY, and NNN..NN is the Approval Voting winner if D is large enough for any fixed F with 0<F<100%. Note 49%=70%² of the voters prefer N on any given issue, so by 51-49 majority vote, Y would win on each issue individually.

More-concrete examples: Also, of course, the Brams D=4 example for plurality voting is also valid for approval voting if the voters only approve their top choice, i.e. the top 1/16 of the candidates. In that example Abd ul-Rahman Lomax also notes that voters approving only their favorite if he is a "frontrunner" but approving both their favorite and successive candidates worse than him until they reach a "frontrunner"... will sadly still elect NNNN if we regard "frontrunners" as those with 4 or more plurality votes.

Further, in the 61-voter concrete D=4 example in item "f", if voters approve their top 5 candidates, then NNNN wins with 40 approvals; no rival gets more than 30.

Finally we note that range voting can elect NNNN if the voters are strategic exaggerators rather than "honest," i.e. if we assume they vote in "approval style." Also, if voters have different strengths of feeling, e.g. if the NNNN voters were extra passionate (this differs from the utilities assumed in the theorem statement), then range can elect NNNN, albeit in that case this might be "good behavior," not a "malfunction."

QED.

REMARK: The perfect behavior of range voting in the YN model can be regarded as an artifact of the YN model's simplistic assumption that all voters have equal "strength of feeling," combined with the assumption of a "full democratic marketplace."

Incidentally, in the contemporary USA the plurality voting system has engendered "two-party domination" (via "Duverger's law," see, e.g, Riker 1982) in which there are only 2 politically-feasible issue-stance vectors not 2^D. E.g, in 2006 USA, a candidate who is against "gay rights" will also nearly-automatically be against abortion rights, for lower taxes for the rich, and for the Iraq War, even though, logically speaking, there seems little connection between these.

4. YN model for random voters

Theorem 5 totally settles the worst-case behavior of the most-common voting system proposals in the YN model. However, these pathologies all may have little practical interest because

Fairly large D values seem needed to make the proof-examples work. (A reply to this criticism is that it is unclear what the smallest examples are; we used large D purely because it made proofs easier, but examples with small D might well also exist.)
The theorem is about worst case behavior (which is rare?) – in other words the examples in the proof were specifically invented to make the various non-range voting systems behave pessimally. Meanwhile there are no examples that make range voting behave badly since it always behaves perfectly in the YN model.

To dodge the latter criticism, we now examine the YN model for random voters.

DEFINITION: In the RANDOM-VOTER YN MODEL every possible issue-stance (from among the 2^D) is equally likely and we pick some large number V (far exceeding the number 2^D of candidates) of such random voters. [This contrasts with the original YN model, where the voters could be distributed in a nonrandom, indeed adversarially chosen, manner.] Again we assume a full democratic marketplace: all 2^D possible candidates exist. We then canonize the issues by reversing some of their signs so that for each issue considered alone, "Y" is the majority winner. (This canonical-renaming is purely for convenience and has no genuine effect.)

THEOREM 6 (BRs in random YN model): In the random YN model, call a candidate "bad" if he has over 50%, and "regrettable" if he has more than a constant positive fraction, of Ns in his name.

"Honest" range voting (as defined as in the preceding theorem) always elects the best candidate YYY...YY.
With honest plurality voting with D≥4 issues, a "bad" candidate will be elected with chance greater than some absolute positive constant C. (Further, in the D→∞ limit, C may be taken arbitrarily close to 1/2.)
With honest approval voting with D≥4 issues, where voters approve candidates who disagree with them on fraction≤F of the issues (for any fixed F with 0≤F<50%) a bad candidate will be elected with chance greater than some absolute positive constant C.
That is also true (albeit for a "regrettable" rather than "bad" candidate) if F=50% in the V/2^D→∞ limit.
The preceding point also is true for honest Borda voting.

Proof sketches:
a. Same proof as in the preceding theorem.

b. Consider

the identity W of the the plurality-voting winner, and
the canonizing renaming (i.e. flipping the signs of the issues to that YYY...YY is the "best winner").

We argue that it is almost irrelevant who the plurality winner is, in the sense that, if enough of that voter-type were obliterated to make it become the second-most-popular – while at the same time enough votes for some arbitrary loser L were added to make it become the winner – this would with high probability in the D→∞ and V/2^D→∞ limits not be enough to affect the canonizing renaming.

Specifically (using Chernoff), the gap in vote-count between the winningest and losingest candidates will with high probability be O(V^1/2D^1/22^-D/2). Meanwhile, the |gap| in vote count between the "Y" and "N" votes on any one issue – even the issue with the smallest |gap| – will with probability→1 as D&rarr∞ exceed V^1/2D^-1.01. Since, when D becomes large, the latter gap greatly exceeds the former, the claim is proven.

c. We use the same argument as in (b), altered appropriately to make it work for approval voting where each voter approves candidates disagreeing with her on at most a fraction F of the issues. The number of candidates approved by each voter is, for any fixed F with 0≤F<50%, exponentially tiny compared to the number 2^D of candidates (for D large). To be precise, the total number of approved candidates is

A = ∑_0≤k≤FD binomial(D, k).

which, while large, is exponentially tiny compared to 2^D, i.e. a random candidate has a tiny chance 2^-DA of being approved by a random voter. The expected number of approvals got by any particular candidate then is

2^-DVA.

The variance in this number (easily computed since all voter-locations were independent) then is essentially the same as the number itself and by the Chernoff bound it has exponentially-small tails. Further, the correlation between the approval counts for two candidates further than F issues apart, is essentially nil. We can conclude from this that if we obliterate enough voters to stop the approval-winner winning while adding enough to make an arbitrary far-away loser become the winner, we will with high probability need to mess with O(V^1/2A^1/2D^1/22^-D/2) voters. And as we argued before, this will with high probability in the D→∞ and V/2^D→∞ limits not be enough to affect the canonizing renaming.

In both (b) and (c), note that the argument about removing voters for W while adding voters for L works for any set of voters {supporting L and not W} of that cardinality. The same argument could then be re-used to go backwards, and this works for any L. The point of those remarks is their complete symmetry: the canonizing renaming yields essentially no constraint on who wins.

d & e. These are going to be consequences of a more general theory. Range voting in the random YN model (which is optimal) is equivalent to a certain weighted positional voting system. Specifically, the top weight is D, the next D weights are D-1, the next binomial(D,2) weights are each D-2,..., the next binomial(D,k) weights are each D-k,... the last (2^Dth) weight is 0. But Borda and "approve the top half of the candidates" both are weighted positional systems with quite different, and non-optimal, weights. Consider the candidates and voters as D-bit binary words (defining their issue-stances). With range voting the range vote score of the candidates is a linear function of the candidate-defining binary vector. With some other voting system we instead get a nonlinear function.

Any such function can be thought of as the sum of its best (under the L² norm) linear approximation, plus nonlinear terms. The best linear approximation of the Borda or 50%-approval weights is (by symmetry) the same as the range-voting weight function (up to a scaling factor which is irrelevant). The nonlinear terms are therefore the sole reason they fail to deliver always-optimum results.

In fact, in the jargon of the "Boolean Fourier transform" (BFT, see appendix B), the vote-totals for the candidates are, as a function of the D-bit candidate-defining word, precisely given by the convolution of the weight-function defining that voting system, with the voter-population-function defining how many voters of type T there are, for each D-bit word T.

Because of theory in appendix B, the best linear approximation is given by the 1+D linear terms in the Boolean Fourier series, while the remaining 2^D-D-1 terms, all of which are orthogonal, are nonlinear.

Now the convolution of two functions arises from multiplying their BFTs. By the central limit theorem the voter-population-function just becomes a vector of 2^D identical independent random-normal deviates in the large-V/2^D limit. The same is true of its Boolean Fourier Transform because the BFT is an orthogonal linear transformation in 2^D-dimensional space. So when we multiply this by the BFT of the voting system's weight function, we leave the weight-function's BFT-coefficients unaffected in the sense that all coefficients are being multiplied by identical Gaussian random deviates, hence their relative magnitudes do not change in any systematic way.

Our idea is that if the sum-of-squares of the nonlinear Fourier coefficients is large enough compared to the sum-of-squares for the D nonconstant linear coefficients, then it becomes improbable that the voting system will deliver an optimum winner or one agreeing with it on more than a constant fraction of issues.

The function of bit-sum that arises as the weight function for 50%-approval voting is a step function, while for Borda voting it resembles the "error function" rescaled so that its rise has characteristic width of order D^1/2. For either, we can see immediately from Parseval's equality that in the large-D limit, the L₂ norm of the nonlinear terms vastly swamps that of the linear terms in the weight function's Fourier series, by, indeed, exponential(D) factors.

The result, in the V/2^D→∞ limit where the BFT is just taking an orthogonal transformation of normal randoms, is that the nonlinear terms in the Fourier series of the vote-total function of candidate-defining-binary-word Q, all added together and evaluated at all 2^D locations, form a 2^D-dimensional vector of identical independent random normals conditioned on having sum=0 and exactly zero correlation with each bit of Q. The amplitudes each of these normals is within a constant factor of the maximum of the linear function arising from the linear terms. In view of this, the probability→1 that the resulting normal randoms will include some with arbitrarily great-factor larger magnitude than the maximum of the linear part.

Knowing that, one finally can argue that, for a sufficiently small but fixed ε>0, the probability→1 that the maximum vote will be for a candidate with at least a fraction ε of "Ns" in his name. QED.

CONJECTURE: With probability→1, no Condorcet winner exists in the random-voter YN model in the limit where D is large and V/2^D→∞; in that case the worth of Condorcet voting methods would seem to depend entirely on the worth of their "backup" methods intended to provide a winner when a "beats-all" winner does not exist.

5. Median-based range voting

M.Balinski & R.Laraki (2008) have recently advocated (essentially) range voting but with the winner chosen to be the candidate with the greatest median score. (The usual form of range voting elects the one with greatest average score.) Since it might be expected to be common (in the presence of a nonzero fraction of strategic voters) for two candidates to be tied for the greatest median score, B&L also advocate a specific tie-breaking method.

In 1999-2000 and since, I have done many computer simulations. In those simulations, median-based range voting never exhibited smaller BR than average-based range voting, for any modeling assumptions tried and any mixture of honest and strategic voters. Also, median-based range voting fails many logical properties which average-based obeys, for discussion see http://RangeVoting.org/MedianVrange.html. In my mind, this had seemed to clearly show the inferiority of median. (Incidentally, we should remark that median and average-based range are the endpoints of the continuum family of "trimmed mean" range voting methods where some specified fraction T of "outlier" scores are discarded for each candidate and then the mean taken. With T→100% from below we get median, while with T→0+ we get average-based range.)

However, in 2009 Balinski was quoted by Wall Street Journal reporter Carl Bialik as stating that average-based range voting was "a ridiculous method" because it can be manipulated by strategic voters.

To understand what Balinski could have meant by that, I suggest reading his paper. My personal view that Balinski & Laraki's "optimality" and "vulnerability to strategy" definitions were contrived – with other (but reasonable) definitions, their theorems would fail. And despite their "optimality" result for the vulnerability to strategy median-based range voting, typically about 50% of voters have incentive to exaggerate their votes on the two frontrunners maximally, as opposed to casting honest votes for them. (For example a voter whose honest opinions of A and B are both sub-median, finds it advantageous to dishonestly score one max and the other min.) This is assuming each voter possesses excellent information, i.e. knows what the median scores are going to be to high accuracy; under the analogous assumption 100% of average-based range voters would find it strategic to thus-exaggerate. With however less-perfect information, 100% of median-based range voters would find it advantageous to thus-strategize (thus-strategizing does not hurt, but can help, a voter). So I fail to see much if any advantage, in realistic political-election practice, for median-based range voting.

However, as we shall now show, under a model perhaps approximating the situation for judging figure-skating events (e.g. the Olympics), median-based or trimmed-mean-based range voting can outperform ordinary average-based range voting. Here "outperform" means "has lower Bayesian regret than," of course. Here is the model.

"Sharp-peaked": Assume the honest utility distributions for each candidate (among all voters) are sharply peaked, that is, almost all voters have the same utilities to within ±ε for that candidate.
(This seems unrealistic for most political elections, where utilities tend to be distributed more broadly or exhibit multimodal distributions. For sample range-voting polls in USA political elections see http://RangeVoting.org/RangePolls.html. Indeed, Balinski & Laraki's own exit-poll range-voting study in Orsay France 2007, see http://RangeVoting.org/OrsayTable.html, also failed either to find sharp-peakedness or to exhibit any visible difference in strategic voting behavior in median- versus in average-based range voting elections. But this assumption might be realistic for figure skating judging.)
"Biased strategy": Assume that some constant fraction S, bounded within a real interval that is a subset of (0,1), of the voters are strategic, and that all of those voters are one-sided, i.e. all the strategic voters favor one of the two frontrunners B over the other A. The rest are honest, i.e. deliver their honest utility values, linearly transformed so the best (in that voter's view) candidate gets the high-endpoint and the worst gets the low-endpoint or the allowed-score range, as their votes.
(This again seems unrealistic for most political elections. Indeed in the 2004 USA presidential election, an exit poll study I was a co-author of found no evidence Bush- or Kerry-supporters were more or less strategic than the other. All my computer simulations alluded to above had assumed strategicness of a voter was independent of her politics, so this biased-strategy assumption it seems essential to justify the below theorem. Again this assumption seems plausibly realistic in the case of figure skating judging.)

THEOREM 7 (Median's superiority under this model): In this sharp-peaked biased-strategy model with V→∞ voters, median-based range voting (and also trimmed-mean range voting with any sufficiently large fraction of outliers trimmed before computing the mean) delivers regret≤2ε. However, average-based range voting can deliver regret of order S/2 times full range.

The proof is trivial.

Acknowledgment

I thank Abd ul-Rahman Lomax for contributing some helpful computations (noted in the text).

References

Noga Alon & Joel Spencer: The Probabilistic Method, 3rd ed. J.Wiley 2008.
Michel Balinski & Rida Laraki: A theory of measuring electing and ranking, Proc. National Acad. Sci. USA 104,21 (May 2007) B720-725.
William Beckner: Inequalities in Fourier Analysis, Annals of Maths 102 (1975) 159-182.
Steven J. Brams: Mathematics and Democracy, Princeton Univ. Press 2008.
Patrick Billingsley: Probability and Measure, Wiley (3rd ed. 1995).
Otto A. Davis, Morris H. DeGroot, Melvin J. Hinich: Social Preference Orderings and Majority Rule, Econometrica 40,1 (Jan. 1972) 147-157.
William Feller: An Introduction to Probability Theory and Its Applications, Wiley, 2 vols, 1968.
T.Hagerup & C.Rüb: A guided tour of Chernoff bounds, Information Processing Letters 33,6 (1990) 305-308.
E.J.Gumbel: Statistics of Extremes, Columbia University Press 1958.
Hannu J. Nurmi: Comparing Voting Systems, Kluwer 1987.
William H. Riker: The Two-party System and Duverger's Law: An Essay on the History of Political Science, Amer. Polit. Sci. Review 76 (December 1982) 753-766.
Sheldon M. Ross: Probability models for computer science, Academic Press 2002. (Chapter 3 is a low-level introduction to tail bounds.)
Warren D.Smith, Jacqueline N. Quintal, Douglas S. Greene: What if the 2004 US presidential election had been held using Range or Approval voting?, paper #82 here http://www.math.temple.edu/~wds/homepage/works.html.
Nicolaus Tideman: Collective Decisions and Voting: The Potential for Public Choice, Ashgate 2006.

Appendix A: Some abbreviated probability theory

Central limit theorem: the sum of N identically distributed independent random variables, each with finite mean and variance, will in the limit N→∞ if rescaled to have unit variance and translated to have zero mean, has probability CDF which pointwise approaches the standard normal CDF.

Chernoff bound: If the summands X have finite expectation of exp(kX) for any real constant k, then the rescaled sum will have a density bounded by an exponentially-declining function of the number of standard deviations we are away from the mean.

Chebyshev bound: If X is a random variable with finite mean μ and variance σ², then

Prob[|X-μ|≥t] ≤ σ²/t²

Strong law of large numbers: The mean of N identically distributed random variables is a random variable which, when N→∞ has probability→1 of being within ±ε of the expected value of each summand (and this is true for any ε>0, no matter how small).

Appendix B: The "Boolean Fourier transform" (BFT)

We define and recount known results about (what some people call) the Boolean Fourier transform. (This has also been called the "Walsh transform," the "Hadamard transform," the "Hadamard-Walsh transform," and "expansion into Boolean polynomials.")

Let F(x) be a function mapping binary n-bit words to real numbers. It is always possible to write F(x) as a polynomial in the n bits x₁, x₂,…, x_n. Because b²=b if b is a bit (and this would also be true for {-1,+1} bits as well as our convention of {0,1} bits) we may regard the polynomial as multilinear and with degree≤n, and with at most 2ⁿ terms. One way to write this is

F(x) = ∑_{n-bit words u} f(u) (-1)^u·x

and then F(x) is the "Walsh transform" of f(u). If we instead were using {-1,+1} bits then we could replace our (-1)^u·x with ∏_j∈ux_j which makes it clear the terms really are multilinear polynomials. This transform is a linear bijection. Because the different terms in the sum all are orthogonal functions of x and all with the same L² norm, the inverse transform is the same as the forward transform divided by 2ⁿ. The Walsh transform enjoys the Parseval equality relating the L² norms of f(x) and F(x):

∑_x F(x)² = 2ⁿ ∑_x f(x)².

Like the usual (1-dimensional real) Fourier transform, the Walsh transform enjoys a convolution theorem. Define the convolution

f∗g(x) = ∑_{n-bit words w} f(w) g(x-w).

Then the Walsh transform of f∗g is FG.

The uniquely best (in the L² norm) linear approximation of a Boolean function F(x) is got by employing only the constant and linear terms in the sum, i.e. arising from the "n-bit words u" with at most a single nonzero bit. Due to Parseval's equality the summed-squared-error in this approximation is just the sum of the squares of all the other 2ⁿ-n-1 Fourier coefficients f(u).

Functions of bit-sum: If a function f of binary n-bit words depends only on the bit-sum s of that word (in which case it can be alternatively be viewed as an ordinary 1-dimensional function) then its BFT also does, and its best linear approximation also does. [This inspires a common abuse of notation where we write "f(s)" instead of f(u).]

Return to main page