Toby Pereira, Forest W. Simmons, Warren D. Smith. Dec 2015-Feb 2016. PRELIMINARY DRAFT?? ONLY PARTLY WRITTEN
We previously had identified, as a (the?) top open problem in multiwinner voting systems, whether a "holy grail" such system exists. The main properties asked of the holy grail were that
Here "basic PR" means that in scenarios consisting of "colored" candidates and voters and featuring "maximally racist" voting behavior, the parliament will be guaranteed to have the same color-composition as the electorate – up to errors of order ±1 seat per color, i.e. of the same order forced by integer roundoff demands, and provided enough candidates of each color run (e.g. we cannot elect 5 Greens if only 3 run).
To be clear: the voting system never knows and is never told the colors of any voter or any candidate, it only knows the votes. And there is no requirement for any voters to vote in "color-like" fashion. We are merely demanding that if they do, then proportional representation with respect to colors, must happen.
"Strong PR" means if we add uncolored candidates – whose ratings depend only on the candidate and not on the voter – to that picture, then the colored subset of the parliament will be guaranteed to have the same composition as the electorate. As bonus properties, it would be pleasant if the voting method
It originally seemed as though the holy grail were impossible, and indeed Smith had been able to produce impossibility proofs under certain ansatzes.
However, the present paper succeeds in finding three holy grail systems. The first, which is the simplest to define and program, unfortunately is computationally very laborious – the work grows like W!V, note the factorial sign, merely to evaluate Q for a specified W-member parliament (V voters).
The second, which unfortunately is the most complicated to define (it involves numerically solving a system of ordinary differential equations as a sub-algorithm!), involves a quality function Q which should be evaluable to D decimal places of accuracy in a number of steps bounded by (V+1)polynomial(C,T,D) for a W-winner, V-voter, C-candidate election. Here T is the number of "transitions" within the algorithm, which hopefully usually is small, e.g. upper bounded by polynomial(C). Unfortunately we are unable to rule out the possibility that T might be exponentially large, e.g. have the same order as 2W. The situation is analogous to the "simplex algorithm" for linear programming (Dantzig 1963). In practice, simplex experimentally virtually always runs in a polynomially bounded number of steps, but examples have been artificially constructed which cause exponential runtime.
The first such construction was by Klee & Minty 1972. Over the next few decades, various fancier "pivot rules" were proposed that hopefully would allow simplex to dodge the exponential examples, plus in some cases made simplex perform experimentally better. But as the decades rolled by, virtually every promising-seeming deterministic pivot rule was defeated by some (sometimes quite complicated) artificial exptime example: Jeroslow 1973, Goldfarb & Sit 1979, and Friedmann 2011, and Freidman-Hansen-Zwick 2011a, 2011b. Historically, that simplex method quagmire was dodged when so-called "interior point methods" (IPMs) were invented which guarantee polynomial runtime bounds for linear programming. Also, Kelner & Spielman 2006 eventually found a randomized algorithm, whose "simplexity" is arguable, with a polynomial bound on expected runtime. But in both, the polynomial was not of the number of dimensions and constraints, but instead based on the number of bits of input. We have been unable to find an analogue of either IPM or Kelner/Spielman for the voting method we are speaking of here, and do not know whether one exists.
A third holy grail scheme called PAMSAC was invented by Toby Pereira after S&S showed him their first one. We'll generalize, re-describe and re-prove it. Pereira's original PAMSAC cost function Q had impractically enormous computational requirements – exponential in VW+C, where V is the number of voters, C of candidates, and W of winners. We however shall show how to compute the PAMSAC Q function to D-decimal accuracy in time≤polynomial(V,W,D) provided V≥2W; and by replacing the "CFAT" inside it by a "slimmed CFAT" this proviso may be weakened to V>6(W+1)W, which effectively means there no longer is any proviso. Further, we have certain heuristic randomized algorithm speedups which (we argue nonrigorously) should reduce the runtime bound to V·polynomial(W,D) in almost all realistic elections – in other words enjoying linear growth with V if W and D are held fixed.
Along the way to producing our holy grails, we present two new strong-PR voting systems that are both simple and computationally efficient (polynomial time): MSPAV (multiplicative sequential proportional approval voting), and MSPRV (" " " range "). Neither enjoys the full set of Holy Grail properties, but they are simple and they do enjoy strong PR.
CURRENTLY HG SCHEME #1 and most of #3 MANUALLY CHECKED BY WDS. HG #2 NEEDS MORE CHECKING. NOTHING COMPUTER CHECKED. NOT CHECKED BY FWS.
Parts I and II of this series had investigated "optimizing" multiwinner voting systems. I.e. let there be V voters, C candidates, W winners, 0<W<C, usually in practice C≪V, and in which the ballots were score-style or, more simply, approval-style. (Score: each voter scores each candidate with a real number from 0 [undesired] to 1 [desired]. Approval: the only permitted scores are 0 [disapproved] and 1 [approved], intermediate reals are forbidden.)
Define a continuous (or more smooth) "quality function" Q of the scores on the ballots and the winner-set. For approval-style ballots there are only 2C possible ballot-types, and we can demand Q be a continuous function of the 2C different counts, now regarded as real numbers. For score-style ballots we'll assume continuum scores, i.e. each voter scores each candidate with a real number from the interval [0,1], and demand Q be a continuous function of all those scores.
The voting method is to elect the W-element subset of the candidates that maximizes Q. (Sometimes we prefer to minimize Q, in which case it is better referred to as a "cost" or "penalty" function.) We'd previously stated various properties such multiwinner voting systems might or might not obey. The "holy grail" combination of properties was singled out for special attention as both particularly desirable, and which nobody had managed to attain. Smith suspected it was impossible.
We here attain the holy grail, indeed we have found 3 different holy grail voting methods. However, our first and simplest holy grail voting method comes about in a rather unexpected and somewhat nasty way.
All our previous failed attempts to design holy grail voting methods had involved rather "natural" and "nice" Q, which could be evaluated from the V ballots and the W winning candidates via a straightforward polynomial-time computation. Indeed, the number of computational steps to compute Q was bounded by an expression of form O(VClogC) at most. Unfortunately finding the optimum parliament, maximizing Q, was shown to be NP-hard – and not only for our Q's, but also for essentially any voting method achieving "optimal" proportional representation (PR), for any reasonable notion of "optimal." And this also was shown for various merely-approximate optimality notions. In other words, brute force evaluation of Q for every possible parliament, then picking the best, is in some sense necessary for any good-performing PR voting method. There is no quick way to shortcut that to rapidly produce the optimum parliment, and any attempt to try will unavoidably, in at least some elections, produce poor results. (Unless P=NP.)
But we were willing to accept that need for heavy computation, for several reasons:
The first holy grail scheme developed in the present paper, however, makes this situation worse. Its quality function Q no longer appears to be polynomial-time computable! It likely is NP-hard to compute, and perhaps even computing it approximately is NP-hard. The entire optimization problem (finding the Q-maximizing parliament) is no longer merely an NP-hard task, it instead lies in the complexity class NPNP (next level up within the "polynomial hierarchy"). Once the optimum parliament is found, it is neither easily verified nor easily refuted, albeit it still should be a lot easier to compute its Q value than it is to perform the full optimization.
This all was rather annoying and led us to wonder whether this "next level" computational complexity was somehow inherent and unavoidable for any holy grail voting method.
We then devised a second holy grail method which made us suspect that it wasn't. Specifically, we showed its Q function was evaluable to D-decimal accuracy a number of arithmetic operations bounded by polynomial(V,W,D,T), where T seemed plausibly likely to almost-always be upper bounded by polynomial(W).
Finally, we found a third holy grail method with Q function genuinely evaluable in time bounded by polynomial(V,W,D). With this voting method optimizing Q once again is a mere coNP∪NP problem.
But unfortunately, this third method is complicated to explain and to program for a computer, and the polynomials governing its runtime and memory usage both are rather large. Our second method is probably the most feasible for practical use, but it is very complicated to explain.
It unexpectedly turned out to be crucial for inventing all of these schemes to study several interesting kinds of "transforms" which convert multiwinner voting methods into different ones. CROSSREFS?? In the previous papers we already discussed the "Kotze-Pereira transform" of a set of score-ballots into a weighted set of approval-type ballots, and "coin-flip approval" transforms which convert a set of approval-type ballots into a different weighted set of approval-type ballots. (Despite the latter's name, both are wholy deterministic.) Both shall be used here. Indeed, we shall focus entirely on the problem of inventing voting methods using approval ballots, because score-type ballots can then be handled via an initial Kotze-Pereira transform. And for our second holy grail method we shall also employ the new "infinite-cloning transform," which seems the most remarkable among these ideas.
So the holy grail problem now seems completely solved from the standpoint of theory, but in practice our solutions may be unacceptable in the sense that hardly any politicians and voters would be willing/able to understand them, and/or the computational memory and/or time complexities would be too great. That quite likely would render them, as a practical matter, unenactible.
If so, the question for future authors becomes: is there some holy grail voting method enjoying both the simplicity of description (or simpler!) of our first solution, and better computational efficiency than we've so far managed to attain?
Algorithm definition:
1. Each voter, as her ballot, approves or disapproves each candidate.
2. Let Q=0. Initialize S as the number of seats that currently are unfilled, and let the common initial weight of each of the ballots be 1 (unless they were already weighted a priori). These weights will remain nonnegative throughout the process.
Also fix a constant Δ>-1; here Δ=1/2 seems appropriate for Sainte-Laguë-like PR and Δ=1 for d'Hondt-like.3. Loop:
- Elect the (as-yet unelected) candidate X with the greatest weighted approval (and let that weighted approval be A). Meanwhile a hypothetical universally-approved candidate would have weighted approval U with the same ballot weightings.
- Q←Q+mono(A); If S≤1 then stop.
- If A>0 then: Multiply the weights of each ballot that approved X by F, where 0≤F≤1 is given by
F1 = max(1-h, ½) where h=A-1U/(S+Δ). Possible variant F-formulas: Four inequivalent alternatives which also work (we'll discuss below how they were obtained; see also F1b, F2b, F3b for three further possibilities) areF2 = 1-h if h≤½, else [4h]-1; F3 = 1-h if h≤½, else 4-1+[4h]-2;
F4 = max{ 1-U/([S+Δ]A), (U-A)[S+Δ]/(2[S+Δ-1]U-A[S+Δ]) }; F∞ = [1+A-1U/(S-2+Δ)]-1.- Decrement S.
EndLoop.
Remarks. Q is unused and hence its update in step 3.2 and initialization in step 2 both may be jettisoned. The purpose of Q is, at algorithm termination, to provide a numerical measure of the "quality" of the election result, and different such quality-measures can be got by use of different "mono(x)" functions. Here mono(x) is assumed to be some particular continuous monotone function of x≥0, most simply mono(x)=x, but we also could consider, e.g, mono(x)=x-1/2. For purposes of the present section (since the algorithm does not ever use Q) it does not matter what mono(x) is. However, later we shall have purposes in mind for using Q, and at that point we shall care.
In step 3.3 the test "if A>0" can be jettisoned if we know a priori that every candidate is approved by at least one voter (e.g. by themselves). Such elections have been called "narcissistic."
The main theorem about MSPAV is that it obeys "strong PR." That is:
Terminology note: "Partisanship" may be a more appropriate descriptor than "racism." (And, e.g, "racist voting"↔"party-slate voting," etc.) We chose the "colored" terminology to emphasize the point that "color" could be anything – e.g. gender, militarism – and does not necessarily correspond to political party identity. Indeed, the goal is to design voting methods that work regardless of whether political parties even exist, and the definition of "work" similarly does not require political parties to exist.
Proof. The key is the formula for F. It was obtained by solving
Here the lefthand side is the number of seats (from among the S seats remaining to be filled)
that would be proportionately "deserved" by the faction supporting the elected candidate X,
under maximal-racism conditions. The primed quantities A' and U' denote A and U
after the deweighting by multiplying the weights of all X-approving ballots by F;
the point is that
Unfortunately, this naive formula can become negative, which occurs when no candidate has enough approval to "deserve" a seat, but nevertheless we must elect one, so we do. Negative reweighting factors seem to be something we should avoid, since otherwise the algortihm will start awarding seats with the goal of intentionally hurting factions, without accomplishing anything positive. (If F-values always are nonnegative, then given that all approvals begin nonnegative, they will stay that way forever.) Even F=0 seems something we usually want to avoid, because we want to avoid situations where various factions have zero weighted approval, so that their approval will stay zero forevermore, giving us no way to decide which among them should win the next seat.
So the question is how best to modify the naive formula in those problematic regimes. Here are some desiderata (for the following discussion regard Δ=0; we will discuss nonzero Δ later):
Also, note that rather than needing a fraction 1/S of the V voters to approve you
(in maximal racism basic-PR situations) to "deserve" one of the S seats,
one arguably "deserves" a seat with
only
Finally, we note that MSPAV will always elect universally-approved candidates in preference to any candidate approved by only some of the voters, and whenever it does so, then all ballots are deweighted by the same factor F. Therefore, MSPAV obeys not just basic, but actually strong PR. (Actually, of course, it does not matter for this purpose what deweighting factor F>0 is used whenever a universally-approved candidate is elected.) Q.E.D.
Remarks: Note F1 is continuous, positive and non-increasing for all real h. The formulas F2 and F3 similarly were devised also to equal 1-h when 0≤h≤½ (i.e. for factions naively deserving ≥2 more seats) but now also enjoying continuous h-derivative for all real h. Note both F2 and F3 are positive and decreasing for all real h, and F2 goes to 0, but F3 goes to 1/4, when h→∞. The differences between these formulae hopefully do not matter in the sense that in "racist voting" scenarios they only change seat counts for color classes which "deserve less than 2 seats." In other words, proportionality remains valid up to acceptable errors with one formula, if that were true with another. The above formula-surgeries F1, F2, F3 all were devised based on h=½ (i.e. 2 naively deserved seats) as the breakpoint; we also could consider the alternative breakpoint "3/2 seats" in which case those formulae would instead become
The formula F∞ is even smoother, namely analytic in h; it is asymptotic to F1 when S→∞, and equals it when h=½, and F1<F∞<1 when 2-Δ<S<∞ and 0<h<½. Unfortunately F∞∉[0,1] when S-2+Δ<0. Therefore, if you want to use the F∞ variant, insist that Δ>0. With that restriction, every deweighting that can actually occur (i.e. those with S≥2) will obey 0<F∞<1.
Among these F-formula choices, Smith personally currently prefers formula F4??
The main algorithmic differences between MSPAV and RRV (reweighted range voting, a different "sequential" proportional representation voting method which had been invented independently by both Simmons and Smith in the early 2000s – but which, we later found out, in an approval voting special case already had been invented in 1895 by T.N.Thiele) – are
RRV uses a quite different ballot weighting scheme; and RRV does not need to know in advance how many seats are to be filled – it just fills them one by one until we tell it to stop. This is since MSPAV fills the seats in "backwards" order (decrementing S as it goes), if RRV is regarded as filling them in "forwards" order.
Runtime: With V>1 voters, C candidates, W winners, 0<W<C, MSPAV will run in O(CVW) steps using O(CV) words of memory. If C is less than or equal to the word size (in bits) of the computer, then O(V) words of memory suffice.
1. Each voter, as her ballot, scores each candidate with a real number lying in the interval [0,1]. (Greater scores for more-preferred candidates.)
2. Convert the score-style ballots to a weighted set of approval-style ballots via the Kotze-Pereira transform.
3. Elect parliament via MSPAV.
Strong-PR theorem. MSPRV obeys basic PR. Further, if there also exist uncolored "commonly rated candidates" who are scored the same by all voters (i.e. the score for each depends only on the candidate, not on the voter) then the colored subset of the parliament will have the same color composition as the electorate (up to small errors forced by integer roundoff requirements, and provided enough candidates of each color run).
Proof.
For this kind of "max-racism plus commonly-rated candidates" election,
the Kotze-Pereira transform
converts the ballots,
for an uncolored candidate X with common-score Y,
into a weight-fraction Y of ballots which approve X,
and a weight-fraction
Even more strongly, MSPRV enjoys "subfaction proportionality." Suppose the Red Party has factions inside it. Each faction scores candidates with non-Red colors, zero. (Also all non-red colored voters score all flavors of Reds zero. We also permit uncolored commonly rated candidates.) But the different Red factions score various Red candidates differently. In that case, MSPRV will, if we consider the Red-portion of parliament alone, elect the same proportions of each kind of Red MP, as it would have done if the Red voters were the only voters and the Red candidates the only candidates. (As usual, only up to integer roundoff and provided enough candidates of each type run.)
Proof. This is because all elections of non-Red candidates deweight all Red-voter ballots equally; and since all elections of Red candidates deweight all non-Red ballots equally. Hence within-Red proportionality is unaffected by non-Red voters, and color-proportionality within non-Red colors, is unaffected by Red voters. Q.E.D.
Monotonicity property: If a voter raises her score for some candidate X (leaving all her other scores unaltered), that can cause X to be elected, but cannot cause X to stop being elected.
Proof. X's (weighted) approval increases, while other candidates' approvals are unaltered. That can cause X to get elected in some MSPAV step 3.1. Q.E.D.
Note re "phantom voters": We can regard any election as having more voters, all of whom score all candidates 0. If some of those "phantom voters" increase a score for some candidate, then it is as if a new voter appeared. And monotonicity applies to those "appearances."
Single-winner case: MSPAV if the parliament only has 1 seat, is just approval voting; meanwhile MSPRV is just range voting.
In MSPAV (and hence in MSPRV) the candidates are elected sequentially greedily, i.e. in step 3.1 always electing the available candidate that increases Q by the most. However, if our goal were to maximize the final value of Q, we might be able to do better by instead in step 3.1 picking some non-greedy choice, hoping that this would permit later seat-elections to increase Q by more than enough to compensate. Or, conceivably, electing the same parliament but in a different (non-greedy) time-order also might allow increasing Q.
Also note that if in some MSPAV step 3.1 we did not elect the greedy candidate, but rather somebody else, that would still be ok in the sense that the F formulas would then automatically perform the correct reweightings to make the algorithm strive to restore proportionality – and indeed will always succeed – i.e. still will generate a strong PR parliament given a racism+uncolored voting scenario – provided the artificially-forced MPs do not create overrepresentation of some color thus making that impossible.
To pursue this idea, consider the following algorithm, which given a set of weighted approval ballots (or, via the Kotze-Pereira transform, score-style ballots) and a tuple
of the W members, listed in some specific order, of a putative parliament, computes a real-valued quality measure Q for that ordered-parliament with those ballots. (Note, this Q can depend on the ordering.)
Q-computing algorithm: Input is V approval-style ballots and a putative ordered W-member parliament (MW, MW-1, ..., M2, M1).
1. Let Q=0 and S=W and let each ballot initially have weight 1 (unless they are already weighted a priori). Fix a constant Δ>-1; here Δ=1/2 seems appropriate for Sainte-Lagüe-like and Δ=1 for d'Hondt-like PR behavior.
2. Loop:
- Suppose MS has weighted approval A, while a hypothetical candidate approved by every voter who approved any member of parliament would have weighted approval U with the same ballot weightings.
- Q←Q+mono(A); If S≤1 then return Q and stop.
- If A>0 then: Multiply the weights of each ballot that approved MS by F, where 0≤F≤1 is given by the same formulas stated in MSPVA step 3.3.
- Decrement S.
EndLoop.
We assume that mono(x) is a known continuous function monotonic on reals x≥0.
The first holy grail election method, then, is to elect the ordered parliament (selected from among all possible W-member parliaments and all W! possible orderings of each parliament) that, if mono(x) is monotonic-increasing, maximizes Q. [If, however, mono(x) were monotonic-decreasing, then it instead would be appropriate to minimize Q. But to keep the present arguments simple we shall, without loss of generality, assume mono(x) is increasing and Q is maximized – because otherwise we could simply negate both mono(x) and Q.] Different quality measures Q result from different mono(x) functions.
What properties does this election method have?
Was that too facile? The above reasoning has (dangerously) hidden the effect of the Kotze-Pereira transform. We claim the above proof was ok provided all Holy Grail ballots were approval style, i.e. there was no Kotze-Pereira transform. But one might worry the Kotze-Pereira-transform is potentially ruining both parts 1 and 3 of the proof. Example: with the KP transform, as you raise X's score above Y's on your ballot:
The Y-shrinkage phase is worrying re proof part 3. (The others are ok.) But this is ok; the summed-weight staying fixed means the deweighting effect of Y caused by X's election (with greater F), stays fixed thanks to the following identity
which is independent of A. Here (1-r/A)A is the naively-deweighted approval A for X (actually for XY before the crossover) while (q-A) is the approval for Y (but for XY after the crossover) with the sum (q-A)+A remaining fixed. Our point is that the quantities here denoted r and q both stay fixed as some X-score rises during the pre-crossover phase. Therefore q-r stays fixed. Therefore the net effect of both changing the ratings-style ballots (via a continuous motion over a positive timespan), and electing X, and (therefore) deweighting, is to leave the net weighted approval for Y unaffected – as we should.
So this identity proves the theorem, at least when the naive reweighting factor formula is being used. "Surgical alterations" of the reweighting factor formula could alter that – but we assume those only are invoked (and only can be invoked) once all parties have already gotten all the seats they "deserve." In that case violations of strong PR are bounded by the number of seats that can be awarded to a party after that point. If our "surgical alterations" were designed keep that number small, e.g. bounded by some absolute constant, in all "racist voting plus commonly rated candidates" scenarios – or if we do not care about violating strong PR in such situations – that is ok.
So the proof is now complete.
Runtime: With V voters, C candidates, W winners, 0<W<C, the Holy Grail scheme, assuming approval-style ballots, can be made (using "incremental" algorithm techniques, cf. Arndt 2011) to run in O(W!V) steps using O(V) words of memory to compute Q for any given W-member ordered parliament; and hence by brute force consideration of every possible ordered parliament in O(C!V/(C-W)!) steps to find the optimum (Q-maximizing) parliament.
With score-style ballots, the Kotze-Pereira transform requires an initial O(VC) steps to convert them into ≤min(VC,2C) weighted approval-style ballots.
How outrageously slow is that? We would prefer it if the Holy Grail for approval-style ballots somehow could be made to run about W! times faster, i.e. if the Q-computing algorithm had runtime≤polynomial(V,C). Call that desire a polynomially-efficient holy grail. (We'll achieve it later, but for the moment we want to demonstrate that it is unclear whether it is possible.)
We have elsewhere presented arguments that for "optimum PR" elections it is inherently NP-hard to find an optimum parliament, and indeed even to attain approximate optimality. So a certain amount of brute force computing is necessary to perform optimum PR elections.
However, the above Holy Grail scheme worsens that picture by now employing a Q function (quality measure) which is hard even to evaluate for a wholy-known parliament! We indeed might well speculate that our particular Q is NP-hard to evaluate, perhaps even NP-hard to approximately evaluate. The basis for the first speculation is that "minimum feedback arc set" (and vertex set) and "max cut" and "sparsest cut," maximum and minimum "linear arrangement," and the "traveling salesman problem" all are known NP-complete problems that look related to the problem of finding the Q-maximizing ordering of a parliament with the present section's definition of Q.
The basis for the stronger speculation is the observation that known results (Ambühl et al 2011; Crescenzi, Kann et al 1998+; Orponen & Mannila 1990; Engebretsen & Karpinski 2006) indicate that (subject to very-believed computer science conjectures) there exists a constant κ>1 such that the versions of those problems that correspond best to the Q-maximizing problem, namely minimum feedback vertex and arc sets, minimum linear arrangement, sparsest cut, and traveling salesman problem with edge costs {1,2} only, all are inapproximable to within factor κ via polynomial time algorithms. That vaguely suggests that perhaps the holier grail might be unattainable. Specifically, no matter how you tried to design a polynomial-time-computable quality function Q, those results suggest it would inherently be unable to approximate the present section's holy grail Q well enough. That suggests the whole approach of trying to emulate our holy grail function, approximately, via something faster, is doomed to failure.
Any polynomial-time efficient Q defining a holy grail voting method, if one exists, therefore would need to have a quality function Q quite different from ours. Which is, in fact, exactly what is going to happen.
Tabulated below are some example elections using MSPAV with best ordering.
|
|
In all the above elections there are exactly two political parties, "A" and "B"; and exactly 12 seats in the parliament. We use Δ=½. In the table on the left, we use mono(x)=x and F∞, but in the righthand table we use mono(x)=x-1/2 and F4.
Each row of each table states one election. The first three columns describe the votes: The first column states the number of voters approving party A. Similarly the second column states the number of voters approving both parties A and B, while the third column gives the number of voters approving B only. (We assume "racist" voting, i.e. each voter who approves any A-member, approves them all; and that there are an unlimited supply of candidates from each party.)
Column 4 gives the best ordered parliament in chronological order (left to right as time increases) of election. Although there may be more than one co-equally-best ordered parliament, we only state one.
Columns 5 and 6 give the seat counts for parties A and B in that parliament. Finally, the last column states the value of Q (which in the lefthand table is maximum possible, but in the righthand table is minimum possible, because we only are considering the best possible parliament).
Gratifyingly, in every case with zero voters approving both parties (bottom 6 rows of table, colored orange), as expected, the parliament exhibits perfect PR. That is, the A:B seat-ratio in the parliament, is the same as the A:B approval-ratio in the ballots.
One might like that to be unaffected by the number of "approve both A & B" ballots. I.e,
Possible "cross-party approval independence" (CPAI) property: Adding all-party-approving ballots to an election ought not to affect the seat-count ratios amongst the parties (assuming otherwise "racist" party-line voting).
But arguments can be made that CPAI is an undesirable property. The question is debatable:
"Cross-party approval diminution" (CPAD) property: Adding all-party-approving ballots to an election either ought not to affect the seat-count ratios amongst the parties (assuming otherwise-"racist" party-line voting) or it ought to diminish ratios that had exceeded 1.
CPAI is achievable. For example Thiele's method, also called "reweighted range voting" if scoring-style ballots are used, achieves CPAI. But at this point is is unknown whether CPAD is achievable in combination with the "holy grail" properties.
However, evidently, in the lefthand table, the ratios are affected, i.e. CPAI is violated (although never by more than 1 seat per party in the examples tabulated); and more disturbingly, those effects have the "wrong sign" – adding "approve A&B" ballots sometimes magnifies the seat-count ratio for the stronger versus the weaker party, whereas one would intuitively desire either no effect (CPAI), or a diminution. Thus the lefthand table also demonstrates CPAD violations.
However, by replacing mono(x)=x by mono(x)=x-1/2 and F∞ by F4 (righthand table) the number of violations of the CPAI property is diminished; indeed we get perfect CPAI PR except for two elections in which we have CPAI to within ±1 seat errors in both cases, and the discrepancies from CPAI all have the "correct sign" i.e. obey CPAD. But bizarrely, the symmetric A=1, A&B=10, B=1 election then prefers to elect 7 A's and 5 B's (or the reverse – 7 B's and 5 A's – these two parliaments are co-equally optimal) breaking the symmetry. This suggests the resulting system still violates CPAD, even though, technically, there are no CPAD violations in the righthand table above.
Strangely enough, in every election shown, either the righthand or lefthand election method delivered perfect CPAI PR, but sometimes not both.
Now for a further investigation. In the table below, the same two sets of voters elect parliaments with 6, 12, 18, 24, and 30 seats, each time using F∞, Δ=½, and mono(x)=x. (Also note, in every case, if the "A&B-approve" bipartisan voters were removed, then we'd have perfect PR, i.e. exactly the same votes ratios as seats ratios. Those are not shown; just trust me about that.)
A | AB | B | best ordered parliament |
seat counts A, B | Q | seat counts A, B with F4 and mono(x)=x-1/2 |
---|---|---|---|---|---|---|
5 | 10 | 1 | AAAAAB | 5, 1 | 48.1093 | 5, 1 |
1 | 10 | 2 | BBBABA | 2, 4 | 38.7278 | 3, 3 |
5 | 10 | 1 | AAAAAAAAABAB | 10, 2 | 90.7611 | 11, 1 |
1 | 10 | 2 | BBBBBBABABAB | 3, 9 | 72.9874 | 4, 8 |
5 | 10 | 1 | AAAAAAAAAAAAAABABA | 16, 2 | 133.449 | 16, 2 |
1 | 10 | 2 | BBBBBBBBBABABABABA | 5, 13 | 107.251 | 5, 13 |
5 | 10 | 1 | AAAAAAAAAAAAAAAAAABABABA | 21, 3 | 176.126 | |
1 | 10 | 2 | BBBBBBBBBBBABABABABABABA | 7, 17 | 141.501 | |
5 | 10 | 1 | AAAAAAAAAAAAAAAAAAAAAAABABABAB | 26, 4 | 218.778 | 26, 4 |
1 | 10 | 2 | BBBBBBBBBBBBBBABABABABABABABAB | 8, 22 | 175.756 | 8, 22 |
The CPAD property evidently is violated in all non-orange elections tabulated, but the violation is only by 1 seat extra except in the bottom line of the table (30-member parliament; colored aqua) where the violation is by 2 seats: (8,22) versus (10,20). Also, the rightmost column of the table shows what would happen to the seat-counts if we changed the election method to use F4, Δ=½, and mono(x)=x-1/2. Note that CPAD still would be violated.
Note re runtime: For elections of the above kind featuing K parties and completely-partisan voting, it is trivially possible to maximize Q and find the "best" parliament in only order KWV steps rather than W!V. It is an open question whether such a speedup is possible for entirely-general elections.
Now let us redo something much like our original 12-member parliament, 2 parties study, but now for a 24-member parliament. MORE???
Here is a straightforward attempt to generalize MSPAV to create a holy grail voting system whose quality function Q is evaluable in polynomial time. As we'll see, this attempt fails – but the explanation of why and how it fails will be useful mental preparation for the next section.
Algorithm for computing a quality function Q for a parliament α using a weighted set of approval-style ballots β:
In this algorithm Δ is a constant, e.g. Δ=½. Incidentally, if desired (and if Δ≥0) we can agree to pre-normalize the ballot weights so that they sum to 2W-1(W+Δ)/(1+W+2Δ) where W is the number of seats being elected, i.e. the cardinality of the input parliament α. This normalization ensures (a priori) that the final value of Q will obey 0≤Q≤1, with Q=1 happening only if every ballot approves every candidate in α.
Derivation of that normalization: Let Z be the sum of the ballot-weights. If all ballots approve all candidates then the deweighting factor F isF[S]=1-1/[S+Δ] ifS+Δ≥2 , which in turn is assured ifΔ≥0. Here we have written F[S] to make it explicit that F depends on the number S of seats remaining to be filled. These F[S] are used for S starting at W (the number of seats in parliament) and going down to 2. ThenQ = Z + Z F[W] + Z F[W] F[W-1] + ... + Z F[W] F[W-1] ... F[3] F[2]. Now sum the series to findQ = Z+Z∑2≤k≤W∏k≤S≤W(1-1/[S+Δ]) = Z+Z(W-1)(Δ+W/2)/(Δ+W). Hence to cause Q=1 we needZ=(Δ+W)/(Δ+W+[W-1][Δ+W/2]) which simplifies toZ=(2/W)(W+Δ)/(1+W+2Δ) valid if Δ≥0.
This quality function obviously is computed in O(W2V) steps if there are V>1 voters and W≥1 winners. This is polynomial time.
And the election method consisting of electing the W-winner parliament that maximizes Q indeed will obey some of the Holy Grail desiderata:
Unfortunately, this quality function Q fails to yield a holy grail voting method, because it is discontinuous. Consider E>1 parliamentarians whose weighted approvals are exactly tied. Now consider breaking the tie by, e.g. infinitesimally reducing the approval for one while infinitesimally increasing it for another. In that case, the value of Q will, in general, jump discontinuously versus if the tie had been infinitesimally broken in a different manner.
THIS SECTION NOT YET FINISHED??
Our approach to create a (genuine) holy grail scheme will be to use the failed scheme above, but try to convert its quality function Q to become continuous without sacrificing either its desirable properties or its polynomial-time algorithmic efficiency. There is an interesting and fairly general looking transformation we can employ for that purpose.
M-fold cloning transformation. "Clone" each member of parliament M times (getting a new parliament with MW seats instead of W). (All clones receive the exact same approvals on ballots.) Call the resulting quality function QM.
Now consider the M→∞ limit of some post-processed version of QM, such as (QM)1/M or QM/M.
The hope is that (i) some such "clone+post-process+limit" transformation converts a discontinuous Q into a continuous one, and (ii) while still preserving the polynomial-time computational complexity of our Q. Obviously, in general these hopes are not necessarily going to be satisfied, and a certain amount of creative art is needed to make (i) and (ii) happen – but it often seems plausible that such an attack will work.
To explain why this could hope to yield a continuous Q even though the original untransformed Q is discontinuous: Suppose there are two candidates with the first having a lot more approval then the second. The chronological election order behaves like 1111122222 and if the second's approval increases then maybe 1121212122, and if it increases further then maybe 2212121211, and after still further increase maybe 2222211111. Here we have shown M-fold cloning with multiplicity factor M=5. So in this way, if M→∞ we could hope to get a continuous quality function, where we do not just discontinuously jump from a 12 to a 21 election situation, but rather pass through an infinity of intermediate states.
But then what if there is a three-way near tie among candidates 1, 2, and 3? We then realize the behavior could get quite messy...
Now let us discuss how we are going to apply this whole idea to the specific "failed" Q above. Our examination will start out abstract but we shall make it more specific toward the end by working out the particular formulas we need.
It is more useful to view everything not as M-fold cloning with M→∞, but rather as continuum-divisible candidates using continuum time.
That is, an election might proceed as follows. Candidate 1 is the most approved. We elect 0.37 of him, causing, via "fractional deweighting" of his ballots, his approval to drop to become exactly equal to candidate 2's. Now we fractionally-elect both candidate 1 and 2 at relative "election rates" of 0.53 and 0.32 (causing their deweightings to remain exactly equal to each other as time goes by), and keep doing that for just the right amount of time to cause both their approvals to become exactly equal to candidate 3's. Now we fractionally-elect both 1, 2, and 3 at relative rates... until candidate 1 drops out of the picture since he has now been fully elected, now only 2 & 3 continue on being fractionally elected. Etc.
This then is a continuum-time election process with a finite number of possible "states." Transitions between the states occur whenever two candidates' approval levels become equal (approval levels change continuously with time) or whenever a candidate becomes 100%-elected (his election-fraction also changes continuously with time).
Each time such a state-transition occurs we need to solve some set of simultaneous equations to determine the new "election rates" for whichever candidates are then involved, and then solve some other equations to determine how long to proceed until the next state-transition occurs.
WRONGNESS NEEDS TO BE FIXED??:
The total algorithmic runtime should then be
The number T of possible algorithm-states is at most 2W where W is the number of winners, since this counts the number of possible subsets of the W members of parliament. But note that if a candidate gets incorporated into the current state, then he necessarily stays in it until he becomes fully elected, whereupon he leaves it, never to return. This "once in, once out" behavior means that the possible state-transitions are not arbitrary and there cannot be anywhere near as many as 2W when W is large.
How many states can arise? The answer is the solution of this combinatorial problem. Get out your pencil and draw W different horizontal line segments like this:
*------------* *--------------* *-------*
(here shown with W=3 representing the time-intervals when each of the W members of parliament is involved in the algorithm's "state"). Then ask: what is the maximum possible number of different "states" (i.e. different intersections of these W line segments with a rightward-moving vertical scanline representing "time") that can occur in any such drawing? If the answer is called T(W), then we find
and T(W+1)=2+T(W) because each new line-segment (at worst) splits the states occuring at each of its two endpoints. Hence T(W)=2W+1.
This is indeed much slower-growing than 2W and ensures the whole algorithm will run in a number of steps bounded by a polynomial.
Excellent.
Now to make this efficient, we need the simultaneous equations to be linear equations (efficiently solvable via Gaussian elimination); or if they somehow are solvable via linear programming (or semidefinite or convex programming) that also would be acceptable, since those also have polynomial-time algorithms. The point is, we need them to be solvable more efficiently than general simultaneous nonlinear equations, because for them only exponential-time algorithms are known.
The other kind of equation – to determine the timespan until the next state-transition – is trivial because it involves only one variable, so some sort of binary search procedure (or any other efficient 1-dimensional rootfinding scheme; see Press et al and note the monotone-decreasing nature of anybody's weighted approval causes such roots to be unique) will solve it to high accuracy fast.
All that has been a somewhat abstract proof sketch telling us whether and when our whole "M-fold cloning in the M→∞ limit" idea can be made to work to yield a polynomial-time algorithm for evaluating a now-continuous parliament-quality function.
We now shall make this proof more concrete and less abstract by working out what, exactly, all the equations are for our particular application.
Iterated deweighting. Let us consider the F1, F2, and F3 deweighting formulas, all of which, we remind the reader, are the same if h≤½, namely
were A is the weighted approval of the candidate currently being elected, U is (with the same ballot weightings) the weighted approval of a hypothetical universally approved candidate, S is the number of seats remaining to be filled (just before our candidate is elected), and Δ is a constant such as 1/2.
The effect of deweighting using this F is to replace A by A' and U by U' and finally S by S-1 where
Letting B=A/U and B'=A'/U', we also find
Now let us iterate this replacement K times, causing the replacements U→U(K) and B→B(K) where for example B(0)=B and B(1)=B'. Writing J=S-Δ for brevity, the U result is
because the product telescopes. The first few B iterates are
making it clear that the general B result is
which is readily proved by induction via
Of course, the above derivation assumed h≤½ and hence those iteration formulas break down if h>½. But this regime also can be handled. It is simplest to do so if we are using formula F1=max(½,1-h), because then the K-fold iterated deweighting factor F is simply 2-K. In this way, the general notion of "iterating the F1 deweighting formula K times" can be defined, and efficiently computed, for any real K. One would need to use an efficient 1-dimensional rootfinder to find the "critical" least K, call it Kcrit, causing h to rise to ½. (Again note the monotone-decreasing nature of anybody's weighted approval causes such roots to be unique.) If the desired K is greater than Kcrit, then cut the K-fold iteration into two parts: before and after this transition, and perform them each separately.
The resulting formulas place us in a position to consider fractional iteration, i.e. the notion of "electing somebody K times" where we now allow K to be a noninteger real number. The effect of that is to replace U→U(K) and A→A(K) and S→S-K, and each ballot that approved him is deweighted to A(K)/A times its original weight, while the weight of each ballot that disapproved him is unaltered.
[More generally, a theory of "fractional iteration" was devised by Ernst Schröder in 1871 and is discussed at length in the book by Kuczma et al. However, we do not need general Schröder theory to handle F1.]
We also can consider the notion of fractionally-electing a set of E candidates who all are tied, i.e. all have equal weighted approval. To do so, we ε1-elect candidate 1, then ε2-elect candidate 2, ... then εE-elect candidate E, then repeat this cycle, where the εj all are positive infinitesimals. The ratios of the εj are the relative "election rates" of these E candidates, and need to be chosen in such a way as to cause all their weighted approvals to remain tied as we proceed. That may be accomplished by solving a set of E simultaneous linear equations.
In general, the
εj
will be unequal, but
if E=2 then we enjoy equality ε1=ε2;
and in any "no overlap"
situation where no two current candidates are simultaneously approved by any voter,
we also would enjoy equality
Determining the εj may be accomplished via Gaussian elimination in O(E3) operations, plus the time required to write down the system of equations in the first place, which obviously is at most O(VW). Once the εj have been found, performing the process is simply a matter of solving a system of first-order ODEs (ordinary differential equations) to track the time-evolution of the ballot weights for the R relevant kinds of ballots. Here of course R≤min(2E, V) because there are only V ballots (since V is the number of voters) and because a ballot could approve any subset of those E candidates, and there are 2E such subsets.
If ODE-system solving is regarded as efficient enough to get enough accuracy, and we don't worry about very unlikely ultra-near-ties where you would need extreme accuracy, then this shows there is an efficient algorithm. We should really write down the system of equations and the ODe system fairly explicitly... LITERATURE: I think Ker-I Ko has a book where he shows ODE-solving is a polytime task within his notion of "polynomial time computable real numbers." [Specifically the initial value problem y'=f(x,y) with y(0) specified, is computable function if f is and if solution unique. If furthermore f is Lipshitz with respect to its 2nd argument, then y is P-space computable but this can be P-space complete.] Have to look that up. See also Bournez, Graca, Pouly 2011 & 2016 and Smith ????. In our case the "stiffness" of our ODE system is less than E, and presumably additive errors ε with 1/|ε| bounded by a polynomial in V and C would be acceptable, which should be achievable using plain 4th order Runge Kutta as in the Press book... albeit with fancier high order ODE-solving schemes probably exponential accuracy would be achievable... So I think this will prove we indeed have gotten it into polynomial time, with approriate definitions and caveats... Albeit... it's a rather horribly messy algorithm...
Algorithm definition: For a V-voter, C-candidate, W-winner election, with 1≤W<C and 6(W+1)W<V, this algorithm will compute a cost-function Q of the ballots and the W-winner parliament. The election method then is to elect the W-member parliament having minimum cost.
1. [Input and pre-process votes] Each voter, as her ballot, approves (1) or disapproves (0) each candidate. We allow arbitrary positive voter weights; most simply all voters v have weight ρv=1. If score-style ballots instead were provided, then convert them to weighted approval-style ballots via a preliminary Kotze-Pereira transform; then let ρv denote the weight of voter v's ballot.
2. ["Approval removal"] The approval by voter v for candidate c is thus a boolean quantity (1 or 0). Replace it by Xvc. Here the Xvc are VW different real-valued variables each constrained to lie within the real interval [0,1] if v approved c, but constrained to equal 0 if v disapproved c. Their values will be explained/determined later (via the cost-minimization in step 5).
3. [CFAT] Apply a "coin-flip approval transform" (CFAT) to convert the resulting set of weighted partial-strength approvals, to a different such set. The simplest CFAT is that each voter is split into two voters, each with half-weight: the first approves candidate c (with strength Xvc as before) if the original voter did; the second simply disapproves candidate c. This splitting is repeatedly done for each member c among the W members of parliament, so that at the end of the process each voter is split into 2W voters, each weighted 2-W times the original voter's weight.
Fancier kinds of CFAT also are possible involving "biased coins," where the weights ρa and ρb of the two daughter-voters a,b are no longer equal, but instead are pre-specified positive multiples of the original weight ρv of the voter v, withρv=ρa+ρb. Also possible are certain cleverer kinds of "coins" which replace the exponentially growing 2W with certain quantities bounded by polynomial(W). See the text for descriptions of how those work and their properties.4. [Extra fake voters] Adjoin 3W fake voters, each of whom approves every candidate. Equivalently but more simply, a single fake voter v weighted ρv=3W will do, such that Xvc=1 for every candidate c. (It also is possible to omit this step by replacing it with various other devices.)
5. [Compute cost function] Compute the "Ebert cost function" Q as follows. Let Ac denote the sum, over all voters v, of the ρvXvc-weighted approval for candidate c, that is Ac=∑voters vρvXvc. Let load(v)=∑winning cXvc/Ac be the sum, over all W winning candidates c, of Xvc/Ac. Finally define Q=min∑voters vρvload(v)2, where the minimization is done over all possible choices of the VW variables Xvc subject to the constraints
0≤Xvc≤1 and Xvc=0 if v disapproved c and ∑vρvXvc ≥ 3W for each winning c. The text will explain how this minimization problem can be solved to D-decimal accuracy in polynomial(V',W,D) steps where V' is the number of distinct voter-types after the CFAT transformation in step 3 and fake-voter adjoining in step 4. It will be seen that the minimum always is unique if (i) it exists and (ii) not all members of parliament were approved by exactly the same sets of voters. It will exist if all W members of parliament would have got at least 3W approvals if there had been no removal – i.e, because we have written in step 4, always.
Lemmas about concave-∪ functions: A smooth function of n variables is "concave-∪" if along any line in n-space, its second derivative is nonnegative (or, for "strict" concavity, positive).
Lemma 1: 1/(x1+x2+...+xn) is concave-∪ wherever its denominator is positive. This concavity is strict along all lines except those perpendicular to the (1,1,...,1) direction. Essentially because: the second derivative of 1/(K+x) with respect to x is 2/(K+x)3.
Lemma 2: When x and A+x are positive, x/(A+x) is not a concave-∪ function of x. [Its second derivative is -2A/(A+x)3.] However, its square is concave-∪ wherever 0<2x<A. Because: Its second derivative is 2(A-2x)A/(A+x)4.
Lemma 3:
Indeed
Lemma 4: [x/(A+x)+(W-1)/(B+x)]2 is a concave-∪ function of x wherever 0≤x≤1 and 3≤3W≤min(B,A)+1. Essentially because: Its second derivative is
and the first term is a square (hence nonnegative) while the second is a product of two terms, each nonnegative under our assumptions.
Lemma 5:
The square of a concave-∪ function is concave-∪ wherever the original function
was positive.
Because:
The second derivative of (½)f(x)2 is
Lemma 6: The sum, and indeed any linear combination with positive coefficients, (and also the max), of concave-∪ functions also is concave-∪.
Lemma 7: if f(x) is a concave-∪ real-valued function and g(x) is a linear transformation (x and g can be vector-valued, not necessarily with the same dimensionalities), then f(g(x)) is concave-∪. If, furthermore, g is invertible [more precisely, if x can be back-deduced from g(x)] and f is strictly concave-∪ then f(g(x)) is strictly concave-∪.
Now by combining lemmas 1-7, noting that the CFAT is a linear transformation, and doing a little thinking, we obtain the
Concavity Theorem: Our "sum of squared loads" cost function Q is a concave-∪ function of the WV "approval removal" variables Xvc in the domain where 0≤Xvc≤1 and where each of the W winning candidates gets at least 3W approvals after all "approval removals," i.e. Ac≥3W for each winning candidate c. (Indeed, each squared load is individually concave-∪.) The concavity of Q is strict provided that not all members of parliament are approved by exactly the same sets of voters.
Because of the concavity theorem, it is possible to compute the optimum (i.e. Q-minimizing) choice of the Xvc's, for any specified W-member parliament, subject to our linear inequality constraints, to D-decimal (or better) accuracy via known "convex programming" algorithms such as the "ellipsoid method" in time bounded by a known polynomial(V',W,D). Due to the strict concavity, this optimum will automatically be unique.
That allows us to compute our cost function Q in polynomial time, with the proviso that there might be no solution of the convex programming problem. For example, if some member c of parliament has less than 3W approvals, then there clearly cannot be a solution. In such a case our algorithm must declare failure. Indeed, if fewer than W candidates got ≥3W approvals, then no possible W-member parliament would have a computable Q, which would be complete failure. However, the purpose of putting step 4 in the voting method was to prevent this problem from ever occuring.
Was our insertion of step 4 legitimate? Let us explain this in more detail. If we insist that candidates must be approved by at least 3W voters, even after all approval-removal, in order to win a seat, then at least one parliament meeting the conditions for Q-computability will always exist provided there exist at least W candidates each of whom got at least 3W approvals with the CFAT, but without doing any removals. Without step 4, if enough voters hated enough candidates, then it would be possible for no parliament to exist meeting our conditions. However, in that case, at least for elections large enough so that V≥6(W+1)W, it would not matter what we do, because the problematic candidates would all have gotten so few approvals that they do not even "deserve" half a seat (reckoned by Droop quotas) so that their election could not violate basic PR in any "racist voting" situation by more than half a seat per "color." So we could, for example, simply elect the W most-approved candidates whenever fewer than W had ≥3W approvals. But to treat everything in a simple and unified way we added step 4 to the algorithm, which adjoins 3W fake voters who each approve everybody. If we knew that every candiate got at least one approval (e.g. from themselves) then we could have step 4 add only 3W-1. Provided we regard these 3W fake voters as too few to hurt proportionality – they amount to less than half a Droop quota if V>6(W+1)W – problem solved.
Voting system properties. In the previous papers (specifically part II) of this series, we already had examined the "PerEbert" voting system, involving minimizing a "sum of squared loads" cost function, and proved it yielded both basic and strong PR. Indeed, we showed that in "racist voting" basic PR elections, it became exactly equivalent to Sainte-Lagüe party-list.
We further had examined the effect of the CFAT on PerEbert, for a definition of CFAT involving W "coin tosses," each with an arbitrary single fixed "coin-bias" value. We proved that the resulting voting system in "racist voting" basic PR elections, with fair coins was exactly equivalent to d'Hondt party-list PR; and for general coin bias yielded a kind of PR corresponding to Smith's new "divisor method" of party-list voting continuously variable between Sainte-Lagüe and d'Hondt by choice of the "coin bias" parameter.
We also had examined the effect of (what we called) "shifts" or "ghost voters" upon this PR. These in the present paper are instantiated by step 4. We had shown that if the number of ghost voters was smaller than order V/W, then their effect on proportionality was small. This is exactly what happens for us if V>6(W+1)W. Further, we showed that the effect of adding order V/W ghost voters perturbed Sainte-Lagüe in the "opposite direction" to d'Hondt (approximately – exact equivalence to d'Hondt party list is unattainable in this manner). Therefore, the PR-altering effect of adding ghost voters in our step 4 will be opposite in sign to the effect caused by the CFAT in step 3 and indeed by appropriately choosing both the number of ghosts and coin-bias, these two effects could, if desired, be made (at least within first order approximation) to cancel each other out so that we would still enjoy Sainte-Lagüe-like PR. Or by manipulating the ghost-voter count and CFAT coin-bias our PR could be tuned to yield other PR notions.
Monotonicity. However, paper II also showed that the PerEbert voting method disobeys "monotonicity." The main innovation here is to realize that the "optimized approval removal" in our steps 2 and 5 both
Weak monotonicity: Voters by adding additional approvals of some parliament, can never increase its cost function Q.
Strong monotonicity: A stronger and more desirable monotonicity statement would replace "can never increase" with "always decreases."
This also is true. The reason is that we've already shown that the optimum of our convex program is generically unique and it is a continuous and smooth, indeed analytic function of its parameters ρv (essentially because of the well known theorem that the roots of a polynomial are continuous and smooth – and analytic except on a set of lower dimensionality – functions of its coefficients). If our monotonicity were merely weak then an analytic function would have a "flat spot" – an open set in the complex plane throughout which the function assumed a constant value. But by analytic continuation, this is impossible unless the function everywhere equals that constant, which in our problem is plainly untrue.
Approval-removal does not hurt PR. Approval removal does not destroy either basic PR or strong PR. To see this, it suffices to observe that those PR claims (by their definitions) pertain only to "racist voting" situations with enough candidates of each "color" available; and in those situations – at least if there are no "integer roundoffs" i.e. each party that deserves 6 seats was getting 6 seats, there were never any parties which deserved 5.8 and therefore got 6, i.e. we restrict attention to racist voting elections with all voter "loads" equal – then minimizing cost can be accomplished without doing any removals. In such situations we claim PR still happens by minimizing the Ebert "sum of squared load" cost function, even if optimal approval removal is also happening. Why? Our concavity theorem shows the optimum removal is unique in all no-roundoff racist situations with ≥2 colors. So if "no removal" is optimum, then it is the unique optimum.
Specifically, in any situation where the voter loads all are equal, then an equivalent form of the Ebert cost function (namely: variance among the voter loads; minimizing this is equivalent to minimizing Ebert if the number of voters and number of seats both are fixed, as was demonstrated in paper II's "Ebert & Variance" remark) is minimized since it is zero. In "basic PR" racist voting situations, that happens provided that Sainte-Lagüe manages to attain "perfect PR" with no "leftovers" i.e. no roundoffs to integers needed.
This also happens in "perfect strong PR" racist voting situations with "commonly-rated candidates" adjoined, since the commonly rated candidates do not affect the perfect situation that the load variance is 0. The result of these thoughts is the
Theorem (Approval removal does not affect "perfect PR"): Approval removal does not destroy "perfect basic PR" and does not destroy "perfect strong PR."
However, we still need to prove the stronger theorem which also allows imperfect PR (i.e. including nontrivial integer roundoffs). We now do so. For simplicity, let us first examine the case of two "colors" (e.g. political parties) in "basic PR" situations. We are going to consider four cost function variants Q1, Q2, Q3, Q4.
Assume racist voter behavior since we are examining "basic PR." Let the first party have X voters and win S seats; the second has Y voters and wins T seats.
The Ebert cost function is
If that cost is minimized subject to S+T=fixed we find via the method of Lagrange multipliers that S/Y=T/Y, i.e. perfect proportionality. Presumably in the imperfect case where we require S and T be integer the answer would be the same as the optimum real-number S and T, except one would round one of them up and the other down. If we let X' and Y' denote the reduced values of X and Y got by doing approval-removals (if any) then the cost function instead is
if all removals are "all or none" i.e. each voter either removes all or none of her approvals. This Q2 formula also is valid if all removals are fractional and equal (for example if each approving voter removes a fraction 0.37 of each of her approvals, where 0.37 is any value that does not depend on the voter except is allowed to depend on her party). Note Q2≥Q1 and indeed Q2>Q1 if 0≤X'<X or 0≤Y'<Y or both, regardless of S and T. Therefore, approval-removal does not help, i.e. cannot reduce cost, in "perfect PR" or "imperfect PR" 2-party situations with either all-or-none, or equidistributed-within-each-party-fractional, removals.
If we do CFAT (using "fair coins"; with other coin-biases the shifts "1/2" in the cost formulas below would change to other constants) without any approval removal then by the analysis given previously in paper II, the cost function becomes
Finally if we do both CFAT and approval removal the cost function becomes
assuming either all-or-none, or equidistributed-within-each-party-fractional, removals happening prior to CFAT.
Again, Q4≥Q3 and indeed Q4>Q3 if 0≤X'<X or 0≤Y'<Y or both, regardless of S and T. Therefore, approval-removal does not help, i.e. cannot reduce cost, in either "perfect PR" or "imperfect PR" 2-party situations with all-or-none, or equidistributed-within-each-party-fractional, removals.
Now we claim as a lemma that the optimum way to remove approvals in racist-voting situations is, in fact, equidistributed-within-each-party-fractional. Why? Basically because of the previously shown concave-∪ nature of the Ebert cost function in approval-variable space: If you are minimizing any strictly-concave-∪ function of N variables within some real interval (same interval for every variable, and function invariant under permuting the variables) then it is a fairly well known theorem that this minimum necessarily
That theorem is an immediate corollary of Jensen's inequality. So that has proven
Theorem (interacting CFAT & approval-removal): Interacting CFAT and approval removal, cannot destroy basic PR in either perfect-PR or imperfect-PR (rounded) scenarios.
We proved this for only 2 parties, but actually our whole argument works with any number (not necessarily 2) of parties – it generalizes trivially. (With large numbers of parties it becomes far less clear what the optimum rounding to integer seat counts is, but that doesn't matter, since the proof argument does not need to know what it is!)
Now finally, we can extend the theorem to also allow "strong PR," essentially because commonly-rated candidates cannot alter the minimality of the variance in the equivalent variance-based formulation of the Ebert cost function, and since the Kotze-Pereira transform is known not to hurt either basic or strong PR.
About convex programming. Which convex programming algorithm will give us the best possible performance, meaning the least runtime and/or the least memory consumption? To learn about this, a good start is the review paper by Bubeck 2015 and the earlier ones by Bland, Goldfarb, and Todd 1981 and Goldfarb & Todd 1982. (And the book by Grötschel, Lovasz, and Schrivjer remains a classic, albeit somewhat dated.)
Definition of "convex programming." Given a concave-∪ real-valued Lipshitz function F(x), where x is an N-dimensional real vector, and a convex subset S of N-dimensional space, the "convex programming problem" is to find the minimum of F(x) among all x∈S. A convex programming algorithm will be said to be "polynomial time" if it provably always succeeds in finding that minimum to D-decimal accuracy, in a number of steps upperbounded by a known polynomial(N,D,L), where L is the input length in bits, and each "step" may involve an evaluation of F(x), its gradient ∇F(x), a D-decimal-accuracy arithmetic operation, or consultation of an "S-membership oracle" which either reports that a provided vector x indeed lies inside S, or returns a hyperplane separating x from S.
The historically first (or at least, first to achieve wide recognition)
polynomial algorithm for convex programming in N dimensions was the
ellipsoid algorithm.
This algorithm maintains an N-dimensional
ellipsoid known to
enclose the minimizing point x.
In our application, our dimensionality N obeys N≤VW,
and since all our Xvc lie within the real interval [0,1],
we could begin with the sphere centered at (½,½,,...,½)
with radius (N/2)1/2.
Each "major step," the ellipsoid algorithm evaluates ∇F(x) at the ellipsoid's center,
or consults the S-membership oracle, or both. Either way, it finds a hyperplane H
which separates the present ellipsoid centerpoint from the desired minimizing x.
We then use H to cut the current ellipsoid in half. The undesired "wrong" hemiellipsoid
is discarded. The "right" hemiellipsoid is then replaced by the smallest
ellipsoid surrounding it (which can be computed via a certain known formula
involving a constant number of N×N matrix-vector multiplications; preferably implemented
with numerically better behaved "matrix-factorization update" techniques as discussed by
Goldfarb & Todd 1982)
and we continue on. Each such step provably decreases the N-dimensional volume of the
ellipsoid by multiplication by a
For smooth cost functions (which ours is), the linear dependence on D can be reduced to ultimately-logarithmic by switching to a quadratically convergent iteration such as the N-dimensional Newton iteration – or quasiNewton or conjugate-gradient methods – once the ellipsoid method gets near enough to the answer to assure that these methods will converge.
Although this is polynomially bounded, unfortunately for us it is a horrifyingly large polynomial. Furthermore, the memory space needed just to describe an N-dimensional ellipsoid (or N-dimensional simplex; another version of the ellipsoid algorithm employs enclosing simplices rather than ellipsoids) would be order N2 numbers, which in our application means order (V'W)2 numbers. Keep in mind that in real world applications it would be common for 104≤V≤109.
The two most horrifying aspects of this are the quadratic dependence of the memory usage on the number V' of (pseudo)voters, and also the cubic or worse dependence of the runtime on V'. (And actually, the runtime can be quite a bit worse if multiprecision arithmetic is needed due to possibly poor numerical behavior of at least some variants of the ellipsoid algorithm when one tries to employ single precision arithmetic.)
(Usually) saving the day. We now want to point out some features of our particular convex programs which ought to allow, in almost all elections, reducing the memory-space needs to O(V'+W2κ) numbers and the runtime to O(V'W3κ-1) arithmetic operations. These improved bounds often should be quite acceptable, since both their dependencies on V' are merely linear, and since the number W of winners in real-world elections tends to be small, anyhow much smaller than V.
The idea is as follows. Randomly divvy up the V' pseudovoters into about V'/Wκ disjoint sets, each consisting of about Wκ voters. (The constant κ can be chosen empirically to get the best performance, from among, say, 1≤κ≤3.) For each such set in succession, use convex programming to optimize its own Xvc variables (leaving all the Xvc for v in other sets, fixed). Each such optimization runs in polynomial(W) arithmetics and memory. Call the entire set of all V'/Wκ of these reduced-dimension optimizations, an "iteration." If each iteration reduces the location-error for the minimizing x, by a constant factor (or better), then O(D) iterations will suffice to reach D-decimal accuracy. We believe that most elections with high probability (given the randomized voter partitionings) will behave that way. If so, excellent. If not, then more iterations will be needed, but we still will safely decrease Q each iteration and this process is guaranteed to converge to the minimum – that convergence will just take more iterations than we'd hoped. Probably even better than fully optimizing for each voter-subset, would be to do just one major step of the ellipoid method within a voter-subset, then switch to a different voter-subset and continue on.
We also remark that other ideas in Bubeck 2015 may allow further speedups. There are many algorithm variants one could try, and which perform best is largely an experimental question. For example, the "conjugate gradient" and "Nesterov accelerated gradient descent" (NAGD) general purpose optimization methods plausibly will work well, the latter because it is plausible that for most elections our cost function is "strongly concave" with small "condition number."
Minimization methods like BFGS quasi-Newton and the ellipsoid method have the advantage that they can "understand" general quadratic and general concave-∪ functions, but the disadvantage that this understanding requires a model containing order N2 numbers in N dimensions, and work growing quadratically or worse with N. Meanwhile although first-order methods like crude gradient-descent cannot understand general such functions, they work well for minimizing functions whose "condition number" is small, e.g. which are approximately spherically symmetric about their minima. For them a step takes only O(N) time and memory. This led to the question: what was the best possible performance, as a function of the condition number κ>1, for a first-order method, requiring storing only O(N) numbers? This question ultimately was answered by Yurii E. Nesterov and Arkadi Nemirovsky in the 1980s, who showed that the naive gradient descent requirement of order κ iterations to halve the error, was improvable to order √κ but not further. See Bubeck, Lee, Singh 2015, Allen-Zhu & Orecchia 2015, and Kim & Fessler 2015 for recent papers about methods related to NAGD which achieve optimum performance-dependence on the condition number for a first-order minimization method.
For us, the important – albeit nonrigorous – realization is that the runtime and memory dependence on V' can be reduced to linear while keeping the dependence on W polynomial. This claim in fact could be made rigorous provided we forbid elections featuring condition number κ depending too strongly on the number V of voters. For example if κ=Vo(1) then we could prove runtime growing like V1+o(1).
About the CFAT and "slimmed" versions thereof. The CFAT splits each voter into 2W different voters, by "tossing W coins" in all possible ways. This exponential blowup is acceptable if V≥2W because in that case the number of voters will still be of order V after merging voters having the same approval "type" (i.e. who approve the same set of winning candidates) by summing their weights ρv. And indeed, in most real-world large government elections this V≥2W condition has historically been satisfied. For example, in Australia, each state elects 12 senators to the federal parliament via a PR-STV multiwinner voting method, but only W=6 at a time (staggered terms) with typically about V=4×106 voters per state. In Canada, there are single-winner elections in ridings. As far as we've been able to tell, the record largest number of candidates that ever ran in a riding was 13, and each riding contains up to about 105 registered voters. So even if Canada switches to a PR system involving multiwinner elections, it seems likely that it will have W≤13 and V≥105.
However, we claim it is possible to replace our CFAT by "slimmed" variants featuring only polynomial blowup. We actually know many ways to do that, but the slimmest among them is as follows. Consider only the W winners. There are exactly W+1 subsets S of these winners with ≥W-1 members each. For each such S, replace each voter by one who approves the intersection of whoever she originally approved, with S. In this way each voter is replaced by W+1 new voters, not 2W. These new voters shall have two weights. Namely, one of the W+1 new voters is idential to the original voter, except will have a reduced weight; and the other W each shall have the second weight.
??
What is the purpose of the CFAT? It may seem as though the CFAT step 3 could simply be omitted from our voting method, while still enjoying Holy Grail properties. However, we would then suffer disagreeable behavior on the following class of (W=2)-winner elections with V=2(1+x)N voters and 4 candidates named A,B,C,D:
#voters | Candidates they approve |
---|---|
xN | A,B,C |
xN | A,B,D |
N | C |
N | D |
If x>0 is small, then the winners are {C,D}. But when x→∞ we would like {A,B} to win. Without the CFAT, unfortunately {C,D} would remain the unique winning parliament regardless of the value of x≥0, and the possibility of removing approvals for A and/or B cannot avoid this.
But with CFAT, it turns out that {A,B} wins when x>3 while {C,D} wins when 0≤x<3.
??
??
A computer check of the claims in our proof would help confirm or deny. The holy grail election method needs to be programmed and made available as software. Run test elections. Devise illustrative election examples.
The following weak "participation" property: if a new approval-style voter comes, she will increase Q more for parliaments with more approved MPs – seems likely to be false. Decide that.
List more properties HG obeys.
Write an introduction.
Zeyuan Allen-Zhu & Lorenzo Orecchia: Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent, (2014-2015) arXiv/1602.05248.
C. Ambühl, M. Mastrolilli, O. Svensson: Inapproximability Results for Maximum Edge Biclique, Minimum Linear Arrangement, and Sparsest Cut, SIAM J. Computing 40,2 (2011) 567-596.
Jörg Arndt: Matters computational: Ideas, Algorithms, Source Code, Springer 2011.
Michel L. Balinski & H. Peyton Young: Fair representation: Meeting the Ideal of One Man, One Vote, Yale Univ. Press 1982. Also there was a second edition from Brookings Institution Press 2001. JF1075.U6B3.
Robert G. Bland, Donald Goldfarb, Michael J. Todd: The ellipsoid method: a survey, Operations Reserach 29,6 (Nov-Dec 1981) 1039-1091.
Olivier Bournez, Daniel S. Graca, Amaury Pouly: Solving Analytic Differential Equations in Polynomial Time over Unbounded Domains, Mathematical Foundations of Computer Science 2011 (Volume 6907 of Springer Lecture Notes in Computer Science) 170-181. See also Polynomial Time corresponds to Solutions of Polynomial Ordinary Differential Equations of Polynomial Length (2016) by same 3 authors, arXiv 1601.05360.
Sebastien Bubeck: Convex Optimization: Algorithms and Complexity, Foundations and Trends in Machine Learning 8, 3-4 (2015) 231-357.
Sebastien Bubeck, Yin Tat Lee, Mohit Singh: A geometric alternative to Nesterov's accelerated gradient descent, (2015) ArXiv 1506.08187.
Volkan Cevher, Stephen Becker, Mark Schmidt: Convex Optimization for Big Data, IEEE Signal Processing Magazine 31,5 (1014) 32-43.
Pierluigi Crescenzi, Viggo Kann, M.Halldorsson, M.Karpinski, G.Woeginger (editors): Online compendium of NP optimization problems, continually updated electronic book. Described in paper How to find the best approximation results – a follow-up to Garey and Johnson, ACM SIGACT News 29,4 (Dec. 1998) 90-97. Two particular pages of interest to us: minimum feedback arc set & minimum feedback vertex set.
George B. Dantzig: Linear programming and extensions, 1963. Reprinted by Princeton Univ. Press 1998.
Henry R. Droop: On Methods of Electing Representatives, Journal of the Statistical Society of London 44,2 (June 1881) 141-202. Reprinted in Voting Matters 24 (Oct. 2007) 7-46.
Lars Engebretsen & Marek Karpinski: Approximation Hardness of TSP with Bounded Metrics, J. Comput. System Sci. 72,4 (2006) 509-546.
Oliver Friedmann: A subexponential lower bound for Zadeh's pivoting rule for solving linear programs and games, Proc. 15th conf on Integer Programming and Combinatoral Optimization (IPCO 2011) 192-206 (Springer LNCS #6655).
Oliver Friedmann, Thomas Dueholm Hansen, Uri Zwick: Subexponential lower bounds for randomized pivoting rules for solving linear programs, Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC 2011).
Oliver Friedmann, Thomas Dueholm Hansen, Uri Zwick: Subexponential lower bounds for randomized pivoting rules for the simplex algorithm, Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC 2011).
Michael R. Garey & David S. Johnson: Computers and Intractability: a guide to the theory of NP-completeness, W.H. Freeman & co, San Francisco 1979.
Bernd Gärtner, Martin Henk, Günter M. Ziegler: Randomized Simplex Algorithms on Klee-Minty Cubes, Combinatorica 18,3 (March 1998) 349-372.
Donald Goldfarb & William Y. Sit: Worst case behavior of the steepest edge simplex method, Discrete Appl. Matha. 1,4 (1979) 277-285.
Donald Goldfarb & Michael J. Todd: Modifications and Implementation of The Ellipsoid Algorithm for Linear Programming, Mathematical Programming 23,1 (1982) 1-19.
Martin Grötschel, Laszlo Lovasz, , Alexander Schrijver: Geometric Algorithms and Combinatorial Optimization, Springer 1993.
Refael Hassin & Shlomi Rubinstein: Approximation algorithms for maximum linear arrangement, Information Processing Letters 80,4 (Nov. 2001) 171-177.
R.G. Jeroslow: The simplex algorithm with the pivot rule of maximizing criterion improvement, Discrete Maths. 4,4 (1973) 367-377.
Jonathan A. Kelner & Daniel A. Spielman: A randomized polynomial-time simplex algorithm for linear programming, Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, STOC 38 (2006) 51-60.
Donghwan Kim & Jeffrey A. Fessler: Optimized first-order methods for smooth convex minimization, (2014-2015) arXiv 1406.5468.
Victor Klee & G.J. Minty: How good is the simplex algorithm, pp. 159-177 in O. Shisha (ed.) Inequalities III, Academic Press, 1972.
Ker-I Ko: Computational Complexity of Real Functions, Birkhauser Boston, Boston, MA, 1991. QA267.7.K6 1991
M. Kuczma, B. Choczewski, R. Ger: Iterative functional equations. Volume 32 of "Encyclopedia of Mathematics and its Applications," Cambridge University Press, Cambridge 1990. [QA267.7.K6 1991??]
Pekka Orponen & Heikki Mannila: On Approximation Preserving Reductions: Complete Problems and Robust Measures, Report C-1987-28, Department of Computer Science, University of Helsinki, revised May 1990.
Frederick W. Owens: On the Apportionment of Representatives, Quarterly Pub. Amer. Statist. Assoc. 17,136 (Dec 1921) 958-968.
Christos H. Papadimitriou & Mihalis Yannakakis: The Traveling Salesman Problem with Distances One and Two, Mathematics of Operations Research 18,1 (Feb. 1993) 1-11.
Toby Pereira: Proportional Approval Method using Squared loads, Approval removal and Coin-flip approval transformation (PAMSAC) – a new system of proportional representation using approval voting, arXiv/1602.05248, Feb. 2016.
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical recipes, The Art of Scientific Computing, Third Edition, 1256 pages, Cambridge University Press 2007.
Thorvald N. Thiele: Om flerfoldsvalg [On multiple election], Oversigt over Det Kongelige danske videnskabernes selskabs forhandlinger = Bulletin de L'Académie royale des sciences et des lettres de Danemark, Copenhague Kongelige Danske Videnskabernes Selskab = Royal Danish Academy of Sciences and Letters 3,4 (1895) 415-441.
Norman Zadeh: What is the Worst Case Behavior of the Simplex Algorithm? Tech. Rept. #27, Department of Operations Research, Stanford (1980).