Toby Pereira, Forest W. Simmons, Warren D. Smith. Dec 2015Feb 2016. PRELIMINARY DRAFT?? ONLY PARTLY WRITTEN
We previously had identified, as a (the?) top open problem in multiwinner voting systems, whether a "holy grail" such system exists. The main properties asked of the holy grail were that
Here "basic PR" means that in scenarios consisting of "colored" candidates and voters and featuring "maximally racist" voting behavior, the parliament will be guaranteed to have the same colorcomposition as the electorate – up to errors of order ±1 seat per color, i.e. of the same order forced by integer roundoff demands, and provided enough candidates of each color run (e.g. we cannot elect 5 Greens if only 3 run).
To be clear: the voting system never knows and is never told the colors of any voter or any candidate, it only knows the votes. And there is no requirement for any voters to vote in "colorlike" fashion. We are merely demanding that if they do, then proportional representation with respect to colors, must happen.
"Strong PR" means if we add uncolored candidates – whose ratings depend only on the candidate and not on the voter – to that picture, then the colored subset of the parliament will be guaranteed to have the same composition as the electorate. As bonus properties, it would be pleasant if the voting method
It originally seemed as though the holy grail were impossible, and indeed Smith had been able to produce impossibility proofs under certain ansatzes.
However, the present paper succeeds in finding three holy grail systems. The first, which is the simplest to define and program, unfortunately is computationally very laborious – the work grows like W!V, note the factorial sign, merely to evaluate Q for a specified Wmember parliament (V voters).
The second, which unfortunately is the most complicated to define (it involves numerically solving a system of ordinary differential equations as a subalgorithm!), involves a quality function Q which should be evaluable to D decimal places of accuracy in a number of steps bounded by (V+1)polynomial(C,T,D) for a Wwinner, Vvoter, Ccandidate election. Here T is the number of "transitions" within the algorithm, which hopefully usually is small, e.g. upper bounded by polynomial(C). Unfortunately we are unable to rule out the possibility that T might be exponentially large, e.g. have the same order as 2^{W}. The situation is analogous to the "simplex algorithm" for linear programming (Dantzig 1963). In practice, simplex experimentally virtually always runs in a polynomially bounded number of steps, but examples have been artificially constructed which cause exponential runtime.
The first such construction was by Klee & Minty 1972. Over the next few decades, various fancier "pivot rules" were proposed that hopefully would allow simplex to dodge the exponential examples, plus in some cases made simplex perform experimentally better. But as the decades rolled by, virtually every promisingseeming deterministic pivot rule was defeated by some (sometimes quite complicated) artificial exptime example: Jeroslow 1973, Goldfarb & Sit 1979, and Friedmann 2011, and FreidmanHansenZwick 2011a, 2011b. Historically, that simplex method quagmire was dodged when socalled "interior point methods" (IPMs) were invented which guarantee polynomial runtime bounds for linear programming. Also, Kelner & Spielman 2006 eventually found a randomized algorithm, whose "simplexity" is arguable, with a polynomial bound on expected runtime. But in both, the polynomial was not of the number of dimensions and constraints, but instead based on the number of bits of input. We have been unable to find an analogue of either IPM or Kelner/Spielman for the voting method we are speaking of here, and do not know whether one exists.
A third holy grail scheme called PAMSAC was invented by Toby Pereira after S&S showed him their first one. We'll generalize, redescribe and reprove it. Pereira's original PAMSAC cost function Q had impractically enormous computational requirements – exponential in VW+C, where V is the number of voters, C of candidates, and W of winners. We however shall show how to compute the PAMSAC Q function to Ddecimal accuracy in time≤polynomial(V,W,D) provided V≥2^{W}; and by replacing the "CFAT" inside it by a "slimmed CFAT" this proviso may be weakened to V>6(W+1)W, which effectively means there no longer is any proviso. Further, we have certain heuristic randomized algorithm speedups which (we argue nonrigorously) should reduce the runtime bound to V·polynomial(W,D) in almost all realistic elections – in other words enjoying linear growth with V if W and D are held fixed.
Along the way to producing our holy grails, we present two new strongPR voting systems that are both simple and computationally efficient (polynomial time): MSPAV (multiplicative sequential proportional approval voting), and MSPRV (" " " range "). Neither enjoys the full set of Holy Grail properties, but they are simple and they do enjoy strong PR.
CURRENTLY HG SCHEME #1 and most of #3 MANUALLY CHECKED BY WDS. HG #2 NEEDS MORE CHECKING. NOTHING COMPUTER CHECKED. NOT CHECKED BY FWS.
Parts I and II of this series had investigated "optimizing" multiwinner voting systems. I.e. let there be V voters, C candidates, W winners, 0<W<C, usually in practice C≪V, and in which the ballots were scorestyle or, more simply, approvalstyle. (Score: each voter scores each candidate with a real number from 0 [undesired] to 1 [desired]. Approval: the only permitted scores are 0 [disapproved] and 1 [approved], intermediate reals are forbidden.)
Define a continuous (or more smooth) "quality function" Q of the scores on the ballots and the winnerset. For approvalstyle ballots there are only 2^{C} possible ballottypes, and we can demand Q be a continuous function of the 2^{C} different counts, now regarded as real numbers. For scorestyle ballots we'll assume continuum scores, i.e. each voter scores each candidate with a real number from the interval [0,1], and demand Q be a continuous function of all those scores.
The voting method is to elect the Welement subset of the candidates that maximizes Q. (Sometimes we prefer to minimize Q, in which case it is better referred to as a "cost" or "penalty" function.) We'd previously stated various properties such multiwinner voting systems might or might not obey. The "holy grail" combination of properties was singled out for special attention as both particularly desirable, and which nobody had managed to attain. Smith suspected it was impossible.
We here attain the holy grail, indeed we have found 3 different holy grail voting methods. However, our first and simplest holy grail voting method comes about in a rather unexpected and somewhat nasty way.
All our previous failed attempts to design holy grail voting methods had involved rather "natural" and "nice" Q, which could be evaluated from the V ballots and the W winning candidates via a straightforward polynomialtime computation. Indeed, the number of computational steps to compute Q was bounded by an expression of form O(VClogC) at most. Unfortunately finding the optimum parliament, maximizing Q, was shown to be NPhard – and not only for our Q's, but also for essentially any voting method achieving "optimal" proportional representation (PR), for any reasonable notion of "optimal." And this also was shown for various merelyapproximate optimality notions. In other words, brute force evaluation of Q for every possible parliament, then picking the best, is in some sense necessary for any goodperforming PR voting method. There is no quick way to shortcut that to rapidly produce the optimum parliment, and any attempt to try will unavoidably, in at least some elections, produce poor results. (Unless P=NP.)
But we were willing to accept that need for heavy computation, for several reasons:
The first holy grail scheme developed in the present paper, however, makes this situation worse. Its quality function Q no longer appears to be polynomialtime computable! It likely is NPhard to compute, and perhaps even computing it approximately is NPhard. The entire optimization problem (finding the Qmaximizing parliament) is no longer merely an NPhard task, it instead lies in the complexity class NP^{NP} (next level up within the "polynomial hierarchy"). Once the optimum parliament is found, it is neither easily verified nor easily refuted, albeit it still should be a lot easier to compute its Q value than it is to perform the full optimization.
This all was rather annoying and led us to wonder whether this "next level" computational complexity was somehow inherent and unavoidable for any holy grail voting method.
We then devised a second holy grail method which made us suspect that it wasn't. Specifically, we showed its Q function was evaluable to Ddecimal accuracy a number of arithmetic operations bounded by polynomial(V,W,D,T), where T seemed plausibly likely to almostalways be upper bounded by polynomial(W).
Finally, we found a third holy grail method with Q function genuinely evaluable in time bounded by polynomial(V,W,D). With this voting method optimizing Q once again is a mere coNP∪NP problem.
But unfortunately, this third method is complicated to explain and to program for a computer, and the polynomials governing its runtime and memory usage both are rather large. Our second method is probably the most feasible for practical use, but it is very complicated to explain.
It unexpectedly turned out to be crucial for inventing all of these schemes to study several interesting kinds of "transforms" which convert multiwinner voting methods into different ones. CROSSREFS?? In the previous papers we already discussed the "KotzePereira transform" of a set of scoreballots into a weighted set of approvaltype ballots, and "coinflip approval" transforms which convert a set of approvaltype ballots into a different weighted set of approvaltype ballots. (Despite the latter's name, both are wholy deterministic.) Both shall be used here. Indeed, we shall focus entirely on the problem of inventing voting methods using approval ballots, because scoretype ballots can then be handled via an initial KotzePereira transform. And for our second holy grail method we shall also employ the new "infinitecloning transform," which seems the most remarkable among these ideas.
So the holy grail problem now seems completely solved from the standpoint of theory, but in practice our solutions may be unacceptable in the sense that hardly any politicians and voters would be willing/able to understand them, and/or the computational memory and/or time complexities would be too great. That quite likely would render them, as a practical matter, unenactible.
If so, the question for future authors becomes: is there some holy grail voting method enjoying both the simplicity of description (or simpler!) of our first solution, and better computational efficiency than we've so far managed to attain?
Algorithm definition:
1. Each voter, as her ballot, approves or disapproves each candidate.
2. Let Q=0. Initialize S as the number of seats that currently are unfilled, and let the common initial weight of each of the ballots be 1 (unless they were already weighted a priori). These weights will remain nonnegative throughout the process.
Also fix a constant Δ>1; here Δ=1/2 seems appropriate for SainteLaguëlike PR and Δ=1 for d'Hondtlike.3. Loop:
 Elect the (asyet unelected) candidate X with the greatest weighted approval (and let that weighted approval be A). Meanwhile a hypothetical universallyapproved candidate would have weighted approval U with the same ballot weightings.
 Q←Q+mono(A); If S≤1 then stop.
 If A>0 then: Multiply the weights of each ballot that approved X by F, where 0≤F≤1 is given by
F_{1} = max(1h, ½) where h=A^{1}U/(S+Δ). Possible variant Fformulas: Four inequivalent alternatives which also work (we'll discuss below how they were obtained; see also F_{1b}, F_{2b}, F_{3b} for three further possibilities) areF_{2} = 1h if h≤½, else [4h]^{1}; F_{3} = 1h if h≤½, else 4^{1}+[4h]^{2};
F_{4} = max{ 1U/([S+Δ]A), (UA)[S+Δ]/(2[S+Δ1]UA[S+Δ]) }; F_{∞} = [1+A^{1}U/(S2+Δ)]^{1}. Decrement S.
EndLoop.
Remarks. Q is unused and hence its update in step 3.2 and initialization in step 2 both may be jettisoned. The purpose of Q is, at algorithm termination, to provide a numerical measure of the "quality" of the election result, and different such qualitymeasures can be got by use of different "mono(x)" functions. Here mono(x) is assumed to be some particular continuous monotone function of x≥0, most simply mono(x)=x, but we also could consider, e.g, mono(x)=x^{1/2}. For purposes of the present section (since the algorithm does not ever use Q) it does not matter what mono(x) is. However, later we shall have purposes in mind for using Q, and at that point we shall care.
In step 3.3 the test "if A>0" can be jettisoned if we know a priori that every candidate is approved by at least one voter (e.g. by themselves). Such elections have been called "narcissistic."
The main theorem about MSPAV is that it obeys "strong PR." That is:
Terminology note: "Partisanship" may be a more appropriate descriptor than "racism." (And, e.g, "racist voting"↔"partyslate voting," etc.) We chose the "colored" terminology to emphasize the point that "color" could be anything – e.g. gender, militarism – and does not necessarily correspond to political party identity. Indeed, the goal is to design voting methods that work regardless of whether political parties even exist, and the definition of "work" similarly does not require political parties to exist.
Proof. The key is the formula for F. It was obtained by solving
Here the lefthand side is the number of seats (from among the S seats remaining to be filled)
that would be proportionately "deserved" by the faction supporting the elected candidate X,
under maximalracism conditions. The primed quantities A' and U' denote A and U
after the deweighting by multiplying the weights of all Xapproving ballots by F;
the point is that
Unfortunately, this naive formula can become negative, which occurs when no candidate has enough approval to "deserve" a seat, but nevertheless we must elect one, so we do. Negative reweighting factors seem to be something we should avoid, since otherwise the algortihm will start awarding seats with the goal of intentionally hurting factions, without accomplishing anything positive. (If Fvalues always are nonnegative, then given that all approvals begin nonnegative, they will stay that way forever.) Even F=0 seems something we usually want to avoid, because we want to avoid situations where various factions have zero weighted approval, so that their approval will stay zero forevermore, giving us no way to decide which among them should win the next seat.
So the question is how best to modify the naive formula in those problematic regimes. Here are some desiderata (for the following discussion regard Δ=0; we will discuss nonzero Δ later):
Also, note that rather than needing a fraction 1/S of the V voters to approve you
(in maximal racism basicPR situations) to "deserve" one of the S seats,
one arguably "deserves" a seat with
only
Finally, we note that MSPAV will always elect universallyapproved candidates in preference to any candidate approved by only some of the voters, and whenever it does so, then all ballots are deweighted by the same factor F. Therefore, MSPAV obeys not just basic, but actually strong PR. (Actually, of course, it does not matter for this purpose what deweighting factor F>0 is used whenever a universallyapproved candidate is elected.) Q.E.D.
Remarks: Note F_{1} is continuous, positive and nonincreasing for all real h. The formulas F_{2} and F_{3} similarly were devised also to equal 1h when 0≤h≤½ (i.e. for factions naively deserving ≥2 more seats) but now also enjoying continuous hderivative for all real h. Note both F_{2} and F_{3} are positive and decreasing for all real h, and F_{2} goes to 0, but F_{3} goes to 1/4, when h→∞. The differences between these formulae hopefully do not matter in the sense that in "racist voting" scenarios they only change seat counts for color classes which "deserve less than 2 seats." In other words, proportionality remains valid up to acceptable errors with one formula, if that were true with another. The above formulasurgeries F_{1}, F_{2}, F_{3} all were devised based on h=½ (i.e. 2 naively deserved seats) as the breakpoint; we also could consider the alternative breakpoint "3/2 seats" in which case those formulae would instead become
The formula F_{∞} is even smoother, namely analytic in h; it is asymptotic to F_{1} when S→∞, and equals it when h=½, and F_{1}<F_{∞}<1 when 2Δ<S<∞ and 0<h<½. Unfortunately F_{∞}∉[0,1] when S2+Δ<0. Therefore, if you want to use the F_{∞} variant, insist that Δ>0. With that restriction, every deweighting that can actually occur (i.e. those with S≥2) will obey 0<F_{∞}<1.
Among these Fformula choices, Smith personally currently prefers formula F_{4}??
The main algorithmic differences between MSPAV and RRV (reweighted range voting, a different "sequential" proportional representation voting method which had been invented independently by both Simmons and Smith in the early 2000s – but which, we later found out, in an approval voting special case already had been invented in 1895 by T.N.Thiele) – are
RRV uses a quite different ballot weighting scheme; and RRV does not need to know in advance how many seats are to be filled – it just fills them one by one until we tell it to stop. This is since MSPAV fills the seats in "backwards" order (decrementing S as it goes), if RRV is regarded as filling them in "forwards" order.
Runtime: With V>1 voters, C candidates, W winners, 0<W<C, MSPAV will run in O(CVW) steps using O(CV) words of memory. If C is less than or equal to the word size (in bits) of the computer, then O(V) words of memory suffice.
1. Each voter, as her ballot, scores each candidate with a real number lying in the interval [0,1]. (Greater scores for morepreferred candidates.)
2. Convert the scorestyle ballots to a weighted set of approvalstyle ballots via the KotzePereira transform.
3. Elect parliament via MSPAV.
StrongPR theorem. MSPRV obeys basic PR. Further, if there also exist uncolored "commonly rated candidates" who are scored the same by all voters (i.e. the score for each depends only on the candidate, not on the voter) then the colored subset of the parliament will have the same color composition as the electorate (up to small errors forced by integer roundoff requirements, and provided enough candidates of each color run).
Proof.
For this kind of "maxracism plus commonlyrated candidates" election,
the KotzePereira transform
converts the ballots,
for an uncolored candidate X with commonscore Y,
into a weightfraction Y of ballots which approve X,
and a weightfraction
Even more strongly, MSPRV enjoys "subfaction proportionality." Suppose the Red Party has factions inside it. Each faction scores candidates with nonRed colors, zero. (Also all nonred colored voters score all flavors of Reds zero. We also permit uncolored commonly rated candidates.) But the different Red factions score various Red candidates differently. In that case, MSPRV will, if we consider the Redportion of parliament alone, elect the same proportions of each kind of Red MP, as it would have done if the Red voters were the only voters and the Red candidates the only candidates. (As usual, only up to integer roundoff and provided enough candidates of each type run.)
Proof. This is because all elections of nonRed candidates deweight all Redvoter ballots equally; and since all elections of Red candidates deweight all nonRed ballots equally. Hence withinRed proportionality is unaffected by nonRed voters, and colorproportionality within nonRed colors, is unaffected by Red voters. Q.E.D.
Monotonicity property: If a voter raises her score for some candidate X (leaving all her other scores unaltered), that can cause X to be elected, but cannot cause X to stop being elected.
Proof. X's (weighted) approval increases, while other candidates' approvals are unaltered. That can cause X to get elected in some MSPAV step 3.1. Q.E.D.
Note re "phantom voters": We can regard any election as having more voters, all of whom score all candidates 0. If some of those "phantom voters" increase a score for some candidate, then it is as if a new voter appeared. And monotonicity applies to those "appearances."
Singlewinner case: MSPAV if the parliament only has 1 seat, is just approval voting; meanwhile MSPRV is just range voting.
In MSPAV (and hence in MSPRV) the candidates are elected sequentially greedily, i.e. in step 3.1 always electing the available candidate that increases Q by the most. However, if our goal were to maximize the final value of Q, we might be able to do better by instead in step 3.1 picking some nongreedy choice, hoping that this would permit later seatelections to increase Q by more than enough to compensate. Or, conceivably, electing the same parliament but in a different (nongreedy) timeorder also might allow increasing Q.
Also note that if in some MSPAV step 3.1 we did not elect the greedy candidate, but rather somebody else, that would still be ok in the sense that the F formulas would then automatically perform the correct reweightings to make the algorithm strive to restore proportionality – and indeed will always succeed – i.e. still will generate a strong PR parliament given a racism+uncolored voting scenario – provided the artificiallyforced MPs do not create overrepresentation of some color thus making that impossible.
To pursue this idea, consider the following algorithm, which given a set of weighted approval ballots (or, via the KotzePereira transform, scorestyle ballots) and a tuple
of the W members, listed in some specific order, of a putative parliament, computes a realvalued quality measure Q for that orderedparliament with those ballots. (Note, this Q can depend on the ordering.)
Qcomputing algorithm: Input is V approvalstyle ballots and a putative ordered Wmember parliament (M_{W}, M_{W1}, ..., M_{2}, M_{1}).
1. Let Q=0 and S=W and let each ballot initially have weight 1 (unless they are already weighted a priori). Fix a constant Δ>1; here Δ=1/2 seems appropriate for SainteLagüelike and Δ=1 for d'Hondtlike PR behavior.
2. Loop:
 Suppose M_{S} has weighted approval A, while a hypothetical candidate approved by every voter who approved any member of parliament would have weighted approval U with the same ballot weightings.
 Q←Q+mono(A); If S≤1 then return Q and stop.
 If A>0 then: Multiply the weights of each ballot that approved M_{S} by F, where 0≤F≤1 is given by the same formulas stated in MSPVA step 3.3.
 Decrement S.
EndLoop.
We assume that mono(x) is a known continuous function monotonic on reals x≥0.
The first holy grail election method, then, is to elect the ordered parliament (selected from among all possible Wmember parliaments and all W! possible orderings of each parliament) that, if mono(x) is monotonicincreasing, maximizes Q. [If, however, mono(x) were monotonicdecreasing, then it instead would be appropriate to minimize Q. But to keep the present arguments simple we shall, without loss of generality, assume mono(x) is increasing and Q is maximized – because otherwise we could simply negate both mono(x) and Q.] Different quality measures Q result from different mono(x) functions.
What properties does this election method have?
Was that too facile? The above reasoning has (dangerously) hidden the effect of the KotzePereira transform. We claim the above proof was ok provided all Holy Grail ballots were approval style, i.e. there was no KotzePereira transform. But one might worry the KotzePereiratransform is potentially ruining both parts 1 and 3 of the proof. Example: with the KP transform, as you raise X's score above Y's on your ballot:
The Yshrinkage phase is worrying re proof part 3. (The others are ok.) But this is ok; the summedweight staying fixed means the deweighting effect of Y caused by X's election (with greater F), stays fixed thanks to the following identity
which is independent of A. Here (1r/A)A is the naivelydeweighted approval A for X (actually for XY before the crossover) while (qA) is the approval for Y (but for XY after the crossover) with the sum (qA)+A remaining fixed. Our point is that the quantities here denoted r and q both stay fixed as some Xscore rises during the precrossover phase. Therefore qr stays fixed. Therefore the net effect of both changing the ratingsstyle ballots (via a continuous motion over a positive timespan), and electing X, and (therefore) deweighting, is to leave the net weighted approval for Y unaffected – as we should.
So this identity proves the theorem, at least when the naive reweighting factor formula is being used. "Surgical alterations" of the reweighting factor formula could alter that – but we assume those only are invoked (and only can be invoked) once all parties have already gotten all the seats they "deserve." In that case violations of strong PR are bounded by the number of seats that can be awarded to a party after that point. If our "surgical alterations" were designed keep that number small, e.g. bounded by some absolute constant, in all "racist voting plus commonly rated candidates" scenarios – or if we do not care about violating strong PR in such situations – that is ok.
So the proof is now complete.
Runtime: With V voters, C candidates, W winners, 0<W<C, the Holy Grail scheme, assuming approvalstyle ballots, can be made (using "incremental" algorithm techniques, cf. Arndt 2011) to run in O(W!V) steps using O(V) words of memory to compute Q for any given Wmember ordered parliament; and hence by brute force consideration of every possible ordered parliament in O(C!V/(CW)!) steps to find the optimum (Qmaximizing) parliament.
With scorestyle ballots, the KotzePereira transform requires an initial O(VC) steps to convert them into ≤min(VC,2^{C}) weighted approvalstyle ballots.
How outrageously slow is that? We would prefer it if the Holy Grail for approvalstyle ballots somehow could be made to run about W! times faster, i.e. if the Qcomputing algorithm had runtime≤polynomial(V,C). Call that desire a polynomiallyefficient holy grail. (We'll achieve it later, but for the moment we want to demonstrate that it is unclear whether it is possible.)
We have elsewhere presented arguments that for "optimum PR" elections it is inherently NPhard to find an optimum parliament, and indeed even to attain approximate optimality. So a certain amount of brute force computing is necessary to perform optimum PR elections.
However, the above Holy Grail scheme worsens that picture by now employing a Q function (quality measure) which is hard even to evaluate for a wholyknown parliament! We indeed might well speculate that our particular Q is NPhard to evaluate, perhaps even NPhard to approximately evaluate. The basis for the first speculation is that "minimum feedback arc set" (and vertex set) and "max cut" and "sparsest cut," maximum and minimum "linear arrangement," and the "traveling salesman problem" all are known NPcomplete problems that look related to the problem of finding the Qmaximizing ordering of a parliament with the present section's definition of Q.
The basis for the stronger speculation is the observation that known results (Ambühl et al 2011; Crescenzi, Kann et al 1998+; Orponen & Mannila 1990; Engebretsen & Karpinski 2006) indicate that (subject to verybelieved computer science conjectures) there exists a constant κ>1 such that the versions of those problems that correspond best to the Qmaximizing problem, namely minimum feedback vertex and arc sets, minimum linear arrangement, sparsest cut, and traveling salesman problem with edge costs {1,2} only, all are inapproximable to within factor κ via polynomial time algorithms. That vaguely suggests that perhaps the holier grail might be unattainable. Specifically, no matter how you tried to design a polynomialtimecomputable quality function Q, those results suggest it would inherently be unable to approximate the present section's holy grail Q well enough. That suggests the whole approach of trying to emulate our holy grail function, approximately, via something faster, is doomed to failure.
Any polynomialtime efficient Q defining a holy grail voting method, if one exists, therefore would need to have a quality function Q quite different from ours. Which is, in fact, exactly what is going to happen.
Tabulated below are some example elections using MSPAV with best ordering.


In all the above elections there are exactly two political parties, "A" and "B"; and exactly 12 seats in the parliament. We use Δ=½. In the table on the left, we use mono(x)=x and F_{∞}, but in the righthand table we use mono(x)=x^{1/2} and F_{4}.
Each row of each table states one election. The first three columns describe the votes: The first column states the number of voters approving party A. Similarly the second column states the number of voters approving both parties A and B, while the third column gives the number of voters approving B only. (We assume "racist" voting, i.e. each voter who approves any Amember, approves them all; and that there are an unlimited supply of candidates from each party.)
Column 4 gives the best ordered parliament in chronological order (left to right as time increases) of election. Although there may be more than one coequallybest ordered parliament, we only state one.
Columns 5 and 6 give the seat counts for parties A and B in that parliament. Finally, the last column states the value of Q (which in the lefthand table is maximum possible, but in the righthand table is minimum possible, because we only are considering the best possible parliament).
Gratifyingly, in every case with zero voters approving both parties (bottom 6 rows of table, colored orange), as expected, the parliament exhibits perfect PR. That is, the A:B seatratio in the parliament, is the same as the A:B approvalratio in the ballots.
One might like that to be unaffected by the number of "approve both A & B" ballots. I.e,
Possible "crossparty approval independence" (CPAI) property: Adding allpartyapproving ballots to an election ought not to affect the seatcount ratios amongst the parties (assuming otherwise "racist" partyline voting).
But arguments can be made that CPAI is an undesirable property. The question is debatable:
"Crossparty approval diminution" (CPAD) property: Adding allpartyapproving ballots to an election either ought not to affect the seatcount ratios amongst the parties (assuming otherwise"racist" partyline voting) or it ought to diminish ratios that had exceeded 1.
CPAI is achievable. For example Thiele's method, also called "reweighted range voting" if scoringstyle ballots are used, achieves CPAI. But at this point is is unknown whether CPAD is achievable in combination with the "holy grail" properties.
However, evidently, in the lefthand table, the ratios are affected, i.e. CPAI is violated (although never by more than 1 seat per party in the examples tabulated); and more disturbingly, those effects have the "wrong sign" – adding "approve A&B" ballots sometimes magnifies the seatcount ratio for the stronger versus the weaker party, whereas one would intuitively desire either no effect (CPAI), or a diminution. Thus the lefthand table also demonstrates CPAD violations.
However, by replacing mono(x)=x by mono(x)=x^{1/2} and F_{∞} by F_{4} (righthand table) the number of violations of the CPAI property is diminished; indeed we get perfect CPAI PR except for two elections in which we have CPAI to within ±1 seat errors in both cases, and the discrepancies from CPAI all have the "correct sign" i.e. obey CPAD. But bizarrely, the symmetric A=1, A&B=10, B=1 election then prefers to elect 7 A's and 5 B's (or the reverse – 7 B's and 5 A's – these two parliaments are coequally optimal) breaking the symmetry. This suggests the resulting system still violates CPAD, even though, technically, there are no CPAD violations in the righthand table above.
Strangely enough, in every election shown, either the righthand or lefthand election method delivered perfect CPAI PR, but sometimes not both.
Now for a further investigation. In the table below, the same two sets of voters elect parliaments with 6, 12, 18, 24, and 30 seats, each time using F_{∞}, Δ=½, and mono(x)=x. (Also note, in every case, if the "A&Bapprove" bipartisan voters were removed, then we'd have perfect PR, i.e. exactly the same votes ratios as seats ratios. Those are not shown; just trust me about that.)
A  AB  B  best ordered parliament 
seat counts A, B  Q  seat counts A, B with F_{4} and mono(x)=x^{1/2} 

5  10  1  AAAAAB  5, 1  48.1093  5, 1 
1  10  2  BBBABA  2, 4  38.7278  3, 3 
5  10  1  AAAAAAAAABAB  10, 2  90.7611  11, 1 
1  10  2  BBBBBBABABAB  3, 9  72.9874  4, 8 
5  10  1  AAAAAAAAAAAAAABABA  16, 2  133.449  16, 2 
1  10  2  BBBBBBBBBABABABABA  5, 13  107.251  5, 13 
5  10  1  AAAAAAAAAAAAAAAAAABABABA  21, 3  176.126  
1  10  2  BBBBBBBBBBBABABABABABABA  7, 17  141.501  
5  10  1  AAAAAAAAAAAAAAAAAAAAAAABABABAB  26, 4  218.778  26, 4 
1  10  2  BBBBBBBBBBBBBBABABABABABABABAB  8, 22  175.756  8, 22 
The CPAD property evidently is violated in all nonorange elections tabulated, but the violation is only by 1 seat extra except in the bottom line of the table (30member parliament; colored aqua) where the violation is by 2 seats: (8,22) versus (10,20). Also, the rightmost column of the table shows what would happen to the seatcounts if we changed the election method to use F_{4}, Δ=½, and mono(x)=x^{1/2}. Note that CPAD still would be violated.
Note re runtime: For elections of the above kind featuing K parties and completelypartisan voting, it is trivially possible to maximize Q and find the "best" parliament in only order K^{W}V steps rather than W!V. It is an open question whether such a speedup is possible for entirelygeneral elections.
Now let us redo something much like our original 12member parliament, 2 parties study, but now for a 24member parliament. MORE???
Here is a straightforward attempt to generalize MSPAV to create a holy grail voting system whose quality function Q is evaluable in polynomial time. As we'll see, this attempt fails – but the explanation of why and how it fails will be useful mental preparation for the next section.
Algorithm for computing a quality function Q for a parliament α using a weighted set of approvalstyle ballots β:
In this algorithm Δ is a constant, e.g. Δ=½. Incidentally, if desired (and if Δ≥0) we can agree to prenormalize the ballot weights so that they sum to 2W^{1}(W+Δ)/(1+W+2Δ) where W is the number of seats being elected, i.e. the cardinality of the input parliament α. This normalization ensures (a priori) that the final value of Q will obey 0≤Q≤1, with Q=1 happening only if every ballot approves every candidate in α.
Derivation of that normalization: Let Z be the sum of the ballotweights. If all ballots approve all candidates then the deweighting factor F isF[S]=11/[S+Δ] ifS+Δ≥2 , which in turn is assured ifΔ≥0. Here we have written F[S] to make it explicit that F depends on the number S of seats remaining to be filled. These F[S] are used for S starting at W (the number of seats in parliament) and going down to 2. ThenQ = Z + Z F[W] + Z F[W] F[W1] + ... + Z F[W] F[W1] ... F[3] F[2]. Now sum the series to findQ = Z+Z∑_{2≤k≤W}∏_{k≤S≤W}(11/[S+Δ]) = Z+Z(W1)(Δ+W/2)/(Δ+W). Hence to cause Q=1 we needZ=(Δ+W)/(Δ+W+[W1][Δ+W/2]) which simplifies toZ=(2/W)(W+Δ)/(1+W+2Δ) valid if Δ≥0.
This quality function obviously is computed in O(W^{2}V) steps if there are V>1 voters and W≥1 winners. This is polynomial time.
And the election method consisting of electing the Wwinner parliament that maximizes Q indeed will obey some of the Holy Grail desiderata:
Unfortunately, this quality function Q fails to yield a holy grail voting method, because it is discontinuous. Consider E>1 parliamentarians whose weighted approvals are exactly tied. Now consider breaking the tie by, e.g. infinitesimally reducing the approval for one while infinitesimally increasing it for another. In that case, the value of Q will, in general, jump discontinuously versus if the tie had been infinitesimally broken in a different manner.
THIS SECTION NOT YET FINISHED??
Our approach to create a (genuine) holy grail scheme will be to use the failed scheme above, but try to convert its quality function Q to become continuous without sacrificing either its desirable properties or its polynomialtime algorithmic efficiency. There is an interesting and fairly general looking transformation we can employ for that purpose.
Mfold cloning transformation. "Clone" each member of parliament M times (getting a new parliament with MW seats instead of W). (All clones receive the exact same approvals on ballots.) Call the resulting quality function Q_{M}.
Now consider the M→∞ limit of some postprocessed version of Q_{M}, such as (Q_{M})^{1/M} or Q_{M}/M.
The hope is that (i) some such "clone+postprocess+limit" transformation converts a discontinuous Q into a continuous one, and (ii) while still preserving the polynomialtime computational complexity of our Q. Obviously, in general these hopes are not necessarily going to be satisfied, and a certain amount of creative art is needed to make (i) and (ii) happen – but it often seems plausible that such an attack will work.
To explain why this could hope to yield a continuous Q even though the original untransformed Q is discontinuous: Suppose there are two candidates with the first having a lot more approval then the second. The chronological election order behaves like 1111122222 and if the second's approval increases then maybe 1121212122, and if it increases further then maybe 2212121211, and after still further increase maybe 2222211111. Here we have shown Mfold cloning with multiplicity factor M=5. So in this way, if M→∞ we could hope to get a continuous quality function, where we do not just discontinuously jump from a 12 to a 21 election situation, but rather pass through an infinity of intermediate states.
But then what if there is a threeway near tie among candidates 1, 2, and 3? We then realize the behavior could get quite messy...
Now let us discuss how we are going to apply this whole idea to the specific "failed" Q above. Our examination will start out abstract but we shall make it more specific toward the end by working out the particular formulas we need.
It is more useful to view everything not as Mfold cloning with M→∞, but rather as continuumdivisible candidates using continuum time.
That is, an election might proceed as follows. Candidate 1 is the most approved. We elect 0.37 of him, causing, via "fractional deweighting" of his ballots, his approval to drop to become exactly equal to candidate 2's. Now we fractionallyelect both candidate 1 and 2 at relative "election rates" of 0.53 and 0.32 (causing their deweightings to remain exactly equal to each other as time goes by), and keep doing that for just the right amount of time to cause both their approvals to become exactly equal to candidate 3's. Now we fractionallyelect both 1, 2, and 3 at relative rates... until candidate 1 drops out of the picture since he has now been fully elected, now only 2 & 3 continue on being fractionally elected. Etc.
This then is a continuumtime election process with a finite number of possible "states." Transitions between the states occur whenever two candidates' approval levels become equal (approval levels change continuously with time) or whenever a candidate becomes 100%elected (his electionfraction also changes continuously with time).
Each time such a statetransition occurs we need to solve some set of simultaneous equations to determine the new "election rates" for whichever candidates are then involved, and then solve some other equations to determine how long to proceed until the next statetransition occurs.
WRONGNESS NEEDS TO BE FIXED??:
The total algorithmic runtime should then be
The number T of possible algorithmstates is at most 2^{W} where W is the number of winners, since this counts the number of possible subsets of the W members of parliament. But note that if a candidate gets incorporated into the current state, then he necessarily stays in it until he becomes fully elected, whereupon he leaves it, never to return. This "once in, once out" behavior means that the possible statetransitions are not arbitrary and there cannot be anywhere near as many as 2^{W} when W is large.
How many states can arise? The answer is the solution of this combinatorial problem. Get out your pencil and draw W different horizontal line segments like this:
** ** **
(here shown with W=3 representing the timeintervals when each of the W members of parliament is involved in the algorithm's "state"). Then ask: what is the maximum possible number of different "states" (i.e. different intersections of these W line segments with a rightwardmoving vertical scanline representing "time") that can occur in any such drawing? If the answer is called T(W), then we find
and T(W+1)=2+T(W) because each new linesegment (at worst) splits the states occuring at each of its two endpoints. Hence T(W)=2W+1.
This is indeed much slowergrowing than 2^{W} and ensures the whole algorithm will run in a number of steps bounded by a polynomial.
Excellent.
Now to make this efficient, we need the simultaneous equations to be linear equations (efficiently solvable via Gaussian elimination); or if they somehow are solvable via linear programming (or semidefinite or convex programming) that also would be acceptable, since those also have polynomialtime algorithms. The point is, we need them to be solvable more efficiently than general simultaneous nonlinear equations, because for them only exponentialtime algorithms are known.
The other kind of equation – to determine the timespan until the next statetransition – is trivial because it involves only one variable, so some sort of binary search procedure (or any other efficient 1dimensional rootfinding scheme; see Press et al and note the monotonedecreasing nature of anybody's weighted approval causes such roots to be unique) will solve it to high accuracy fast.
All that has been a somewhat abstract proof sketch telling us whether and when our whole "Mfold cloning in the M→∞ limit" idea can be made to work to yield a polynomialtime algorithm for evaluating a nowcontinuous parliamentquality function.
We now shall make this proof more concrete and less abstract by working out what, exactly, all the equations are for our particular application.
Iterated deweighting. Let us consider the F_{1}, F_{2}, and F_{3} deweighting formulas, all of which, we remind the reader, are the same if h≤½, namely
were A is the weighted approval of the candidate currently being elected, U is (with the same ballot weightings) the weighted approval of a hypothetical universally approved candidate, S is the number of seats remaining to be filled (just before our candidate is elected), and Δ is a constant such as 1/2.
The effect of deweighting using this F is to replace A by A' and U by U' and finally S by S1 where
Letting B=A/U and B'=A'/U', we also find
Now let us iterate this replacement K times, causing the replacements U→U^{(K)} and B→B^{(K)} where for example B^{(0)}=B and B^{(1)}=B'. Writing J=SΔ for brevity, the U result is
because the product telescopes. The first few B iterates are
making it clear that the general B result is
which is readily proved by induction via
Of course, the above derivation assumed h≤½ and hence those iteration formulas break down if h>½. But this regime also can be handled. It is simplest to do so if we are using formula F_{1}=max(½,1h), because then the Kfold iterated deweighting factor F is simply 2^{K}. In this way, the general notion of "iterating the F_{1} deweighting formula K times" can be defined, and efficiently computed, for any real K. One would need to use an efficient 1dimensional rootfinder to find the "critical" least K, call it K_{crit}, causing h to rise to ½. (Again note the monotonedecreasing nature of anybody's weighted approval causes such roots to be unique.) If the desired K is greater than K_{crit}, then cut the Kfold iteration into two parts: before and after this transition, and perform them each separately.
The resulting formulas place us in a position to consider fractional iteration, i.e. the notion of "electing somebody K times" where we now allow K to be a noninteger real number. The effect of that is to replace U→U^{(K)} and A→A^{(K)} and S→SK, and each ballot that approved him is deweighted to A^{(K)}/A times its original weight, while the weight of each ballot that disapproved him is unaltered.
[More generally, a theory of "fractional iteration" was devised by Ernst Schröder in 1871 and is discussed at length in the book by Kuczma et al. However, we do not need general Schröder theory to handle F_{1}.]
We also can consider the notion of fractionallyelecting a set of E candidates who all are tied, i.e. all have equal weighted approval. To do so, we ε_{1}elect candidate 1, then ε_{2}elect candidate 2, ... then ε_{E}elect candidate E, then repeat this cycle, where the ε_{j} all are positive infinitesimals. The ratios of the ε_{j} are the relative "election rates" of these E candidates, and need to be chosen in such a way as to cause all their weighted approvals to remain tied as we proceed. That may be accomplished by solving a set of E simultaneous linear equations.
In general, the
ε_{j}
will be unequal, but
if E=2 then we enjoy equality ε_{1}=ε_{2};
and in any "no overlap"
situation where no two current candidates are simultaneously approved by any voter,
we also would enjoy equality
Determining the ε_{j} may be accomplished via Gaussian elimination in O(E^{3}) operations, plus the time required to write down the system of equations in the first place, which obviously is at most O(VW). Once the ε_{j} have been found, performing the process is simply a matter of solving a system of firstorder ODEs (ordinary differential equations) to track the timeevolution of the ballot weights for the R relevant kinds of ballots. Here of course R≤min(2^{E}, V) because there are only V ballots (since V is the number of voters) and because a ballot could approve any subset of those E candidates, and there are 2^{E} such subsets.
If ODEsystem solving is regarded as efficient enough to get enough accuracy, and we don't worry about very unlikely ultranearties where you would need extreme accuracy, then this shows there is an efficient algorithm. We should really write down the system of equations and the ODe system fairly explicitly... LITERATURE: I think KerI Ko has a book where he shows ODEsolving is a polytime task within his notion of "polynomial time computable real numbers." [Specifically the initial value problem y'=f(x,y) with y(0) specified, is computable function if f is and if solution unique. If furthermore f is Lipshitz with respect to its 2nd argument, then y is Pspace computable but this can be Pspace complete.] Have to look that up. See also Bournez, Graca, Pouly 2011 & 2016 and Smith ????. In our case the "stiffness" of our ODE system is less than E, and presumably additive errors ε with 1/ε bounded by a polynomial in V and C would be acceptable, which should be achievable using plain 4th order Runge Kutta as in the Press book... albeit with fancier high order ODEsolving schemes probably exponential accuracy would be achievable... So I think this will prove we indeed have gotten it into polynomial time, with approriate definitions and caveats... Albeit... it's a rather horribly messy algorithm...
Algorithm definition: For a Vvoter, Ccandidate, Wwinner election, with 1≤W<C and 6(W+1)W<V, this algorithm will compute a costfunction Q of the ballots and the Wwinner parliament. The election method then is to elect the Wmember parliament having minimum cost.
1. [Input and preprocess votes] Each voter, as her ballot, approves (1) or disapproves (0) each candidate. We allow arbitrary positive voter weights; most simply all voters v have weight ρ_{v}=1. If scorestyle ballots instead were provided, then convert them to weighted approvalstyle ballots via a preliminary KotzePereira transform; then let ρ_{v} denote the weight of voter v's ballot.
2. ["Approval removal"] The approval by voter v for candidate c is thus a boolean quantity (1 or 0). Replace it by X_{vc}. Here the X_{vc} are VW different realvalued variables each constrained to lie within the real interval [0,1] if v approved c, but constrained to equal 0 if v disapproved c. Their values will be explained/determined later (via the costminimization in step 5).
3. [CFAT] Apply a "coinflip approval transform" (CFAT) to convert the resulting set of weighted partialstrength approvals, to a different such set. The simplest CFAT is that each voter is split into two voters, each with halfweight: the first approves candidate c (with strength X_{vc} as before) if the original voter did; the second simply disapproves candidate c. This splitting is repeatedly done for each member c among the W members of parliament, so that at the end of the process each voter is split into 2^{W} voters, each weighted 2^{W} times the original voter's weight.
Fancier kinds of CFAT also are possible involving "biased coins," where the weights ρ_{a} and ρ_{b} of the two daughtervoters a,b are no longer equal, but instead are prespecified positive multiples of the original weight ρ_{v} of the voter v, withρ_{v}=ρ_{a}+ρ_{b}. Also possible are certain cleverer kinds of "coins" which replace the exponentially growing 2^{W} with certain quantities bounded by polynomial(W). See the text for descriptions of how those work and their properties.4. [Extra fake voters] Adjoin 3W fake voters, each of whom approves every candidate. Equivalently but more simply, a single fake voter v weighted ρ_{v}=3W will do, such that X_{vc}=1 for every candidate c. (It also is possible to omit this step by replacing it with various other devices.)
5. [Compute cost function] Compute the "Ebert cost function" Q as follows. Let A_{c} denote the sum, over all voters v, of the ρ_{v}X_{vc}weighted approval for candidate c, that is A_{c}=∑_{voters v}ρ_{v}X_{vc}. Let load(v)=∑_{winning c}X_{vc}/A_{c} be the sum, over all W winning candidates c, of X_{vc}/A_{c}. Finally define Q=min∑_{voters v}ρ_{v}load(v)^{2}, where the minimization is done over all possible choices of the VW variables X_{vc} subject to the constraints
0≤X_{vc}≤1 and X_{vc}=0 if v disapproved c and ∑_{v}ρ_{v}X_{vc} ≥ 3W for each winning c. The text will explain how this minimization problem can be solved to Ddecimal accuracy in polynomial(V',W,D) steps where V' is the number of distinct votertypes after the CFAT transformation in step 3 and fakevoter adjoining in step 4. It will be seen that the minimum always is unique if (i) it exists and (ii) not all members of parliament were approved by exactly the same sets of voters. It will exist if all W members of parliament would have got at least 3W approvals if there had been no removal – i.e, because we have written in step 4, always.
Lemmas about concave∪ functions: A smooth function of n variables is "concave∪" if along any line in nspace, its second derivative is nonnegative (or, for "strict" concavity, positive).
Lemma 1: 1/(x_{1}+x_{2}+...+x_{n}) is concave∪ wherever its denominator is positive. This concavity is strict along all lines except those perpendicular to the (1,1,...,1) direction. Essentially because: the second derivative of 1/(K+x) with respect to x is 2/(K+x)^{3}.
Lemma 2: When x and A+x are positive, x/(A+x) is not a concave∪ function of x. [Its second derivative is 2A/(A+x)^{3}.] However, its square is concave∪ wherever 0<2x<A. Because: Its second derivative is 2(A2x)A/(A+x)^{4}.
Lemma 3:
Indeed
Lemma 4: [x/(A+x)+(W1)/(B+x)]^{2} is a concave∪ function of x wherever 0≤x≤1 and 3≤3W≤min(B,A)+1. Essentially because: Its second derivative is
and the first term is a square (hence nonnegative) while the second is a product of two terms, each nonnegative under our assumptions.
Lemma 5:
The square of a concave∪ function is concave∪ wherever the original function
was positive.
Because:
The second derivative of (½)f(x)^{2} is
Lemma 6: The sum, and indeed any linear combination with positive coefficients, (and also the max), of concave∪ functions also is concave∪.
Lemma 7: if f(x) is a concave∪ realvalued function and g(x) is a linear transformation (x and g can be vectorvalued, not necessarily with the same dimensionalities), then f(g(x)) is concave∪. If, furthermore, g is invertible [more precisely, if x can be backdeduced from g(x)] and f is strictly concave∪ then f(g(x)) is strictly concave∪.
Now by combining lemmas 17, noting that the CFAT is a linear transformation, and doing a little thinking, we obtain the
Concavity Theorem: Our "sum of squared loads" cost function Q is a concave∪ function of the WV "approval removal" variables X_{vc} in the domain where 0≤X_{vc}≤1 and where each of the W winning candidates gets at least 3W approvals after all "approval removals," i.e. A_{c}≥3W for each winning candidate c. (Indeed, each squared load is individually concave∪.) The concavity of Q is strict provided that not all members of parliament are approved by exactly the same sets of voters.
Because of the concavity theorem, it is possible to compute the optimum (i.e. Qminimizing) choice of the X_{vc}'s, for any specified Wmember parliament, subject to our linear inequality constraints, to Ddecimal (or better) accuracy via known "convex programming" algorithms such as the "ellipsoid method" in time bounded by a known polynomial(V',W,D). Due to the strict concavity, this optimum will automatically be unique.
That allows us to compute our cost function Q in polynomial time, with the proviso that there might be no solution of the convex programming problem. For example, if some member c of parliament has less than 3W approvals, then there clearly cannot be a solution. In such a case our algorithm must declare failure. Indeed, if fewer than W candidates got ≥3W approvals, then no possible Wmember parliament would have a computable Q, which would be complete failure. However, the purpose of putting step 4 in the voting method was to prevent this problem from ever occuring.
Was our insertion of step 4 legitimate? Let us explain this in more detail. If we insist that candidates must be approved by at least 3W voters, even after all approvalremoval, in order to win a seat, then at least one parliament meeting the conditions for Qcomputability will always exist provided there exist at least W candidates each of whom got at least 3W approvals with the CFAT, but without doing any removals. Without step 4, if enough voters hated enough candidates, then it would be possible for no parliament to exist meeting our conditions. However, in that case, at least for elections large enough so that V≥6(W+1)W, it would not matter what we do, because the problematic candidates would all have gotten so few approvals that they do not even "deserve" half a seat (reckoned by Droop quotas) so that their election could not violate basic PR in any "racist voting" situation by more than half a seat per "color." So we could, for example, simply elect the W mostapproved candidates whenever fewer than W had ≥3W approvals. But to treat everything in a simple and unified way we added step 4 to the algorithm, which adjoins 3W fake voters who each approve everybody. If we knew that every candiate got at least one approval (e.g. from themselves) then we could have step 4 add only 3W1. Provided we regard these 3W fake voters as too few to hurt proportionality – they amount to less than half a Droop quota if V>6(W+1)W – problem solved.
Voting system properties. In the previous papers (specifically part II) of this series, we already had examined the "PerEbert" voting system, involving minimizing a "sum of squared loads" cost function, and proved it yielded both basic and strong PR. Indeed, we showed that in "racist voting" basic PR elections, it became exactly equivalent to SainteLagüe partylist.
We further had examined the effect of the CFAT on PerEbert, for a definition of CFAT involving W "coin tosses," each with an arbitrary single fixed "coinbias" value. We proved that the resulting voting system in "racist voting" basic PR elections, with fair coins was exactly equivalent to d'Hondt partylist PR; and for general coin bias yielded a kind of PR corresponding to Smith's new "divisor method" of partylist voting continuously variable between SainteLagüe and d'Hondt by choice of the "coin bias" parameter.
We also had examined the effect of (what we called) "shifts" or "ghost voters" upon this PR. These in the present paper are instantiated by step 4. We had shown that if the number of ghost voters was smaller than order V/W, then their effect on proportionality was small. This is exactly what happens for us if V>6(W+1)W. Further, we showed that the effect of adding order V/W ghost voters perturbed SainteLagüe in the "opposite direction" to d'Hondt (approximately – exact equivalence to d'Hondt party list is unattainable in this manner). Therefore, the PRaltering effect of adding ghost voters in our step 4 will be opposite in sign to the effect caused by the CFAT in step 3 and indeed by appropriately choosing both the number of ghosts and coinbias, these two effects could, if desired, be made (at least within first order approximation) to cancel each other out so that we would still enjoy SainteLagüelike PR. Or by manipulating the ghostvoter count and CFAT coinbias our PR could be tuned to yield other PR notions.
Monotonicity. However, paper II also showed that the PerEbert voting method disobeys "monotonicity." The main innovation here is to realize that the "optimized approval removal" in our steps 2 and 5 both
Weak monotonicity: Voters by adding additional approvals of some parliament, can never increase its cost function Q.
Strong monotonicity: A stronger and more desirable monotonicity statement would replace "can never increase" with "always decreases."
This also is true. The reason is that we've already shown that the optimum of our convex program is generically unique and it is a continuous and smooth, indeed analytic function of its parameters ρ_{v} (essentially because of the well known theorem that the roots of a polynomial are continuous and smooth – and analytic except on a set of lower dimensionality – functions of its coefficients). If our monotonicity were merely weak then an analytic function would have a "flat spot" – an open set in the complex plane throughout which the function assumed a constant value. But by analytic continuation, this is impossible unless the function everywhere equals that constant, which in our problem is plainly untrue.
Approvalremoval does not hurt PR. Approval removal does not destroy either basic PR or strong PR. To see this, it suffices to observe that those PR claims (by their definitions) pertain only to "racist voting" situations with enough candidates of each "color" available; and in those situations – at least if there are no "integer roundoffs" i.e. each party that deserves 6 seats was getting 6 seats, there were never any parties which deserved 5.8 and therefore got 6, i.e. we restrict attention to racist voting elections with all voter "loads" equal – then minimizing cost can be accomplished without doing any removals. In such situations we claim PR still happens by minimizing the Ebert "sum of squared load" cost function, even if optimal approval removal is also happening. Why? Our concavity theorem shows the optimum removal is unique in all noroundoff racist situations with ≥2 colors. So if "no removal" is optimum, then it is the unique optimum.
Specifically, in any situation where the voter loads all are equal, then an equivalent form of the Ebert cost function (namely: variance among the voter loads; minimizing this is equivalent to minimizing Ebert if the number of voters and number of seats both are fixed, as was demonstrated in paper II's "Ebert & Variance" remark) is minimized since it is zero. In "basic PR" racist voting situations, that happens provided that SainteLagüe manages to attain "perfect PR" with no "leftovers" i.e. no roundoffs to integers needed.
This also happens in "perfect strong PR" racist voting situations with "commonlyrated candidates" adjoined, since the commonly rated candidates do not affect the perfect situation that the load variance is 0. The result of these thoughts is the
Theorem (Approval removal does not affect "perfect PR"): Approval removal does not destroy "perfect basic PR" and does not destroy "perfect strong PR."
However, we still need to prove the stronger theorem which also allows imperfect PR (i.e. including nontrivial integer roundoffs). We now do so. For simplicity, let us first examine the case of two "colors" (e.g. political parties) in "basic PR" situations. We are going to consider four cost function variants Q_{1}, Q_{2}, Q_{3}, Q_{4}.
Assume racist voter behavior since we are examining "basic PR." Let the first party have X voters and win S seats; the second has Y voters and wins T seats.
The Ebert cost function is
If that cost is minimized subject to S+T=fixed we find via the method of Lagrange multipliers that S/Y=T/Y, i.e. perfect proportionality. Presumably in the imperfect case where we require S and T be integer the answer would be the same as the optimum realnumber S and T, except one would round one of them up and the other down. If we let X' and Y' denote the reduced values of X and Y got by doing approvalremovals (if any) then the cost function instead is
if all removals are "all or none" i.e. each voter either removes all or none of her approvals. This Q_{2} formula also is valid if all removals are fractional and equal (for example if each approving voter removes a fraction 0.37 of each of her approvals, where 0.37 is any value that does not depend on the voter except is allowed to depend on her party). Note Q_{2}≥Q_{1} and indeed Q_{2}>Q_{1} if 0≤X'<X or 0≤Y'<Y or both, regardless of S and T. Therefore, approvalremoval does not help, i.e. cannot reduce cost, in "perfect PR" or "imperfect PR" 2party situations with either allornone, or equidistributedwithineachpartyfractional, removals.
If we do CFAT (using "fair coins"; with other coinbiases the shifts "1/2" in the cost formulas below would change to other constants) without any approval removal then by the analysis given previously in paper II, the cost function becomes
Finally if we do both CFAT and approval removal the cost function becomes
assuming either allornone, or equidistributedwithineachpartyfractional, removals happening prior to CFAT.
Again, Q_{4}≥Q_{3} and indeed Q_{4}>Q_{3} if 0≤X'<X or 0≤Y'<Y or both, regardless of S and T. Therefore, approvalremoval does not help, i.e. cannot reduce cost, in either "perfect PR" or "imperfect PR" 2party situations with allornone, or equidistributedwithineachpartyfractional, removals.
Now we claim as a lemma that the optimum way to remove approvals in racistvoting situations is, in fact, equidistributedwithineachpartyfractional. Why? Basically because of the previously shown concave∪ nature of the Ebert cost function in approvalvariable space: If you are minimizing any strictlyconcave∪ function of N variables within some real interval (same interval for every variable, and function invariant under permuting the variables) then it is a fairly well known theorem that this minimum necessarily
That theorem is an immediate corollary of Jensen's inequality. So that has proven
Theorem (interacting CFAT & approvalremoval): Interacting CFAT and approval removal, cannot destroy basic PR in either perfectPR or imperfectPR (rounded) scenarios.
We proved this for only 2 parties, but actually our whole argument works with any number (not necessarily 2) of parties – it generalizes trivially. (With large numbers of parties it becomes far less clear what the optimum rounding to integer seat counts is, but that doesn't matter, since the proof argument does not need to know what it is!)
Now finally, we can extend the theorem to also allow "strong PR," essentially because commonlyrated candidates cannot alter the minimality of the variance in the equivalent variancebased formulation of the Ebert cost function, and since the KotzePereira transform is known not to hurt either basic or strong PR.
About convex programming. Which convex programming algorithm will give us the best possible performance, meaning the least runtime and/or the least memory consumption? To learn about this, a good start is the review paper by Bubeck 2015 and the earlier ones by Bland, Goldfarb, and Todd 1981 and Goldfarb & Todd 1982. (And the book by Grötschel, Lovasz, and Schrivjer remains a classic, albeit somewhat dated.)
Definition of "convex programming." Given a concave∪ realvalued Lipshitz function F(x), where x is an Ndimensional real vector, and a convex subset S of Ndimensional space, the "convex programming problem" is to find the minimum of F(x) among all x∈S. A convex programming algorithm will be said to be "polynomial time" if it provably always succeeds in finding that minimum to Ddecimal accuracy, in a number of steps upperbounded by a known polynomial(N,D,L), where L is the input length in bits, and each "step" may involve an evaluation of F(x), its gradient ∇F(x), a Ddecimalaccuracy arithmetic operation, or consultation of an "Smembership oracle" which either reports that a provided vector x indeed lies inside S, or returns a hyperplane separating x from S.
The historically first (or at least, first to achieve wide recognition)
polynomial algorithm for convex programming in N dimensions was the
ellipsoid algorithm.
This algorithm maintains an Ndimensional
ellipsoid known to
enclose the minimizing point x.
In our application, our dimensionality N obeys N≤VW,
and since all our X_{vc} lie within the real interval [0,1],
we could begin with the sphere centered at (½,½,,...,½)
with radius (N/2)^{1/2}.
Each "major step," the ellipsoid algorithm evaluates ∇F(x) at the ellipsoid's center,
or consults the Smembership oracle, or both. Either way, it finds a hyperplane H
which separates the present ellipsoid centerpoint from the desired minimizing x.
We then use H to cut the current ellipsoid in half. The undesired "wrong" hemiellipsoid
is discarded. The "right" hemiellipsoid is then replaced by the smallest
ellipsoid surrounding it (which can be computed via a certain known formula
involving a constant number of N×N matrixvector multiplications; preferably implemented
with numerically better behaved "matrixfactorization update" techniques as discussed by
Goldfarb & Todd 1982)
and we continue on. Each such step provably decreases the Ndimensional volume of the
ellipsoid by multiplication by a
For smooth cost functions (which ours is), the linear dependence on D can be reduced to ultimatelylogarithmic by switching to a quadratically convergent iteration such as the Ndimensional Newton iteration – or quasiNewton or conjugategradient methods – once the ellipsoid method gets near enough to the answer to assure that these methods will converge.
Although this is polynomially bounded, unfortunately for us it is a horrifyingly large polynomial. Furthermore, the memory space needed just to describe an Ndimensional ellipsoid (or Ndimensional simplex; another version of the ellipsoid algorithm employs enclosing simplices rather than ellipsoids) would be order N^{2} numbers, which in our application means order (V'W)^{2} numbers. Keep in mind that in real world applications it would be common for 10^{4}≤V≤10^{9}.
The two most horrifying aspects of this are the quadratic dependence of the memory usage on the number V' of (pseudo)voters, and also the cubic or worse dependence of the runtime on V'. (And actually, the runtime can be quite a bit worse if multiprecision arithmetic is needed due to possibly poor numerical behavior of at least some variants of the ellipsoid algorithm when one tries to employ single precision arithmetic.)
(Usually) saving the day. We now want to point out some features of our particular convex programs which ought to allow, in almost all elections, reducing the memoryspace needs to O(V'+W^{2κ}) numbers and the runtime to O(V'W^{3κ1}) arithmetic operations. These improved bounds often should be quite acceptable, since both their dependencies on V' are merely linear, and since the number W of winners in realworld elections tends to be small, anyhow much smaller than V.
The idea is as follows. Randomly divvy up the V' pseudovoters into about V'/W^{κ} disjoint sets, each consisting of about W^{κ} voters. (The constant κ can be chosen empirically to get the best performance, from among, say, 1≤κ≤3.) For each such set in succession, use convex programming to optimize its own X_{vc} variables (leaving all the X_{vc} for v in other sets, fixed). Each such optimization runs in polynomial(W) arithmetics and memory. Call the entire set of all V'/W^{κ} of these reduceddimension optimizations, an "iteration." If each iteration reduces the locationerror for the minimizing x, by a constant factor (or better), then O(D) iterations will suffice to reach Ddecimal accuracy. We believe that most elections with high probability (given the randomized voter partitionings) will behave that way. If so, excellent. If not, then more iterations will be needed, but we still will safely decrease Q each iteration and this process is guaranteed to converge to the minimum – that convergence will just take more iterations than we'd hoped. Probably even better than fully optimizing for each votersubset, would be to do just one major step of the ellipoid method within a votersubset, then switch to a different votersubset and continue on.
We also remark that other ideas in Bubeck 2015 may allow further speedups. There are many algorithm variants one could try, and which perform best is largely an experimental question. For example, the "conjugate gradient" and "Nesterov accelerated gradient descent" (NAGD) general purpose optimization methods plausibly will work well, the latter because it is plausible that for most elections our cost function is "strongly concave" with small "condition number."
Minimization methods like BFGS quasiNewton and the ellipsoid method have the advantage that they can "understand" general quadratic and general concave∪ functions, but the disadvantage that this understanding requires a model containing order N^{2} numbers in N dimensions, and work growing quadratically or worse with N. Meanwhile although firstorder methods like crude gradientdescent cannot understand general such functions, they work well for minimizing functions whose "condition number" is small, e.g. which are approximately spherically symmetric about their minima. For them a step takes only O(N) time and memory. This led to the question: what was the best possible performance, as a function of the condition number κ>1, for a firstorder method, requiring storing only O(N) numbers? This question ultimately was answered by Yurii E. Nesterov and Arkadi Nemirovsky in the 1980s, who showed that the naive gradient descent requirement of order κ iterations to halve the error, was improvable to order √κ but not further. See Bubeck, Lee, Singh 2015, AllenZhu & Orecchia 2015, and Kim & Fessler 2015 for recent papers about methods related to NAGD which achieve optimum performancedependence on the condition number for a firstorder minimization method.
For us, the important – albeit nonrigorous – realization is that the runtime and memory dependence on V' can be reduced to linear while keeping the dependence on W polynomial. This claim in fact could be made rigorous provided we forbid elections featuring condition number κ depending too strongly on the number V of voters. For example if κ=V^{o(1)} then we could prove runtime growing like V^{1+o(1)}.
About the CFAT and "slimmed" versions thereof. The CFAT splits each voter into 2^{W} different voters, by "tossing W coins" in all possible ways. This exponential blowup is acceptable if V≥2^{W} because in that case the number of voters will still be of order V after merging voters having the same approval "type" (i.e. who approve the same set of winning candidates) by summing their weights ρ_{v}. And indeed, in most realworld large government elections this V≥2^{W} condition has historically been satisfied. For example, in Australia, each state elects 12 senators to the federal parliament via a PRSTV multiwinner voting method, but only W=6 at a time (staggered terms) with typically about V=4×10^{6} voters per state. In Canada, there are singlewinner elections in ridings. As far as we've been able to tell, the record largest number of candidates that ever ran in a riding was 13, and each riding contains up to about 10^{5} registered voters. So even if Canada switches to a PR system involving multiwinner elections, it seems likely that it will have W≤13 and V≥10^{5}.
However, we claim it is possible to replace our CFAT by "slimmed" variants featuring only polynomial blowup. We actually know many ways to do that, but the slimmest among them is as follows. Consider only the W winners. There are exactly W+1 subsets S of these winners with ≥W1 members each. For each such S, replace each voter by one who approves the intersection of whoever she originally approved, with S. In this way each voter is replaced by W+1 new voters, not 2^{W}. These new voters shall have two weights. Namely, one of the W+1 new voters is idential to the original voter, except will have a reduced weight; and the other W each shall have the second weight.
??
What is the purpose of the CFAT? It may seem as though the CFAT step 3 could simply be omitted from our voting method, while still enjoying Holy Grail properties. However, we would then suffer disagreeable behavior on the following class of (W=2)winner elections with V=2(1+x)N voters and 4 candidates named A,B,C,D:
#voters  Candidates they approve 

xN  A,B,C 
xN  A,B,D 
N  C 
N  D 
If x>0 is small, then the winners are {C,D}. But when x→∞ we would like {A,B} to win. Without the CFAT, unfortunately {C,D} would remain the unique winning parliament regardless of the value of x≥0, and the possibility of removing approvals for A and/or B cannot avoid this.
But with CFAT, it turns out that {A,B} wins when x>3 while {C,D} wins when 0≤x<3.
??
??
A computer check of the claims in our proof would help confirm or deny. The holy grail election method needs to be programmed and made available as software. Run test elections. Devise illustrative election examples.
The following weak "participation" property: if a new approvalstyle voter comes, she will increase Q more for parliaments with more approved MPs – seems likely to be false. Decide that.
List more properties HG obeys.
Write an introduction.
Zeyuan AllenZhu & Lorenzo Orecchia: Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent, (20142015) arXiv/1602.05248.
C. Ambühl, M. Mastrolilli, O. Svensson: Inapproximability Results for Maximum Edge Biclique, Minimum Linear Arrangement, and Sparsest Cut, SIAM J. Computing 40,2 (2011) 567596.
Jörg Arndt: Matters computational: Ideas, Algorithms, Source Code, Springer 2011.
Michel L. Balinski & H. Peyton Young: Fair representation: Meeting the Ideal of One Man, One Vote, Yale Univ. Press 1982. Also there was a second edition from Brookings Institution Press 2001. JF1075.U6B3.
Robert G. Bland, Donald Goldfarb, Michael J. Todd: The ellipsoid method: a survey, Operations Reserach 29,6 (NovDec 1981) 10391091.
Olivier Bournez, Daniel S. Graca, Amaury Pouly: Solving Analytic Differential Equations in Polynomial Time over Unbounded Domains, Mathematical Foundations of Computer Science 2011 (Volume 6907 of Springer Lecture Notes in Computer Science) 170181. See also Polynomial Time corresponds to Solutions of Polynomial Ordinary Differential Equations of Polynomial Length (2016) by same 3 authors, arXiv 1601.05360.
Sebastien Bubeck: Convex Optimization: Algorithms and Complexity, Foundations and Trends in Machine Learning 8, 34 (2015) 231357.
Sebastien Bubeck, Yin Tat Lee, Mohit Singh: A geometric alternative to Nesterov's accelerated gradient descent, (2015) ArXiv 1506.08187.
Volkan Cevher, Stephen Becker, Mark Schmidt: Convex Optimization for Big Data, IEEE Signal Processing Magazine 31,5 (1014) 3243.
Pierluigi Crescenzi, Viggo Kann, M.Halldorsson, M.Karpinski, G.Woeginger (editors): Online compendium of NP optimization problems, continually updated electronic book. Described in paper How to find the best approximation results – a followup to Garey and Johnson, ACM SIGACT News 29,4 (Dec. 1998) 9097. Two particular pages of interest to us: minimum feedback arc set & minimum feedback vertex set.
George B. Dantzig: Linear programming and extensions, 1963. Reprinted by Princeton Univ. Press 1998.
Henry R. Droop: On Methods of Electing Representatives, Journal of the Statistical Society of London 44,2 (June 1881) 141202. Reprinted in Voting Matters 24 (Oct. 2007) 746.
Lars Engebretsen & Marek Karpinski: Approximation Hardness of TSP with Bounded Metrics, J. Comput. System Sci. 72,4 (2006) 509546.
Oliver Friedmann: A subexponential lower bound for Zadeh's pivoting rule for solving linear programs and games, Proc. 15th conf on Integer Programming and Combinatoral Optimization (IPCO 2011) 192206 (Springer LNCS #6655).
Oliver Friedmann, Thomas Dueholm Hansen, Uri Zwick: Subexponential lower bounds for randomized pivoting rules for solving linear programs, Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC 2011).
Oliver Friedmann, Thomas Dueholm Hansen, Uri Zwick: Subexponential lower bounds for randomized pivoting rules for the simplex algorithm, Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC 2011).
Michael R. Garey & David S. Johnson: Computers and Intractability: a guide to the theory of NPcompleteness, W.H. Freeman & co, San Francisco 1979.
Bernd Gärtner, Martin Henk, Günter M. Ziegler: Randomized Simplex Algorithms on KleeMinty Cubes, Combinatorica 18,3 (March 1998) 349372.
Donald Goldfarb & William Y. Sit: Worst case behavior of the steepest edge simplex method, Discrete Appl. Matha. 1,4 (1979) 277285.
Donald Goldfarb & Michael J. Todd: Modifications and Implementation of The Ellipsoid Algorithm for Linear Programming, Mathematical Programming 23,1 (1982) 119.
Martin Grötschel, Laszlo Lovasz, , Alexander Schrijver: Geometric Algorithms and Combinatorial Optimization, Springer 1993.
Refael Hassin & Shlomi Rubinstein: Approximation algorithms for maximum linear arrangement, Information Processing Letters 80,4 (Nov. 2001) 171177.
R.G. Jeroslow: The simplex algorithm with the pivot rule of maximizing criterion improvement, Discrete Maths. 4,4 (1973) 367377.
Jonathan A. Kelner & Daniel A. Spielman: A randomized polynomialtime simplex algorithm for linear programming, Proceedings of the thirtyeighth annual ACM symposium on Theory of computing, STOC 38 (2006) 5160.
Donghwan Kim & Jeffrey A. Fessler: Optimized firstorder methods for smooth convex minimization, (20142015) arXiv 1406.5468.
Victor Klee & G.J. Minty: How good is the simplex algorithm, pp. 159177 in O. Shisha (ed.) Inequalities III, Academic Press, 1972.
KerI Ko: Computational Complexity of Real Functions, Birkhauser Boston, Boston, MA, 1991. QA267.7.K6 1991
M. Kuczma, B. Choczewski, R. Ger: Iterative functional equations. Volume 32 of "Encyclopedia of Mathematics and its Applications," Cambridge University Press, Cambridge 1990. [QA267.7.K6 1991??]
Pekka Orponen & Heikki Mannila: On Approximation Preserving Reductions: Complete Problems and Robust Measures, Report C198728, Department of Computer Science, University of Helsinki, revised May 1990.
Frederick W. Owens: On the Apportionment of Representatives, Quarterly Pub. Amer. Statist. Assoc. 17,136 (Dec 1921) 958968.
Christos H. Papadimitriou & Mihalis Yannakakis: The Traveling Salesman Problem with Distances One and Two, Mathematics of Operations Research 18,1 (Feb. 1993) 111.
Toby Pereira: Proportional Approval Method using Squared loads, Approval removal and Coinflip approval transformation (PAMSAC) – a new system of proportional representation using approval voting, arXiv/1602.05248, Feb. 2016.
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical recipes, The Art of Scientific Computing, Third Edition, 1256 pages, Cambridge University Press 2007.
Thorvald N. Thiele: Om flerfoldsvalg [On multiple election], Oversigt over Det Kongelige danske videnskabernes selskabs forhandlinger = Bulletin de L'Académie royale des sciences et des lettres de Danemark, Copenhague Kongelige Danske Videnskabernes Selskab = Royal Danish Academy of Sciences and Letters 3,4 (1895) 415441.
Norman Zadeh: What is the Worst Case Behavior of the Simplex Algorithm? Tech. Rept. #27, Department of Operations Research, Stanford (1980).