Optimal party- and district-based proportional representation with one MP per district

Warren D. Smith. Nov. 2015. PRELIMINARY, it would be better to have a more careful & complete examination of various problem-version's LP duals, and analysis of the integer programs which result when the ∑=W constraint is adjoined.

Overview

If, in an election, each district attempted to elect its MP using score voting, then the resulting parliament could be disproportional. I.e: each voter rates each candidate with a numerical score from (say) 0=bad to 9=good. That causes each candidate to acquire an average score. But if each district then just elected whoever had the highest average score, the resulting parliament could be "disproportional."

A possible solution to that problem is to demand that the parliament have a prescribed party composition (proportional to the number of voters who had awarded each party their top rating), and among all possible parliaments meeting the party-composition constraints and 1-seat-per-district constraint, finding the "best" one, i.e. the one maximizing the sum of the average scores of all its MPs. We'll show that this optimization problem is efficiently soluble using "linear programs" (LPs) of a well understood type arising in optimal "weighted bipartite graph matching."

When the "dual form" of our linear program is studied, we find that it is quite similar to – but not the same as – the "Fair Majority Voting" scheme proposed by Michel Balinski in 2008. The reason it is not the same is that Balinski made a severe design error. He, in our terminology, optimized the wrong thing. (Although actually, Balinski did not really realize he was optimizing anything.) The right thing to optimize must approximate perceived society-wide "utility." Balinski optimizes something rather unrelated to that, whereas we optimize the obvious natural approximation to it. But it is possible to alter the Balinski proposal to remove his "multiplicative weights" by replacing them with "additive offsets." When that is done, Balinski's altered system becomes essentially equivalent to ours via an LP-duality, aside from his preference for plurality-style "name-one-candidate" ballots as opposed to our preference for score voting style "rate-all-candidates" ballots.

Also, still more embarrassingly for Balinski, he contended his system would "eliminate gerrymandering" by removing all incentive for it. That claim was total garbage, and displays a massive lack of understanding by Balinski of what gerrymandering is like in the USA and why it happens.

Both our proposal and Balinski's enjoy the virtues of electing exactly one MP per district, and achieving proportional representation (PR) as measured by party-fractions.

However, they also both suffer the disadvantage that parties are woven into the very fabric of the voting system definition. A government based on either system would need to forbid, or at least be very unfriendly to, unaffiliated candidates; and it probably would be deemed necessary to enact laws forbidding MPs from switching parties (countries have enacted such laws since they felt otherwise their PR systems could be "gamed"). I greatly prefer systems based entirely on candidates with no reference to parties, and which work equally well regardless of who is in what party, or whether parties even exist.

Also, we demonstrate another massive disadvantage of both schemes (Balinski's method is afflicted more severely than ours, but both seem severely damaged): they are hugely susceptible to a simple organized form of strategic voting I call "targeted killing." I would forecast that any country trying to use either scheme to elect its parliament would be devastated by this problem.

I think these disadvantages render both schemes, especially Balinski's, unacceptable for the basic design of the government of a country.

(Also, more generally, any of a wide class of PR schemes involving explicit use of party labels as part of their definitions, might be vulnerable to an analogous problem, which should be a very scary thought for anybody designing such a system. Play it safe by refusing to involve party labels in your system. Or watch out.)

But there is a possible salvage. Do not use our scheme as the election method. Instead merely use it as a "top up" method to try to restore PR to an otherwise single-winner-per-district parliament elected by old fashioned independent single-district single-winner elections. (Note: Our LP formulation can also be used to find the optimum parliament with prescribed party-composition such that district k elects Nk MP's, where the integers Nk≥0 can be anything, not necessarily all 1 or all equal.) When used for this top-up purpose, our method becomes a far less attractive "target" for "killings." And then it actually might be a decent way to design a government, at least if you can stomach being force-fed political parties...

The voting system

Purpose. To elect a W-seat parliament, consisting of one MP per district, in such a way that the party-composition of the parliament, mirrors that of the voters. We assume some number C>W of candidates run and at least one candidate runs in each district.

Extension: we could optimally make district k elect Rk MPs, for any pre-specified set of non-negative integers R1, R2, ... summing to W; they do not need to all equal 1. Indeed, if desired we could even just pre-specify integer intervals [Lk, Rk] and merely demand district k elect a number x of MPs where Lk≤x≤Rk.

1. Each voter rates each candidate in her district, on an 0-to-9 numerical scale. (Voters also allowed to leave some candidates unrated, if that voter wishes not to express any opinion about that candidate.) Refinement suggested by J.Quinn: "Any voter who did not explicitly rate any candidates at 0 is considered to have given 0 to any candidates they left blank. Any voter who did rate at least one candidate explicitly at 0 is considered to have expressed 'no opinion' about any candidates they left blank. (I think this rule is the safest way to interpret 'intent of the voter'.)"

2. At this point, each candidate in each district possesses some "average score." Also, each political party, nationwide, has a number of "points." (Each time a voter rates K candidates co-equal top, then each of their political parties gets 1/K of a point.)

3. Determine the number of seats each party "deserves" in parliament,. e.g. party j deserves Nj seats, for some set of non-negative integers S1, S2, ... summing to W. These Sj could be produced from the party point-counts and W by, for example, the classical procedure of Sainte-Laguë. Extension: everything to follow could also more generally be done with integer intervals [Sj, Tj] where we shall demand that party j gets a number x of seats with Sj≤x≤Tj.

4. Find, from among all possible ways to elect W of the candidates to parliament which meet all the party seat-count constraints and also simultaneously meet all the district MP-count constraints, the "best" possible choice, and elect those W. Here "best" means that the average scores possessed by the W elected MPs, have maximum possible sum. (Note: if no parliament meeting all the constraints exists, then report that failure and stop. However, if the Sj were produced via Sainte-Laguë as described in step 2, then a solution obviously will exist.)

Why that voting method can be implemented efficiently – graph formulation and LP reformulation

We can view this as a problem about bipartite graphs (or bipartite multigraphs) with edge weights, then solve it efficiently via linear programming. Specifically, consider a bipartite graph with one red vertex for each political party, and one blue vertex for each district. The edge from party j to district k has numerical weight equal to the average score possessed by the candidate from party j who ran in district k. (If some party runs more than one candidate in some district, we'd need to make multiple edges between those two vertices, and it'd be a bipartite multigraph. That is ok.)

Our optimization problem then is a "max-weight constrained-valence subgraph" problem. That is, we seek a subgraph (W-element subset of the edges) of the original graph. This subgraph must obey the "valence constraints" that, e.g. the number ν of edges emerging from the party-j red vertex, must obey Sj≤ν≤Tj. And so on; each vertex of the original graph has its own allowed-valency integer interval. Among all possible subgraphs meeting the valency constraints, our task is to find the one maximizing the sum of the edge-weights.

This graph problem has been heavily studied since the 1950s and consequently nowadays is well understood. One way to solve it is by reformulating it as a linear program. That works to provide a polynomial-time algorithm. (However, it also is possible to solve it more quickly because this linear program happens to have special structure. We won't discuss that.)

Here is the linear program. If there are P parties and D districts, then there are PD variables xjk. (We are formulating this, for simplicity of notation, in the ordinary-graph case. Actually, in the multigraph case we really would need to index the x's by which edge of the original multigraph that x pertained to. In the ordinary graph case, an edge is specified by its two endpoints, namely party j and district k, which simplifies notation.) The meaning of xjk will be the number of candidates elected by party j from district k, and this number after solution will always be either 0 or 1. These obey the linear inequality constraints

0 ≤ xjk ≤ 1,    (∑1≤j≤P1≤k≤D xjk = W),    Sj ≤ ∑1≤k≤D xjk ≤ Tj,    Mk ≤ ∑1≤j≤P xjk ≤ Nk.

Subject to those constraints, we find the xjk that maximize

1≤j≤P1≤k≤D xjk Ajk

where Ajk is the average score possessed by candidate jk.

Obviously, this is a linear program, and if its optimum solution happens to have all entries xjk integer, then that solution instantly solves our original graph problem optimally. However, if we ignore the parenthesized ∑1≤j≤P1≤k≤D xjk=W constraint, then it automatically will have all entries integer, because then this is a "totally unimodular" linear program with "integer right hand sides," due to the Hoffman-Gale theorem.

Non-integer solutions would be inconvenient because politicians do not like being cut in half. Hence for those problem versions we could end up having some other (i.e. not W) integer number of seats consistent with all the other (nonparenthesized) constraints. However in the unextended (or less extended) forms of our voting problem, the W-seat demand will be satisfied automatically, even without explicitly imposing it, just because, e.g, the sum of the demanded district-occupancies equals W if all Mk=Nk.

Dual form of our linear program

When we apply the linear programming duality theorem, we find our LP is equivalent to the following. (We'll assume Sj=Tj and Mk=Nk for this purpose.) Seek "additive adjustments" Uj≥0 for party j and Vk≥0 for district k such that

Uj + Vk ≤ Ajk,    and    ∑1≤j≤PSjUj + ∑1≤k≤DNkVk   is maximized.

We then elect the Nk candidates in district k having the maximum adjusted average score Ajk-Uj-Vk, for each k=1,2,3,...,D. (The adjustments automatically will cause the resulting parliament to obey the party-seat-occupancy demands.) Equivalently, we elect the Sj candidates in party j having the maximum adjusted average score, for each j. (Generically, the same winner-set will result either way.)

Balinski's "fair majority voting" system, and its relationship (but inequivalence) to ours

Balinski's system begins by soliciting from each voter in each district a "name one candidate" plurality-style vote, saying their presumably-favorite candidate running within their district. Each such vote for a candidate also counts as a vote for his party. Balinski then determines each party's "deserved" seat count, by applying d'Hondt's method to the party vote counts. (Balinski prefers d'Hondt, while I prefer Sainte-Laguë, for this purpose. It perhaps is of interest that Germany's Bundestag originally used d'Hondt, but as of 2005 has switched to Sainte-Laguë.)

Then "multiplicative adjustments" ρj>0 for party j and λk>0 for district k, are found, such that if the vote counts are adjusted – i.e. any vote cast within district k is multiplied by λk and any vote cast for a party-j candidate is multiplied by ρj – then the winners in each district using adjusted votes, yield the deserved number of seats for each party.

So Balinski's system is the same as ours, except for the following changes:

1. Balinski uses plurality-style ballots, while we use score-style.
2. Balinski uses d'Hondt, we use Sainte-Laguë. (Balinski claimed that was since he wanted the USA to develop a "majority party" and d'Hondt made that more likely. Evidently Balinski felt for some mysterious reason that the USA did not already have enough 2-party domination. Polls of US citizens, however, disagree, with 60% finding the USA needs a third party.)
3. Balinski uses multiplicative adjustments, while we use additive ones. (But Balinski's also are additive after logarithmically transforming everything, in which sense his and my methods are "the same.")

While I believe all three of these choices by Balinski were unwise, the last one was truly absurd, because...

What Balinski's LP is optimizing and why that was the wrong thing to optimize

We optimize the right thing, namely, the sum, over all MPs, of their average score reckoned by voters. This is the obvious naive approximation to the "quality" or "social utility" of the parliament, as estimated by voters. Balinski maximizes the sum, over all MPs, of the logarithm of their number of votes. (Equivalently, he maximizes the product of their vote-counts.) This has little relation to anything utilitarian.

How "targeted killing" strategic voting method would devastate Balinski's system and render it utterly unfit for direct use in designing the government of a country

With Balinski, the NaziLoon party (which gets 1% of the votes and hence deserves 1% of the seats) necessarily will enjoy a multiplicative scale-up factor of about 50, as compared with the Mainstream party (which gets 50% of the votes) with a multiplier of about 1. In other words, Balinski is going to force each NaziLoon vote to count about the same as 50 Mainstream votes.

This is a huge "leverage" or "amplification" factor subject to easy exploitation via "strategic voting." For example, suppose for the purposes of argument that Justin Trudeau (leader of Canada's Liberal party, recently elected prime minister in 2015) is the most popular MP in Canada. So the Tories might like it considerably if Trudeau's district were to un-elect him and thus remove him from power. Tory leadership can easily accomplish that goal. They simply tell the Tories in Trudeau's district "we suggest that instead of voting Tory, which is a lost cause, you vote NaziLoon." The 50× amplification factor then easily carries the day, and – voila – Trudeau is gone.

I call this "targeted killing." Now the Tories might now want to try that again, selecting as their target some other important Liberal MP. If, however, they kept re-using the NaziLoon party as their pawn in the targeted-killing game, it would no longer work as well. So instead, for their next "target" the Tories would switch to using the BeerLovers party as the (new) pawn. And so on.

This would cause all the important leaders in Canada to be "assassinated" – removed from political power – and instead a ragtag random collection of normally unelectable NaziLoons, BeerLovers, etc would be all over parliament. So perhaps the best two-word description of the situation that would be caused if Balinski's voting system were the design for Canada, would be "complete devastation."

The lesser (but probably still unacceptably severe) vulnerability of our system to "targeted killings"

With our system, which has additive rather than multiplicative adjustments to votes, the situation is somewhat less dire than with Balinski. There will no longer be a 50× amplification factor for NaziLoon votes which could be used to "kill" any "target." Instead, the NaziLoons will get a large additive "head start" in vote count in each district. Because Trudeau (in our example above) was postulated to be a very popular Liberal MP, it would still take a lot of NaziLoon votes to overcome him in his district, and it would be far easier and much more likely to work, to target an unpopular Liberal MP in this way. So it would no longer be possible to target essentially anybody. Only some targets, usually comparatively unattractive ones, would be easily "hittable."

Nevertheless, even with our system there would be some easily predictable places where easily-determined strategic voting could be used to hit hard. These hits would be highly effective and also would, as a side effect, distort PR to artificially enrich parliament with fringe candidates.

Why neither Balinski's nor our systems should be expected to "eliminate gerrymandering"

Balinski (judging by his title) seems to hold the view that with his voting method, gerrymandering will vanish because it is no longer useful. From then on, districts would be drawn sensibly, for a refreshing change!

Unfortunately, we believe Balinski is wrong about that. We think it quite likely, unfortunately, that gerrymandering would persist and continue to corrupt the USA massively.

Nationwide, we get proportionality, therefore it does not pay to gerrymander. Agreed. But locally, gerrymandering pays – plenty!

Suppose you are Joe Democrat Congressman. You'd like to have a safe seat. So you get your buddies in the statehouse to draw you a safe district full of Democrats.

Result (under Balinski voting): Joe is almost certain to be elected. The only way Joe can fail to be elected is if the entire country decides it hates Democrats hugely, and/or Joe's district stops voting Democratic – both of which are almost impossible. (Well, if we ignore the above about "targeted killings.")

So what I predict would happen (with Balinski voting) is this. The Democrats and Republicans will make interparty deals (just as they already do) to draw "safe" districts for all their high-ranking members. As usual, none of them will be defeatable.

But there will also be some districts occupied by lower-ranking Dems and Repubs, who do not have as much influence with the party bosses, which will be "at play." Those will be unsafe seats. So Balinski might improve things, but by no means will eliminate either gerrymandering or "congressmen for life."

Because of proportionality, the Greens will be able to get elected too. There could in principle actually be districts that are safe for Greens (i.e. essentially guaranteed to elect a Green). That is because the Greens will (by proportionality) win, say 5% (or whatever) of congress. The most-Green district in the country will then be essentially guaranteed to elect a Green, even though it (say) contains only 10% Greens.

Ivan Ryan comments: Locals might be pissed at a third-party which sets up its base in their district so it can consistently get 5% of the vote – the result being that 95% of the residents of the district are represented by a wacko-fringe party that they don't support.

Now the Dems and Repubs could (and probably will?) conspire to hurt the Greens as follows. Although the Dems and Repubs will be unable to prevent 10% Greens from being elected if the voters are 10% Green, they can keep shifting the Green district-boundaries with the goal of preventing any Green from being reelected. Or, they could try to make all the districts safe for everybody, so nobody ever gets defeated. This won't entirely work (if the country shifts from 60-40 R/D to 40-60, say, then there will be a 20% change) but it will work in the sense that all the established party members will be safe everywhere. Further, if, say, the Dems arrange to make their seatholders all have districts packed with Democratic voters, we don't see how that will hurt the Democratic party:

1. Their seatholders will now be safe.
2. The remaining parts of the country, which contain (say) only 1% Democratic voters because the gerrymanderers sucked it dry, still will elect Democrats because of the proportionality guarantee, so the party as a whole is not hurt.

So, sorry Balinski – there is huge incentive to continue to gerrymander. Indeed, the ultimate limit would be that every district is 100% politically monolithic. In that case, yes we'd get proportionality, but every congressman would be a congressman for life (the opposite of democracy) and the country would be totally divided+polarized into utterly predictable red and blue chunks.

Indeed, Balinski throughout his paper about "eliminating gerrymandering in the USA," appeared to be completely unfamiliar with the fact that a large number (probably "most") of the seats in the USA are "bipartisan gerrymandered" and indeed unaware even of the existence of the concept of "bipartisan gerrymandering." And also, with the fact the USA's districts are drawn, not federally, but rather locally (i.e. state by state). For next time, I would recommend Balinski acquaint himself with the basics about an area before he revolutionizes it.

Would it be acceptable to use our system merely as a "top up"? Yes, and the simple result is "MMP with score voting."

The idea would be: first to elect 1 MP per district in an old-fashioned (non-PR) manner, i.e. with single-winner elections within each district independently. (Preferably the score voting single-winner voting system would be used, not "first past the post.") Then we would add "top-up" MPs to try to bring the parliament back toward proportionality. (There would be some pre-specified number W of seats in parliament, of which W-T were single-MP-in-district seats and the remaining T were top-up seats.) The number of top-up seats from each party could be obtained using Sainte-Laguë based on each party's vote counts, but now starting the Sainte-Laguë seat-addition procedure not from zero initial seat-counts for each party, but rather from the party seat-counts arising from the initial W-T "one MP per district" elections.

Sainte-Laguë seat-addition procedure: Let Vj denote the vote-count for party j, and Sj denote its current seat count. Find the party with the greatest V/(2S+1) value. Give that party one additional seat (incrementing its S). Keep doing this, one seat at a time, until all the seats in parliament have been labeled with party-occupancy labels. [d'Hondt is the same except that the formula V/(2S+1) is replaced by V/(S+1).]

With each party's deserved-seat count known for the top-up seats, the plan then would be to employ the linear program here to choose the T top-up MPs optimally while meeting the party-occupancy demands. This task would, in fact, be trivially simple, i.e, no linear programming would be needed! That it because we simply would, for a party deserving S more seats, simply elect the S highest-scoring as-yet-unelected MPs from that party. This essentially would be the (already well known and already used, e.g. in Sweden) Sainte-Laguë method of "party lists" but used as a top-up method, with the list ordering within each party simply being descending order by average scores received by their candidates from the voters in their districts.

If we wanted to impose the additional demand that no district could have more than one top-up MP, though, then we would need linear programming, and indeed integer programming. The latter might be too computationally difficult for feasibility. Linear programming is computationally efficient. Integer programming is not and too-large integer programs can be infeasible to solve. (Anyhow quite likely such an integer-programming-based top up would not even be desirable, if the "targeted killing" threat were considered worse than the problem of occasionally having districts with several MPs.)

Here is a list of advantages and disadvantages of the resulting system:

1. Pro: Very simple, well known, method with historical experience of use (approximates the "MMP" systems currently used in Germany and New Zealand) with no linear or integer programming needed.
2. Con: needs explicit use of party names. "Independent" candidates, therefore, are placed at a severe disadvantage, or anyhow treated unequally. It is probably not a good thing if a country, by law, in the definition of its voting system itself, disparages or forbids independent thinking. In the USA during 1970-2015, approximately 1/3 of voters identified themselves as "Democratic," approximately 1/3 "Republican" and approximately 1/3 Independent. In 2015, Gallup reported that the precise count was 43% Independent, 30% Democrat, 26% Republican:

At the same time, Ipsos (on behalf of Global News) polled Canadians about how they would vote (if they could) in the USA presidential election, finding 55% Democrat, 28% Independent, 17% Republican. In Hong Kong, the majority of the legislative council consists of independents. The Irish Dail (elected via a PR system defined in a way that nowhere mentions "parties") contained 10% independent MPs during 2011-2015. This contrasts with Poland, which had zero, since the Polish system basically makes it impossible to be an independent politician. Many countries over time have experienced a rise in party control, which has caused a decrease in the numbers of their independent politicians. In the opinion of many, this trend has been bad. So if we were to use the present section's "proportional representation" party-list top-up system to elect a parliament, and if it resulted in fewer than 1% of MPs being independents (duplicating Canada's 2015 situation) – then that wouldn't be "proportional representation." It would be "an enormous distortion of PR."
3. Think that's unrealistic? The two most prominent countries using an MMP system to elect their parliaments are Germany (Bundestag) and New Zealand. In the German Bundestag's first election in 1949, three MPs were independents. In all elections after than (up to 2015) there have been zero. New Zealand installed its current MMP system in 1996. The last independent MP elected was Winston Peters (elected in 1993); up to 2015 there have been no others. It is possible to repair this perceived problem to some extent by regarding all the independents as a pseudo-party (the "independents' party"). Of course, many independents are quite dissimilar, so that such an agglomeration could be highly misleading. But from the standpoint of just providing an extra boost to independents versus party candidates to try to repair inherent pro-party unfair biases, this repair ought to work pretty well.
4. Con: it is possible to "game the system" tremendously by switching party affiliations. To stop that, New Zealand (which uses a top-up system of a similar nature) found it necessary to enact anti-switching laws making it illegal for politicians to switch parties. Such laws are not exactly conducive to independent thinking. (I am writing this in 2015. New Zealand has not had any Independent MPs for over 20 years.)
5. Con: the top-up MPs are "countrywide" and not "regional," and since some districts will have more than one MP due to top-ups, that could be viewed as unfair. It is possible to repair this perceived problem to some extent by dividing the country into, say, 20 regions, and electing 1/20 of the parliament from each region (complete with its own top-ups and own party-lists inside each region). The regions conduct elections independently using the top-up system we've described (each region contains many districts inside) and then at the end all the regional subparliaments are merged to obtain the full countrywide parliament.
6. Con: In this kind of system, it can be strategic for voters not to vote for their favorite candidate. Such votes can be "wasted" and it can be strategically better to lie so that you can elect a "lesser evil."

References

Michel Balinski: Fair Majority Voting (or how to eliminate gerrymandering), American Mathematical Monthly 115,2 (Feb. 2008) 97-113.

Steven J. Brams & Peter C. Fishburn: Proportional representation in variable-size legislatures, Social Choice and Welfare 1,3 (Oct. 1984) 211-229.

George B. Dantzig: Linear programming and extensions, Princeton Univ. Press 1963.

A.J.Hoffman & David Gale: Appendix within the paper I.Heller & C.B.Tompkins: An Extension of a Theorem of Dantzig's, pp.247-254 in H.W.Kuhn & A.W.Tucker eds. Linear Inequalities and Related Systems, Annals of Mathematics Studies #38, Princeton University Press 1956.