By Warren D. Smith, January 2007
Let Pk be the (known) population of state k, and let Sk be its number of congressmen, where the total number S=∑kSk of congressmen is fixed by US law to be S=435.
The question is: what should the Sk be? The difficulty is that they have to be integers.
The new method we propose is very simple. Choose a number Q>0, traditionally called "the divisor," so that the total number of seats S will come out correct, and let
where d is a constant that had been chosen ahead of time. There are two ways to choose d. First, we can choose it to yield the best retrospective performance (least "bias") over past US history. Second, we can instead choose d based on a purely theoretical analysis, which yields the following formula:
With the USA's K=50/435=10/87=0.11494252873... we get d=0.495211255149063832... For application to "party list" PR voting systems, instead K=#parties/#seats, and with typical values #parties=20 and #seats=300 we would get d=0.49722. In the limit K→0+ we would get d=1/2 (Webster), and in the opposite limit K→1 we would get d=1-ln(e-1)≈0.458675.
First of all, the classic apportionment methods of Webster, Adams, and Jefferson are merely special cases of our new method, with d=½, d=0+, and d=1- respectively. If we choose the optimal value of d, then we must get a method at least as good as these (by whatever performance measure we are optimizing). Because Balinski & Young's book concluded Webster (and in some ways Jefferson) were the best methods among the ones they knew (and computer simulations by Dan Bishop and myself also came to that same conclusion), the new method is at least as good as all previous to the extent you buy Balinski & Young.
Second, because the new method is a "divisor method" it enjoys all the properties that, B&Y proved, divisor methods (and only divisor methods) always enjoy: "House monotonicity," "population-pair monotonicity," and freedom from the "new state paradox."
The Hamilton/Vinton method fails all three kinds of monotonicity. The Balinski-Young method fails two of them but obeys "House monotonicity" and both it and Hamilton/Vinton obey the "quota criterion" that the number of seats a state is awarded differs by at most 1 from its ideal value. [All divisor methods fail the quota criterion in suitable situations; and indeed the quota and population-pair-monotonicity criteria are logically incompatible.]
Third, computer simulations by Dan Bishop's computer program (as well as an upcoming more powerful program of my own) found superior new-record behavior for the new method (using the theoretical model's d) over all previous divisor methods both in terms of several bias measures, and in terms of making "quota violations" get rarer. (Note: this computer finding was not merely a trivial and expected consequence of our derivation below, because the bias measures the computer evaluated were not the same as the one nulled out by our derivation, and also because the computer examined other probabilistic models.)
Fourth, both our computer simulations and US history agree that Webster is slightly biased in favor of the large-population states, i.e. it tends to give them slightly more than their ideal (but noninteger) fair share of congressmen. (And Adams is heavily biased in favor of small states, and Jefferson is heavily biased in favor of large ones.) The new method with d=0.495 perturbs Webster in the right direction to get rid of that bias, or alternatively our theoretical model that outputs d=0.495 may be regarded as having predicted Webster's bias.
As our starting point, we shall assume the state populations are
independent identically distributed exponential random variables,
i.e. with density
That is: the probability density proportional to exp(-xK)dx is postulated to hold for state populations measured in units of one ideal seat's worth of population. In that case, the expectation value of x, which is 1/K, must equal #seats/#states, i.e. the correct average number of seats a state gets.
In other words:
Now suppose we wish to choose a threshhold y, n<y<n+1, so that, if these states have quota>y, their seat-count is rounded up to n+1, otherwise down to n; and we wish to choose y so that the net expected additive effect of the rounding is zero, i.e. so that the rounding is unbiased. The formula that determines y is then
It is very interesting and pleasant that this formula for y-n is independent of n. This new apportionment method is (by its derivation) unbiased in our probabilistic model not only for all states en masse in the sense that neither large nor small states have any systematic advantage, but also, more strongly, is unbiased for each state-subset in each quota-interval [n, n+1] in the sense the expected additive adjustment of the number of seats caused by rounding is zero. In the K→0+ limit it reduces to the Webster formula y=n+½, and in the K→∞ limit it reduces to the Adams formula y=n+0; for intermediate K it provides intermediate formulae.
Our simple apportionment proposal does not provide a 1-seat minimum; it will award small-enough states zero seats. If you want to have a 1-seat minimum, then the first rounding point y (between 0 and 1) needs to be replaced by y=0. The other rounding points y then are still given by the usual formula y=n+d, for n=1,2,3,... but now with a different value of d. Again you can choose d to optimize historical performance, or in the same theoretical model we have the exact formula
which when K=50/435 yields d≈0.557504703. [Note this formula only is valid when 0<K<1.] This formula was derived by solving for y to zero
The resulting apportionment method then is only unbiased for all states en masse (albeit if K>1 then unbias is unachievable) and not for each state-subset in each quota-interval [n, n+1] individually.
It is also possible to start from other underlying simplistic probabilistic models besides i.i.d. exponential state-population densities. I shall argue soon that these other models make less sense, but disregarding that, one can proceed anyway and see what formulas result. Also, we can work toward different goals than zeroing the expected additive adjustment to the quota (caused by rounding-to-integer); we instead could, e.g, zero the expected additive adjustment to the log of the quota. We have worked out several examples, some of which will be given here. They all yield rather more complicated formulas, which unfortunately always depend on n:
1. If we aim to null out the expected additive adjustment in a state's log(#seats) by choice of y, then solving
Here the "exponential integrals" Eim(z) for m=1,2,3,... are defined in the halfplane Re(z)>0 by
2. If we aim to null out the expected additive adjustment in a state's seat count, divided by the state's population, then we ask for y so that
3. If we aim to null out the expected additive adjustment in a state's population/seats ratio (i.e. in its "district size"), then we ask for y so that
4. If we aim to null out the expected additive adjustment in a state's population/seats ratio (i.e. in its "district size") times its population, then we ask for y so that
5. If we aim to null out the expected additive adjustment in a state's log(#seats) times its population by choice of y, then
6. If we aim to null out the expected additive adjustment in a state's seat count, divided by the square root of the state's population, then we ask for y so that
At the present moment it seems to me that alternative method #2 is the most "morally right" in terms of its underlying theoretical model/goals. However, that is merely my subjective impression. Here is my reasoning:
That method aims to null out the "bias" which (for that method) is defined as the expected |additive change| in the number of seats a state has (caused by the round to integer) divided by state population. Why is this "morally right"? Well, it seems that the unfairness, for an individual person in state k, of having X times as many congressmen than he ideally should, is
Now we have to weight this unfairness per person, by the number of people in the state, which means we multiply by Pk thus cancelling out the denominator to get |X-1|.
But that is exactly what the derivation of the method nulled out, because |X-1| is proportional to the |additive change| in the number of seats, divided by state population.
But Mike Ossipoff is (justifiably) dubious of my "quantified morality" above and currently thinks the original method is the morally best of those here. He argues against me:
You said it was about unfairness per person, which justified dividing by population. But unfairness sometimes isn't something that has to be divided among its recipients. When we're unfair to a state, all of its residents get all of the unfairness, and they don't have to divide it up... Equal representation expectation per person – shouldn't that be the goal for all of us? The original method can be derived from that goal.
It is also possible to take a compromise path "midway between" the Ossipoff and Smith views (i.e. if Ossipoff divides by 1 and Smith by x, then the compromise is to divide by √x), and the result of that would be alternative method #6. More generally one could consider division by an arbitrary power xμ of x, whereupon the resulting integrals would be expressible in terms of incomplete gamma functions.
There are many reasons for that, well known to mathematicians and statisticians and associated with such names as Dirichlet, Poisson, and Shannon. We shall explore them incompletely.
First, there is the observation of Dirichlet. Suppose you want to split up a country's total population P into N state-populations Pk, k=1,...,N (N fixed), so that ∑k=1..NPk=P and Pk≥0. This set is an (N-1)-dimensional "simplex" in N-dimensional space. The goal of choosing a point randomly from this set (and uniformly so that equal N-1-dimensional areas are equally likely) can be attained by choosing each state population to be an i.i.d. exponential random deviate, and then normalizing the N state populations so that their sum comes out right (to the desired value P). In this process, only exponential random variates work. Every other 1-dimensional density function does not work.
Second, there is the observation of Poisson. Suppose you wish to drop N points randomly and uniformly in the interval [0,N/K] of the real line, where N→∞. This is a model of "density-K random points." Then what is the probability density for the gap between two consecutive points? The answer is that it is exponential: Kexp(-xK) dx. Indeed, it is not even necessary to take the limit N→∞; we can make a very similar statement about each finite N. By that we mean that the following algorithm will generate N random uniform in [0,1] numbers X1..N, in sorted order without actually performing a sort:
s = 0; for k=1 to N do s = s + ExponentialRandomDeviate; Xk = s; end loop; s = s + ExponentialRandomDeviate; for k=1 to N do Xk = Xk / s; end loop;
Our point is, that the gaps between the consecutive Xk (including the initial gap from 0 to X1 and the final one from XN to 1) generated in this way are i.i.d. exponential random deviates, aside from the rescaling at the end.
We also can get exponentials from a discrete model. Suppose we consider all the possible ways of placing P indistinguishable balls into N distinguishable boxes. How many are there? The answer is (P+N-1)!/[(N-1)!P!]. Now if all these configurations are considered equally likely, then in a limit where P→∞ and N→∞ with the average number P/N of balls per box held fixed, what is the probability distribution for the number of balls in one particular box? The answer is it tends to a (now discrete) exponential distribution on the non-negative integers, with mean P/N.
In both of the cases above, what is going on is we are splitting a total population (modeled as "indistinguishable balls" or "the real interval [0,N/K]" into N "state" populations subject only to the constraints that they be nonnegative and have the correct sum, and we are finding the state populations (in a limit) are exponentially distributed. Because in the USA P=3×108 is large, and N=50 is fairly large, the asymptotic limits are fairly well satisfied in practice, albeit the assumption the state boundaries are drawn to split up the population "randomly" is of course nonsense (but if you are going to make a simplistic probabilistic model, this is the one).
Third, and finally, there is Claude Shannon. The probability density function on the positive real axis with given mean 1/K and maximum entropy is the exponential density Kexp(-KX)dx (and similarly the discrete exponential has maximum entropy on the nonnegative integers with specified mean). Maximum entropy is a formal and quantitative version of Occam's razor. That is, this is the "simplest" assumption about a non-negative random variable with a mean, i.e. the assumption which "assumes the least."
Can we now devise some pairwise optimality theorem or global optimality theorem that these methods obey? (See /Apportion.html for such theorems concerning the classic apportionment methods as well as an introduction to them generally.)
The following should be useful references both for terminology and for previous results and for related history (mostly US):
M.L. Balinski & H. Peyton Young: Fair Representation: Meeting the Ideal of One Person, One Vote (2nd edition), Brookings Institution Press 2001. Young also wrote a more accessible essay summarizing his thoughts for the layman.
Lawrence R. Ernst: Apportionment methods for the House of Representatives and Court Challenges, Management Science 40,10 (Oct. 1994) 1207-1227.
Warren D. Smith: Introduction to Apportionment methods web page.
The sorted random points without sorting algorithm is from:
J.Bentley & J.Saxe: Generating Sorted Lists of Random Numbers, ACM Trans. Math'l. Software 6,3 (1980) 359-364.
For standard results on probability distributions see the 5-volume set by N.L.Johnson & S.Kotz (now with N. Balakrishnan and Adrienne W. Kemp in recent editions) J.Wiley & Sons; for combinatorics R.P.Stanley: Enumerative Combinatorics, (Brooks Wadsworth, Cole), and for information theory and maximum entropy theory, T.M. Cover & J.A. Thomas: Elements of Information Theory, Wiley 1991.
I would also like to acknowledge Dan Bishop and Michael Ossipoff. Bishop programmed a computer apportionment simulator, and Ossipoff came up with the idea of trying to seek an unbiased apportionment method, although his execution of that idea was poor due to incorrect mathematical manipulations and/or confusing expositions (and Ossipoff also did not appreciate the desirability of the exponential density as an underlying model). The present work can be viewed as simply carrying out Ossipoff's idea, but now with correct execution and a better underlying model.