## Puzzle #102: Probability Condorcet Winner exists

Puzzle:
In the random election model (also called the "impartial culture")

1. Find an exact formula, expressed as a 1-dimensional integral, for the probability P(N) that a "Condorcet Winner" (i.e. a candidate who pairwise-defeats all the others by majority-vote) exists in a V-voter N-candidate election in the V→∞ limit. [Hint: it will help immensely to have solved puzzle #99 first.]
2. Then find a simpler asymptotic formula, valid when N→∞, not involving any integrals.
3. Your formula should prove that the probability that a Condorcet Winner exists is less than N-0.99 for all sufficiently large N. Prove this is also true in the Dirichlet model.

By Warren D. Smith, January 2009

### Exact Formula for P(N)

a. Consulting the answer to puzzle #99, we find that candidate A defeats rival B if XAA>XBB+XAB where all three quantities XAA, XBB, and XAB are independent standard normal random deviates (mean=0, variance=1). We can equivalently regard XBB+XAB just as a single normal random deviate with mean=0 and variance=2. Hence A defeats all N-1 rivals if 2-1/2XAA is greater than the maximum among N-1 independent standard normal random deviates. Letting F'(u)=(2π)-1/2exp(-u2/2) be the standard normal density and F(u)=[1+erf(2-1/2u)]/2 be its CDF (i.e. integral), we obtain the incredibly simple formula

Prob(A is the Condorcet Winner) = ∫-∞<u<∞ F(2-1/2u)N-1F'(u)du.

To get the probability any Condorcet winner exists, we simply multiply by the number N of candidates. (That is valid by symmetry – all candidates are the same – and the fact there cannot be more than one Condorcet Winner so these probabilities are summable since they are for disjoint events.) Result:

 P(N)   =   Prob(Condorcet Winner Exists among N candidates)   =   N∫-∞

Using this formula we find

P(1)=P(2)=1, P(3)≈91.2260172%, P(4)≈82.4520344%, P(5)≈74.8688265%, P(6)≈68.4763934%, P(7)≈63.0816461%, P(8)≈58.4914955%, P(9)≈54.5473641%

which each agree with computations by other authors by other methods to all decimal places stated. Note that

1-P(3) = [1-P(4)]/2 = [3arcsec(3)-π]/(2π) ≈ 0.087739828045910905

is sometimes called "Gilbaud's number" since G.Th.Gilbaud stated it in a footnote in 1952. Continuing on (using numerical integration which ought to be accurate to all decimal places stated) we find

P(10)≈51.125186%,    P(100)≈8.8136522%,    P(1000)≈1.1480128%,    P(10000)≈0.1365%,    P(105)≈0.01554%,    P(106)≈0.001723%.

It is trivial to see from our formula that P(N)→0 as N→∞. Robert M. May claimed, in a paper rife with errors and lacking in details [Some mathematical remarks on the paradox of voting, Behavioral Science 16 (1971) 143-151] that

P(N) ∼ [8πlnN]1/2/N

in the N→∞ limit. May's formula predicts

P(10)∼76.07%,    P(100)∼10.758%,    P(1000)∼1.3176%,    P(10000)∼0.1521%,    P(105)∼0.01701%,    P(106)∼0.001863%.

This numerical evidence seems at least somewhat compatible with the hypothesis May's formula has relative error of order 1/lnN, so he probably did not get it hugely wrong. Still, we now shall derive our own formula and find May got the right behavior but wrong constant factor.

### Asymptotic behavior of P(N)

b. We now derive our own formula for the asymptotics of P(N) in a much simpler way than May by examining our integral and noting the integrand is everywhere positive and has a "peak" which almost all of the mass lies close to. In the large-N limit this peak is asymptotically Gaussian and the integrand is exponentially small when N is large if we are ≥ε>0 away from the peak. The strategy is to evaluate the location, height, and width of the peak to estimate the value of the integral using the usual formula for integrating a Gaussian peak. Unfortunately the details are rather messy, but still appear to be far simpler than May's (full and unstated) derivation.

Using the fact [Abramowitz & Stegun: Handbook of Mathematical Functions, eq 26.2.12] that

1-F(x) ∼ F'(x)/x · [1 - x-2 + 3x-4 - 15x-6 + 105x-8 -...]

when x→+∞, our integrand is then

(2π)-1/2 [1 - u-1π-1/2exp(-u2/4) (1 - 2u-2 + 223u-4 - 2315u-6 +...) ]N-1 exp(-u2/2) du

for large positive u (which is the only part of the integral that matters; the rest has exponentially small effect). We will not actually need the asymptotic series; merely its first term will suffice. The u which maximizes this integrand – the location of the "peak" – is approximately the same as the u which maximizes

(N-1)u-1π-1/2exp(-u2/4) + u2/2 + ln(2π)/2

(which is essentially just the negated integrand's logarithm, up to relatively neglectible terms) i.e. the u which solves

u = π-1/2(N-1)[2-1+u-2]exp(-u2/4)

i.e. approximately the same (up to relatively neglectible terms) as the u which solves

2 π1/2 u = (N-1) exp(-u2/4)

i.e.

upeak ≈ ApproxPeakLocation(N) = π-1/2(N-1)/2 · exp( -W((N-1)2/(8π))/2 )

where the "Lambert W function" W(x) is defined by exp(W)W=x. [R.M.Corless, G.H.Gonnet, D.E.G.Hare, D.J.Jeffrey, D.E.Knuth: On The Lambert W Function, Advances in Computational Mathematics 5 (1996) 329-359.] This formula for the location of the maximum seems accurate to about 1 part in 300 (or better) when N≥100.

Actual vs Approximate peak locations.
NActual Peak LocnApprox Peak Locn
101.571.4744
1002.99902.9896
10004.11304.1121
100005.05.0315
1055.85.8258
1066.5 6.5339

Those who dislike the Lambert W function and prefer more pedestrian functions can use the fact that W(x)∼ln(x)-lnln(x)+o(1) when x→+∞ to find

upeak   ≈   21/2 [2ln(N-1)-ln(8π)]1/2 + o(1)   ∼   2(lnN)1/2

but these are much less accurate and I do not recommend them.

Via complicated but straightforward algebraic manipulations fortunately made easy using the MAPLE symbolic algebra system, one may now show that the width of the peak (that is, [-1/D]1/2 where D is the second derivative of the logarithm of the integrand evaluated at the peak location; this for a Gaussian is just its "standard deviation") is asymptotic to [2lnN]-1/2 for large N. And the height of the peak (that is, the maximum value of the integrand) is asymptotic to 8e-2(2π)1/2N-2lnN.

The usual formula for the integral of a Gaussian with height H and width W is ∫Hexp(-u2/(2W2))du=(2π)1/2HW. Upon using this [and remembering that the whole integral needs to be multiplied by N to get P(N)], we find as our asymptotic estimate

P(N)   ∼   21/28πe-2 (lnN)1/2/N

This agrees with May's estimate except for the constant factor. May's constant factor is (8π)1/2≈5.01326 whereas ours is 21/28πe-2≈4.81023.

This formula predicts

P(10)∼72.99%,    P(100)∼10.323%,    P(1000)∼1.2643%,    P(10000)∼0.1460%,    P(105)∼0.01632%,    P(106)∼0.001788%.

In every case our formula is closer to the truth than May's formula. Anybody sufficiently determined should be able (using our same "saddlepoint method" to analyse our integral) to compute a series of successively-smaller-order correction terms to get even more asymptotic accuracy, see e.g. [C.M.Bender & S.A.Orszag: Advanced Mathematical Methods for Scientists and Engineers, McGraw-Hill 1978, reprinted Springer 1999].

One more-accurate formula is simply to use our highly-accurate formula (above, involving the Lambert W function) for ApproxPeakLocation(N); i.e. we define

Hgt(A) = F(2-1/2A)N-1F'(A)

and

Wid(A) = [-(d2/dA2) ln Hgt(A)]-1/2 =
= [(N-1)F'(A)2F'(A/√2)2 - (N-1)F'(A)2F''(A/√2)F(A/√2) - 2F'(A)F'''(A)F(A/√2)2 + 2F''(A)2F(A/√2)2]-1/2   21/2 F'(A/√2)F(A)

and then

P(N)   ≈   (2π)1/2 N Hgt(ApproxPeakLocation(N)) Wid(ApproxPeakLocation(N)).

This more-accurate formula predicts

P(1)∼100%,    P(2)∼97.6613%,    P(3)∼87.3804%,    P(4)∼78.4376%,    P(5)∼71.1734%,    P(6)∼65.1984%,    P(7)∼60.1948%,
P(10)∼49.086%,    P(100)∼8.610%,    P(1000)∼1.1195%,    P(10000)∼0.1327%,    P(105)∼0.01507%,    P(106)∼0.001669%.

This indeed is more accurate than our preceding formula (which in turn was more accurate than May) in every case. Since it is accurate to ≤5% relative |error| for all the tabulated N, presumably it has ≤5% relative |error| for every N≥1.

c. This theorem (that the probability that a Condorcet Winner exists is less than N-0.99 for all sufficiently large N in the Dirichlet model) is proved as "theorem 2" in Anthony Quas: Anomalous outcomes in preferential voting (pdf), Stochastics & Dynamics 4,1 (2004) 95-105.