Answer to puzzle #117 – Bayes-Laplace-Dirichlet law & "soft quorum"
Puzzle
Suppose there is some binary quantity B (i.e. yes=1 or no=0, for example "do you think there
should be a $5/gallon gasoline tax?").
You ask a random sample of S≥0 people for their values of B, and the result is that
Y say "yes"
while N say "no"
where Y+N=S.
-
Given this data: what is the Bayesian estimate of the probability P
that a random person says "yes"?
-
And what is the variance in this estimate?
-
How can a similar formula be used to make range voting have a "soft quorum"?
Answer a
[Th.Bayes (1702-1761), P.S.Laplace (1749-1827) & J.P.G.L.Dirichlet (1805-1859)]:
If we assume
P has a "prior" distribution uniform on the real interval [0,1],
then the Bayesian estimate of the expectation value of P
(conditioned on the Y yesses, N noes data) is
Expectuniform prior(P | data)
=
∫0<u<1 u · uY · (1-u)N du
/
∫0<u<1 uY · (1-u)N du
Since both the integrals are
Euler Beta functions
they can be done immediately via Euler's formula
∫0<u<1 uA-1 · (1-u)B-1 du
=
Γ(A) Γ(B) / Γ(A+B).
Bayes recognized the answer was the above ratio of integrals but I doubt
he was aware of Euler's formula [due to Leonhard Euler (1707-1783)]
in which case he was not
able to actually
do the integrals. But Dirichlet was aware of it and thus reached the
final result, which (after algebraic simplification) is
Expectuniform prior(P | data)
=
(Y+1)/(Y+N+2)
=
(Y+1)/(S+2).
Note that this is not quite the same as the naive formula P=Y/S,
although it becomes the same in the limit where S is very large.
One indication that the Bayes-Laplace-Dirichlet formula is superior to the naive one is
how it handles the no-data case Y=N=S=0.
You may also enjoy the case Y=S=1, N=0 where Laplace says Expect(P)=2/3
as opposed to the naive 1.
Obviously, it does not make sense, given a single datapoint "yes," to conclude
that every other human being is also going to answer yes so that our best estimate of humanity's
response is "1.000." We feel from our prior knowledge about human behavior that some people will
probably say "no." Getting a single "yes" datapoint is not enough to cause us to throw all our prior
knowledge about human behavior into the garbage. The Bayes-Laplace-Dirichlet formula is
a way to smoothly, and in a principled way, reduce the relative amount of prior knowledge we incorporate
into our estimate, as more real data becomes available.
Incidentally, if instead of using a uniform prior, we had employed a
Beta(α,β) distribution
[which has mean α/(α+β)]
as our prior, then we would get this more general formula:
ExpectBeta(α,β) prior(P | data)
=
(Y+α)/(Y+N+α+β)
and the old formula merely arises as the special case α=β=1.
Call this the "generalized"
Bayes-Laplace-Dirichlet formula.
Note
that the generalized Bayes-Laplace-Dirichlet formula is the same as the naive P=Y/S
formula except that an extra α "yesses" and β "noes"
are artificially adjoined to the set of S real votes.
Answer b: the variance
We can similarly compute the variance (and the standard deviation is its square-root):
VarianceBeta(α,β) prior(P | data)
=
[(Y+α+1) (N+β+1) - 1]
/
[Y + N + α + β]2
In the large-S limit this becomes just Variance→YN/S2.
c: Application to "quorum" for range voting (this was explained to us by
Andy McKenzie on 27 June 2008):
The Internet Movie Database (IMDb)
uses a formula of generalized-BLD form to handle range voting for rating movies.
(Specifically, their formula reduces in the "approval voting" case to
the Bayes-Laplace-Dirichlet law but with a constant number of artificial yes and no votes
introduced before any real votes are solicited.)
The IMDb formula is this:
Candidate's "Output Rating"
=
(RV + CQ)/(V + Q)
where
R = The average (mean) score of this candidate as rated by the V voters
V = Total number of voters
Q = Constant "quorum" number of voters
C = Some constant score somewhere in the score-range
(IMDb uses the mean score of all IMDb movies,
currently 6.7 on an 0-10 scale.)
Then (if we were running a range voting election using the IMDb system)
the candidate with the greatest output
rating would win.
This is just like ordinary range voting, except that
an extra Q "artificial votes"
(all with artificial-vote mean-rating C)
are inserted for each candidate before the V real voters speak.
Special cases:
-
If Q=2 and C=midrange then this
is just our original Bayes-Laplace-Dirichlet uniform-prior formula.
-
If C=0 and Q→∞ then this reduces to sum-based (not average-based) range voting:
candidate with highest summed-score wins.
-
If Q=0 then this reduces to average-based range voting.
-
So with finite positive C and Q the IMDb scheme is a compromise between average- and sum-based
range voting.
In the special case C=0 (which, if the allowed-score-range is from 0 to some positive value,
maximally disfavors candidates that few voters rate)
the formula would simplify to
Candidate's "Output Rating"
=
RV/(V + Q).
Then a good choice (for elections purposes) for Q might be one-fourth
of the maximum number of voters who genuinely-rate any candidate?
Advantages of simplified
B.L.D. formula for use with range voting for "quorum" purposes
-
The formula is simply explained as follows:
"use ordinary range voting – highest average rating wins – except we give
Q artificial 'zero' ratings to each candidate before the real voting begins."
Also, even if C isn't zero, you can still explain it as "artificially adding Q
ballots which rate all candidates at C."
-
If there are few votes, the B.L.D. formula tries to use the data most effectively to
deduce the best statistical estimate of the "true" mean score for a candidate.
-
The B.L.D. formula can also be used to "downgrade" little-known
candidates who got rated by few voters.
This prevents the nightmare scenario where Hitler gets elected just by himself and a few friends,
while 99.99% of the voters do not rate him since they never heard of him.
The idea of "quorums"
is you need to be rated by at least Q voters to win.
Lesser-known candidates could theoretically benefit from a bias that the few people who have heard of them
tend to favor them abnormally much – although in practice they usually suffer much more
from the bias that a substantial fraction of the people
who have never heard of them automatically give them 0s rather than
NO OPINION scores
as a "safety measure."
The appropriate value of Q
to remove the former type of bias, is Q≈the typical number of
fanatical supporters than anybody running for that kind of seat can secretly muster.
The appropriate value of C to reduce the latter type of bias is C≈the average rating of
all candidates.
-
Depending which parameters are inserted into the formula it
can accomplish either or both purposes 2&3.
-
Our formula does not exhibit a sudden "hard brick wall" cutoff in which those
rated by fewer voters than the quorum can never win.
[Incidentally, honeybees employ a hard-quorum
type range-voting scheme.]
Such sudden
cutoffs could be tempting targets for those trying to "game the system"
or those trying to criticize the voting system. Instead with B.L.D. the downgrading is
"continuous" and the quorum is "soft." A candidate rated by few voters might
still be able to win if
his opponents have low-enough ratings.
-
If all candidates are rated by the same number of voters,
then our formula becomes equivalent to ordinary range voting –
candidate with greatest average rating wins.
On the other hand, a disadvantage of the IMDb scheme that it adds 2 new 'dials' (C & Q)
that can be turned.
There would be an incentive for widely known candidates to argue for
making Q as large as possible (or that C should be zero) in order to
hurt candidates who aren't as widely known.
Ivan Ryan and Andy McKenzie helped W.D.Smith to create this page.
Return to puzzles
Return to main page