Nate Silver discusses OSCAR voting system; was Toy Story 3 robbed in 2011?

The following by Warren D. Smith is based on, and includes excerpts from, Nate Silver's New York Times 24 January 2011 blog on the voting methods used to award OSCAR film academy awards.

Silver's piece was pointed out by professional Instant Runoff Voting (IRV) propagandist Rob Richie in a published Huffington Post piece 2013-01-11, who described Silver's analysis as "fascinating." But as usual, Richie completely forgot to mention that Silver's analysis exposed considerable problems with the OSCAR awards' IRV system, and indeed IRV appears to have robbed Toy Story 3 of what should have been a historic victory as the first ever best picture award for an animated film. We'll give the details and explain 7 problems with IRV as they arise one by one during Silver's analysis.

Silver (although he does not realize he is doing so) compares the current OSCAR final-round voting system, based on instant runoff, versus the score voting system used by and's "tomato meter." These and other things like the Internet Movie Database and Yahoo movies work by averaging ratings from large numbers of ordinary people, and/or professional or amateur film critics.

Silver constructs a slightly artificial scenario with these 10 films as contenders: The Social Network, Black Swan, The Fighter, Inception, The King's Speech, The Kids Are All Right, True Grit, Toy Story 3, Blue Valentine, and Winter's Bone.

Silver actually predicted those 90% right – the actual nominees in 2011 were the same except that Blue Valentine was replaced by the survival story 127 Hours.

IRV Problem #1: Silver then observes that not all the critics reviewed all the movies. ("You think the Academy's voters have seen all of them either?" he asks.) This problem is trivial for score voting to handle; MetaCritic simply averages the scores it gets. But as Silver observes, it is not so easily handled by other voting systems like instant runoff (IRV), the system the OSCARs currently (2012; probably foolishly) uses. Silver handles that by "restricting the analysis to those critics who reviewed at least half of them – this leaves us with a 40-person panel – assigning a lukewarm score of 65 (on 0-100 scale) to any movies that the critic bypassed."

It is rather sad that we need to begin by throwing out and/or faking a lot of our data-set (vote set). I daresay Silver as a statistics professional absolutely hates to discard data when he does not really have to. But that is an example of the price you pay for having a stupid voting system.

IRV Problem #2: Silver then observes that there is "yet another problem: it was quite common for one or more of the critic's choices to have gotten the same score." Again, that is no problem at all for score voting. But, again, for stupid voting systems like instant runoff, this is illegal and hence a major problem. Silver "solves" this problem by "drawing lots" (i.e. random tie-breaking imposed by Silver). Again, it is sad to alter the vote-set, i.e. data-set, but he does so because he is forced to by the stupid voting system.

So now, Silver has 40 "voters" and for each has artificially constructed a 10-film preference order ballot from their movie reviews plus (unfortunately) both random tie-breaking and fake-score-insertion, altering and editing those votes to make them legal for the stupid OSCAR instant runoff process. (These changes were considerable. Silver gives as an example J.R.Jones' ballot which involved 5 artificial random tie breaks plus 2 fake scores among the 10 films!)

Silver now runs that instant runoff process. As a result, the films get eliminated one by one in this order:

  1. True Grit. IRV Problem #3: Silver comments that even though TG was the first to be eliminated by the instant runoff process, it was "rated considerably better on average than movies like Inception and Black Swan." This would seem to be unfair, and as Silver points out, the source of the problem is that IRV's elimination decision is based solely on how many critics rated TG (or Inception, or whatever) top, utterly ignoring and discarding the fact that lots of voters rated TG above Inception (or not) with neither top. A major problem with IRV is the fact that it ignores a large part of (in fact in theoretical models, asymptotically 100% of!) the information expressed by voters on their ballots. Silver indeed notes that a film that was everybody's second choice would be eliminated instantly by IRV, even though it is probably the best film, and this was very [nearly] "the situation that True Grit found itself in."
  2. The Fighter
  3. Inception
  4. Blue Valentine
  5. IRV Problem #4: At this point, Silver complains, we have an exact 2-way tie between Black Swan and Kids Are All Right for which to eliminate next. With score voting such ties are much less of a problem since they arise much less frequently (especially if using ratings on 0-100 scale and full vote set without a lot of voters discarded). With IRV, ties are quite common because there are many rounds, each of which could be tied, and the votes each are 2-valued for each round, not 101-valued. That is why IRV is one of the worst voting systems from the point of view of tie risk. Silver solves the problem in this case by breaking the tie using average score, i.e. score voting, thus implicitly endorsing it (even though Silver actually never mentions the explicit words "score voting" is his article). Thus, eliminated is Black Swan.
  6. The Kids Are All Right
  7. Winter's Bone
  8. At this point, the three remaining contenders are Social Network (17 voters), The King's Speech (12), and Toy Story 3 (11=fewest, hence eliminated).
  9. Finally Social Network wins with 23 versus King's Speech (17).

IRV Problem #5: Complexity. Note how complicated the above process was, as compared to simply picking greatest average score as the winner as in score voting.

Silver notes that this winner in his pseudo-election was "no surprise" because Social Network had in fact just won the Critic's choice award and the highest metacritic score using score voting. Silver therefore prognosticated that Social Network would win the 2011 OSCAR best picture award.

He was wrong: the official winner was The King's Speech. Comparing with score voting (now using rating data from Feb. 2013 to gain extra benefits from hindsight):

FilmIMDB usersMetacritic TomatoMeterAvg tomato critic/userYahoo movies
King's Speech8.288948.6, 4.3 4.5 stars
Social Network7.995969.0, 4.2 4.0 stars
Toy Story 38.592998.8, 4.3 4.5 stars
Winter's Bone7.390948.3, 3.7 4.0 stars
The Kids are All Right7.986937.8, 3.6 3.5 stars
Notes on the table: Typical IMDb ratings based on 260,000 raters each; typical tomatos based on 170,000; typical yahoos based on 21,000. These counts suggest that statistical "noise" in an IMDb rating (as a percentage of that rating) should be about ±0.3% or less, which is below the roundoff error. Thus the IMDb rating of King's Speech should really be "8.2±0.02" which is below the error ±0.05 due to IMDb rounding it off to one decimal place. "TomatoMeter" combines the scores of "approved critics." "Avg tomato critic/user" gives two numbers: average critic score then average user score. ( also provides user-ratings but presently typically only based on 1000 raters hence we ignore them.)

So although King's Speech was a defensible choice as 2011 OSCAR winner versus Social Network (although SN perhaps was a slightly better choice) there is a major problem:

IRV Problem #6 – it looks like Toy Story 3 was the best choice (based on the data above)!

Assuming that Toy Story 3 really was the best choice, why was it robbed by the IRV voting system (which made it finish only third in Silver's simulation)? One possible explanation would be that the vote among the final three could have been something like this:

#voters their vote
11 Toy > King > Social
12 King > Toy > Social
17 Social > Toy > King

IRV Problem #7: In this artificial scenario (which perhaps was basically what happened – we do not know because the Academy keeps its votes secret), the IRV system eliminates Toy whereupon King wins by 23=11+12 versus 17 for Social. (As in fact happened, i.e. King really did win, although in Silver's incorrect forecast the final King vs Social votes were reversed, it was 23-17 the other way.) But note, in this scenario, that Toy would overwhelmingly win a head-to-head vote versus King by 28-12, and would also have won a head-to-head vote versus Social by 23-17. (It might be more realistic to alter the votes in this scenario a bit, e.g. change "12 and 17" to "14 and 15" and make the 11 Toy-top voters split their second choices among King and Social, but neither would alter anything important.) So if this were what actually happened, then we would have to conclude that the Academy's poor IRV voting system robbed Toy Story 3 of its deserved victory.

That would have been historic as the first time ever that an animated film won best picture. And based on the amalgamated opinion of hundreds of thousands of raters plus about 100 critics, it appears to have deserved it!   Note in the data table, Toy Story 3 equalled or beat King's Speech according to every single rating method for every single rater-group, plus also was the top-grossing picture of 2010.

Whodunit? It is now fairly clear Toy Story 3 was robbed. But what to blame this on is not clear (especially since the Academy keeps its votes secret). The obvious possible culprits are

  1. Their IRV voting system;
  2. Statistical noise due to the fact the Academy has 5783 voting members as of 2012, which is much smaller than the number of raters (of order 1 million) who contributed to the numerical movie-quality ratings tabulated above. This would be expected to alter each side's total in a 50-50 two-choice vote by ±38 votes worth of "noise" typically, i.e. ±1.3% as a percentage of the vote being altered. This is large enough, and Toy Story 3's superiority versus King's Speech small enough, to make it conceivable that this was the culprit.
  3. Sample bias because the Academy's members simply are a different kind of people than "IMDb raters" or "professional film critics." E.g. are they, for some reason, more biased against animated films than the general public or film critics? The only previous nominations of an animated film for best picture were Beauty and the Beast in 1991 (but it then lost to the non-animated Silence of the Lambs in a decision wholy supported by IMDb and tomato raters and critics) and Up in 2009 (which lost to the non-animated Hurt Locker in a decision that metacritic and tomato raters supported but tomato critics and IMDb raters both dispute). So I see no convincing evidence the Academy is biased against animated films (at least, once they've achieved nomination).

Return to main page