The ISU (International Skater's Union) used to use a scoring system called "BOM" for allocating gold, silver, and bronze medals. This resulted in a highly embarrassing (for them) problem in the 1995 World Figure Skating Championships: After they had finished all their skating, and all the judges had published their scores for all their skates, the USA's Nicole Bobek was in second place and France's Surya Bonaly in third. But then the 14-year-old phenom Michelle Kwan skated surprisingly well (but not well enough to get a medal). As a result of Kwan's skate, Bobek and Bonaly switched places!
This crazy voting-system pathology became known as "the great flip-flop." Bonaly got silver, Bobek got bronze, and voting-theorist Kenneth Arrow got a Nobel.
After another amazing last-minute medal flip-flop happened at the 1997 European men's championships, the ISU had had enough. Its head Ottavio Cinquanta rolled out a new scoring system called "OBO" in 1998, promising "if you are in front of me, then you will stay in front of me!" But that claim was false. And anybody familiar with Arrow's impossibility theorem could have told him instantly that it had to be false.
But the deeper and more important question is how false? E.g. perhaps OBO yields fewer flip-flops although not zero.
A lot of light was cast on that in 1998 by PhD computer programmer and skating expert Sandra J. Loosemore. (Hackers know Loosemore as the author of the Gnu C Library Reference Manual, while skaters know her as the founder of "SkateWeb" and webmaster of the World Skating Federation.)
She ran the scores from the 1998 World Junior, European and World Championships, plus the 1998 Olympics, through a computer program she'd written to do OBO scoring, finding numerous flip-flop examples. Under OBO they would have occurred in
Of course, the ISU then ignored all Loosemore's examples, and adopted OBO anyway.
Since the embarrassment then continued, in 2003 Cinquanta decided on a new "fix" of the scoring problems – the numerical scores of skaters by judges were to be secret so that nobody would know how any judge had scored any skater. That inspired a fan protest at the 2003 Worlds in which Cinquanta was personally jeered by the audience.
Then in 2004 the ISU adopted an even-crazier system in which some judges were randomly chosen to be ignored! As a result, the final placing could and often did depend on the random number generator. In the 2006 worlds, according to an analysis by statistics professor John W. Emerson in Chance 20,2 (2007), Petrova and Tikhonov got the bronze because of the 3 randomly-chosen eliminated judges – with the full 12-judge panel, they would have got silver. In the European Women's Championships Short Program, every one of the top 5 skaters' final rankings, except for the Gold medalist Irene Slutskaya, was determined by chance, i.e. would have changed with different randomness to any desired value.
I personally would recommend the simple trimmed mean voting system for use in judging skating (diving, gymnastics) events. It is far simpler than either BOM and OBO, it uses the judges' numerical scores instead of ignoring them, and it truly abolishes "flip flops."
How do these "flip flops" occur, exactly? We now present an example found by Loosemore inside the skating-score data from the men's short program at the 1998 Olympics. In the actual competition, these were 11th through 16th placing skaters in the field of 29. To save space we shall only list these 6 skaters and their scores and pairwise tables (rather than the whole 29, which would need an enormous 29×29 table).
Here are their marks, listed in their skate (chronological) order:
4.7 | 5.2 | 5.1 | 4.7 | 4.9 | 4.9 | 4.9 | 4.7 | 5.1 |
5.0 | 5.4 | 5.4 | 5.0 | 5.2 | 5.3 | 5.2 | 5.2 | 5.2 |
4.6 | 5.1 | 4.9 | 4.8 | 4.8 | 5.0 | 4.9 | 4.6 | 5.0 |
5.0 | 5.6 | 5.5 | 5.3 | 5.3 | 5.4 | 5.4 | 5.3 | 5.3 |
5.0 | 5.0 | 5.2 | 5.1 | 5.1 | 5.2 | 5.1 | 5.2 | 5.2 |
4.8 | 4.9 | 5.1 | 5.0 | 5.1 | 5.1 | 5.1 | 4.9 | 5.0 |
4.9 | 5.0 | 4.8 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.1 |
5.2 | 5.2 | 5.3 | 5.0 | 5.2 | 5.0 | 5.1 | 4.8 | 5.3 |
4.6 | 4.8 | 4.9 | 4.5 | 4.9 | 4.9 | 4.7 | 4.5 | 4.8 |
5.6 | 5.4 | 5.7 | 5.1 | 5.5 | 5.6 | 5.3 | 5.2 | 5.4 |
4.7 | 5.0 | 4.7 | 4.8 | 4.7 | 4.7 | 4.4 | 4.7 | 4.7 |
5.5 | 5.8 | 5.5 | 5.5 | 5.3 | 5.6 | 5.3 | 5.1 | 5.5 |
The BOM scoring system is as follows.
Here are the BOM rankings computed via the above procedure after each skate in our example. Each row in the results table shows the ordinals from the 9 judges, followed by the majority size and place as well as (in parentheses) the total-ordinal tie-breaker when needed. For example, in the first row, Shmerkin has a 5-judge majority ranking him #1 (out of two skaters so far). After three skates Vidrai, Shmerkin, and Weiss would be tied except that the total-ordinal tiebreaker step ranks Vidrai first. We list the skaters in rank order after each new skate, e.g. after three skaters Vidrai is in first place, Weiss in second, and Shmerkin is third.
(Scores ranged from 0 to 6 in theory, but from 4.4 to 5.8 in practice for these skaters.)1. Michael Shmerkin | 1 | 2 | 1 | 2 | 1 | 2 | 2 | 1 | 1 | -- 5/1 |
2. Michael Weiss | 2 | 1 | 2 | 1 | 2 | 1 | 1 | 2 | 2 | -- 9/2 |
1. Szabolcs Vidrai | 1 | 3 | 3 | 1 | 1 | 2 | 2 | 1 | 3 | -- 6/2 (8) |
2. Michael Weiss | 3 | 1 | 2 | 2 | 3 | 1 | 1 | 3 | 2 | -- 6/2 (9) |
3. Michael Shmerkin | 2 | 2 | 1 | 3 | 2 | 3 | 3 | 2 | 1 | -- 6/2 (10) |
1. Szabolcs Vidrai | 2 | 4 | 3 | 1 | 1 | 2 | 2 | 1 | 4 | -- 6/2 |
2. Michael Weiss | 4 | 1 | 2 | 2 | 4 | 1 | 1 | 3 | 3 | -- 5/2 |
3. Michael Shmerkin | 3 | 2 | 1 | 4 | 3 | 3 | 4 | 2 | 2 | -- 7/3 |
4. Yamato Tamura | 1 | 3 | 4 | 3 | 2 | 4 | 3 | 4 | 1 | -- 6/3 |
1. Michael Weiss | 5 | 1 | 3 | 2 | 5 | 2 | 1 | 3 | 3 | -- 7/3 |
2. Szabolcs Vidrai | 3 | 5 | 4 | 1 | 2 | 3 | 2 | 1 | 4 | -- 6/3 (12) |
3. Yamato Tamura | 2 | 3 | 5 | 3 | 3 | 5 | 3 | 4 | 1 | -- 6/3 (15) |
4. Michael Shmerkin | 4 | 2 | 2 | 4 | 4 | 4 | 4 | 2 | 2 | -- 9/4 |
5. Igor Pashkevitch | 1 | 4 | 1 | 5 | 1 | 1 | 5 | 5 | 5 | -- 5/4 |
1. Michael Weiss | 6 | 2 | 3 | 3 | 5 | 2 | 1 | 3 | 3 | -- 7/3 |
2. Szabolcs Vidrai | 4 | 6 | 4 | 2 | 2 | 3 | 2 | 1 | 4 | -- 5/3 |
3. Yamato Tamura | 3 | 4 | 6 | 4 | 3 | 6 | 3 | 4 | 1 | -- 7/4 |
4. Michael Shmerkin | 5 | 3 | 2 | 5 | 4 | 5 | 4 | 2 | 2 | -- 6/4 |
5. Igor Pashkevitch | 2 | 5 | 1 | 6 | 1 | 1 | 5 | 6 | 5 | -- 7/5 |
6. Viacheslav Zagorodniuk | 1 | 1 | 5 | 1 | 6 | 4 | 6 | 5 | 6 | -- 6/5 |
Notice how, although Shmerkin was initially ahead of Weiss and later Tamura, that reversed as more competitors were marked. This is exactly the kind of embarrassing BOM "flip-flop" that the new OBO scoring system supposedly was designed to eliminate.
Under the new OBO scoring system, the scoring is computed using a square table the size of the number of competitors – in this case a 6×6 table since there are 6 skaters. Each competitor is compared pairwise (or "one-by-one," hence "OBO") to all the others to fill this table.
In each comparison we note how many "votes" each skater received from the judges, counting it as a (pairwise) "win" for the skater who got the most votes. Then, looking across each skater's row on the scoring sheet, we count the total number of "wins" and use it to rank the skaters overall. Then the total number of "votes" (this is called "JiF" for "judges in favor") is used as a tie-breaker.
Note for voting theorists: using as a skater's score, his total number of pairwise wins, is called the "Copeland" system in the voting literature; and the "JiF" score is called "Borda score" in the voting literature. OBO is thus Copeland with Borda used as a tiebreaking method. Plain Copeland is generally regarded as unusable because it yields too many ties.
OBO does not change in any way the rules the judges use to assign marks to a skater's performance; it only affects how the marks are combined to produce the overall competition result. So, it is possible to use the same competition data (same marks, same skate order) to illustrate what the results would have been under the old BOM or new OBO (or any other rank-order-based) system.
Again to save space we only state the final 6-by-6 table (and all of the separate worksheets for comparing the marks from each judge for each pair of skaters in the table have been omitted). The entries in this table show the ratio of "wins" to the number of "votes" in each pairwise comparison. So for example, Smerkin got one "win" over Weiss, based on the fact 5 of the 9 judges said Shmerkin outperformed Weiss.Skater | Shmerkin | Weiss | Vidrai | Tamura | Paskevitch | Zagorodniuk |
Shmerkin | -- | 1/5 | 0/3 | 0/4 | 1/5 | 1/5 |
Weiss | 0/4 | -- | 1/5 | 1/6 | 1/5 | 1/6 |
Vidrai | 1/6 | 0/4 | -- | 1/6 | 0/4 | 1/6 |
Tamura | 1/5 | 0/3 | 0/3 | -- | 1/5 | 0/4 |
Pashkevitch | 0/4 | 0/4 | 1/5 | 0/4 | -- | 1/5 |
Zagorodniuk | 0/4 | 0/3 | 0/3 | 1/5 | 0/4 | -- |
This, of course, only represents a small subset of the competitors involved in the actual event. Scoring the complete 29-skater competition via OBO would require a 29×29 table with 406=29·28/2 total pairings – each of which requires an additional worksheet to compare marks from the 9 judges, for 407 pieces of paper in all.
Loosemore (who prefers BOM over OBO) says the old BOM system would require only a single 29×9 worksheet – less cumbersome.
Here are the skate-by-skate updates of the standings under OBO:
Skater | Wins | Votes |
1. Shmerkin | 1 | 5 |
2. Weiss | 0 | 4 |
Skater | Wins | Votes |
1. Vidrai | 1 | 10 |
2. Weiss | 1 | 9 |
3. Shmerkin | 1 | 8 |
Skater | Wins | Votes |
1. Vidrai | 2 | 16 |
2. Weiss | 2 | 15 |
3. Shmerkin | 1 | 12 |
4. Tamura | 1 | 11 |
Skater | Wins | Votes |
1. Weiss | 3 | 20 |
2. Vidrai | 2 | 20 |
3. Shmerkin | 2 | 17 |
4. Tamura | 2 | 16 |
5. Pashkevitch | 1 | 17 |
Skater | Wins | Votes |
1. Weiss | 4 | 26 |
2. Vidrai | 3 | 26 |
3. Shmerkin | 3 | 22 |
4. Pashkevitch | 2 | 22 |
5. Tamura | 2 | 20 |
6. Zagorodniuk | 1 | 19 |
What happened to cause a flip-flop in the standings? Not only was there the same situation as under the old system – with Shmerkin dropping behind Weiss and Vidrai after initially holding the lead – but now OBO generates an additional flip-flop when Zagorodniuk's marks caused Pashkevitch and Tamura to swap places!
Looking closely at the table of pairwise comparisons, the problem is that the "who-beats-who" relationships between skaters form complicated Condorcet cycles. There is no possible way to rank this group of six skaters without going against at least one of the "pairwise wins." And going against any "win" will (if the skate order is improvident) cause the relative placements of the two skaters concerned to flip-flop.
Indeed, a little thought shows that OBO flip-flops are only possible in the presence of a Condorcet cycle: because if we had a strict linear order of all skaters, then all the Copeland scores would be distinct, no "tiebreaking" would be needed, and the order would stay fixed as more skaters entered the picture. From this we conclude that at least 22 Condorcet cycles were present in the 48 year-1998 skating-events Loosemore examined.
Such flip-flops are not unique to the old BOM and new OBO systems. Mathematicians have proven that any scoring system based on rank-order votes will produce similar "flip-flops."
You also can see this other proof of a different theorem, showing as an immediate consequence that the OBO system will exhibit "no show paradoxes" in which a judge, by honestly ordering the skaters, will cause a worse gold-medalist (in her view) to be chosen than if that judge had refused to score anybody. Indeed an honest judge's ranking of X worst can be the direct cause of X winning gold, and a honest judge's ranking with X best can be the direct cause of X failing to win gold.
Most voting theorists have been unaware that any "Condoret voting system" had ever been used in an important real-life application. Recently Debian Linux to elect its leader, and Wikipedia in some adminstrative votes, have adopted Schulze-beatpaths Condorcet voting. But the OBO voting system had already been used to judge at least 100 top-notch skating events and it is Condorcet (i.e. a Condorcet winner, whenever one exists, will always win gold).
So we have, in these skating events, a treasure-trove of real data about Condorcet elections, presently in quantity far exceeding all other known Condorcet elections.
Meanwhile BOM is, to a good approximation, the same thing as median-rank voting with a Borda-like tiebreaker added, which is also of interest since median-score-based range voting (which is not the same thing, but is related) was proposed by Balinski & Laraki, then attacked compared to ordinary mean-based range voting by Smith et al, all in complete ignorance of the fact that median-rank voting already been in wide use by skaters and then abandoned as a failure.
How well did BOM and OBO perform? In my view, OBO has been an embarrassing failure for the skating world. (And BOM also failed, not only in my view, but in the view of the International Skating Union.) BOM and OBO both are far too complicated and both yielded and continue to yield huge numbers of embarrassing "flip-flops," each of which is a public relations disaster. (As I said, I personally would recommend the simple trimmed mean voting system instead, which is similar to range voting except based on trimmed mean rather than mean.)
But it would be better if more skating data were acquired and analysed to estimate the real-world frequencies of "no-show paradoxes," "Condorcet cycles," and other pathologies. That remains to be done.