The BOM and OBO skating scoring systems, embarrassing "flip-flops," and the heretofore largely unknown greatest available dataset of important real-world Condorcet elections

The ISU (International Skater's Union) used to use a scoring system called "BOM" for allocating gold, silver, and bronze medals. This resulted in a highly embarrassing (for them) problem in the 1995 World Figure Skating Championships: After they had finished all their skating, and all the judges had published their scores for all their skates, the USA's Nicole Bobek was in second place and France's Surya Bonaly in third. But then the 14-year-old phenom Michelle Kwan skated surprisingly well (but not well enough to get a medal). As a result of Kwan's skate, Bobek and Bonaly switched places!

This crazy voting-system pathology became known as "the great flip-flop." Bonaly got silver, Bobek got bronze, and voting-theorist Kenneth Arrow got a Nobel.

After another amazing last-minute medal flip-flop happened at the 1997 European men's championships, the ISU had had enough. Its head Ottavio Cinquanta rolled out a new scoring system called "OBO" in 1998, promising "if you are in front of me, then you will stay in front of me!" But that claim was false. And anybody familiar with Arrow's impossibility theorem could have told him instantly that it had to be false.

But the deeper and more important question is how false? E.g. perhaps OBO yields fewer flip-flops although not zero.

Sandra Loosemore

A lot of light was cast on that in 1998 by PhD computer programmer and skating expert Sandra J. Loosemore. (Hackers know Loosemore as the author of the Gnu C Library Reference Manual, while skaters know her as the founder of "SkateWeb" and webmaster of the World Skating Federation.)

She ran the scores from the 1998 World Junior, European and World Championships, plus the 1998 Olympics, through a computer program she'd written to do OBO scoring, finding numerous flip-flop examples. Under OBO they would have occurred in

In all that was 22 flip-exhibiting competition segments (involving 28 "flips") out of 48 total segments!

Of course, the ISU then ignored all Loosemore's examples, and adopted OBO anyway.

Since the embarrassment then continued, in 2003 Cinquanta decided on a new "fix" of the scoring problems – the numerical scores of skaters by judges were to be secret so that nobody would know how any judge had scored any skater. That inspired a fan protest at the 2003 Worlds in which Cinquanta was personally jeered by the audience.

Then in 2004 the ISU adopted an even-crazier system in which some judges were randomly chosen to be ignored! As a result, the final placing could and often did depend on the random number generator. In the 2006 worlds, according to an analysis by statistics professor John W. Emerson in Chance 20,2 (2007), Petrova and Tikhonov got the bronze because of the 3 randomly-chosen eliminated judges – with the full 12-judge panel, they would have got silver. In the European Women's Championships Short Program, every one of the top 5 skaters' final rankings, except for the Gold medalist Irene Slutskaya, was determined by chance, i.e. would have changed with different randomness to any desired value.

I personally would recommend the simple trimmed mean voting system for use in judging skating (diving, gymnastics) events. It is far simpler than either BOM and OBO, it uses the judges' numerical scores instead of ignoring them, and it truly abolishes "flip flops."

What are BOM and OBO, and how do "flip-flops" happen exactly?

How do these "flip flops" occur, exactly? We now present an example found by Loosemore inside the skating-score data from the men's short program at the 1998 Olympics. In the actual competition, these were 11th through 16th placing skaters in the field of 29. To save space we shall only list these 6 skaters and their scores and pairwise tables (rather than the whole 29, which would need an enormous 29×29 table).

Here are their marks, listed in their skate (chronological) order:


Michael Shmerkin
4.75.25.14.74.94.94.94.75.1
5.05.45.45.05.25.35.25.25.2

Michael Weiss
4.65.14.94.84.85.04.94.65.0
5.05.65.55.35.35.45.45.35.3

Szabolcs Vidrai
5.05.05.25.15.15.25.15.25.2
4.84.95.15.05.15.15.14.95.0

Yamato Tamura
4.95.04.85.05.05.05.05.05.1
5.25.25.35.05.25.05.14.85.3

Igor Pashkevitch
4.64.84.94.54.94.94.74.54.8
5.65.45.75.15.55.65.35.25.4

Viacheslav Zagorodniuk
4.75.04.74.84.74.74.44.74.7
5.55.85.55.55.35.65.35.15.5

The BOM scoring system is as follows.

  1. Based on that judge's numerical scores, number the skaters 1,2,3... from best to worst. Note that from then onward the old BOM (as well as the new OBO) system both ignore the actual numerical scores (such as "5.9") got by each skater, and only consider their ordinal rankings – e.g. if a judge scores Shmerkin 5.9 and Weiss 2.1, that counts the same as a judge who scores Shmerkin 2.5 and Weiss 2.4.
  2. Find the best ranking X for which the skater has a majority of judges ranking him X or better.
  3. Each skater's X-values are used to produce the overall placements of the skaters. This usually results in a large number of ties.
  4. To break the ties, we use the majority size for that skater (e.g. a 7-judge majority saying CLINTON is rank 2-or-better, beats a 6-judge majority saying BUSH is rank 2-or-better).
  5. But that still usually results in lots of ties. So we then use the "total ordinals" of the judges in that majority. The total ordinal is the same thing as what voting theorists call the "Borda score," except that only the ranks awarded by judges in that skater's majority are summed. (E.g. if a 5-judge majority ranks CLINTON rank 6 or below, say their 5 ranks for CLINTON are 3,6,3,2,1, then CLINTON's total ordinal would be 3+6+3+2+1=15.) Smaller total ordinals are better.
  6. Ties can still happen, and if they do then BOM leaves them unbroken.

"Flip-flops" under the old BOM scoring system

Here are the BOM rankings computed via the above procedure after each skate in our example. Each row in the results table shows the ordinals from the 9 judges, followed by the majority size and place as well as (in parentheses) the total-ordinal tie-breaker when needed. For example, in the first row, Shmerkin has a 5-judge majority ranking him #1 (out of two skaters so far). After three skates Vidrai, Shmerkin, and Weiss would be tied except that the total-ordinal tiebreaker step ranks Vidrai first. We list the skaters in rank order after each new skate, e.g. after three skaters Vidrai is in first place, Weiss in second, and Shmerkin is third.

(Scores ranged from 0 to 6 in theory, but from 4.4 to 5.8 in practice for these skaters.)

After two skaters:
1. Michael Shmerkin121212211-- 5/1
2. Michael Weiss212121122-- 9/2

After three skaters
1. Szabolcs Vidrai133112213-- 6/2 (8)
2. Michael Weiss312231132-- 6/2 (9)
3. Michael Shmerkin221323321 -- 6/2 (10)

After four skaters
1. Szabolcs Vidrai243112214-- 6/2
2. Michael Weiss412241133-- 5/2
3. Michael Shmerkin321433422-- 7/3
4. Yamato Tamura134324341-- 6/3

After five skaters
1. Michael Weiss513252133-- 7/3
2. Szabolcs Vidrai354123214-- 6/3 (12)
3. Yamato Tamura235335341-- 6/3 (15)
4. Michael Shmerkin422444422-- 9/4
5. Igor Pashkevitch141511555-- 5/4

After six skaters
1. Michael Weiss623352133-- 7/3
2. Szabolcs Vidrai464223214-- 5/3
3. Yamato Tamura346436341-- 7/4
4. Michael Shmerkin532545422-- 6/4
5. Igor Pashkevitch251611565-- 7/5
6. Viacheslav Zagorodniuk115164656-- 6/5

Notice how, although Shmerkin was initially ahead of Weiss and later Tamura, that reversed as more competitors were marked. This is exactly the kind of embarrassing BOM "flip-flop" that the new OBO scoring system supposedly was designed to eliminate.

Introduction to the new OBO system

Under the new OBO scoring system, the scoring is computed using a square table the size of the number of competitors – in this case a 6×6 table since there are 6 skaters. Each competitor is compared pairwise (or "one-by-one," hence "OBO") to all the others to fill this table.

In each comparison we note how many "votes" each skater received from the judges, counting it as a (pairwise) "win" for the skater who got the most votes. Then, looking across each skater's row on the scoring sheet, we count the total number of "wins" and use it to rank the skaters overall. Then the total number of "votes" (this is called "JiF" for "judges in favor") is used as a tie-breaker.

Note for voting theorists: using as a skater's score, his total number of pairwise wins, is called the "Copeland" system in the voting literature; and the "JiF" score is called "Borda score" in the voting literature. OBO is thus Copeland with Borda used as a tiebreaking method. Plain Copeland is generally regarded as unusable because it yields too many ties.

OBO does not change in any way the rules the judges use to assign marks to a skater's performance; it only affects how the marks are combined to produce the overall competition result. So, it is possible to use the same competition data (same marks, same skate order) to illustrate what the results would have been under the old BOM or new OBO (or any other rank-order-based) system.

Again to save space we only state the final 6-by-6 table (and all of the separate worksheets for comparing the marks from each judge for each pair of skaters in the table have been omitted). The entries in this table show the ratio of "wins" to the number of "votes" in each pairwise comparison. So for example, Smerkin got one "win" over Weiss, based on the fact 5 of the 9 judges said Shmerkin outperformed Weiss.


SkaterShmerkinWeissVidraiTamuraPaskevitchZagorodniuk
Shmerkin--1/50/30/41/51/5
Weiss0/4--1/51/61/51/6
Vidrai1/60/4--1/60/41/6
Tamura1/50/30/3--1/50/4
Pashkevitch0/40/41/50/4--1/5
Zagorodniuk0/40/30/31/50/4--

This, of course, only represents a small subset of the competitors involved in the actual event. Scoring the complete 29-skater competition via OBO would require a 29×29 table with 406=29·28/2 total pairings – each of which requires an additional worksheet to compare marks from the 9 judges, for 407 pieces of paper in all.

Loosemore (who prefers BOM over OBO) says the old BOM system would require only a single 29×9 worksheet – less cumbersome.

Even more "flip-flops" under the new OBO scoring system!

Here are the skate-by-skate updates of the standings under OBO:


After two skaters
SkaterWinsVotes
1. Shmerkin15
2. Weiss04

After three skaters
SkaterWinsVotes
1. Vidrai110
2. Weiss19
3. Shmerkin18

After four skaters
SkaterWinsVotes
1. Vidrai216
2. Weiss215
3. Shmerkin112
4. Tamura111

After five skaters
SkaterWinsVotes
1. Weiss320
2. Vidrai220
3. Shmerkin217
4. Tamura216
5. Pashkevitch117

After six skaters
SkaterWinsVotes
1. Weiss426
2. Vidrai326
3. Shmerkin322
4. Pashkevitch222
5. Tamura220
6. Zagorodniuk119

What happened to cause a flip-flop in the standings? Not only was there the same situation as under the old system – with Shmerkin dropping behind Weiss and Vidrai after initially holding the lead – but now OBO generates an additional flip-flop when Zagorodniuk's marks caused Pashkevitch and Tamura to swap places!

Looking closely at the table of pairwise comparisons, the problem is that the "who-beats-who" relationships between skaters form complicated Condorcet cycles. There is no possible way to rank this group of six skaters without going against at least one of the "pairwise wins." And going against any "win" will (if the skate order is improvident) cause the relative placements of the two skaters concerned to flip-flop.

Indeed, a little thought shows that OBO flip-flops are only possible in the presence of a Condorcet cycle: because if we had a strict linear order of all skaters, then all the Copeland scores would be distinct, no "tiebreaking" would be needed, and the order would stay fixed as more skaters entered the picture. From this we conclude that at least 22 Condorcet cycles were present in the 48 year-1998 skating-events Loosemore examined.

Such flip-flops are not unique to the old BOM and new OBO systems. Mathematicians have proven that any scoring system based on rank-order votes will produce similar "flip-flops."

You also can see this other proof of a different theorem, showing as an immediate consequence that the OBO system will exhibit "no show paradoxes" in which a judge, by honestly ordering the skaters, will cause a worse gold-medalist (in her view) to be chosen than if that judge had refused to score anybody. Indeed an honest judge's ranking of X worst can be the direct cause of X winning gold, and a honest judge's ranking with X best can be the direct cause of X failing to win gold.

What does this tell voting theorists?

Most voting theorists have been unaware that any "Condoret voting system" had ever been used in an important real-life application. Recently Debian Linux to elect its leader, and Wikipedia in some adminstrative votes, have adopted Schulze-beatpaths Condorcet voting. But the OBO voting system had already been used to judge at least 100 top-notch skating events and it is Condorcet (i.e. a Condorcet winner, whenever one exists, will always win gold).

So we have, in these skating events, a treasure-trove of real data about Condorcet elections, presently in quantity far exceeding all other known Condorcet elections.

Meanwhile BOM is, to a good approximation, the same thing as median-rank voting with a Borda-like tiebreaker added, which is also of interest since median-score-based range voting (which is not the same thing, but is related) was proposed by Balinski & Laraki, then attacked compared to ordinary mean-based range voting by Smith et al, all in complete ignorance of the fact that median-rank voting already been in wide use by skaters and then abandoned as a failure.

How well did BOM and OBO perform? In my view, OBO has been an embarrassing failure for the skating world. (And BOM also failed, not only in my view, but in the view of the International Skating Union.) BOM and OBO both are far too complicated and both yielded and continue to yield huge numbers of embarrassing "flip-flops," each of which is a public relations disaster. (As I said, I personally would recommend the simple trimmed mean voting system instead, which is similar to range voting except based on trimmed mean rather than mean.)

But it would be better if more skating data were acquired and analysed to estimate the real-world frequencies of "no-show paradoxes," "Condorcet cycles," and other pathologies. That remains to be done.



Return to main page