Applying Bradley Terry to ShoddyBattle

Dragontamer · Jan 21, 2010

For reference: http://www.smogon.com/forums/showthread.php?t=62941

I'll just say "BradleyTerry" to refer to the system described in that thread... even though its really a "Most Likely Estimate on the BradleyTerry Model"... thats just a bit long.

Just in case someone is interested, here's some of my thoughts about adding this to ShoddyBattle.

-------- Use Case ---------------
1. An administrator / moderator would create a new tournament. This would involve starting up ShoddyBattle with a new switch that uses "BradleyTerry" instead of Glicko2 as the ranking system.

2. Matches will be "waterfalled" by the tournament director by a special command, such as "/runtournament" in the chat screen. For example, lets say there will be a match scheduled every 15 minutes starting at 10:00pm to 12:00 midnight EST. (A more advanced, cron-like schedule can probably be programmed by the director, but overall the idea is to give the tournament director much control over the matches)

3. Players register for the tournament by registering for the server. Players will be able to play all of the games. Players who hit "Find Match" will be entered into the next match (which will occur at the Tournament Director's discression). IMO, the ideal tournament design will have players playing an unlimited number of games, with an unlimited number of rematches. The only restriction needed is listed in #2. Theoretically, one can program more specific sets of restrictions (ie: a limited number of matches per day) as the server processes the pool of "Find Match".

4. The tournament director will print out the list of players as well as their ranks, and then enter the top 16 into a standard tournament.

This will allow a far greater number of people to participate in a tournament. Because of the short period of play, it is unlikely that Glicko2 will converge to a good value. However, because the qualification period is so short, it is more suitable for a real tournament setting. A tournament utilizing ShoddyBattle as the backbone to the "Qualification Rounds" will allow tournaments to scale up to a size restricted only by the server! There can easily be 300+ players in such a tournament, with the server automatically collecting the information for later analysis. Especially if the number crunching takes place at an easier time for the server.

If 512 people enter, and are present in every match, then 8 matches can be equivalent to a single-elimination tournament, if a suitable pairing algorithm is put into place (Assuming one can correctly program the pairing algorithm). The games can scale to an arbitrary number of matches, such as 50 matches, 10 matches per day between 10:00 and 12:00 or something similar. More matches == more accuracy, and this system will take advantage of the extra accuracy.

If enough matches are played for each player, then there will also be very little excuse for the "hax knocked me out of the tournament" excuse. Hell, everyone can play in as many qualification matches as they want to. Bradley Terry does not give an advantage to the player who plays more often. (Nonetheless, a "minimum number of games" should be enforced)

Finally, this should be a familiar interface to the players. They will click on the ladder and a match will come up automatically... and they'll play as they always know how to play. All the relevant changes should be made on the server side.

-------------- Technicals --------------

Okay, so here's the list of things that need to be changed for this to work out well (IMO anyway).

* Item #2 above needs to be implemented. The "ideal pairings", as far as BradleyTerry is concerned, looks rather complicated (and runs at O(n^3) or worse). In my experience, the following heuristics can be used. Make sure that the players don't get more than one "forced bye" (odd number of players will happen) and also make sure that they always play someone new. Both heuristics can be expressed as a matter of finding a Maximum Matching.

* The new Ranking System isn't really developed yet. Ideas exist in my head, but I'm not 100% sure of the details. Here's my thoughts on the issue. First, let me define the rank in terms of graphs. A player is a vertex on a multi-graph. Matches are represented by directed edges. A directed edge points in the direction of the winner. (ie: A <- B means that A beat B).

As noted in my previous post, only Strongly Connected Components can be ranked against each other using the algorithm developed by Bradley and Terry. However, finding strongly connected components is a solved problem (see the wikipedia link), and if we "collapse" the strongly connected components into a single vertex, we are left with a Directed Graph. And a Directed Graph can be topologically sorted.

So, for an example, lets say here's the list of edges on this Win/Loss graph. Remember that A <- B means that A beats B.

A <- B <- C <- D <- B
D <- F

Now, a strongly connected component would be the vertexes B, C, and D. A topological sort would be:

A > B, C, D > F.

Which is "obvious" when you look at the win / loss graph. Bradley Terry then provides an algorithm to rank the players B, C, and D against each other.

Here's a more complicated example:

A <- B
A <- E
Z <- E

B <- C <- D <- B
E <- F <- E

D <- G

G <- H
F <- H

In this case, note that A and Z never faced each other, but both beat the "EF" group. Also, multiple, consistent, topological sorts exist in this example. In a case like this, it is clear that not enough matches have taken place to form a conclusion. More matches will add more connections and overall a simpler graph to rank.

So again, assigning a "rank" to players in this example will be futile. If you don't believe me, tell me how G should be ranked against E and F.

Nonetheless, a concept of a "tier" can be created. We start with the undefeated players, and then work our way down. The number of groups that are between you and the closest undefeated player will be your tier number.

Both A and Z are undefeated, so they are in Tier 0.

Group [A] and Group [Z] are in Tier 0, as no other groups are above them undefeated.

Group [E, F] and Group [B, C, D] are in tier 1.
Group [G] is in Tier 2.
Group [H] is ambiguous, but by this definition, is in Tier 2.

So, we have the following concepts in a "rank" if we go by this system:
1. The Tier
2. The Group
3. The BradleyTerry Rank within the group.

----------------

I'm pretty sure that a subclass of AccountRegistry can actually accomplish all of this, and further, it would be very similar to the current DatabaseImpl class. BradleyTerry however cannot "flush" ANY data from the "ladder" table. Every single match is an edge in the whole match multi-graph, and every edge can affect the eventual outcome.

So the entire ladder needs to be saved into the database. Every win and every loss. (Another reason why BradleyTerry will be superior to Glicko2 for tournaments: it factors in every match :-) Unlikely to scale for a long-term league so Glicko2 should still be used for the long-term)

The calculateConservativeRatings method probably can implement the above algorithm, calculating the Tier, Group, and Rank of every player. Some sort of "Group Name" generator needs to be implemented, that implies that people from different groups are not comparable. Like Tier 1, Group Apple. And Tier 1: Group Bananna.

AccountRegistry will need to be updated to return an abstract "AbstractPlayer" object instead of a Glicko.Player object. (Because we aren't using Glicko to do this ranking).

I'm not too familiar with ShoddyBattle, so further implementation details will require a bit more work I guess.

chaos · Jan 21, 2010

Don't bother with an SB1 implementation... if you were to implement this, I'd talk with Colin and figure out how to add it to SB2. Also, I missed the Bradley-Terry discussion (probably because I never read Stark Mountain, but still). Interesting stuff, although I think there are a lot of social issues to account for... as in, people are unreliable and nobody wants to battle that much for a tournament. Maybe in DP it's gotten so quick that people won't mind battling often. I know in a metagame like GSC, however, this wouldn't fly.

Dragontamer · Jan 21, 2010

Hmm, the social aspect may be true: but really this method is very scalable. Bradley Terry does not care about the order of the matches or anything: it will calculate the best possible fit for the tournament given a list of matches and their results.

If you were to play single-elimination and calculate the BradleyTerry score of everyone, you'd come to the same conclusion as a single-elimination tournament: the undefeated player is #1, and the #3 and #4 places need an additional match for a complete conclusion.

Anyway... the best I could do on research for number of battles people play is this. I don't think it would be too much to ask for 15 or 25 battles in a 300 person tournament, especially because in this kind of format, people do NOT need to play an equal number of battles.

Those "unreliable" people who don't manage to scrape together the minimum number of battles can be disqualified automatically by the program.

_____________________________

EDIT:

Heh, wasn't sure if I was to double post or not... so I'll just edit this in.

Anyway, it looks like that the current code for ShoddyBattle2 still stores the list of matches (in table ladder_matches_ID). While the list of changes I listed above is still feasible, I would probably implement it differently. The Glicko2 code can remain at the core of the system: it does not necessarily "interfere" with this other tournament-style ladder.

Because all of the Ladder Matches are stored in the external database, an external program (or, a thread) can batch together the data. I don't think Colin wrote the code that actually drops the Matches from the table yet, so I'm unsure how it would look like in the future. (Its a flexibility vs simplicity implementation detail... ). I can't the need for other rating systems (Elo / Trueskill practically do the same thing as Glicko2), but if Colin wants to go with a flexible route, he can probably implement some "AbstractUpdater", which represents the various things that read the ladder_matches_ table.

I'm not very good at databases either, I dunno if thats a stupid idea or not...

Applying Bradley Terry to ShoddyBattle

Dragontamer

chaos

Tournament Banned

Dragontamer