Okay, I think I finally understand all the maths.

The issue here is confusion between a player's performance and a player's rating. This post will basically work out to be a brief overview of chess rating systems. For a not-so-brief history, I highly recommend

this reference (PDF warning).

So the theoretical underpinning behind both Elo and Glicko is a model of "pairwise-comparison" called Bradley-Terry (or Bradley-Terry-Luce). The model simplifies all games of skill down to the following scenario: You have two players, each of whom has a box containing slips of paper, and on each slip of paper is a number. Each player shuffles their box and pulls out a random slip of paper. The player with the higher number wins. The idea behind this model is that there's a

*range* of performance levels at which each player can play on a given day, and the player with better performance on that given day will be the victor.

So what's in these player's boxes?

**The distribution of skill levels at which each player can play** (the numbers in each player's box)

**follows the extreme value distribution.**
In the Bradley-Terry model, these distributions are identical,

*except for their center* (it's unclear if by center they mean the parameter "a," the mean or the median, but as you'll see in a minute, we don't care). Consequently, you can do a bit of fancy math, and you'll find that

**the ***difference* in performance between the two players follows a logistic distribution.
Okay, so the Bradley-Terry model assumes that players' "performance distributions" are identical, except for their "centers."

**The centers of these distributions is given by a player's Elo rating.** And

Elo, it turns out, is just Glicko if RD=0.

You see, Glicko was designed as an improvement on Elo to take into account the fact that we don't

*really* know a player's rating (which, as I just said, is the center of the distribution of performance levels at which a player can play). So Glicko added the parameter RD which tells us that we have high confidence that a player's Elo rating (which I earlier called "true rating") falls within the range (R-n*RD,R+n*RD) where

n=1 for 68% confidence, n=2 for 95% confidence and n=3 for 99.7% confidence. These "confidence" levels correspond to the premise that

**the probability distribution of a player's Elo (aka "true") rating is normal.**
Now, for Cathy, how does GXE fit in? GXE is the probability of winning a match against a random player on the ladder (or a player with rating 1500±350). This comes from Glickman's formula for the chance of a player with rating R1±RD1 defeating a second player with rating R2±RD2, which, in the case where RD1=RD2=0, simplifies to the "Elo victory formula," which is based on the player's performances following extreme value distributions and thus having a performance difference distribution given by a logistic function.

Phew.

And with that, I hope I've tied everything together with a nice, neat, and mathematically dense bow.

Also,

TrueSkill is Glicko for multiplayer games.Click to expand...