I'm redirected here to provide a critical review of Glicko and its misuse through GXE in Smogon.
Rating systems are fundamental to competitive gaming, helping match players of similar skill and track improvement over time. The Glicko rating system, developed by Mark Glickman, improves upon the Elo system by introducing a measure of rating reliability. Serious competitive electronic games such as Dota2 and League have adopted it or similar systems to matchmake players. Here in smogon however, we instead use a derived approximation called GXE by X-Act (
original post here). My goal is to show how Glicko by itself is suitable for ladder tiering, tournament seeding, and how GXE is ultimately harmful in any serious use.
On Glicko
As mentioned in the OP, the Glicko system uses two primary numbers to track player skill: Rating (R) and Rating Deviation (RD). The rating, typically starting at 1500, represents the estimated skill level. The Rating Deviation, usually starting at 350 (but for smogon we may be using 130), indicates the uncertainty in this estimate. This two-number approach is crucial because it tells us not just how good we think a player is, but how confident we are in that assessment.
Rating Deviation naturally evolves throughout a player's career. It decreases as players complete more games, increases during periods of inactivity, and responds to the variety of opponents faced. A player who regularly competes against diverse opponents will typically see their RD decrease more rapidly than one who faces the same opponents repeatedly.
Typical RD ranges for different player categories:
- New players: ~350
- Active players: 60-110
- Regular players: 30-80
- Very active/professional players: 20-40
On GLIXARE aka GXE
(
original post here) A formula called GLIXARE was proposed to convert Glicko ratings into a single number:
Code:
GLIXARE Rating = 0, if RD > 100
GLIXARE Rating = round(10000 / (1 + 10^(((1500 - R) * pi / sqrt(3 * ln(10)^2 * RD^2 + 2500 * (64 * pi^2 + 147 * ln(10)^2)))))) / 100, otherwise
GXE as an absolute number was proposed as a means to replace CREs as a definitive measure of the player's skill rating because it may be difficult to compare players. This is a fundamental misuse of Glicko-1, which is a statistical formula, to produce some absolute number to rank or tier players for example its use for ladder requirements in suspect tests.
The GLIXARE formula suffers from several fundamental problems. First, its RD threshold of 100 is unrealistically low. Most active players naturally have RD values above 100, meaning the formula would assign them a rating of zero. This creates artificial "dead zones" where ratings become meaningless and completely ignores valid skill information from newer players.
The mathematical structure of GLIXARE introduces additional concerns. Its unnecessarily complex scaling and non-linear RD effects create unpredictable rating changes. The formula misuses RD's statistical properties and rigidly centers everything around 1500, limiting its flexibility. These issues make it poorly suited for practical applications like tracking player progress, facilitating matchmaking and subsequently its usage as ladder reqs.
The inherent flaws is surfaced as dubious player behaviours shown in getting ladder reqs with GXE: starting multiple new alts to get that lucky streak (of which COIL was an attempt to mitigate this), and some excellent players spending too much time trying to get reqs because they were simply unlucky. This isn't sustainable and an unneeded waste of system resources and human effort.
Better Ways to Use Glicko for Smogon
Instead of forcing ratings into a single number, we should be using R and RD from Glicko, (with an optional minimum amount of games) instead.
To quote the typical RD ranges again:
- New players: ~350
- Active players: 60-110
- Regular players: 30-80
- Very active/professional players: 20-40
We can set ladder reqs to be say R >= 1700 rating with RD <= 100 to filter out for regular players, or for professional/tournament winner adjacent players to be R>=1800 and RD <=50. This would encourage participation among excellent players by using a more accurate range of values instead of the artificial number like GLIXARE which is a fundamental misunderstanding of how Glicko was meant to be a statistical formula instead of an absolute number. This change would also encourage players to just stick with the one account used for ladder reqs, because Glicko-1 is relatively fast to respond to rating changes.
---
While attempts to simplify Glicko into a single number like GLIXARE are understandable, they often sacrifice valuable information about rating uncertainty. Instead, embrace Glicko's two-number system and use confidence intervals for tiering players. This approach provides a more nuanced and accurate picture of player skill while accounting for the natural uncertainty in skill assessment.