# Understanding Glicko Ratings: A Guide to Player Skill Assessment
## Introduction
Rating systems are fundamental to competitive gaming, helping match players of similar skill and track improvement over time. The Glicko rating system, developed by Mark Glickman, improves upon the Elo system by introducing a measure of rating reliability. Serious competitive electronic games such as Dota2 and League have adopted it or similar systems to matchmake players. Here in smogon however, we instead use a derived approximation called GXE (original post here). This article explores how Glicko by itself is suitable for ladder tiering, tournament seeding, and how GXE is ultimately harmful in any serious use.
## Understanding Glicko Basics
The Glicko system uses two primary numbers to track player skill: Rating (R) and Rating Deviation (RD). The rating, typically starting at 1500, represents the estimated skill level. The Rating Deviation, starting at 350, indicates the uncertainty in this estimate. This two-number approach is crucial because it tells us not just how good we think a player is, but how confident we are in that assessment.
Rating Deviation naturally evolves throughout a player's career. It decreases as players complete more games, increases during periods of inactivity, and responds to the variety of opponents faced. A player who regularly competes against diverse opponents will typically see their RD decrease more rapidly than one who faces the same opponents repeatedly.
Typical RD ranges for different player categories:
- New players: ~350
- Active players: 60-110
- Regular players: 30-80
- Very active/professional players: 20-40
## The GLIXARE aka GXE Formula: A Critical Analysis
(original post here) A formula called GLIXARE was proposed to convert Glicko ratings into a single number:
```
GLIXARE Rating = 0, if RD > 100
GLIXARE Rating = round(10000 / (1 + 10^(((1500 - R) * pi / sqrt(3 * ln(10)^2 * RD^2 + 2500 * (64 * pi^2 + 147 * ln(10)^2)))))) / 100, otherwise
```
The GLIXARE formula suffers from several fundamental problems. First, its RD threshold of 100 is unrealistically low. Most active players naturally have RD values above 100, meaning the formula would assign them a rating of zero. This creates artificial "dead zones" where ratings become meaningless and completely ignores valid skill information from newer players.
The mathematical structure of GLIXARE introduces additional concerns. Its unnecessarily complex scaling and non-linear RD effects create unpredictable rating changes. The formula misuses RD's statistical properties and rigidly centers everything around 1500, limiting its flexibility. These issues make it poorly suited for practical applications like tracking player progress, facilitating matchmaking and subsequently its usage as ladder reqs.
## Better Ways to Use Glicko for Smogon
Instead of forcing ratings into a single number, we should be using R and RD from Glicko instead.
To quote the typical RD ranges again:
- New players: ~350
- Active players: 60-110
- Regular players: 30-80
- Very active/professional players: 20-40
We can set ladder reqs to be say R >= 1700 rating with RD <= 100 to filter out for regular players, or for professional/tournament winner adjacent players to be R>=1800 and RD <=50. This would encourage participation among excellent players who may not have the time to do ladder reqs by using a more accurate range of values instead of the artificial number like GLIXARE which is a fundamental misunderstanding of how Glicko was meant to be a statistical formula instead of an absolute number.
## Conclusion
While attempts to simplify Glicko into a single number like GLIXARE are understandable, they often sacrifice valuable information about rating uncertainty. Instead, embrace Glicko's two-number system and use confidence intervals for tiering players. This approach provides a more nuanced and accurate picture of player skill while accounting for the natural uncertainty in skill assessment.
Remember that any rating system is a tool for approximating skill, not an absolute measure. The best systems acknowledge this uncertainty and use it to make better decisions about matchmaking, tournaments, and player progression.
## Introduction
Rating systems are fundamental to competitive gaming, helping match players of similar skill and track improvement over time. The Glicko rating system, developed by Mark Glickman, improves upon the Elo system by introducing a measure of rating reliability. Serious competitive electronic games such as Dota2 and League have adopted it or similar systems to matchmake players. Here in smogon however, we instead use a derived approximation called GXE (original post here). This article explores how Glicko by itself is suitable for ladder tiering, tournament seeding, and how GXE is ultimately harmful in any serious use.
## Understanding Glicko Basics
The Glicko system uses two primary numbers to track player skill: Rating (R) and Rating Deviation (RD). The rating, typically starting at 1500, represents the estimated skill level. The Rating Deviation, starting at 350, indicates the uncertainty in this estimate. This two-number approach is crucial because it tells us not just how good we think a player is, but how confident we are in that assessment.
Rating Deviation naturally evolves throughout a player's career. It decreases as players complete more games, increases during periods of inactivity, and responds to the variety of opponents faced. A player who regularly competes against diverse opponents will typically see their RD decrease more rapidly than one who faces the same opponents repeatedly.
Typical RD ranges for different player categories:
- New players: ~350
- Active players: 60-110
- Regular players: 30-80
- Very active/professional players: 20-40
## The GLIXARE aka GXE Formula: A Critical Analysis
(original post here) A formula called GLIXARE was proposed to convert Glicko ratings into a single number:
```
GLIXARE Rating = 0, if RD > 100
GLIXARE Rating = round(10000 / (1 + 10^(((1500 - R) * pi / sqrt(3 * ln(10)^2 * RD^2 + 2500 * (64 * pi^2 + 147 * ln(10)^2)))))) / 100, otherwise
```
The GLIXARE formula suffers from several fundamental problems. First, its RD threshold of 100 is unrealistically low. Most active players naturally have RD values above 100, meaning the formula would assign them a rating of zero. This creates artificial "dead zones" where ratings become meaningless and completely ignores valid skill information from newer players.
The mathematical structure of GLIXARE introduces additional concerns. Its unnecessarily complex scaling and non-linear RD effects create unpredictable rating changes. The formula misuses RD's statistical properties and rigidly centers everything around 1500, limiting its flexibility. These issues make it poorly suited for practical applications like tracking player progress, facilitating matchmaking and subsequently its usage as ladder reqs.
## Better Ways to Use Glicko for Smogon
Instead of forcing ratings into a single number, we should be using R and RD from Glicko instead.
To quote the typical RD ranges again:
- New players: ~350
- Active players: 60-110
- Regular players: 30-80
- Very active/professional players: 20-40
We can set ladder reqs to be say R >= 1700 rating with RD <= 100 to filter out for regular players, or for professional/tournament winner adjacent players to be R>=1800 and RD <=50. This would encourage participation among excellent players who may not have the time to do ladder reqs by using a more accurate range of values instead of the artificial number like GLIXARE which is a fundamental misunderstanding of how Glicko was meant to be a statistical formula instead of an absolute number.
## Conclusion
While attempts to simplify Glicko into a single number like GLIXARE are understandable, they often sacrifice valuable information about rating uncertainty. Instead, embrace Glicko's two-number system and use confidence intervals for tiering players. This approach provides a more nuanced and accurate picture of player skill while accounting for the natural uncertainty in skill assessment.
Remember that any rating system is a tool for approximating skill, not an absolute measure. The best systems acknowledge this uncertainty and use it to make better decisions about matchmaking, tournaments, and player progression.