Introduction
This document outlines modifications to the Glicko-1 rating system to better handle team-based competitive formats, specifically designed with Pokemon 2v2 in mind. The system extends Glicko-1's individual rating approach to handle team compositions while maintaining the core principles of rating uncertainty and non-linear adjustments.
Try out a simulation here : https://eclectic-duckanoo-4b14ed.netlify.app
Core Modifications
Team Rating Deviation (RD)
For a team T with n players, the team's Rating Deviation is calculated as the root mean square of individual player RDs:
Code:
RD_team = √(∑(RD_i²)/n) where i ∈ {1,...,n}
This provides a composite uncertainty measure that:
- Increases with higher individual uncertainties
- Reflects the overall team rating confidence
- Maintains the scale of original Glicko RD values
Team Rating Aggregation
Team rating is computed as the mean of individual ratings plus a suggested certainty-based boost:
Code:
R_team = (∑R_i)/n + b(RD_team) where i ∈ {1,...,n}
b(RD) = 0.5(350 - RD)
The rating boost function b(RD) rewards teams with more certain ratings, providing:
- Maximum boost of 175 points for perfectly certain ratings
- Linear decrease as uncertainty increases
- No boost at RD = 350 (traditional Glicko-1 initial RD)
Modified Rating Updates
The system introduces a scaling factor to control rating volatility:
Code:
rdFactor = max(1/scale, RD/RD_max)
Rating updates follow the formula:
Code:
d² = 1/(q²g(RD_opp)²E(1-E))
ΔR = scale * rdFactor * q/(1/RD² + 1/d²) * g(RD_opp)(S-E)
Where, following the original Glicko-1 formula,
- q = ln(10)/400 (≈ 0.0057565)
- g(RD) = 1/√(1 + 3q²RD²/π²)
- E(R,R_opp,RD_opp) = 1/(1 + 10^(-g(RD_opp)(R-R_opp)/400))
- S is the match outcome (1 for win, 0 for loss)
Code:
Initial Rating (R₀): 1500
Initial RD (RD₀): 130 // As per what we already have in smogon.
Rating Scale: 2.0 // Controls rating change magnitude, increasing this will increase volality.
Players per Team: 2 // Tested for 2v2 format, while it seems to also work for 3v3 and beyond
Implementation Considerations
Rating Certainty Incentive
The rating boost system incentivizes consistent play and stabilizes team ratings by:- Rewarding teams that play regularly (lower RD)
- Providing a natural handicap for infrequent competitors
- Smoothing rating fluctuations in established teams
Team Composition Effects
The root mean square RD calculation ensures that:- Teams with mixed experience levels have appropriate uncertainty
- New players impact team uncertainty more significantly
- Rating stability improves as all team members play consistently
Individual Player Matchmaking
The current approach of matching players immediately with any available session works well, since the rating calculations above would even out any RD differences over time into their true skill rating. Otherwise if we want to enhance matchmaking, a suggested clustering approach is as follows.
We sort players in the search queue via the difference of RD then R. Players of the similar RD should play together since it indicates that they have the same activity level. After that, we search through 3 other players with the closest R and construct a matrix of players to put them into teams as follows.
For players P₁, P₂, P₃, P₄ in queue, calculate matrix D where:
Code:
D[i,j] = |R_eff_i - R_eff_j| // where i and j are two subsequent numbers in {1,2,3,4}
R_eff_i = R_i + b(RD_i) // Individual rating with certainty boost
For any team combination of players between the 4 players, compute the potential team gap
For two given teams in queue, we calculate the effective ladder difference.
Code:
R_eff_A = R_team_A + b(RD_team_A) // Team rating with certainty boost
R_eff_B = R_team_B + b(RD_team_B) // Team rating with certainty boost
Δ_eff = |R_eff_A - R_eff_B| // Effective rating difference
and then a match quality score Q
Code:
RD_combined = √(RD_team_A² + RD_team_B²)
Q = 1 / (1 + (Δ_eff/400)² + (RD_combined/400)²)
This score ranges from 0 to 1, where:
- Q ≈ 1: Ideal match (similar ratings, low RDs)
- Q ≈ 0: Poor match (large rating gap or very uncertain ratings)