Programming Team-Based Glicko-1 Modifications for Pokemon 2v2 Rating System

Shadowys · Jan 3, 2025

Introduction

This document outlines modifications to the Glicko-1 rating system to better handle team-based competitive formats, specifically designed with Pokemon 2v2 in mind. The system extends Glicko-1's individual rating approach to handle team compositions while maintaining the core principles of rating uncertainty and non-linear adjustments.

Try out a simulation here : https://eclectic-duckanoo-4b14ed.netlify.app

Core Modifications

Team Rating Deviation (RD)

For a team T with n players, the team's Rating Deviation is calculated as the root mean square of individual player RDs:

Code:

RD_team = √(∑(RD_i²)/n)  where i ∈ {1,...,n}

This provides a composite uncertainty measure that:

Increases with higher individual uncertainties
Reflects the overall team rating confidence
Maintains the scale of original Glicko RD values

Team Rating Aggregation

Team rating is computed as the mean of individual ratings plus a suggested certainty-based boost:

Code:

R_team = (∑R_i)/n + b(RD_team)  where i ∈ {1,...,n}
b(RD) = 0.5(350 - RD)

The rating boost function b(RD) rewards teams with more certain ratings, providing:

Maximum boost of 175 points for perfectly certain ratings
Linear decrease as uncertainty increases
No boost at RD = 350 (traditional Glicko-1 initial RD)

This rating boost is proposed to reward consistent players who don't have volatile performances game-to-game, especially in a scenario where it's a team game and not a solo performance. It covers an edge case where players of similar known skill with low certainty play together and rating growth is too low to reflect actual skill. This rating boost can also be applied to solo play.

Modified Rating Updates

The system introduces a scaling factor to control rating volatility:

Code:

rdFactor = max(1/scale, RD/RD_max)

Rating updates follow the formula:

Code:

d² = 1/(q²g(RD_opp)²E(1-E))
ΔR = scale * rdFactor * q/(1/RD² + 1/d²) * g(RD_opp)(S-E)

Where, following the original Glicko-1 formula,

q = ln(10)/400 (≈ 0.0057565)
g(RD) = 1/√(1 + 3q²RD²/π²)
E(R,R_opp,RD_opp) = 1/(1 + 10^(-g(RD_opp)(R-R_opp)/400))
S is the match outcome (1 for win, 0 for loss)

Default Parameters

Code:

Initial Rating (R₀): 1500
Initial RD (RD₀): 130      // As per what we already have in smogon.
Rating Scale: 2.0          // Controls rating change magnitude, increasing this will increase volality.
Players per Team: 2        // Tested for 2v2 format, while it seems to also work for 3v3 and beyond

Implementation Considerations

Rating Certainty Incentive

The rating boost system incentivizes consistent play and stabilizes team ratings by:

Rewarding teams that play regularly (lower RD)
Providing a natural handicap for infrequent competitors
Smoothing rating fluctuations in established teams

Team Composition Effects

The root mean square RD calculation ensures that:

Teams with mixed experience levels have appropriate uncertainty
New players impact team uncertainty more significantly
Rating stability improves as all team members play consistently

Usage in Smogon
Individual Player Matchmaking
The current approach of matching players immediately with any available session works well, since the rating calculations above would even out any RD differences over time into their true skill rating. Otherwise if we want to enhance matchmaking, a suggested clustering approach is as follows.

We sort players in the search queue via the difference of RD then R. Players of the similar RD should play together since it indicates that they have the same activity level. After that, we search through 3 other players with the closest R and construct a matrix of players to put them into teams as follows.

For players P₁, P₂, P₃, P₄ in queue, calculate matrix D where:

Code:

D[i,j] = |R_eff_i - R_eff_j| // where i and j are two subsequent numbers in {1,2,3,4}
R_eff_i = R_i + b(RD_i)    // Individual rating with certainty boost

For any team combination of players between the 4 players, compute the potential team gap

For two given teams in queue, we calculate the effective ladder difference.

Code:

R_eff_A = R_team_A + b(RD_team_A)    // Team rating with certainty boost
R_eff_B = R_team_B + b(RD_team_B)    // Team rating with certainty boost
Δ_eff = |R_eff_A - R_eff_B|          // Effective rating difference

and then a match quality score Q

Code:

RD_combined = √(RD_team_A² + RD_team_B²)
Q = 1 / (1 + (Δ_eff/400)² + (RD_combined/400)²)

This score ranges from 0 to 1, where:

Q ≈ 1: Ideal match (similar ratings, low RDs)
Q ≈ 0: Poor match (large rating gap or very uncertain ratings)

We then select the team combination of players with the highest Q if it goes beyond a certain quality threshold. Otherwise, we expand the selection of players beyond the initial 4 players and increase the teams available until a certain acceptable limit.

Programming Team-Based Glicko-1 Modifications for Pokemon 2v2 Rating System

Shadowys

Introduction​

Core Modifications​

Team Rating Deviation (RD)​

Team Rating Aggregation​

Modified Rating Updates​

Implementation Considerations​

Rating Certainty Incentive​

Team Composition Effects​

Introduction

Core Modifications

Team Rating Deviation (RD)

Team Rating Aggregation

Modified Rating Updates

Implementation Considerations

Rating Certainty Incentive

Team Composition Effects