Why?
The Random Battles formats include every fully evolved Pokemon, but not every Pokemon is equally strong. If every Pokemon was simply level 100, you could not make a fun format that includes Luvdisc, Miraidon and everything in between. The tool we use to mitigate this problem is level balancing. By setting Miraidon's level much lower than its peers, we hope to bring its power roughly in line with the rest of the lineup.
How?
Old method
The traditional way to balance levels is by assigning levels to tiers, and manually adding a couple of exceptions. For example, in the tier-based system for Gen 7 an OU Pokemon would get level 80, and a PU Pokemon would get 88. Exceptions were mainly used to increase the level of Pokemon weaker than the average PU, for example setting Unown and Luvdisc to 100.
This system is currently still used for Gen 2.
New method
For the first time in Gen 8, we used a new method to balance levels: winrates. By going through the database of games played, it was possible to find out how often each Pokemon won. Clearly the average winrate is 50%, so each Pokemon significantly below that line was buffed, and each Pokemon significantly above was nerfed. This scraping of the database was quite a bit of work, and was undertaken by Annika. The fruits of her labor are still visible here. The increased balance that this method brought to the format was widely positively received.
Random Battle winrates have since gotten their own database, implemented by Mia, which anyone can view by typing
If you want to know the exact implementation details click this box:
Results
The below plot shows a histogram of the Gen 9 winrates for February, March and April of 2023. Between each of these months there was a round of balancing, and we can clearly see a tightening towards 50% winrates, with accompanying decrease in standard deviation (σ). The biggest outlier is Luvdisc, which is already at level 100 and thus cannot be buffed further, unfortunately remaining weak in perpetuity.
In Gen 7, as shown below, the distribution is wider. Partly this is because Gen 7 is less balanced, and partly because the lower sample size amplifies noise. Regardless, there is a clear and drastic decrease in variance after March, which coincides with the first balance round.
In summary, winrate-based level balancing provides a convenient and effective balancing mechanism, and is now being applied not just to current gen, but Gens 3-7 as well. It is not without faults: Pokemon that are difficult to use may be weaker at lower ratings than higher, and of course level is but one aspect of balance. However, the huge outliers (Kyogre had a 66% winrate in Gen 3!) should be a thing of the past.
If you have any comments or questions, feel free to post in this thread!
The Random Battles formats include every fully evolved Pokemon, but not every Pokemon is equally strong. If every Pokemon was simply level 100, you could not make a fun format that includes Luvdisc, Miraidon and everything in between. The tool we use to mitigate this problem is level balancing. By setting Miraidon's level much lower than its peers, we hope to bring its power roughly in line with the rest of the lineup.
How?
Old method
The traditional way to balance levels is by assigning levels to tiers, and manually adding a couple of exceptions. For example, in the tier-based system for Gen 7 an OU Pokemon would get level 80, and a PU Pokemon would get 88. Exceptions were mainly used to increase the level of Pokemon weaker than the average PU, for example setting Unown and Luvdisc to 100.
This system is currently still used for Gen 2.
New method
For the first time in Gen 8, we used a new method to balance levels: winrates. By going through the database of games played, it was possible to find out how often each Pokemon won. Clearly the average winrate is 50%, so each Pokemon significantly below that line was buffed, and each Pokemon significantly above was nerfed. This scraping of the database was quite a bit of work, and was undertaken by Annika. The fruits of her labor are still visible here. The increased balance that this method brought to the format was widely positively received.
Random Battle winrates have since gotten their own database, implemented by Mia, which anyone can view by typing
/rwr
in a chatroom, greatly simplifying this process. This has made it possible to not just extend winrate balancing to Gen 9, but also to retroactively balance Gens 3, 4, 5, 6 and 7. The end of March of this year saw the first old-gen balance patch for Gens 3-7. Gen 2 is left out because it does not have enough games played to get accurate statistics, and Gen 1 because its unique qualities make it unsuited for winrate balancing, as explained here. The full history of winrates and accompanying balance changes can be found in this spreadsheet. If you want to know the exact implementation details click this box:
Every month we take the wins/losses from the
To increase sample size, each Arceus and Silvally forme gets the same level, allowing for grouped wins and losses. Current gen is the exception here, as its higher game count allows for individual balancing. To further increase old gen sample size, we have now started using multiple months of data per patch for Pokemon that haven't been buffed or nerfed for multiple months.
/rwr
data for each Pokemon, which includes every game played above 1300 Elo (1500 for current gen, and 1150 for gen 2), and check if it deviates from 50% with sufficient magnitude and certainty. Specifically, for every 1.5% deviation from 50% that is significant at the p<0.01 level using a one-tailed binomial test we buff or nerf a Pokemon by one level. To avoid anomalies, we cap this at max 3 levels per month. For the least drastic action we can take, a 1 level change, we maintain less rigorous standards, allowing a 1% deviation at p < 0.05. These parameters are still subject to change.To increase sample size, each Arceus and Silvally forme gets the same level, allowing for grouped wins and losses. Current gen is the exception here, as its higher game count allows for individual balancing. To further increase old gen sample size, we have now started using multiple months of data per patch for Pokemon that haven't been buffed or nerfed for multiple months.
Results
The below plot shows a histogram of the Gen 9 winrates for February, March and April of 2023. Between each of these months there was a round of balancing, and we can clearly see a tightening towards 50% winrates, with accompanying decrease in standard deviation (σ). The biggest outlier is Luvdisc, which is already at level 100 and thus cannot be buffed further, unfortunately remaining weak in perpetuity.
In Gen 7, as shown below, the distribution is wider. Partly this is because Gen 7 is less balanced, and partly because the lower sample size amplifies noise. Regardless, there is a clear and drastic decrease in variance after March, which coincides with the first balance round.
In summary, winrate-based level balancing provides a convenient and effective balancing mechanism, and is now being applied not just to current gen, but Gens 3-7 as well. It is not without faults: Pokemon that are difficult to use may be weaker at lower ratings than higher, and of course level is but one aspect of balance. However, the huge outliers (Kyogre had a 66% winrate in Gen 3!) should be a thing of the past.
If you have any comments or questions, feel free to post in this thread!
Last edited: