Random Battle levels

Ads20000 · Jul 22, 2018

What do people think of Random Battle at the moment, is it balanced enough? There's already a sets discussion page but the key way in which Random Battle is balanced is with leveling. Yes, one's luck kinda balances over many matches with Random Battle, but the Pokemon aren't all set to level 100, their levels are determined by their tier as follows:

Code:

LC: 88,
LC Uber: 86,
NFE: 84,
PU: 83,
PUBL: 82,
NU: 81,
NUBL: 80,
RU: 79,
RUBL: 78,
UU: 77,
UUBL: 76,
OU: 75,
Uber: 73,
AG: 71,

What do people think of this balancing? An objection I've heard in the Random Battles room recently is that Ubers should be nerfed, this could easily be done by reducing their level, but what to? And is this objection justified?

Also, TheImmortal (I doubt you'll get an Alert from this), what's your justification for the current levels? I don't think that's written down anywhere?

strangelostman · Aug 7, 2018

Levels IMO seem to be fine, but there are specific Pokemon that need to be removed/revamped.

Unown - Removed
Kommo-o - Charging Scales/Dragonium Z are wayy too strong. Bring Kommo in on a revenge kill to set up and prepare for a sweep.
Lunala - There are very few Pokemon that can compete with Lunala's bulk in random battles. Even with Sucker Punch Lunala can proceed to recover to 100% to get Shadow Shield up then proceed to sweep. Toxic may be an option at the sacrifice of at least 2-3 of your Pokemon.

There are a few other Pokemon that I think should be completely removed, Farfetch'd for example, but I have either forgotten their moveset or the Pokemon completely.

I wonder if there are statistics that track Pokemon on a team and win rate? That would be helpful with balancing out this metagame.

Austin · Sep 12, 2019

Ads20000 said:
What do people think of Random Battle at the moment, is it balanced enough? There's already a sets discussion page but the key way in which Random Battle is balanced is with leveling. Yes, one's luck kinda balances over many matches with Random Battle, but the Pokemon aren't all set to level 100, their levels are determined by their tier as follows:

Code:

LC: 88, LC Uber: 86, NFE: 84, PU: 83, PUBL: 82, NU: 81, NUBL: 80, RU: 79, RUBL: 78, UU: 77, UUBL: 76, OU: 75, Uber: 73, AG: 71,

What do people think of this balancing? An objection I've heard in the Random Battles room recently is that Ubers should be nerfed, this could easily be done by reducing their level, but what to? And is this objection justified?

Also, TheImmortal (I doubt you'll get an Alert from this), what's your justification for the current levels? I don't think that's written down anywhere?

There are some Pokémon that have custom levels that don’t abide to this ruling just incase you are unaware

Ads20000 · Apr 16, 2020

A Cake Wearing A Hat said:
An alternate, winrate-based leveling system is and has been on the table for a while now ... Just a matter of time and coding resources.

Posting this as a note relevant to this thread: https://www.smogon.com/forums/threads/ss-random-battle-suspect-process-dynamax.3662006/post-8430966

Diophantine · Apr 16, 2020

I think that winrate-based levels are flawed because any given Pokemon can be carried or carry its team and there are so many variables uncounted for. What teammates did you have? What did your opponent have in a given game? You might have a team of really bad individual Pokemon that just happen to work really well together, or conversely a team of OU Pokemon that don't synergise well with each other at all. The same could be said for the opponent.

I see the argument for trying to use objective reasoning to determine a hierarchy, so to speak, of Pokemon in the RandBatts format, but I think that you need to dig a bit deeper than just analysing winrates.

You could argue that things could statistically balance out, but I think that there are way too many variables that impact a Pokemon's viability ingame that it has nothing to do with.

Mango Smoothie · Apr 16, 2020

Diophantine said:
I think that winrate-based levels are flawed because any given Pokemon can be carried or carry its team and there are so many variables uncounted for. What teammates did you have? What did your opponent have in a given game? You might have a team of really bad individual Pokemon that just happen to work really well together, or conversely a team of OU Pokemon that don't synergise well with each other at all. The same could be said for the opponent.

I see the argument for trying to use objective reasoning to determine a hierarchy, so to speak, of Pokemon in the RandBatts format, but I think that you need to dig a bit deeper than just analysing winrates.

You could argue that things could statistically balance out, but I think that there are way too many variables that impact a Pokemon's viability ingame that it has nothing to do with.

Assuming we know what teammates are present, couldn't we do something along the lines of an adjusted win-rate based on teammates? We should have enough random teams that we can say "how does Spinda's win-rate change when it has Zacian-C as a teammate vs when it doesn't" and be able to detect differences.

Weighting could be something along the lines of:
-> Determine level based on overall win-rate
-> Calculate winrate when paired with high win-rate teammates VS with low win-rate teammates (or however you want to slice it)
-> If these win-rates are 'significantly' different, either adjust up or down

I don't know exactly what information Randbats logs have but I think the overall structure of reweighting doesn't change much. You could do the same thing with the opponent's team (Zacian-C having a win-rate of 80% against LO Delibird would be very different than it steamrolling top-tier Randbat threats).

Instead of directly weighting levels, you could do something like a "Threat Score" and shove it through the process I just mentioned. Exact details would need some ironing out (and I'm coming up with these things as I go) but ideally "threat score" should be able to encompass supporting roles as well (e.g. Shuckle does a really good job of enabling other threats so it gets a high threat score, Delibird does an eh job so it gets a moderate threat score).

Edit: Typos

Ads20000 · Apr 16, 2020

Diophantine said:
I think that winrate-based levels are flawed because any given Pokemon can be carried or carry its team and there are so many variables uncounted for. What teammates did you have? What did your opponent have in a given game? You might have a team of really bad individual Pokemon that just happen to work really well together, or conversely a team of OU Pokemon that don't synergise well with each other at all. The same could be said for the opponent.

I see the argument for trying to use objective reasoning to determine a hierarchy, so to speak, of Pokemon in the RandBatts format, but I think that you need to dig a bit deeper than just analysing winrates.

You could argue that things could statistically balance out, but I think that there are way too many variables that impact a Pokemon's viability ingame that it has nothing to do with.

I suspect that there's enough Randbats played that this doesn't matter (because each Pokemon will be on a wide variety of different teams, so it's overall winrate will be an accurate reflection of its true strength). By a similar argument to yours (though I agree it doesn't fully apply) one could argue that Smogon's usage-based metas are flawed because they also are based on the straightforward stats (albeit usage, rather than winrate) of one Pokemon, and yet I don't see many arguing that should be fundamentally changed, the system seems to work, on the whole.

Also, I've just noticed that you've pre-empted this exact argument in your last paragraph, I still hold that I think things will statistically balance out, given the very high number of Random Battle games played, and the variables involved in dynamic levelling based on winrate can be adjusted to increase the chance of it statistically balancing out (increasing the threshold for 'proof' that a level should be changed by the automated system e.g. by increasing the period of time in which the updates happen (e.g. every three months instead of every month) - which means that more games are played to inform the levelling); increasing the threshold required for a Pokemon's level to be adjusted up or down (e.g. to 60% winrate for a reduction, rather than 55%) - which means that changes don't happen unless there is very considerable statistical winrate evidence that a Pokemon is too strong/weak; reducing the level adjustments made (e.g. reducing the level increase from 2 to 1 for a <40% winrate in a given winrate assessment period) - which means that the impact of the dynamic levelling system is more gradual), though the only way to find out for sure would be to give it a go...

There's the additional point that making a levelling system more complex than just basing on winrate (just as Smogon singles tiers are primarily based on just usage rate) sounds even more difficult to code and justify.

Also wouldn't you think that a winrate-based levelling system would at least be an improvement on the current levelling system based on Smogon singles usage-based tiers? I suppose, on this reflection, I can see why A Cake Wearing A Hat and The Immortal are right that the need for refining the levelling system becomes more urgent if Dynamax isn't banned - because Smogon tiers become less relevant in that instance and thus the current system becomes more problematic, and also what Cake says about movesets not having to change as much on the move to a dynamic levelling system if Dynamax is kept (I presume that's because if Dynamax is kept then the format will already be moving away from Smogon tiers, so adding dynamic levelling would help the meta to balance around Dynamax, whereas if Dynamax is banned then a move to dynamic levelling would be more of a shock to the format which would require changing all the movesets?) See Cake's really helpful clarification below!!

Ads20000 · Apr 16, 2020

Austin said:
There are some Pokémon that have custom levels that don’t abide to this ruling just incase you are unaware

I wasn't aware of this, for reference, the custom levels are currently as follows:

JavaScript:

// Banned Abilities
Dugtrio: 77, Gothitelle: 77, Pelipper: 79, Politoed: 79, Wobbuffet: 77,

// Holistic judgement
Unown: 100,

// Custom level based on moveset
if (ability === 'Power Construct') level = 73;
if (item === 'Kommonium Z') level = 77;

Darvin · Apr 16, 2020

Diophantine said:
You could argue that things could statistically balance out, but I think that there are way too many variables that impact a Pokemon's viability ingame that it has nothing to do with.

This could be tested by looking at the generation 7 numbers and seeing what degree of consistency we get month-to-month for specific pokemon that didn't receive set changes. That can calibrate our expectations for what represents a statistically significant win-rate deviation.

A Cake Wearing A Hat · Apr 16, 2020

Clearing misconceptions:
Level changes are not directly related to set changes or breaking from tiering in any manner. Breaking from tiering is happening either way. Set changes need to occur if dynamax is banned because half of the sets of mons are low-key dynamax based anyway (sd mew, bounce gyara, etc); fixing this will obviously be higher priority than balance changes and the dynamax ban and following set changes would inherently fuck with the statistics for a couple months before shit stabilizes again.

Win rates will balance out over the massive sample size random battles gives, and an average win rate of all Pokemon can be calculated as a reference point. I do not believe the integrity of this measure is in need of questioning, and any more complex metric would be really stupid to code, even moreso than this project is currently looking to be.

I believe that if you wish to be productive in this thread, the best thing to do is focus on these questions and help iron out the subjective parts of this: given individual win rates of each Pokemon and an average win rate of all Pokemon each month,

- What is the range of win rates at which a Pokemon can be deemed "balanced"? +- one of the standard deviation of the first month's win rate stats? Something else? Do we need to see the stats first?

- assuming a mostly-automated monthly level change system, what amounts of deviation from the mean correspond to +-1 level in a month? +-2? 3? More? How drastic do we want monthly changes to be, and how much time are we willing to spend making minuscule monthly adjustments over time to balance a Pokemon?

Ads20000 · Apr 19, 2020

A Cake Wearing A Hat said:
What is the range of win rates at which a Pokemon can be deemed "balanced"? +- one of the standard deviation of the first month's win rate stats? Something else? Do we need to see the stats first?

I think we need to see the stats first and try and work out what reasonable range from 50% should be considered balanced. Smogon usage-rates originally came up with the '50% chance to see a Pokemon in 20 games' stat for usage, I can't remember what it's been amended to now, but I would probably help for finding some reasonable heuristic like that for winrates if we could look at the stats to see what the current distribution from 50% winrate looks like. Hard to do objectively from first principles; I reckon those who play the tier a lot (a council, if you like! I think there's a 'circuit' tour going on atm? Best Randbats players would probably be top of ladder though since Randbats is ladder-focused) could see which Pokemon 'feel balanced' around the 50% winrate mark and set the range accordingly. Struckthrough because I followed that train of thought (leaving it up as it may be helpful if that route is taken) and then realized that we could just do 50% +- stdev of first month's stats as you suggested, that would do the job just as well and would be far simpler! So I think my 'vote' would be for that suggestion. That could then be updated to 50% +- stdev of a more recent month down the line enough for the tier to stabilize around the initial metric (for it to be considered fully stabilized would probably mean no level changes under 50% +- stdev of month 1 for a good few months, maybe a year)?

A Cake Wearing A Hat said:
assuming a mostly-automated monthly level change system, what amounts of deviation from the mean correspond to +-1 level in a month? +-2? 3? More? How drastic do we want monthly changes to be, and how much time are we willing to spend making minuscule monthly adjustments over time to balance a Pokemon?

To allow for a fine-grained balancing I think one (month one) standard deviation from 50% should be +-1 level. Two standard deviations should be +-2 levels etc? That should move the outliers faster but the ones closer to 50% not too fast... Obviously we should keep a close eye on the stats, basing things around standard deviations should 'just work', I hope, but if Pokemon end up flip flopping outside the standard deviations (i.e. moving from 50% -2 stdev pre-monthly-adjustment to 50% +2 (or +3) stdev post-monthly-adjustment) then that should be reduced to +-1 level for everything. I hope the adjustments could be automated (a simple (I hope) script that analyses the usage stats then adjusts the Showdown code accordingly which can then be submitted as a pull request?) so that the adjustments can be permanent thing unless adjusted/removed (or, if that's too hard, I'd be happy to do it manually in a PR). To help players, I don't know if there's a Random Battle damage calculator (edit: yes there is, in the default one)? Ideally the level update script would update that as well (and one would be created if one doesn't already exist - I don't code but presumably I'd just need to take the existing Honko one and adjust everything to work for Randbats? Then get it hosted somewhere...)

Thanks Cake for engaging with this thread so constructively!

Random Battle levels

Ads20000

strangelostman

Austin

Schismatic

Ads20000

Diophantine

Banned deucer.

Mango Smoothie

Ads20000

Ads20000

Darvin

A Cake Wearing A Hat

moist and crusty

Ads20000