Data Official Smogon University Usage Statistics Discussion Thread, mk.3

In any one chaos file, on what scale are the numbers associated with a Pokémon's move? They don't appear to be normalized percentages nor raw counts.
 

DoW

formally Death on Wings
is a Pre-Contributor
In any one chaos file, on what scale are the numbers associated with a Pokémon's move? They don't appear to be normalized percentages nor raw counts.
> let movecount = 0;
> for (var i in ttar.Moves) movecount += ttar.Moves;
> movecount/4 === ttar['Raw count']
true


It would appear to be the raw count of that move. Note that in many cases the movecount/4 won't exactly equal the pokemon's raw usage, if there are people bringing the mon with only 3 moves or whatever, I imagine.
 

DoW

formally Death on Wings
is a Pre-Contributor
Using June, May and April stats, the calc is (20*2,87 + 3*1,66 + 1*0,3 )/24
Given none of its monthly usage is above 3,41% I don't get how it could have risen.
I'm on my phone right now so can't do a comprehensive check, but I can see the 1760 stats had it at 5.3% this month - is it possible the wrong weighting was used for determining rises?
 
I'm on my phone right now so can't do a comprehensive check, but I can see the 1760 stats had it at 5.3% this month - is it possible the wrong weighting was used for determining rises?
I didn't realize there could be a misunderstanding between elo weighting and monthly weighting but to make it clear I used the 1630 stats for those 3 months.

Also this might be a bit late to bring up the subject since it's about last month tier shifts
 
  • Like
Reactions: DoW

Merritt

literally the textbook definition of a tsundere
is a member of the Site Staffis a Forum Moderatoris a Community Contributoris a Contributor to Smogon
Moderator
I didn't realize there could be a misunderstanding between elo weighting and monthly weighting but to make it clear I used the 1630 stats for those 3 months.

Also this might be a bit late to bring up the subject since it's about last month tier shifts
I can see your problem. You did just the pure NU ladder stats, and didn't do the suspect ladder stats, which are counted.

https://www.smogon.com/stats/2018-06/gen7nususpecttest-1630.txt

These have 4.80287% Torterra usage, and since there was a nearly equal number of battles on the suspect ladder vs the normal ladder in June (37722 battles vs the normal ladder's 39712 battles) I'm coming up with a combined usage of 3.812413722% usage for June.

You also want to do the same thing with May as well since the suspect ladder in May also had a higher Torterra usage in the 1630 stats than the normal ladder (1.82372% vs 1.65823% in 39645 battles vs 41514 battles respectively).
 
I can see your problem. You did just the pure NU ladder stats, and didn't do the suspect ladder stats, which are counted.

https://www.smogon.com/stats/2018-06/gen7nususpecttest-1630.txt

These have 4.80287% Torterra usage, and since there was a nearly equal number of battles on the suspect ladder vs the normal ladder in June (37722 battles vs the normal ladder's 39712 battles) I'm coming up with a combined usage of 3.812413722% usage for June.

You also want to do the same thing with May as well since the suspect ladder in May also had a higher Torterra usage in the 1630 stats than the normal ladder (1.82372% vs 1.65823% in 39645 battles vs 41514 battles respectively).
Thanks for the explanation ! (worth being added in the faq post honestly)

However this raises another problem for me which is that using suspect ladders usage means we're basing the tier shifts on an inaccurate representation of a metagame if the suspect ends with a ban.

Looking at the latest suspects in UU and PU it doesn't seem like it's going to happen anymore, but wouldn't it be better to not count or give a smaller impact to some stages of metagames so it prevents some rises followed by immediate drops. This is something I think should be considered in the current discussion about changing the tier shift policy taking place in the Policy Review forums.
 
Last edited:

Merritt

literally the textbook definition of a tsundere
is a member of the Site Staffis a Forum Moderatoris a Community Contributoris a Contributor to Smogon
Moderator
However this raises another problem for me which is that using suspect ladders usage means we're basing the tier shifts on an inaccurate representation of a metagame if the suspect ends with a ban.
I can’t say I really agree with the logic here, since if a suspect test ends in a ban then the normal ladders from before the suspect test would be equally invalid.

To give an example, let’s say that RU does a Sneasel suspect and starts right at the beginning of the third month. They finish and close the suspect ladder around a week before the month ends, Sneasel is banned and removed from the normal ladder as well. That would mean that the suspect ladder usage stats, as well as the first and second month normal ladder stats are all based on a metagame with Sneasel. We should not discard all these stats because then the entire tier shift would be based off an incredibly short period of time - the week or so after the suspect concludes.

Basically, the only scenario where it might be worth considering not including the suspect ladder stats would be if something was retested and not unbanned. In scenarios where something is retested and unbanned you definitely want to include that suspect ladder if you’re concerned about stats for the current metagame, scenarios where something is tested and not banned is obviously identical to the current metagame, and I explained earlier why excluding tests where the suspect is banned doesn’t make logical sense.

This of course isn’t getting into the fact that quickdrops are also a potential source of a different meta - for example Stoutland wasn’t in PU during July but will be for August and September. That doesn’t mean we should exclude July stats.
 
I can’t say I really agree with the logic here, since if a suspect test ends in a ban then the normal ladders from before the suspect test would be equally invalid.
This is exactly the point I'm trying to make: former stages of a metagame should have a smaller impact on the tier below, not the suspect test period only. If the suspect ends with no ban it's fine, we can proceed with the usual shifts since the metagame hasn't changed, using both ladders. If it ends with a ban the current method leads to disturbing the good functionning of lower tiers, let's see how by taking the example you gave.

Sneasel is banned from RU during the last month before tier shifts. We can assume its presence in the metagame leads to some psychic/ghost types being less used and naturally dropping during this shift. Then the tier below has to adapt, make its own suspects, but during the next 3 months these same pokemons are going to claim back their spot in RU. In the end NU (and the tiers below too by domino effect) will have experienced changes for nothing, because drops were calculated on an inaccurate version of RU.

So since we can't base shifts on just a few days of usage as you pointed out, what are the solutions ? Without changing much the current system of monthly usage stats the wisest might be to just wait for a few months to implement changes. I'm also interested to know how feasible are other solutions like computing usage between dates that aren't the 1st and the last day of the month or dropping teams using what ended being banned when counting usage (although I doubt of the accuracy of this one).

That's all I can think of about the subject for now. Sorry for not making this clearer in my first post.
 

Merritt

literally the textbook definition of a tsundere
is a member of the Site Staffis a Forum Moderatoris a Community Contributoris a Contributor to Smogon
Moderator
This is exactly the point I'm trying to make: former stages of a metagame should have a smaller impact on the tier below, not the suspect test period only. If the suspect ends with no ban it's fine, we can proceed with the usual shifts since the metagame hasn't changed, using both ladders. If it ends with a ban the current method leads to disturbing the good functionning of lower tiers, let's see how by taking the example you gave.

Sneasel is banned from RU during the last month before tier shifts. We can assume its presence in the metagame leads to some psychic/ghost types being less used and naturally dropping during this shift. Then the tier below has to adapt, make its own suspects, but during the next 3 months these same pokemons are going to claim back their spot in RU. In the end NU (and the tiers below too by domino effect) will have experienced changes for nothing, because drops were calculated on an inaccurate version of RU.

So since we can't base shifts on just a few days of usage as you pointed out, what are the solutions ? Without changing much the current system of monthly usage stats the wisest might be to just wait for a few months to implement changes. I'm also interested to know how feasible are other solutions like computing usage between dates that aren't the 1st and the last day of the month or dropping teams using what ended being banned when counting usage (although I doubt of the accuracy of this one).

That's all I can think of about the subject for now. Sorry for not making this clearer in my first post.
I don't really agree with calling any meta inaccurate just because it has a Pokemon that is later banned. Metagames cycle even without bans - it's why OU had Amoongus rise and fall despite there not being any OU bans - and ultimately that's something that impacts lower tiers. Sneasel could be banned which causes Psychic-types to be used more or it could not be banned and its presence just causes significantly more Fighting-type usage, which in turn causes Sneasel usage to drop and Psychic-types to be used more. Both result in said Psychic-types potentially moving to RU, it's just that one is due to a forced metagame shift and the other is because of a natural adaption of the meta.

From a philosophical standpoint though, things moving around is ok to me. If something doesn't get usage in RU then it shouldn't be RU by usage, and if it later gets more usage in RU (either because of a normal metagame cycle or because of a ban) then it should move to RU. Honestly from a pure philosophy standpoint I feel like full shifts should happen more frequently, not less, because that means that the tiers are more accurate to actual usage. This obviously comes at the cost of increasing volatility in lower tiers, so it's not ideal.

I don't think that any stats should be discarded if that wasn't clear yet, and I don't even think that a metagame that is somewhat different having an impact on stats is overly detrimental when we're talking about the significant period of 3 months between shifts. Sure, bans or unbans usually have an effect on how the metagame looks, but you get the same effect just from people experimenting and discovering new potent threats that dethrone the current meta-defining Pokemon.

I don't see too much of an issue with the current way overall stats are calculated (aside from the way usage is weighted between months - I do feel that the current weights give too much influence to the third month and not enough to the first and second), and any attempt at improving it would have to not end up basing a tier shift around a very small period of time. For example, if we were to exclude the NU suspect ladder in June as well as the metagames that came before it since Gigalith was banned then we'd have no stats at all since that ladder wasn't removed before July 1st. Similarly, only excluding teams that used the banned Pokemon like you mentioned is based on the flawed premise that a Pokemon who is later banned would only cause teams that use the Pokemon itself to be affected by its influence which is not the case. I don't have a solution to propose, mostly because I don't think there's a problem to be solved here.
 
Stats for August are up

Regarding weighting suspects: IIRC I wasn't a fan of including them either, but it used to be, at least, that the non-suspect ladders were basically dead during suspect periods, to the extent that some tiers would decommission the non-suspect ladder during the suspect test. So including them in the weighting was a necessary evil. If this is something that the policy heads want to revisit, I'm all for changing it--just let me know what gets decided.

Regarding move counts in the chaos files: they should be weighted counts just like everything else. Just keep in mind that not all mons have all four move slots filled, so the sum won't add up to total count x 4. cosine180 opw_Blade

sumwun If it's not already up there then no. IIRC metronome is one of the metagames that breaks my stats scripts.
 
First of all, I’m sorry if this isn’t the correct place to ask.

Any chance we can get a win% statistic for every Pokémon in the Usage stats like Smogon Tour already does?

To make it even better, have it be “adjusted win%”, where stats are not taken into account when both teams have the same Pokémon since it would be 1 win and 1 loss regardless of the Pokemon’s effect on the battle.

It would really help in ban and suspect test discussions, empirical data is always amazing.
 
MagikaripIsOP -- no, because win % is a garbage statistic, especially on a rated ladder where the theoretical win-rate for any team is 50%. What the moveset stats report instead is a Pokemon's viability ceiling, which is the highest GXE of any team using that Pokemon. There's additional information in the "chaos" raw datasets if you're interested in looking at data beyond the very top player.

Anyhoo, stats for the month are now up. Please disregard any data for "Let's Go" tiers.
 
Last edited:
MagikaripIsOP -- no, because win % is a garbage statistic, especially on a rated ladder where the theoretical win-rate for any team is 50%. What the moveset stats report instead is a Pokemon's viability ceiling, which is the highest GXE of any team using that Pokemon. There's additional information in the "chaos" raw datasets if you're interested in looking at data beyond the very top player.

Anyhoo, stats for the month are now up. Please disregard any data for "Let's Go" tiers.
I wholeheartedly disagree. Rated ladders can have win%’s that vary significantly from 50%.

Let’s use overwatch as an example (I don’t play that game anymore because fortnite is better but whatever). Look here: https://www.overbuff.com/heroes

Set it to grandmaster or whatever rank you want. The win rates vary from 59% from 50%. More importantly, that site takes into account matches where both teams use the same hero. In other words, their win rate statistics will be much closer to 50% than they should be since matches with the same hero will always be 1 win and 1 loss regardless, which is why the high usage rate heros are near the bottom.

And I’m not looking at win% from a team by team basis but on a pokemon ny Pokémon basis, just like the usage stats already are.

GXE is great but it only shows how high one person was able to get. It is very susceptible to outliers, almost by definition. Looking at a Pokémon’s “adjusted win%” either globally or from people like the 1825/1750 players should be another important statistic to have.
 

DoW

formally Death on Wings
is a Pre-Contributor
I wholeheartedly disagree. Rated ladders can have win%’s that vary significantly from 50%.

Let’s use overwatch as an example (I don’t play that game anymore because fortnite is better but whatever). Look here: https://www.overbuff.com/heroes

Set it to grandmaster or whatever rank you want. The win rates vary from 59% from 50%. More importantly, that site takes into account matches where both teams use the same hero. In other words, their win rate statistics will be much closer to 50% than they should be since matches with the same hero will always be 1 win and 1 loss regardless, which is why the high usage rate heros are near the bottom.

And I’m not looking at win% from a team by team basis but on a pokemon ny Pokémon basis, just like the usage stats already are.

GXE is great but it only shows how high one person was able to get. It is very susceptible to outliers, almost by definition. Looking at a Pokémon’s “adjusted win%” either globally or from people like the 1825/1750 players should be another important statistic to have.
Not to put words into Antar 's mouth here, but I suspect he is working based on the assumption that people use the same teams for extended periods of time, in which case what he's saying should be pretty much true - there'd be a small positive in the wr as you first used the team, but then you'd just be ranked higher and therefore playing stronger players, and the wr would slowly go back towards 50%, according to Antar's maths.

I think you're also right about there being a noticable difference in win rate with better pokemon, though, but that this doesn't mean that this statistic would give you what you want to know. I'll explain in detail; alternatively there's a tl;dr below if you only care about the conclusion.

Antar is working under the assumption that, after playing enough games on the ladder for you to get to the correct ranking, that every game you play is against someone of equal level to you. However, the ladder doesn't work exactly like that. There's a range of rankings that you can play against at any one time - maybe anyone between 50 points above or below you, though obviously it's a little more complex than this as you can play someone significantly further away if you have to wait for someone to show up, etc.. I could give a more detailed description if I could be bothered to read zarel's code, but +-50 points of you will be a good enough estimate for this.

How players are ranked should be a gaussian distribution, or "bell curve", which looks something like this:
bell_curve.jpg

(I'm aware that Elo assumes a modified gaussian distribution with an extended tail, but as far as I'm aware, the actual ranks people have follow a gaussian, at least well enough for our purposes here.)

As you can see, if you're perfectly average, if you pick a random player +-50 ranking points of you, you're just as likely to get someone better than you than worse. But if you're a better player, towards the right of the curve, then you're far more likely to face an opponent worse than you than one who's better, because there aren't all that many who are better than you, and there's a whole lot who are worse (even within just those 50 points). If you're playing people worse than you, you can expect to have a win ratio of >50%. Similarly, those towards the bottom of the ladder would see a lower win rate. I don't know if this is the same in overwatch or not, but at the same time I'll bet you £1000 that it is.

tl;dr I don't think a win ratio for pokemon would display how good they are, but rather how popular they are among good players relative to among worse players. And it would do a less accurate job of this than higher-weighted usage stats.

Another issue would be that this rating would be highly manipulable. With other ranking systems, there are built-in mechanisms to stop them being abused, which basically boil down to "if you manipulate the ladder into only pairing you with bad players, when you eventually get bad enough hax that you lose, you'll lose a vast number of rating points", which stop this from being worthwhile. (For the record, I've seen people manipulate the ladder in this way, and stop bothering once they lost an extremely unfortunate game to someone 300 points below them. The system works.)

If you wanted a pokemon to have a higher win-rate, this is not the case. You could just make new alts, ladder to 10-0 with the pokemon, rinse and repeat. Even if you set a minimum ranking to count, players who wanted the pokemon banned could just /forfeit when they saw it, and not use the pokemon themselves. Each of these would be far more effective than, say, using a pokemon you want removed from the meta below, although I haven't calculated whether it would be effective enough to actually bother with either.


I was hoping to finish this reply with "and here's how to get the statistic you want", but I honestly can't think of a way to get an accurate winrate statistic for a pokemon, without somehow keeping track of what teams pokemon people are using, as well as what they previously used. That would be an option, perhaps, but it would be a harder thing to compute than standard usage stats for sure, and would need a fair amount of programming just for it as well. That's not to say there isn't a way I haven't thought of yet, though.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top