Data Official Smogon University Usage Statistics Discussion Thread, mk.3

Status
Not open for further replies.
Not to put words into Antar 's mouth here, but I suspect he is working based on the assumption that people use the same teams for extended periods of time, in which case what he's saying should be pretty much true - there'd be a small positive in the wr as you first used the team, but then you'd just be ranked higher and therefore playing stronger players, and the wr would slowly go back towards 50%, according to Antar's maths.

I think you're also right about there being a noticable difference in win rate with better pokemon, though, but that this doesn't mean that this statistic would give you what you want to know. I'll explain in detail; alternatively there's a tl;dr below if you only care about the conclusion.

Antar is working under the assumption that, after playing enough games on the ladder for you to get to the correct ranking, that every game you play is against someone of equal level to you. However, the ladder doesn't work exactly like that. There's a range of rankings that you can play against at any one time - maybe anyone between 50 points above or below you, though obviously it's a little more complex than this as you can play someone significantly further away if you have to wait for someone to show up, etc.. I could give a more detailed description if I could be bothered to read zarel's code, but +-50 points of you will be a good enough estimate for this.

How players are ranked should be a gaussian distribution, or "bell curve", which looks something like this:
View attachment 154360
(I'm aware that Elo assumes a modified gaussian distribution with an extended tail, but as far as I'm aware, the actual ranks people have follow a gaussian, at least well enough for our purposes here.)

As you can see, if you're perfectly average, if you pick a random player +-50 ranking points of you, you're just as likely to get someone better than you than worse. But if you're a better player, towards the right of the curve, then you're far more likely to face an opponent worse than you than one who's better, because there aren't all that many who are better than you, and there's a whole lot who are worse (even within just those 50 points). If you're playing people worse than you, you can expect to have a win ratio of >50%. Similarly, those towards the bottom of the ladder would see a lower win rate. I don't know if this is the same in overwatch or not, but at the same time I'll bet you £1000 that it is.

tl;dr I don't think a win ratio for pokemon would display how good they are, but rather how popular they are among good players relative to among worse players. And it would do a less accurate job of this than higher-weighted usage stats.

Another issue would be that this rating would be highly manipulable. With other ranking systems, there are built-in mechanisms to stop them being abused, which basically boil down to "if you manipulate the ladder into only pairing you with bad players, when you eventually get bad enough hax that you lose, you'll lose a vast number of rating points", which stop this from being worthwhile. (For the record, I've seen people manipulate the ladder in this way, and stop bothering once they lost an extremely unfortunate game to someone 300 points below them. The system works.)

If you wanted a pokemon to have a higher win-rate, this is not the case. You could just make new alts, ladder to 10-0 with the pokemon, rinse and repeat. Even if you set a minimum ranking to count, players who wanted the pokemon banned could just /forfeit when they saw it, and not use the pokemon themselves. Each of these would be far more effective than, say, using a pokemon you want removed from the meta below, although I haven't calculated whether it would be effective enough to actually bother with either.


I was hoping to finish this reply with "and here's how to get the statistic you want", but I honestly can't think of a way to get an accurate winrate statistic for a pokemon, without somehow keeping track of what teams pokemon people are using, as well as what they previously used. That would be an option, perhaps, but it would be a harder thing to compute than standard usage stats for sure, and would need a fair amount of programming just for it as well. That's not to say there isn't a way I haven't thought of yet, though.
In regard to numbers getting nearer to 50% as time goes on, if we look back to overwatch, heroes can hit 59%. Like I said, that’s including matches where both teams use the same hero. After accounting for that, I wouldn’t doubt it if we had top Pokémon hit 65%. Even Pokémon hitting 60% is significant. We have actual data to off of here. Just like top players in Pokémon, top players in overwatch use the same hero everytime. Even moreso than smogon players using the same team. Yet we still have noticable differences in win%.

Personally, I think the best metric would be adjusted win% for 1825 players in OU, and 1750 everywhere else. But that’s not really relevant to the discussion.
 

spatula

I LOVE CHIPFLAVOUR
is a Tiering Contributor
Not to put words into Antar 's mouth here, but I suspect he is working based on the assumption that people use the same teams for extended periods of time, in which case what he's saying should be pretty much true - there'd be a small positive in the wr as you first used the team, but then you'd just be ranked higher and therefore playing stronger players, and the wr would slowly go back towards 50%, according to Antar's maths
This could technically be accounted for by excluding the first x games played by an account, since every account begins at the lowest Elo possible.

While i agree winrate has the potential to be manipulated/erroneously interpreted, I'm not sure if the viability ceiling metric is accurate for that either since if someone wins a ton with a "bad" pokemon it could inflate the viability ceiling significantly.

Is there any way to obtain the raw data? I am interested in playing around with it and see if theres any way to get a decent metric despite the many variables at play that antar and other users have pointed out, if nothing else than for my own curiosity.
 

DoW

formally Death on Wings
MagikaripIsOP yes, I agree that the winrate goes above 50% - but as I said, I'm unconvinced this is directly correlated with the pokemon actually being better (although obviously it's strongly indirectly correlated, if all the good players are using it).

Adjusted win% at 1850+ runs into the same issues as everything else if we assume people stick to the same teams per alt: If we assume that a player has a true rating n, and that using a pokemon causes you to have a significantly higher chance of winning, then that in effect raises your true rating to n + x, where x is whatever benefit running this pokemon gives. If we look at the win rate of all those with a PS! rating above 1850, some of those players using the broken pokemon will have a true rating of (1850 - x), and only seem good enough to factor into our stats because they're using this broken pokemon. This means we'll be comparing win rates of good players not using the pokemon, with those of (on average) slightly less good players who are using the pokemon. (Again there's further things to consider here, like the fact that good players not using the mon will have their PS! rating slightly lower than their true ranking, and some people will switch teams, etc., but while these factors make it more complex, they don't make the proposed rating system any more valid).

So tl;dr yeah, the number might be positive, or it might be negative, and it will contain some information, but it won't be the information you want (i.e. the win rate of those with a true rating of 1850+ who use the pokemon, vs those who don't).

This could technically be accounted for by excluding the first x games played by an account, since every account begins at the lowest Elo possible.

While i agree winrate has the potential to be manipulated/erroneously interpreted, I'm not sure if the viability ceiling metric is accurate for that either since if someone wins a ton with a "bad" pokemon it could inflate the viability ceiling significantly.

Is there any way to obtain the raw data? I am interested in playing around with it and see if theres any way to get a decent metric despite the many variables at play that antar and other users have pointed out, if nothing else than for my own curiosity.
I've spoken to Antar about this in the past for similar reasons, and the answer is generally that he won't give away the raw stats (at least without a very good reason), but if you have something you think might work well, you can code it up and add it to this codebase, in order for it to be run alongside the rest of the stats generation.
 
Okay, I just found something in here and here.
Code:
 | 45   | Swinub             |  3.48688% | 42     |  0.315% | 31     |  0.319% | 
 | 46   | Minior             |  3.23044% | 609    |  4.561% | 420    |  4.324% | 

 +----------------------------------------+ 
 | Swinub                                 | 
 +----------------------------------------+ 
 | Raw count: 47                          | 
 | Avg. weight: 0.650092130445            | 
 | Viability Ceiling: 85                  | 
 +----------------------------------------+
Is it true that "raw count" means the number of times the Pokemon was used? If yes, how the FAQ did Swinub get 3.5% weighted usage from only 47 (or 42, the 2 files don't even agree with each other) uses? For reference, the Pokemon ranked immediately below Swinub was used 609 times.
 

Marty

Always more to find
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Member of Senior Staffis a Community Contributoris a Top Researcheris a Top Tiering Contributor
Research Leader
Why is Antar not posting the stats this month?
Antar retired; I'll be posting stats for the foreseeable future.
Check out Antar's thread from the end of Gen 5 for a general gist of what his script looks for, and the post he links from there for Gen 6 updates.
https://www.smogon.com/forums/threads/metagame-analyses-gen-vi-changes.3470954/
The script itself probably has more info than is written anywhere else. Should be straightforward enough that anyone can look through it and understand what's going on, even with no knowledge of Python.
 

Katy

Banned deucer.
Can any of the proposed tiering policy changes, such as equal weighting across all 3 months or requiring a 3.41% average usage over 6 months, possibly happen anytime in the next few months? If no, why not?
Hello,

I dont know if this is helpful for you, but the discussion was already in the Tiering Policy thread, which dodmen brought up: https://www.smogon.com/forums/threads/lower-tier-shifts.3638656/ maybe u find more inside thoughts about why / why not those changes will be implemented.
 
  • Like
Reactions: DoW
Hello,

I dont know if this is helpful for you, but the discussion was already in the Tiering Policy thread, which dodmen brought up: https://www.smogon.com/forums/threads/lower-tier-shifts.3638656/ maybe u find more inside thoughts about why / why not those changes will be implemented.
Did you look in there for reasons they weren't implemented? Did you find anything? Because I already looked, and if I found my answer then I wouldn't be asking about it here.
 
This is probably a very stupid question, but I couldn't find the answer anywhere. Each file has three columns: the name of the metagame, the date it was uploaded, and a number 4 or 5 digits long. Is that number the amount of games taken into account? If so, why don't the numbers get smaller as the required ELO goes up?

The reason why I'm asking is because I am fed up with OU's bulky protect and regenerator heavy meta, and I was looking for the next most played format. I'm not sure if that's Ubers, UU, or Monotype. If that number is the total number of games played, then there have been more Ubers battles than OU, which I find astonishing.

Edit: I think it might just be the number of characters in the file, which would make sense. Still, I'd still love to see a documentation of the total number of battles held in each tier that month.
 
This is probably a very stupid question, but I couldn't find the answer anywhere. Each file has three columns: the name of the metagame, the date it was uploaded, and a number 4 or 5 digits long. Is that number the amount of games taken into account? If so, why don't the numbers get smaller as the required ELO goes up?

The reason why I'm asking is because I am fed up with OU's bulky protect and regenerator heavy meta, and I was looking for the next most played format. I'm not sure if that's Ubers, UU, or Monotype. If that number is the total number of games played, then there have been more Ubers battles than OU, which I find astonishing.

Edit: I think it might just be the number of characters in the file, which would make sense. Still, I'd still love to see a documentation of the total number of battles held in each tier that month.
You can find the total number of battles at the top of each file in this folder : https://www.smogon.com/stats/2019-02/.
For example, last February, uu had 215062 battles, ubers 301532 and monotype 284225.

This number doesn't change depending on ELO because the ELO only changes the weighting of the games when calculating the usage stats. It does not remove low ladder games from being counted in the high ladder stats, they simply matter less.

Hope this helped.
 
I was working with the caos file and everything went ok when comparing my calcs to the txt, except for the calcs for teammates %.

So i was wondering, from the caos folder, how can we get to the % shown on the .txt file?
Taking this month vgc19UltraSeries-1630 garchomp as an example, the raw json tell us that the most used teammate for garchomp is togekiss with 33.205806% (weighted), but the .txt give us 37.063%

I was able to figure out how to adjust those values for the other variables (Abilities, Moves, Items..), but for teammates i couldn't find a way at all.
 
Last edited:

Merritt

no comment
is a Tournament Directoris a Site Content Manageris a Member of Senior Staffis a Community Contributoris a Contributor to Smogonis a Top Dedicated Tournament Host
Head TD
https://www.smogon.com/stats/2019-05/gen7nu-1630.txt
Code:
| 31   | Whimsicott         |  6.99538% | 16485  |  8.061% | 13074  |  7.956% |
| 32   | Camerupt-Mega      |  6.87299% | 6350   |  3.105% | 5550   |  3.377% |
| 33   | Medicham           |  6.82039% | 12141  |  5.937% | 9393   |  5.716% |
Wasn't Camerupt in NUBL? What happened here?
https://www.smogon.com/forums/threads/np-nu-stage-15-lava-camerupt-mega-retest.3650028/

Mega Camerupt was tested during May, and suspect tests are now generally done on the standard ladder.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top