Data Official Smogon University Simulator Statistics — January 2014

Status
Not open for further replies.

Jimera0

You don't understand, Edgar is the one in the hole!
is a Smogon Social Media Contributor Alumnus
#26
While I'm sure you've probably posted explanations before Antar, I feel that we could use some explanation of what the the "real" vs. "raw" percentages and the like mean, and an explanation of why they're included and the like in the OP of the topic.

I know this has almost certainly been addressed before, but it's somewhat difficult to track down exactly where this was explained. Hell, if you can find a post where it was explained satisfactorily you could just link to it in the OP and call it a day. I feel this would help a lot of people (including myself :P) to better understand what all the numbers mean and how to interpret them, and also stop some people from thinking that the fact that the columns don't all match up means there are errors in the data collection.
 

Antar

is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
Official Data Miner
#27
If something tries to switch out, but is KOed by Pursuit, does that count as KOed or as Switched Out under checks and counters?
Counted as a KO.

Jimera0, yeah--I should really include just a bit of text at the top of each Standard Stats post...
  • Usage % : Weighted
  • Raw: Unweighted
  • "Real": Only counts the Pokemon which actually appear in battle (Doubles not supported)
The reason for the name "real" is historic--back when I first took over the stats and then the running of PO, only the Pokemon that appeared in battle were recorded in the logs, so there was no way to actually *get* the full team stats. When I modified PO to generate logs with full team info in them, we were left with a decision regarding which stats to use, and the argument was that counting only Pokemon appearing in battle was somewhat more legit, because that corresponded to actual, or "real" usage (that argument lost out in the end).
 
#28
Filtering out troll alts, etc. is done at the normal weighting level, FYI (keep in mind--we're using Glicko score, not Elo, where your rating *can* drop below the starting rating).

I'll decide whether to do 1 std. dev or two some time before March 1, once I look at the distribution of ratings on PS.
Thank you for the weighting link as I did not bother to read it before, and it was rather straightforward. That explains so much about the weighing of a Pokemon's usage in tiering, but this seems to depend on what is the "average player" and that trolls who perform worse in their battles relative to the average player are given fewer weight than more skilled players.

A standard score of 1 is a standard score 1. But what I meant was that something based on one's relative ranking depends on the composition of people in that sample. The presence of trolls certainly do affect the mean skill of the ladder by depressing it (and skill is not something with a hard quantity that possesses a true zero and can be easily measured, like height, but something that can be normalized). For the SAT, there are fewer "trolls" who take it just to score in the 200s on the subtests for the obvious reasons, and it possesses a natural mechanism to exclude "trolls" without any statistical filters. Pokemon battling does not have a strong disincentive to discourage trolling, and trolls can collectively influence the definition of "average".

How does the system prevent trolls from influencing the definition of the average player? Although I doubt trolls have much to do with the highly kurtotic distributions at the high end on the old system.

Edit: I do not think this post belongs here since it does not concern metagame trends or usage. But I still believe it is a legitimate question.
 
Last edited:

Antar

is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
Official Data Miner
#29
Edit: I do not think this post belongs here since it does not concern metagame trends or usage. But I still believe it is a legitimate question.
Eh, it's fine.

Regarding trolls, it's true that much of what we've built is based on the assumption that players like to win, but I haven't noticed the rating distribution being at all lopsided. I've actually been pleasantly surprised by how Gaussian everything looks.



This is the distribution of Elo* ratings for the OU ladder for most of January, IIRC. It's borked around 1000 because most alts on PS are only associated with 1 or 2 games (I think 80% of all alts only play one game). But other than that, it's pretty good. The right tail is a little heavier, which makes sense, since players are more likely to "reset" a low alt than a high one, but even that's not totally skewed.

So bottom line: I get that trolls *could* be a problem, but given that we don't see any peaks on this distribution around extremely low ratings, I think we're okay.

*This is not the Elo rating currently deployed on PS. This is Elo calculated strictly using the standard Elo formula, with a K factor of 50, no modifications, no hacks, no nothing.
 
#31
MoxieInfinite, a U-Turn, Volt Switch or Baton Pass counts as a switch-out unless it delivers a KO, in which case it counts as a "U-Turn KO" and isn't counted towards Checks & Counters.
Does this explain the the counterintuitive data that Genesect doesn't "check" or "counter" anything in OU, except Pinsir? Obvious Latios seems to be checked by it, since (Scarf) Genesect has two moves it can choose from to KO it.
 
#34
I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?
 
#35
I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?
Well, I think the most important stat people are interested in are the amount of speed investment a Pokemon has received.

It would be helpful if there was some data that shows how a given Pokemon's speed is distributed among the players. People may use spreads to speed creep other Pokemon that try to speed creep it.

One, for instance, may run a Landorus-T that creeps 44 Speed Rotom-W (just 8 EVs needed) (and 44 Speed Rotom-W doesn't show up in the stats). And some Rotom-W might try to creep that before it U-Turns out by putting 8 extra EVs. I do not think this type of spreads would show up in the usage statistics.

It also makes you wonder how many let's say, (Mega) Scizor, are trying to outspeed minimum speed Heatran (and/or Rotom-W before it is burned) to hit it with Superpower. It seems to be a worthy investment if one wants to get through Heatran (and/or Rotom-W), or lose momentum by manually switching out. This doesn't seem to show up on the statistics either.
 
Last edited:
#36
Well, I think the most important stat people are interested in are the amount of speed investment a Pokemon has received.

It would be helpful if there was some data that shows how a given Pokemon's speed is distributed among the players. People may use spreads to speed creep other Pokemon that try to speed creep it.

One, for instance, may run a Landorus-T that creeps 44 Speed Rotom (just 8 EVs needed) (and that doesn't show up in the stats). And some Rotom-W might try to creep that before it U-Turns out by putting 8 extra EVs. I do not think this type of spreads would show it in the usage statistics.
You are correct that it wouldn't show Speed investment; I will admit I wasn't thinking of that at the time. My ulterior motive is that I breed Pokemon in-game, so it helps to know which natures to give them (since natures can't be adjusted after hatching, while EV spreads can be set later). Obviously the situation is different for people using simulators. I'm not sure how you would want to set up the display of spreads if you wanted to focus on Speed, especially since EVs are much more finely adjustable. Trying to ensure you displayed the majority of levels of investment seems like it might require a lot more slots.
 
Last edited:

migetno1

bRMT Developer
is a Programmer Alumnus
#37
I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?
As far as I know, the json data has ALL the spreads used if the player meets the 1500 cutoff. This leads to common pokemon like Rotom-Wash having about 10000 different spreads listed with many of them having a count under 5. If you wanted to, you could parse this data to get a percentage of the natures used.
 

Antar

is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
Official Data Miner
#38
ArcFurnace -- yes, the data for spreads is collected by counting the occurrence of each and every individual spread. migetno1, the 1500 cutoff is not a "hard" cutoff. See my Weighting FAQ for more details.

One note: if the Pokemon's spread contains useless EVs (255 EVs in one stat, improperly optimized LC spreads), my scripts round that down and bin it with the equivalent spread that contains no useless EVs.

Calm_Mind_Latias -- it's at the top of my "to-do" list to start generating "speed tier" info from usage data (throwing in Choice Scarf and speed-boosting moves as well). But I just haven't had the time recently.
 
#41
Working on analyzing the raw data myself to get the information I want. It's actually going pretty well (hooray for Python), but I have a question about the data format- in the raw data, each unique spread for a Pokemon is paired with a number. What exactly does that number represent? I was assuming it was something along the lines of "number of times this spread appeared", but there has to be something else adjusting it, since it's not necessarily an integer and if you add them all up it doesn't add up to the 'Raw count' variable for that Pokemon. Is it being adjusted by the weighting function intended to reduce the impact of bad players on the stats?
 
#44
It's working. Excellent.



Code:
import json
# Used for analyzing data posted by Antar from Smogon simulators
# Server address: http://sim.smogon.com:8080/Stats/
file = input('Which file do you want to analyze?\n')
f = open(file)
a = json.load(f)

# Data becomes a nested dictionary
# First layer keys are 'info', 'data'
b = a['data']  

# Second layer keys are Pokemon names (capitalized)
name = input('Which Pokemon do you want to analyze?\n')

# Third layer keys are 'Abilities', 'Checks and Counters', 'Items', 'Moves', 
# 'Raw count', 'Spreads', 'Teammates'

pkmn = b[name]
temp = pkmn['Spreads']
# In the Spreads dictionary for a specific Pokemon, every single unique spread
# is a key, and the weighted count is the number associated with it.

naturestats = dict(Adamant=0, Bashful=0, Bold=0, Brave=0, Calm=0, Careful=0,
Docile=0, Gentle=0, Hardy=0, Hasty=0, Impish=0, Jolly=0, Lax=0, Lonely=0, 
Mild=0, Modest=0, Naive=0, Naughty=0, Quiet=0, Quirky=0, Rash=0, Relaxed=0, 
Sassy=0, Serious=0, Timid=0)
total = 0

for spread, count in temp.items():
    total += count
    for nature in naturestats.keys():
        if nature in spread:
            naturestats[nature] += count

for nature in naturestats.keys():
    naturestats[nature] /= total

print('Nature usage (5% or greater):')

for nature, percent in naturestats.items():
    if percent > 0.05:
        print(nature, '({0:.1f}%)'.format(percent*100))
f.close()
You'll need Python installed (this was created in Python 3.3). Save the code as a .py file, put it in a folder with the .json file you want to analyze, and run it from a command line window. No error handling, though, so make sure you spell things right.
 
Last edited:
#48
So I'm starting to look a bit deeper into the Checks and Counters data courtesy of the json files and I did borrow some of Antar's source code. The first thing my limited python knowledge managed to find me was what I'm calling "Average Opposing Success Rate". What this means, to oversimplify a bit, is what is the chance something good happens to the opponent when we have Pokemon X out. It's probably a bit easier to explain with actual numbers, so I'll get that up right below. These are taken from the top 100 most used Pokemon in OU last month.

  1. Pinsir: 32.6606536624097
  2. Mawile: 32.6844481742917
  3. Manaphy: 34.3630632235175
  4. Heracross: 34.4424965021286
  5. Charizard: 35.0592024325037
  6. Medicham: 35.3023614950572
  7. Volcarona: 35.4067400910503
  8. Lucario: 35.4455291037521
  9. Gyarados: 35.5116863785033
  10. Conkeldurr: 36.516907127894
  11. Dragonite: 36.5545320428452
  12. Aegislash: 36.5681975927158
  13. Kyurem-Black: 37.8309717680315
  14. Bisharp: 38.3649220193829
  15. Kingdra: 38.531349203473
  16. Venusaur: 38.5553733987965
  17. Clefable: 38.8988327548322
  18. Garchomp: 39.1048053132335
  19. Breloom: 39.2098243163858
  20. Cloyster: 39.3413409294272
  21. Talonflame: 39.3922228568415
  22. Gardevoir: 39.6711615064359
  23. Alakazam: 39.9964141451897
  24. Gengar: 40.1997603794041
  25. Reuniclus: 40.2864221128222
  26. Crawdaunt: 40.6154542915553
  27. Salamence: 40.6454368190726
  28. Azumarill: 40.6689531224642
  29. Haxorus: 40.7513280330215
  30. Scizor: 41.2739257364425
  31. Keldeo: 41.4610637679135
  32. Greninja: 41.7516802360431
  33. Weavile: 42.5671333255041
  34. Landorus: 42.8043943463543
  35. Slowbro: 42.8043943463543
  36. Togekiss: 43.5658724202397
  37. Porygon2: 43.9430431099078
  38. Arcanine: 44.1548236761597
  39. Sableye: 44.3135961436258
  40. Gliscor: 44.3913692085438
  41. Metagross: 44.450457754148
  42. Terrakion: 44.4600453571302
  43. Blastoise: 44.5511315724735
  44. Latios: 44.7306280229763
  45. Mamoswine: 44.930485065953
  46. Hydreigon: 44.9459549417092
  47. Chandelure: 45.0101588925653
  48. Nidoking: 45.0129339855979
  49. Ferrothorn: 45.4970226366418
  50. Genesect: 45.6367708095554
  51. Sylveon: 45.7302719438663
  52. Infernape: 45.8951290756273
  53. Diggersby: 46.3248129428848
  54. Absol: 46.3320133131536
  55. Umbreon: 46.3743233324173
  56. Excadrill: 46.3759120080044
  57. Darmanitan: 46.4111232917662
  58. Heatran: 46.5177171463897
  59. Goodra: 46.6600627629875
  60. Zapdos: 46.6995220785531
  61. Thundurus: 46.9785487566663
  62. Thundurus-Therian: 47.4919887437474
  63. Trevenant: 47.498523525822
  64. Florges: 47.9143122421163
  65. Noivern: 47.9300544465411
  66. Quagsire: 48.0237927022329
  67. Whimsicott: 48.0444210975334
  68. Starmie: 48.1152167905302
  69. Aggron: 48.1960393308203
  70. Ditto: 48.3003550165925
  71. Latias: 48.3643931242995
  72. Tyranitar: 48.366004291485
  73. Manectric: 48.3852945812972
  74. Jirachi: 48.6828542642543
  75. Vaporeon: 49.0304566322915
  76. Klefki: 49.2993117538896
  77. Espeon: 49.3014704588863
  78. Mandibuzz: 49.3193762553036
  79. Gastrodon: 49.6502828137244
  80. Jellicent: 49.7605682005731
  81. Ambipom: 49.7643480320047
  82. Chansey: 50.6178222346396
  83. Skarmory: 50.9443778369248
  84. Magnezone: 51.5004764249148
  85. Celebi: 51.807078744494
  86. Ninetales: 52.0344370841584
  87. Blissey: 52.1084252145068
  88. Jolteon: 52.1130142076991
  89. Crobat: 53.1164262429457
  90. Donphan: 53.3124666216968
  91. Landorus-Therian: 53.3148566332438
  92. Tentacruel: 54.225819632441
  93. Rotom-Wash: 54.3354877933639
  94. Politoed: 55.0057811770138
  95. Deoxys-Speed: 55.543091182007
  96. Deoxys-Defense: 57.0743333619611
  97. Galvantula: 57.5611122966849
  98. Scolipede: 58.5615741449645
  99. Forretress: 61.3480515649239
  100. Smeargle: 63.4775826385627


Let's start at the top of the list with Pinsir, with an AOSR of about 32.66. That means that once he got out on the field, the opponent only gained an advantage (Pinsir switched or was KOed) 32.66% of the time. Compare that with Smeargle, who gave the opponent an advantage nearly twice as often.

As for my first impressions, there are a lot of Megas high on the list. Half of the Pokemon with scores below 40 had a Mega Evolution available to them. If Gamefreak wanted these guys to be the powerhouses of their teams, they certainly succeeded. You may notice Rotom-W and Landorus-T being near the bottom of this list, which is strange for the super-standard bulky momentum core they are. My theory is that this is due to Antar's list counting U-Turn and Volt Switch as a switch out. Still, in theory this would only affect the number of positive outcomes for the U-Turner/Volt Switcher, since the opponent likely checks them if they stay in to tank the moves anyway.

Some caveats/other observations about this data:
  • These numbers were taken from the Checks and Counters data, which coincidentally looked at what happened when a Pokemon was switched out or KOed. Thus, Pokemon who are often used as suicide leads, like Smeargle, Galvantula, and the Deoxys formes, will have innately worse scores than the rest, regardless of their ability to support the team. If you don't understand anything I'm saying, peruse through here. Antar explains the situations that make up this data quite well.
  • Generally, stallier Pokemon have worse scores. I would guess it is due to the offensive nature of the meta putting heavy pressure on stall teams, although Venusaur, who seems to be the best wall right now, has a good score for a defensive poke.
  • Chansey is currently performing about 2% better than Blissey.
  • I'm only calling it AOSR because it was the first thing that came to mind and I didn't want to keep writing "Average Opposing Success Rate" a bunch of times, so if anyone can think of something shorter/leads to a better acronym, I'll implement that as well.
  • I'm eventually trying to build up the Python knowledge to have a weighted average success rate for the Pokemon itself rather than its counters. You can approximate the unweighted version by subtracting these numbers from 100, but it will actually be slightly less due to double downs/double switches making up a small amount of these scores.
  • Most importantly, and perhaps the biggest problem, is that these stats don't weight by usage, they just need to pass the minimum encounters to get counted equally. I chose this because CRE has generally favored Pokemon lower in usage, simply because the higher deviation of their counters means the CREs of their counters will be lower. In addition, while the crap pokes lose more due to high deviation, they don't actually weight much at all. If anything, they'll attribute more to pokes who aren't matched up a lot. I'll apply the weighting when/if I get good enough at Python to do so.
 

Antar

is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
Official Data Miner
#49
  • These numbers were taken from the Checks and Counters data, which coincidentally looked at what happened when a Pokemon was switched out or KOed. Thus, Pokemon who are often used as suicide leads, like Smeargle, Galvantula, and the Deoxys formes, will have innately worse scores than the rest, regardless of their ability to support the team. If you don't understand anything I'm saying, peruse through here. Antar explains the situations that make up this data quite well.
I can give you the "Encounter Matrix" if you want it. That's the comprehensive table of what happens when X faces off with Y and should yield better results.
 
Status
Not open for further replies.