Official Smogon University Simulator Statistics — January 2014

Jimera0 · Feb 3, 2014

While I'm sure you've probably posted explanations before Antar, I feel that we could use some explanation of what the the "real" vs. "raw" percentages and the like mean, and an explanation of why they're included and the like in the OP of the topic.

I know this has almost certainly been addressed before, but it's somewhat difficult to track down exactly where this was explained. Hell, if you can find a post where it was explained satisfactorily you could just link to it in the OP and call it a day. I feel this would help a lot of people (including myself :P) to better understand what all the numbers mean and how to interpret them, and also stop some people from thinking that the fact that the columns don't all match up means there are errors in the data collection.

Antar · Feb 3, 2014

Sergeant Spooky said:
If something tries to switch out, but is KOed by Pursuit, does that count as KOed or as Switched Out under checks and counters?

Counted as a KO.

Jimera0, yeah--I should really include just a bit of text at the top of each Standard Stats post...

Usage % : Weighted
Raw: Unweighted
"Real": Only counts the Pokemon which actually appear in battle (Doubles not supported)

The reason for the name "real" is historic--back when I first took over the stats and then the running of PO, only the Pokemon that appeared in battle were recorded in the logs, so there was no way to actually *get* the full team stats. When I modified PO to generate logs with full team info in them, we were left with a decision regarding which stats to use, and the argument was that counting only Pokemon appearing in battle was somewhat more legit, because that corresponded to actual, or "real" usage (that argument lost out in the end).

Calm_Mind_Latias · Feb 3, 2014

Antar said:
Filtering out troll alts, etc. is done at the normal weighting level, FYI (keep in mind--we're using Glicko score, not Elo, where your rating *can* drop below the starting rating).

I'll decide whether to do 1 std. dev or two some time before March 1, once I look at the distribution of ratings on PS.

Thank you for the weighting link as I did not bother to read it before, and it was rather straightforward. That explains so much about the weighing of a Pokemon's usage in tiering, but this seems to depend on what is the "average player" and that trolls who perform worse in their battles relative to the average player are given fewer weight than more skilled players.

A standard score of 1 is a standard score 1. But what I meant was that something based on one's relative ranking depends on the composition of people in that sample. The presence of trolls certainly do affect the mean skill of the ladder by depressing it (and skill is not something with a hard quantity that possesses a true zero and can be easily measured, like height, but something that can be normalized). For the SAT, there are fewer "trolls" who take it just to score in the 200s on the subtests for the obvious reasons, and it possesses a natural mechanism to exclude "trolls" without any statistical filters. Pokemon battling does not have a strong disincentive to discourage trolling, and trolls can collectively influence the definition of "average".

How does the system prevent trolls from influencing the definition of the average player? Although I doubt trolls have much to do with the highly kurtotic distributions at the high end on the old system.

Edit: I do not think this post belongs here since it does not concern metagame trends or usage. But I still believe it is a legitimate question.

Antar · Feb 3, 2014

Edit: I do not think this post belongs here since it does not concern metagame trends or usage. But I still believe it is a legitimate question.

Eh, it's fine.

Regarding trolls, it's true that much of what we've built is based on the assumption that players like to win, but I haven't noticed the rating distribution being at all lopsided. I've actually been pleasantly surprised by how Gaussian everything looks.

This is the distribution of Elo* ratings for the OU ladder for most of January, IIRC. It's borked around 1000 because most alts on PS are only associated with 1 or 2 games (I think 80% of all alts only play one game). But other than that, it's pretty good. The right tail is a little heavier, which makes sense, since players are more likely to "reset" a low alt than a high one, but even that's not totally skewed.

So bottom line: I get that trolls *could* be a problem, but given that we don't see any peaks on this distribution around extremely low ratings, I think we're okay.

*This is not the Elo rating currently deployed on PS. This is Elo calculated strictly using the standard Elo formula, with a K factor of 50, no modifications, no hacks, no nothing.

Shinji Mimura · Feb 4, 2014

So what is the list of tier changes? I think I only heard Jirachi, Landorus, and Terrakion to UU, and I think Kangaskhan.

Anyone else?

Calm_Mind_Latias · Feb 5, 2014

Antar said:
MoxieInfinite, a U-Turn, Volt Switch or Baton Pass counts as a switch-out unless it delivers a KO, in which case it counts as a "U-Turn KO" and isn't counted towards Checks & Counters.

Does this explain the the counterintuitive data that Genesect doesn't "check" or "counter" anything in OU, except Pinsir? Obvious Latios seems to be checked by it, since (Scarf) Genesect has two moves it can choose from to KO it.

Antar · Feb 5, 2014

Calm_Mind_Latias, that sounds about right.

migetno1 · Feb 13, 2014

I've formatted the OU moveset statistics into a more accessible format at http://sweepercalc.com/stats/

I'll add in Ubers / UU / VGC when I get some free time.

ArcFurnace · Feb 13, 2014

I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?

Calm_Mind_Latias · Feb 13, 2014

ArcFurnace said:
I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?

Well, I think the most important stat people are interested in are the amount of speed investment a Pokemon has received.

It would be helpful if there was some data that shows how a given Pokemon's speed is distributed among the players. People may use spreads to speed creep other Pokemon that try to speed creep it.

One, for instance, may run a Landorus-T that creeps 44 Speed Rotom-W (just 8 EVs needed) (and 44 Speed Rotom-W doesn't show up in the stats). And some Rotom-W might try to creep that before it U-Turns out by putting 8 extra EVs. I do not think this type of spreads would show up in the usage statistics.

It also makes you wonder how many let's say, (Mega) Scizor, are trying to outspeed minimum speed Heatran (and/or Rotom-W before it is burned) to hit it with Superpower. It seems to be a worthy investment if one wants to get through Heatran (and/or Rotom-W), or lose momentum by manually switching out. This doesn't seem to show up on the statistics either.

ArcFurnace · Feb 13, 2014

Calm_Mind_Latias said:
Well, I think the most important stat people are interested in are the amount of speed investment a Pokemon has received.

It would be helpful if there was some data that shows how a given Pokemon's speed is distributed among the players. People may use spreads to speed creep other Pokemon that try to speed creep it.

One, for instance, may run a Landorus-T that creeps 44 Speed Rotom (just 8 EVs needed) (and that doesn't show up in the stats). And some Rotom-W might try to creep that before it U-Turns out by putting 8 extra EVs. I do not think this type of spreads would show it in the usage statistics.

You are correct that it wouldn't show Speed investment; I will admit I wasn't thinking of that at the time. My ulterior motive is that I breed Pokemon in-game, so it helps to know which natures to give them (since natures can't be adjusted after hatching, while EV spreads can be set later). Obviously the situation is different for people using simulators. I'm not sure how you would want to set up the display of spreads if you wanted to focus on Speed, especially since EVs are much more finely adjustable. Trying to ensure you displayed the majority of levels of investment seems like it might require a lot more slots.

migetno1 · Feb 13, 2014

ArcFurnace said:
I was wondering, how is the data for spreads (nature and EVs) stored in the raw data? I ask because for quite a few Pokemon, they have a diversity of spreads, so listing only the most common ones winds up with 50% or more of the spreads listed under "other". Obviously you can't display every spread, but would it be possible to display, say, the top 2-4 most common natures (with percentages) separately from the EVs, so that you can at least get an idea of which natures are most popular (and by what margin)?

As far as I know, the json data has ALL the spreads used if the player meets the 1500 cutoff. This leads to common pokemon like Rotom-Wash having about 10000 different spreads listed with many of them having a count under 5. If you wanted to, you could parse this data to get a percentage of the natures used.

Antar · Feb 14, 2014

ArcFurnace -- yes, the data for spreads is collected by counting the occurrence of each and every individual spread. migetno1, the 1500 cutoff is not a "hard" cutoff. See my Weighting FAQ for more details.

One note: if the Pokemon's spread contains useless EVs (255 EVs in one stat, improperly optimized LC spreads), my scripts round that down and bin it with the equivalent spread that contains no useless EVs.

Calm_Mind_Latias -- it's at the top of my "to-do" list to start generating "speed tier" info from usage data (throwing in Choice Scarf and speed-boosting moves as well). But I just haven't had the time recently.

Leer · Feb 14, 2014

Quick question, not sure if it's been answered before, but are unrated battles (challenges) counted?

Antar · Feb 14, 2014

Lexical Analysis said:
Quick question, not sure if it's been answered before, but are unrated battles (challenges) counted?

Unrated battles aren't even logged, so no.

ArcFurnace · Feb 14, 2014

Working on analyzing the raw data myself to get the information I want. It's actually going pretty well (hooray for Python), but I have a question about the data format- in the raw data, each unique spread for a Pokemon is paired with a number. What exactly does that number represent? I was assuming it was something along the lines of "number of times this spread appeared", but there has to be something else adjusting it, since it's not necessarily an integer and if you add them all up it doesn't add up to the 'Raw count' variable for that Pokemon. Is it being adjusted by the weighting function intended to reduce the impact of bad players on the stats?

Leer · Feb 15, 2014

Antar said:
Unrated battles aren't even logged, so no.

Oh duh, should have known '~'

Also this has been a good reminder of why I really need to start learning Python...

Antar · Feb 15, 2014

ArcFurnace: All collected stats are weighted.

ArcFurnace · Feb 15, 2014

It's working. Excellent.

Code:

import json
# Used for analyzing data posted by Antar from Smogon simulators
# Server address: http://sim.smogon.com:8080/Stats/
file = input('Which file do you want to analyze?\n')
f = open(file)
a = json.load(f)

# Data becomes a nested dictionary
# First layer keys are 'info', 'data'
b = a['data']  

# Second layer keys are Pokemon names (capitalized)
name = input('Which Pokemon do you want to analyze?\n')

# Third layer keys are 'Abilities', 'Checks and Counters', 'Items', 'Moves', 
# 'Raw count', 'Spreads', 'Teammates'

pkmn = b[name]
temp = pkmn['Spreads']
# In the Spreads dictionary for a specific Pokemon, every single unique spread
# is a key, and the weighted count is the number associated with it.

naturestats = dict(Adamant=0, Bashful=0, Bold=0, Brave=0, Calm=0, Careful=0,
Docile=0, Gentle=0, Hardy=0, Hasty=0, Impish=0, Jolly=0, Lax=0, Lonely=0, 
Mild=0, Modest=0, Naive=0, Naughty=0, Quiet=0, Quirky=0, Rash=0, Relaxed=0, 
Sassy=0, Serious=0, Timid=0)
total = 0

for spread, count in temp.items():
    total += count
    for nature in naturestats.keys():
        if nature in spread:
            naturestats[nature] += count

for nature in naturestats.keys():
    naturestats[nature] /= total

print('Nature usage (5% or greater):')

for nature, percent in naturestats.items():
    if percent > 0.05:
        print(nature, '({0:.1f}%)'.format(percent*100))
f.close()

You'll need Python installed (this was created in Python 3.3). Save the code as a .py file, put it in a folder with the .json file you want to analyze, and run it from a command line window. No error handling, though, so make sure you spell things right.

Agent Gibbs · Feb 15, 2014

Hey Antar, I noticed that there are no usage stats for the Gen 3 OU ladder, which is the only old gen missing. Is there any way we could get those stats as well?

Antar · Feb 16, 2014

Agent Gibbs said:
Hey Antar, I noticed that there are no usage stats for the Gen 3 OU ladder, which is the only old gen missing. Is there any way we could get those stats as well?

Gen III didn't get a ladder on PS until Feb. 12...

Agent Gibbs · Feb 16, 2014

Antar said:
Gen III didn't get a ladder on PS until Feb. 12...

Oh, ok. I never thought to look for it until yesterday, so when I saw it, I just assumed it had been there for a while. My mistake!

MicfiJasan · Feb 22, 2014

So I'm starting to look a bit deeper into the Checks and Counters data courtesy of the json files and I did borrow some of Antar's source code. The first thing my limited python knowledge managed to find me was what I'm calling "Average Opposing Success Rate". What this means, to oversimplify a bit, is what is the chance something good happens to the opponent when we have Pokemon X out. It's probably a bit easier to explain with actual numbers, so I'll get that up right below. These are taken from the top 100 most used Pokemon in OU last month.

Pinsir: 32.6606536624097
Mawile: 32.6844481742917
Manaphy: 34.3630632235175
Heracross: 34.4424965021286
Charizard: 35.0592024325037
Medicham: 35.3023614950572
Volcarona: 35.4067400910503
Lucario: 35.4455291037521
Gyarados: 35.5116863785033
Conkeldurr: 36.516907127894
Dragonite: 36.5545320428452
Aegislash: 36.5681975927158
Kyurem-Black: 37.8309717680315
Bisharp: 38.3649220193829
Kingdra: 38.531349203473
Venusaur: 38.5553733987965
Clefable: 38.8988327548322
Garchomp: 39.1048053132335
Breloom: 39.2098243163858
Cloyster: 39.3413409294272
Talonflame: 39.3922228568415
Gardevoir: 39.6711615064359
Alakazam: 39.9964141451897
Gengar: 40.1997603794041
Reuniclus: 40.2864221128222
Crawdaunt: 40.6154542915553
Salamence: 40.6454368190726
Azumarill: 40.6689531224642
Haxorus: 40.7513280330215
Scizor: 41.2739257364425
Keldeo: 41.4610637679135
Greninja: 41.7516802360431
Weavile: 42.5671333255041
Landorus: 42.8043943463543
Slowbro: 42.8043943463543
Togekiss: 43.5658724202397
Porygon2: 43.9430431099078
Arcanine: 44.1548236761597
Sableye: 44.3135961436258
Gliscor: 44.3913692085438
Metagross: 44.450457754148
Terrakion: 44.4600453571302
Blastoise: 44.5511315724735
Latios: 44.7306280229763
Mamoswine: 44.930485065953
Hydreigon: 44.9459549417092
Chandelure: 45.0101588925653
Nidoking: 45.0129339855979
Ferrothorn: 45.4970226366418
Genesect: 45.6367708095554
Sylveon: 45.7302719438663
Infernape: 45.8951290756273
Diggersby: 46.3248129428848
Absol: 46.3320133131536
Umbreon: 46.3743233324173
Excadrill: 46.3759120080044
Darmanitan: 46.4111232917662
Heatran: 46.5177171463897
Goodra: 46.6600627629875
Zapdos: 46.6995220785531
Thundurus: 46.9785487566663
Thundurus-Therian: 47.4919887437474
Trevenant: 47.498523525822
Florges: 47.9143122421163
Noivern: 47.9300544465411
Quagsire: 48.0237927022329
Whimsicott: 48.0444210975334
Starmie: 48.1152167905302
Aggron: 48.1960393308203
Ditto: 48.3003550165925
Latias: 48.3643931242995
Tyranitar: 48.366004291485
Manectric: 48.3852945812972
Jirachi: 48.6828542642543
Vaporeon: 49.0304566322915
Klefki: 49.2993117538896
Espeon: 49.3014704588863
Mandibuzz: 49.3193762553036
Gastrodon: 49.6502828137244
Jellicent: 49.7605682005731
Ambipom: 49.7643480320047
Chansey: 50.6178222346396
Skarmory: 50.9443778369248
Magnezone: 51.5004764249148
Celebi: 51.807078744494
Ninetales: 52.0344370841584
Blissey: 52.1084252145068
Jolteon: 52.1130142076991
Crobat: 53.1164262429457
Donphan: 53.3124666216968
Landorus-Therian: 53.3148566332438
Tentacruel: 54.225819632441
Rotom-Wash: 54.3354877933639
Politoed: 55.0057811770138
Deoxys-Speed: 55.543091182007
Deoxys-Defense: 57.0743333619611
Galvantula: 57.5611122966849
Scolipede: 58.5615741449645
Forretress: 61.3480515649239
Smeargle: 63.4775826385627

Let's start at the top of the list with Pinsir, with an AOSR of about 32.66. That means that once he got out on the field, the opponent only gained an advantage (Pinsir switched or was KOed) 32.66% of the time. Compare that with Smeargle, who gave the opponent an advantage nearly twice as often.

As for my first impressions, there are a lot of Megas high on the list. Half of the Pokemon with scores below 40 had a Mega Evolution available to them. If Gamefreak wanted these guys to be the powerhouses of their teams, they certainly succeeded. You may notice Rotom-W and Landorus-T being near the bottom of this list, which is strange for the super-standard bulky momentum core they are. My theory is that this is due to Antar's list counting U-Turn and Volt Switch as a switch out. Still, in theory this would only affect the number of positive outcomes for the U-Turner/Volt Switcher, since the opponent likely checks them if they stay in to tank the moves anyway.

Some caveats/other observations about this data:

These numbers were taken from the Checks and Counters data, which coincidentally looked at what happened when a Pokemon was switched out or KOed. Thus, Pokemon who are often used as suicide leads, like Smeargle, Galvantula, and the Deoxys formes, will have innately worse scores than the rest, regardless of their ability to support the team. If you don't understand anything I'm saying, peruse through here. Antar explains the situations that make up this data quite well.
Generally, stallier Pokemon have worse scores. I would guess it is due to the offensive nature of the meta putting heavy pressure on stall teams, although Venusaur, who seems to be the best wall right now, has a good score for a defensive poke.
Chansey is currently performing about 2% better than Blissey.
I'm only calling it AOSR because it was the first thing that came to mind and I didn't want to keep writing "Average Opposing Success Rate" a bunch of times, so if anyone can think of something shorter/leads to a better acronym, I'll implement that as well.
I'm eventually trying to build up the Python knowledge to have a weighted average success rate for the Pokemon itself rather than its counters. You can approximate the unweighted version by subtracting these numbers from 100, but it will actually be slightly less due to double downs/double switches making up a small amount of these scores.
Most importantly, and perhaps the biggest problem, is that these stats don't weight by usage, they just need to pass the minimum encounters to get counted equally. I chose this because CRE has generally favored Pokemon lower in usage, simply because the higher deviation of their counters means the CREs of their counters will be lower. In addition, while the crap pokes lose more due to high deviation, they don't actually weight much at all. If anything, they'll attribute more to pokes who aren't matched up a lot. I'll apply the weighting when/if I get good enough at Python to do so.

Antar · Feb 22, 2014

MicfiJasan said:
These numbers were taken from the Checks and Counters data, which coincidentally looked at what happened when a Pokemon was switched out or KOed. Thus, Pokemon who are often used as suicide leads, like Smeargle, Galvantula, and the Deoxys formes, will have innately worse scores than the rest, regardless of their ability to support the team. If you don't understand anything I'm saying, peruse through here. Antar explains the situations that make up this data quite well.

I can give you the "Encounter Matrix" if you want it. That's the comprehensive table of what happens when X faces off with Y and should yield better results.

asbdsp · Feb 23, 2014

Swagplay ought to be counted as a thing in the metagame analysis section. Just saying.

Official Smogon University Simulator Statistics — January 2014

Jimera0

Antar

Calm_Mind_Latias

Antar

Shinji Mimura

Calm_Mind_Latias

Antar

migetno1

bRMT Developer

ArcFurnace

Calm_Mind_Latias

ArcFurnace

migetno1

bRMT Developer

Antar

Leer

Antar

ArcFurnace

Leer

Antar

ArcFurnace

Agent Gibbs

Antar

Agent Gibbs

MicfiJasan

Antar

asbdsp