1. Welcome to Smogon Forums! Please take a minute to read the rules.
  2. Click here to ensure that you never miss a new SmogonU video upload!

Data Official Smogon University Usage Statistics Discussion Thread, mk.2

Discussion in 'Smogon Metagames' started by Antar, Jun 4, 2014.

Thread Status:
Not open for further replies.
  1. Grizzle

    Grizzle

    Joined:
    Jul 7, 2016
    Messages:
    2
    Oh interesting, so by that logic then, we can't really pull a list of "Pokemon A's top teammates" only those mons who it has the most influence over?

    Sorry I'm unsure why, but making sense out of this one part of the data set is causing me a lot of trouble. No matter how many different calculations I try, normalizing against various factors, I can't seem to get a list of "Teammates" that comes out how I would expect one to.

    That said, if I'm searching for the wrong result, that would explain my frustration thus far.

    For example (referring to the 2016-06 VGC2016 data set):

    Code:
    Groudon-Primal 'usage': 0.655
    Groudon-Primal Sum-of-Abilities: 4297.835
    
    Xerneas 'usage': 0.534
    Xerneas Sum-of-Abilities: 3507.517
    
    Xerneas' entry in Groudon-Primal 'Teammates': 23.385
    
    Kangaskhan-Mega 'usage': 0.603
    Kangaskhan-Mega Sum-of-Abilities: 3962.796
    
    Kangaskhan-Mega' entry in Groudon-Primal 'Teammates': 419.126
    Is there a way that I am able to take this data, and generate some sort of normalized value in which I can rank Xerneas versus Kangaskhan-Mega for which is a "better" Groudon-Primal teammate?
    Last edited: Jul 8, 2016
  2. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    Grizzle, by "top teammates" you mean you want to know which Pokemon appear most with that mon, regardless of prior distribution (that is, you don't care about the Pokemon's usage by itself)?

    My code isn't very well documented, I'll admit, but here's what's going on and how you can undo it:

    1. I have a "teammateMatrix" in a binary file that you don't have access to. This file stores the raw counts for "Pokemon X appeared with Pokemon Y." This would be the number you want.
    2. On line 147, I transform that value into one that only accounts for the boost / drop of Pokemon Y's appearance on a team based on the fact that Pokemon X appeared on that team by subtracting the expected count, if teams were generated randomly based on just the usage numbers. So if Pokemon X's count is "count" and Pokemon Y's usage percentage is "usage[y]" then you'd expect the count for "Pokemon X appeared with Pokemon Y" to be "count * usage[y]".
    3. Ergo: If what you want is to discount the prior distribution and just look at, "what are the most common Pokemon to appear on a team with Pokemon X"? Then you just need to add the prior back in.
    The question is how you get "usage[y]", because we're talking percentages, not counts, and there's no "total count" in that file (wtf, Antar?*). You could always parse the usage stats table...

    Edit: and you can't just sum abilities throughout, because I only do the top 200 mons.

    Edit 2: *"tf" is that this report was intended just to be a more detailed version of the moveset stats, that is, each Pokemon was supposed to be considered independently. This was pretty shortsighted of me, and it'll be rectified as I rewrite this package.
    Last edited: Jul 9, 2016
  3. pokeprogrammer

    pokeprogrammer

    Joined:
    Jul 12, 2014
    Messages:
    1
    I'm working on a similar problem as Grizzle, and I've been going through the json file. In your code, why is count, the sum of the instances of a pokemon's abilities, not the same as the Raw count? Every pokemon has exactly one ability, they should sum to the same number. Are the ability numbers generated from the same samples as usage?
    Last edited: Jul 16, 2016
  4. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    pokeprogrammer, this is covered by the third item in the FAQ:

  5. HeadsILoseTailsYouWin

    HeadsILoseTailsYouWin

    Joined:
    Dec 18, 2013
    Messages:
    719
    Antar If we're going to count the amount of teams that use obscure playstyles such as Magic Room and Gravity, I think that Terrains should be added to the list as well.

    Electric Terrain team: At least one Pokemon has the move Electric Terrain without the move Nature Power, two Pokemon have the move Electric Terrain, or one Pokemon has whatever Tapu Koko's ability is.
    Misty Terrain team: At least one Pokemon has the move Misty Terrain without the move Nature Power, or two Pokemon have the move Misty Terrain.
    Grassy Terrain team: At least one Pokemon has the move Grassy Terrain without the move Nature Power or the ability Grass Pelt, or two Pokemon have the move Grassy Terrain.

    Requiring Pokemon to not have certain moves helps differentiate stand-alone Terrain sweepers from Terrain-using Support Pokemon. Stand-alone Terrain sweepers are a thing in Other Metagames, where in Balanced Hackmons a Mega Ampharos can run Prankster Electric Terrain and Nature Power to get priority Thunderbolts. Misty Terrain can also be used in VGC Doubles or Battle Spot Triples to block Dark Void (although it is generally outclassed by safeguard). If you see, say, a Xerneas using Misty Terrain in VGC, it will likely be to support the team by blocking Status and not to power up itself.

    tl;dr Terrains are niche, but deserve to be recorded under team type.
    nv and Antar like this.
  6. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    SparksBlade likes this.
  7. Sleepless

    Sleepless

    Joined:
    Sep 30, 2012
    Messages:
    179
    excuse me if I'm asking out of turn, but when are the aug stats will be up?
    xRoyUltra likes this.
  8. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    Sleepless and Ununhexium like this.
  9. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
  10. MSC Knightmare

    MSC Knightmare

    Joined:
    Jun 29, 2016
    Messages:
    63
    Hey guys, my apologies if this doesn't belong here, but as someone who really enjoys statistics as a whole, could someone explain this to me?

    AndrewB73
    Joined: Jun 27, 2016

    Ratings
    Official ladderEloGXEGlicko-1WL
    nu1103(more games needed)31Reset
    ou128664.7%1617 ± 77177Reset
    pu113256.3%1549 ± 79118Reset
    ru108954.5%1535 ± 76118Reset
    ubers147273.9%1697 ± 416444Reset
    uu120664.5%1616 ± 87114Reset
    Unofficial ladderEloGXEGlicko-1WL
    nususpecttest1154(more games needed)50
    pususpecttest1000(more games needed)01
    rususpecttest1022(more games needed)12
    uususpecttest107442.3%1440 ± 8547

    These are my stats on Showdown's ladder. What do things like Elo, GXE, and Glicko-1 mean?
  11. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
  12. Bad Ass

    Bad Ass Serious as a heart attack!!
    is a Tiering Contributor Alumnusis a Past SPL Championis the 2nd Grand Slam Winneris a defending World Cup of Pokemon champion

    Joined:
    Jan 5, 2009
    Messages:
    2,696
    Howdy. I'm trying to build something for RBY that takes in the first n revealed pokemon and spits out the best guess of what the last 6-n pokemon are. I'm pretty confused about how exactly the teammate stats are calculated. I tried doing a correlation between points (from here http://www.smogon.com/stats/2016-08/chaos/gen1ou-1760.json) and % increase (from here http://www.smogon.com/stats/2016-08/moveset/gen1ou-1760.txt). Interestingly I got the same number for individual pokemon, but each pokemon had a different number (e.g. for Golem it's about .16% change per point in the json file, but for Alakazam it's about .08%).

    How can i use these data to make a meaningful metric like "47% of teams that have a lapras also have an alakazam"? Or alternatively, "a team that has lapras is 4.5% more likely to have an alakazam than the average team"?

    Antar

    thanks in advance for any time taken to help me out
  13. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    Bad Ass, teammate stats are % of teams with A that also have B - % of teams with B.
  14. Bad Ass

    Bad Ass Serious as a heart attack!!
    is a Tiering Contributor Alumnusis a Past SPL Championis the 2nd Grand Slam Winneris a defending World Cup of Pokemon champion

    Joined:
    Jan 5, 2009
    Messages:
    2,696
    so in the json file in my last post, Starmie's teammate number for Chansey is -182.30732025389. clearly this is some kind of weight/modifier, not a raw "one percentage minus another", and so a larger negative number corresponds to "teams with starmie are moderately less likely [probably around 8%] to also have a chansey". What I meant to ask was: what's the schema for changing the -182 -> "moderately less likely" to -182 -> "x.x% less likely".
  15. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    Bad Ass, IIRC, divide by count (sum of abilities).

    Check your numbers against what's in the reports.

    I'm sorry this is so obscure and poorly documented. Things will be a lot better post-rewrite.
    Bad Ass likes this.
  16. Heroes and Cons

    Heroes and Cons

    Joined:
    Jan 21, 2012
    Messages:
    307
    When will September's usage stats be up?
  17. HeadsILoseTailsYouWin

    HeadsILoseTailsYouWin

    Joined:
    Dec 18, 2013
    Messages:
    719
    They are up already.
    Aaronboyer, thesecondbest and Antar like this.
  18. Heroes and Cons

    Heroes and Cons

    Joined:
    Jan 21, 2012
    Messages:
    307
    Where? Antar hasn't commented a link in this thread as far as i can see :/

    EDIT: Never Mind, i found the full statistics link in the OP, never looked there before lol.
  19. wishes

    wishes best beast
    is a member of the Site Staffis a Forum Moderatoris a Smogon Social Media Contributor Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Smogon Media Contributor Alumnus
    Moderator

    Joined:
    Aug 30, 2013
    Messages:
    7,296
    [​IMG]

    STABmons Usage graph since the usage stats go back until now. Pretty funky but also really interesting in my opinion.
    nv and Antar like this.
  20. Heroes and Cons

    Heroes and Cons

    Joined:
    Jan 21, 2012
    Messages:
    307
    There is gonna be a tier update in the next couple of days right?
  21. Kyuzeth

    Kyuzeth

    Joined:
    Sep 26, 2010
    Messages:
    922
    Yes, but please give Antar his time to work on compiling the stats. Your patience will eventually be rewarded.
  22. Heroes and Cons

    Heroes and Cons

    Joined:
    Jan 21, 2012
    Messages:
    307
    I'm just curious if it's happening coz I don't know how it exactly works, or if there was even gonna be one this close to SM release.
    Aaronboyer likes this.
  23. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    Stats for the month will be up in the next few days. I need to add the new CAP data and rerun that tier. May do a partial upload before that.
  24. Drk Pwnr

    Drk Pwnr

    Joined:
    Sep 27, 2006
    Messages:
    7
    Hello, I have some questions

    1. What's the difference between "Raw" in a usage file (ex. 1070733 for Landorus-Therian in http://www.smogon.com/stats/2016-09/ou-1695.txt) and "Raw count" in a moveset file (ex.
    1178902 for Landorus-Therian in the same month)? My best guess is that the raw count in the moveset file is including too-short battles, and the raw count in the usage file is excluding them?

    2. In the moveset file, in the Checks and Counters section, what do the three numbers on a Pokemon's first line represent? I mean the 79.978, 83.14, and 0.79 in a line like "Manaphy 79.978 (83.14±0.79)". I tried figuring it out from the code, but failed pretty miserably.

    Thanks for any answers you can give, and thanks for maintaining this great resource for so long!
  25. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,885
    Stats for the month (sans CAP) are now up. Working on a CAP rerun now. Drk Pwnr, please read the FAQ in the second post for the answer to your first question. I'll give you an answer to #2 in a bit. Feel free to ping me again if you don't get a response in a day or two.
Thread Status:
Not open for further replies.

Users Viewing Thread (Users: 0, Guests: 0)