1. Welcome to Smogon! Check out the Smogon Starters Hangout for everything you need to know about starting out in the community. Don't forget to introduce yourself in the Introduction and Hangout Thread, too!
  2. Welcome to Smogon Forums! Please take a minute to read the rules.

Smogon University PO Statistics — June 2011

Discussion in 'BW Competitive Discussion' started by Rising_Dusk, Jul 12, 2011.

Thread Status:
Not open for further replies.
  1. UltiMario

    UltiMario

    Joined:
    Aug 11, 2009
    Messages:
    1,190
    Whimsicott is getting Electivire syndrome, when it doesn't deserve it.

    Whimsicott was overhyped at the start of the Gen, so everyone is saying that it's ass now to counteract that hype. But in reality, while Whimsicott isn't amazing, it's still certainly the best Momentum stealer in the entire game. That in itself is a quality of an OU Pokemon, and actually takes insane prediction to use at Maximum potential, particularly the set Stun Spore/Tailwind/Encore/U-Turn, which can steal momentum in nearly any situation.
  2. lordkira

    lordkira

    Joined:
    Jul 8, 2009
    Messages:
    1,143
    I'm expecting to see politoed rise even more...as well as haxorus.
    I'm also surprised to see so many freaking SD Garchomps in Ubers.
    It is weird as fuck.
  3. GtM

    GtM

    Joined:
    Jun 3, 2009
    Messages:
    717
    Whimsicott has a 75% chance of stopping any possibility of an Excadrill sweep, regardless of how many attack boosts so long as you have something that can OHKO it after, thanks to Stun Spore. I wouldn't say that it's useless, and I believe it deserves to be OU. This thing can dismantle stall with Subseed and cripple offense with Stun Spore, and it's a great big fluffball! What's not to like. :P
  4. Olijolly

    Olijolly

    Joined:
    Feb 2, 2010
    Messages:
    175
    politoed yeah haxorus no
    while it's true haxorus is obviously far better than what everyone said about him a few months ago after the hype died out, i see haxorus as a pokemon of usage rank of around 20-30
    sd chomp...yeah that shit fucking sucks imo
    scarfchomp>>sdchomp
  5. TickingSeedotBomb

    TickingSeedotBomb

    Joined:
    Sep 22, 2010
    Messages:
    236
    Whimsicott has big problems with Gyro Ball Ferrothorn which is THE stall pokemon. and imo he's not all that amazing against offense because he cripples one pokemon, and if they have Lum Berry (Dnite) you're screwed. Imo Whimsicott is usually death fodder. though he's AWESOME with memento and tailwind! (Though, Tailwind isn't as amazing as it is in doubles / triples)
  6. UltiMario

    UltiMario

    Joined:
    Aug 11, 2009
    Messages:
    1,190
    Not to mention the chance of coming in on an SD and Encoring it, allowing another team member to take it out.
  7. skitz0phrenic

    skitz0phrenic

    Joined:
    Feb 23, 2011
    Messages:
    425
    I was hoping for these to be compiled sooner given the lack of stats from last month (I know the server was down a lot, but still).

    Given the fact this is 2011, the stats should be available for anyone to view at any time on a daily updated basis, because technology makes life easier and it exists, but I guess we're a bit primitive?

    Anyway, I don't want to come off as offensive. It just seems motivation to get things done around Smogon is certainly not a high priority (in all areas, not just this matter).
  8. DetroitLolcat

    DetroitLolcat Maize And Blue Badge Set :)
    is a Forum Moderatoris a CAP Contributoris a Tiering Contributor
    Moderator

    Joined:
    Apr 11, 2010
    Messages:
    2,397
    Remember, it's only one person collecting the stats. And running constantly updated stats like PO does would make it harder for Rising_Dusk (the stats dude) to create more detailed statistics (like Doug did in Gen 4), which Rising_Dusk wants to do in the future IIRC.

    It seems pretty difficult for R_D to compile these stats that we have already, so taking it a step further is probably not going to happen soon.
  9. skitz0phrenic

    skitz0phrenic

    Joined:
    Feb 23, 2011
    Messages:
    425
    Yea, I completely understand how it can be hard for one guy to compile all the stats. I guess I just don't know why we have one guy doing it and not just software doing it with one guy maintaining it.

    This process could be made easier for those behind the curtains and be made more efficient for the community, and that's not happening. I just don't quite understand the lack of logic here.
  10. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,220
    Would it be possible to make the raw data public? I'm a grad student--distilling data sets is kinda what I do, and I'd be really interested in seeing if I could develop some tools to help streamline the process / make things easier for R_D.

    I assume because someone would have to write the software...
  11. Darkon

    Darkon

    Joined:
    Apr 4, 2009
    Messages:
    71
    There is no way to do Smogon server statistics like that: http://stats.pokemon-online.eu/ ?

    Pokémon Online server got they own statistics updated every day (probably every day or every week...)

    It's really bad that smogon server statistics don't be disponible since june, i understand that it's very hard to got the statistics and make the all topic, etc, but something should be done to make those statistics come to the players more fast x_x
  12. SHUCKLE MAN

    SHUCKLE MAN

    Joined:
    Apr 26, 2006
    Messages:
    1,187
    No. I think it's fine the way it is. As long as we get told if we're not going to get them, or as long as they tell us if they're having issues with getting them.

    It's only the 7th. In 4th gen we usually got them around the 10th I think, so this isn't a big deal.

    In terms of getting them to us faster, you could offer to help out somehow, if you have the time. They should be happy for people to help them. :)
  13. skitz0phrenic

    skitz0phrenic

    Joined:
    Feb 23, 2011
    Messages:
    425
    There are actually a lot of people that want to help and are totally competent and capable of doing so...but their offers for help don't reach anyone if they aren't part of the smogon cliques.

    Realistically though, the raw data should at least be available for download on the first of every month and the members that know how to use it should be free to do so.
  14. TickingSeedotBomb

    TickingSeedotBomb

    Joined:
    Sep 22, 2010
    Messages:
    236

    I used the same moveset recently with an adamant nature and Maxt attack / HP ev's and... It's amazing! Even Reuniclus has trouble with confusion and he's quite bulky! I love Sub Machamp especially because he's "immune" to Ferrothorn and T-tar!
  15. DetroitLolcat

    DetroitLolcat Maize And Blue Badge Set :)
    is a Forum Moderatoris a CAP Contributoris a Tiering Contributor
    Moderator

    Joined:
    Apr 11, 2010
    Messages:
    2,397
    I believe we don't do the stats like PO does because Rising Disk has expressed interest in more detailed statistics (teammates, move usage) which the Pokemon Online program is not capable of generating. Whether it's worth it is debatable, but it doesn't look like it's going to change...
  16. Destiny Warrior

    Destiny Warrior also known as Darkwing_Duck
    is a Smogon Media Contributor Alumnus

    Joined:
    Dec 30, 2009
    Messages:
    3,171
    I thought it was known that PO's usage statistics added rated and non-rated battles together with no differentiation for each in stats?

    This might be outdated, but I remember RD talking about that.
  17. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,220
    I downloaded the PO server package and started playing around with it a bit. It looks like that PO stats page is automatically generated, but, as people are saying, it's EXTREMELY lacking, not even differentiating between rated and unrated battles.

    What would be ideal, I guess, is if the server kept a full log of each and every battle. Then anyone with access to the data could literally do any kind of analysis he or she wanted. For example, I've always been curious about what would happen if usage stats were weighted by ranking. Or what if you could look the "average kills per battle" for each pokemon? (of course, there's the stuff people ACTUALLY care about, like lead rankings, teammate usage, etc. etc.)

    Of course, the flip side of this is that the dataset would be HUGE--if an average battle log is 10KB, and there are a million battles per month (there were 300K OU battles alone for June--not unreasonable), then that's 10 gigs of raw data per month to sift through. And, looking at a battle log file, it's not actually what you'd want--what you really want is a header that gives the full movesets of every pokemon on each team IN ADDITION to the log of the actual battle.

    What's my point? It's that this is a huge freaking project. Clearly we want more than the very bare-bones usage stats that come built in to PO, but it's not clear to me, at least, what's actually OBTAINABLE given the current software. And the current software is kinda what we're stuck with. I don't think Smogon really has a dedicated enough community to do a complete (or even partial) rewrite of the PO code, nor do we really WANT to fork with PO and lead to the same compatibility problems that took out the server for most of July.

    And with that, I'm going to shut up about this and stop theorizing until someone provides more information about what the situation is actually like, in terms of the raw data the PO software generates.
  18. SHUCKLE MAN

    SHUCKLE MAN

    Joined:
    Apr 26, 2006
    Messages:
    1,187
    I know... I've been here for years, and seen that sort of thing time and time again. But it is better than it used to be, to be fair.

    But put yourself in the position of those in the cliques. As I keep saying, a lot of them probably have jobs, etc, and don't have time to devote 6 hours a day to Smogon or whatever. So if they let people on this website who DO have a lot of spare time to help out, they'll end up taking over, so to speak.

    It sounds a bit selfish and power-hungry, but at the same time, if I were in their situation, I'd probably be scared about letting other people help for the same reason, so I can't blame them.

    I think you'd be pleasently surprised.
  19. kejective

    kejective

    Joined:
    May 23, 2010
    Messages:
    8
    Missingno at number 31 in ubers.
    Golf clap....
    800 battles, really?
  20. PK Gaming

    PK Gaming Pursuing My True Self
    is a Site Staff Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis a Past SPL Winner

    Joined:
    Aug 18, 2009
    Messages:
    5,158
    We ARE working on getting usage stats, so all hope isn't loss. Don't take this as a guarantee but we're doing everything in our power to find a solution.
  21. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,220
    I just started digging in the PO source code. There's a really interesting little file in there called usagestats.cpp:

    Code:
    #include <cstdio>
    #include "usagestats.h"
    #include "../Server/playerinterface.h"
    #include "../PokemonInfo/battlestructs.h"
    
    ServerPlugin * createPluginClass() {
        return new PokemonOnlineStatsPlugin();
    }
    
    PokemonOnlineStatsPlugin::PokemonOnlineStatsPlugin()
        : md5(QCryptographicHash::Md5)
    {
        QDir d;
        d.mkdir("usage_stats");
        d.mkdir("usage_stats/raw");
        d.mkdir("usage_stats/formatted");
    }
    
    QString PokemonOnlineStatsPlugin::pluginName() const
    {
        return "Usage Statistics";
    }
    
    inline int norm(int ev) {
        return (ev/4)*4;
    }
    
    /*
     * From here onward C functions are used in order to optimize
     * speed for that crucial problem, because we access files a lot.
     */
    
    QByteArray PokemonOnlineStatsPlugin::data(const PokeBattle &p) const {
        QByteArray ret;
        ret.resize(bufsize);
    
        /* Constructs 28 bytes of raw data representing the pokemon */
        qint32 *a = (qint32*)ret.data();
        
        a[0] = (p.num().toPokeRef());
        a[1] = (p.item());
        a[2] = (p.ability() << 16) + (p.gender() << 8) + p.level();
        a[3] = (p.nature() << 24) + (norm(p.evs()[0]) << 16) + (norm(p.evs()[1]) << 8) + norm(p.evs()[2]);
        a[4] = (norm(p.evs()[3]) << 16) + (norm(p.evs()[4]) << 8) + norm(p.evs()[5]);
        a[5] = (p.dvs()[0] << 25) + (p.dvs()[1] << 20) + (p.dvs()[2] << 15) + (p.dvs()[3] << 10) + (p.dvs()[4] << 5) + p.dvs()[5];
    
        /* Here the moves are sorted because we don't want to have different
           movesets when moves are in a different order */
        quint16 *moves = (quint16*) (a + 6);
        moves[0] = p.move(0).num();
        moves[1] = p.move(1).num();
        moves[2] = p.move(2).num();
        moves[3] = p.move(3).num();
        qSort(&moves[0], &moves[4]);
    
        return ret;
    }
    
    void PokemonOnlineStatsPlugin::battleStarting(PlayerInterface *p1, PlayerInterface *p2, int mode, unsigned int &clauses, bool)
    {
        /* We only keep track of battles between players of the same tier
           and not CC battles */
        if (clauses & ChallengeInfo::ChallengeCup)
            return;
    
        QString tier = p1->tier();
    
        if (p1->tier() != p2->tier()) {
            tier = QString("Mixed Tiers Gen %1").arg(p1->team().gen);
        }
    
        if (!existingDirs.contains(tier)) {
            QDir d;
            d.mkdir(QString("usage_stats/raw/%1").arg(tier));
            existingDirs[tier] = QString("usage_stats/raw/%1/").arg(tier);
        }
    
        PlayerInterface *players[2] = {p1, p2};
    
        for (int i = 0; i < 2; i++) {
            const TeamBattle &team = players[i]->team();
    
            for (int j = 0; j < 6; j++) {
                bool lead = false;
    
                if (mode == ChallengeInfo::Singles) {
                    lead = j == 0;
                } else if (mode == ChallengeInfo::Doubles) {
                    lead = j <= 1;
                } else if (mode == ChallengeInfo::Triples) {
                    lead = j <= 2;
                }
    
                savePokemon(team.poke(j), lead, existingDirs[tier]);
            }
        }
    }
    
    /* Basically, we take the first 2 letters of the hash of the pokemon's raw data,
       and we open that file. We then put the pokemon if it isn't already in, and
       we also open another file with those two letters + _count, in which we write
       two numbers: the usage and the lead usage of the set.
    
       The reason for using low level functions is because this is a pretty critical
       section in my opinion for big servers, and C++ file management systems are
       pretty slow.
    */
    void PokemonOnlineStatsPlugin::savePokemon(const PokeBattle &p, bool lead, const QString &d)
    {
        QByteArray data = this->data(p);
    
        QByteArray file = (d + QCryptographicHash::hash(data, QCryptographicHash::Md5).toHex().left(3)).toUtf8();
    
        FILE *raw_f = fopen(file.data(), "r+b");
    
        if (!raw_f) {
            raw_f = fopen(file.data(), "w+b");
        }
    
        char buffer[bufsize];
    
        /* We look for the pokemon in the file. Read 28 bytes, compare, skip 8 bytes, read 28 bytes, ... */
        while (!feof(raw_f) && fread(buffer, sizeof(char), bufsize/sizeof(char), raw_f) == signed(bufsize) ) {
            if (memcmp(data.data(), buffer, bufsize) == 0) {
                break;
            }
            /* Not being interested by the count, so we seek forward */
            fseek(raw_f, 2*sizeof(qint32), SEEK_CUR);
        }
    
        qint32 usage(0), leadusage(0);
    
        /* The pokemon was never used before? */
        if (feof(raw_f)) {
            fseek(raw_f, 0, SEEK_END);
            fwrite(data.data(), sizeof(char), bufsize/sizeof(char), raw_f);
        } else {
            /* The pokemon was used before so there's already a count,
                so we read the count and then move back */
            fread(&usage, sizeof(qint32), 1, raw_f);
            fread(&leadusage, sizeof(qint32), 1, raw_f);
            /* Seek back to the place where the count is... */
            fseek(raw_f, -2*sizeof(qint32), SEEK_CUR);
        }
    
        usage += 1;
        leadusage += int(lead);
    
        /* Write the final counts */
        fwrite(&usage, sizeof(qint32), 1, raw_f);
        fwrite(&leadusage, sizeof(qint32), 1, raw_f);
    
        fclose(raw_f);
    }
    
    bool PokemonOnlineStatsPlugin::hasConfigurationWidget() const {
        return false;
    }
    Basically, what it does is, every time a pokemon is used in battle, a file is opened on the server for that pokemon (in that tier). In that file, the program looks for the pokemon's moveset. If the set already exists, the program ++'s its count. If not, it creates an entry for that set. In addition, the program ++'s the pokemon's usage count and (if relevant) lead usage count.

    The end result, it looks like, is a raw file and a formatted web page like this.

    This seems to me to be all that we would want and need. The question is whether this plugin (it looks like it's a plugin) is implemented on the Smogon PO server. From the comments in the source code, it looks like there could be good reason NOT to use it--it might be fairly taxing on the server, which might not be up to the task.

    Anyway, it's interesting.
  22. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,220
    Sorry to double-post.

    If all we have for raw data is the actual battle logs, it'll still be relatively easy to write a script to produce basic usage and lead usage stats. However, these stats won't give movesets, spreads and won't factor in pokemon that were never used in battle.

    Basically, for each battle, the following will need to be done:

    1. Identify the tier and whether the battle was rated.
    2. Find all lines beginning with <div class="SendOut">
    3. Identify the name of the trainer and the species of the pokemon sent out (THANK GOD we play with Species clause). This is a bit tricky because the string is different depending on whether the pokemon was nicknamed or not.
    4. Remove redundant entries (to account for switching)
    5. Write the species of all pokemon used in the battle to a file (write the species name twice if both trainers used it, obviously).
    6. Make another script. This one will take that giant file and simply tally each pokemon's usage (doing this step separately, rather than keeping a running tally, prevents racing conditions if you're parallelizing the workload).
    7. Sort the usage stats.
    8. PROFIT!!!

    This is an afternoon project, and only because I haven't done much with regular expressions before.
  23. Princess Bubblegum

    Princess Bubblegum

    Joined:
    Mar 2, 2011
    Messages:
    2,983
    I could actually see us doing statistics on site like PO does if we ever get this fixed out. Heck we could probably mix it in with the analysises in another tab.
  24. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,220
    Steps 1-5 done.

    Ran 'em against a sample dataset (just one day's battles). They're looking good.

    Update 5:15PM EDT...

    And I am DONE!

    Two python scripts:

    Steps 1-5:

    Code:
    import string
    import sys
    filename = str(sys.argv[1])
    file = open(filename)
    log = file.readlines()
    
    if (len(log) < 15):
    	sys.exit()
    #determine tier
    if log[2][0:25] != '<div class="TierSection">':
    	sys.exit()
    tier = log[2][string.find(log[2],"</b>")+4:len(log[2])-7]
    if log[3][0:19] == '<div class="Rated">':
    	rated = log[3][string.find(log[3],"</b>")+4:len(log[3])-7]
    else:
    	if log[5][0:19] == '<div class="Rated">':
    		rated = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
    	else:
    		print "Can't find the rating"
    		for line in range(0,15):
    			print line
    		sys.exit()
    
    #make sure the battle lasted at least six turns (to discard early forfeits)
    longEnough = False
    for line in log:
    	if line == '<div class="BeginTurn"><b><span style=\'color:#0000ff\'>Start of turn 6</span></b></div>\n':
    		longEnough = True
    		break
    if longEnough == False:
    	sys.exit()
    
    trainer = []
    species = []
    #find all "sent out" messages
    for line in range(6,len(log)):
    	if log[line][0:21] == '<div class="SendOut">':
    		ttemp = log[line][21:string.find(log[line],' sent out ')]
    
    		#determine whether the pokemon is nicknamed or not
    		if log[line][len(log[line])-8] == ')':
    			stemp = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
    		else:
    			stemp = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]
    
    		#determine whether this entry is already in the list
    		match = 0
    		for i in range(0,len(species)):
    			if (trainer[i] == ttemp) & (species[i] == stemp):
    				match = 1
    				break
    		if match == 0:
    			trainer.append(ttemp)
    			species.append(stemp)
    
    outname = "Raw/"+tier+" "+rated+".txt"
    outfile=open(outname,'a')
    for i in range(0,len(species)):
    	outfile.write(str(species[i]))
    	outfile.write("\n")
    outfile.write("---\n")
    outfile.close()
    
    Step 6:
    Code:
    import string
    import sys
    
    file = open("pokemons.txt")
    pokelist = file.readlines()
    file.close()
    
    lsnum = []
    lsname = []
    for line in range(0,len(pokelist)):
    	lsnum.append(pokelist[line][0:str.find(pokelist[line],':')])
    	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
    filename = str(sys.argv[1])
    file = open(filename)
    species = file.readlines()
    battleCount = 0
    counter = [0 for i in range(len(lsnum))]
    for entry in range(0,len(species)):
    	if species[entry] == "---\n":
    		battleCount=battleCount+1
    	else:
    		for i in range(0,len(lsnum)):
    			if species[entry] == lsname[i]:
    				counter[i]=counter[i]+1
    				break
    total = sum(counter)
    
    #for appearance-only form variations, we gotta manually correct (blegh)
    counter[172] = counter[172] + counter[173] #spiky pichu
    for i in range(507,534):
    	counter[202] = counter[202]+counter[i] #unown
    counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
    counter[413] = counter[413] + counter[551] + counter[552] #burmy
    counter[422] = counter[422] + counter[556]  #cherrim
    counter[423] = counter[423] + counter[557] #shellos
    counter[424] = counter[424] + counter[558] #gastrodon
    counter[615] = counter[615] + counter[616] #basculin
    counter[621] = counter[621] + counter[622] #darmanitan
    counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
    counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
    counter[721] = counter[721] + counter[722] #meloetta
    for i in range(507,534):
    	counter[i] = 0
    counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0
    
    print "Total battles: "+str(battleCount)
    print "Total pokemon: "+str(total)
    for i in range(len(lsnum)):
    	if (counter[i] > 0):
    		print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
    
    The above code refers to a file "pokemons.txt" which can be found in the db/Pokes folder of your PO download (might only be in the server binaries--not sure). In any case, you can download it here as well. When troubleshooting the line numbers for the multiple formes corrections, keep in mind that arrays start at zero in Python.

    Edit 5:42PM EDT...

    There's a minor flaw in the first script. Looks like it's missing some challenge cup logs that were formatted weird. I need to make sure that no LEGITIMATE matches are getting mis-filed.

    Edit 6:33PM EDT...
    Looks like all those mis-filed battles were custom-rule battles. Not gonna bother having my program count them to any kind of usage stats.

    Edit 7:15PM EDT...
    A thought occurs to me: sweeps and forfeits are going to screw up the stats, since unused pokemon don't show up. Leads are going to be very over-represented. We should probably implement some kind of cap where battles shorter than, say, five turns don't get counted (note that the only way a battle can be this short is if there's a forfeit). What do people think?
  25. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,220
    Just for lulz,

    BW OU Rated Stats for 7/31/11

    [Deleted because I forgot to divide by two]

    Two things to note: average of 9.8 pokemon appearing per battle. That's actually pretty good. Secondly, if the OU/UU cutoff was based on this alone (rather than usage over the past three months), Victini stays UU while Tangrowth and Mew move up to UU. Of course, there's nothing to guarantee that these stats are typical. I was mainly just posting them to show what I could do with the scripts.
Thread Status:
Not open for further replies.

Users Viewing Thread (Users: 0, Guests: 0)