Smogon University PO Statistics — June 2011

Status
Not open for further replies.
Whimsicott is getting Electivire syndrome, when it doesn't deserve it.

Whimsicott was overhyped at the start of the Gen, so everyone is saying that it's ass now to counteract that hype. But in reality, while Whimsicott isn't amazing, it's still certainly the best Momentum stealer in the entire game. That in itself is a quality of an OU Pokemon, and actually takes insane prediction to use at Maximum potential, particularly the set Stun Spore/Tailwind/Encore/U-Turn, which can steal momentum in nearly any situation.
 
I'm expecting to see politoed rise even more...as well as haxorus.
I'm also surprised to see so many freaking SD Garchomps in Ubers.
It is weird as fuck.
 
Whimsicott has a 75% chance of stopping any possibility of an Excadrill sweep, regardless of how many attack boosts so long as you have something that can OHKO it after, thanks to Stun Spore. I wouldn't say that it's useless, and I believe it deserves to be OU. This thing can dismantle stall with Subseed and cripple offense with Stun Spore, and it's a great big fluffball! What's not to like. :P
 
I'm expecting to see politoed rise even more...as well as haxorus.
I'm also surprised to see so many freaking SD Garchomps in Ubers.
It is weird as fuck.

politoed yeah haxorus no
while it's true haxorus is obviously far better than what everyone said about him a few months ago after the hype died out, i see haxorus as a pokemon of usage rank of around 20-30
sd chomp...yeah that shit fucking sucks imo
scarfchomp>>sdchomp
 
Whimsicott has big problems with Gyro Ball Ferrothorn which is THE stall pokemon. and imo he's not all that amazing against offense because he cripples one pokemon, and if they have Lum Berry (Dnite) you're screwed. Imo Whimsicott is usually death fodder. though he's AWESOME with memento and tailwind! (Though, Tailwind isn't as amazing as it is in doubles / triples)
 
Whimsicott has a 75% chance of stopping any possibility of an Excadrill sweep, regardless of how many attack boosts so long as you have something that can OHKO it after, thanks to Stun Spore.

Not to mention the chance of coming in on an SD and Encoring it, allowing another team member to take it out.
 
I was hoping for these to be compiled sooner given the lack of stats from last month (I know the server was down a lot, but still).

Given the fact this is 2011, the stats should be available for anyone to view at any time on a daily updated basis, because technology makes life easier and it exists, but I guess we're a bit primitive?

Anyway, I don't want to come off as offensive. It just seems motivation to get things done around Smogon is certainly not a high priority (in all areas, not just this matter).
 
I was hoping for these to be compiled sooner given the lack of stats from last month (I know the server was down a lot, but still).

Given the fact this is 2011, the stats should be available for anyone to view at any time on a daily updated basis, because technology makes life easier and it exists, but I guess we're a bit primitive?

Anyway, I don't want to come off as offensive. It just seems motivation to get things done around Smogon is certainly not a high priority (in all areas, not just this matter).

Remember, it's only one person collecting the stats. And running constantly updated stats like PO does would make it harder for Rising_Dusk (the stats dude) to create more detailed statistics (like Doug did in Gen 4), which Rising_Dusk wants to do in the future IIRC.

It seems pretty difficult for R_D to compile these stats that we have already, so taking it a step further is probably not going to happen soon.
 
Remember, it's only one person collecting the stats. And running constantly updated stats like PO does would make it harder for Rising_Dusk (the stats dude) to create more detailed statistics (like Doug did in Gen 4), which Rising_Dusk wants to do in the future IIRC.

It seems pretty difficult for R_D to compile these stats that we have already, so taking it a step further is probably not going to happen soon.
Yea, I completely understand how it can be hard for one guy to compile all the stats. I guess I just don't know why we have one guy doing it and not just software doing it with one guy maintaining it.

This process could be made easier for those behind the curtains and be made more efficient for the community, and that's not happening. I just don't quite understand the lack of logic here.
 
Would it be possible to make the raw data public? I'm a grad student--distilling data sets is kinda what I do, and I'd be really interested in seeing if I could develop some tools to help streamline the process / make things easier for R_D.

I guess I just don't know why we have one guy doing it and not just software doing it with one guy maintaining it.

I assume because someone would have to write the software...
 
There is no way to do Smogon server statistics like that: http://stats.pokemon-online.eu/ ?

Pokémon Online server got they own statistics updated every day (probably every day or every week...)

It's really bad that smogon server statistics don't be disponible since june, i understand that it's very hard to got the statistics and make the all topic, etc, but something should be done to make those statistics come to the players more fast x_x
 
There is no way to do Smogon server statistics like that: http://stats.pokemon-online.eu/ ?

Pokémon Online server got they own statistics updated every day (probably every day or every week...)

It's really bad that smogon server statistics don't be disponible since june, i understand that it's very hard to got the statistics and make the all topic, etc, but something should be done to make those statistics come to the players more fast x_x

No. I think it's fine the way it is. As long as we get told if we're not going to get them, or as long as they tell us if they're having issues with getting them.

It's only the 7th. In 4th gen we usually got them around the 10th I think, so this isn't a big deal.

In terms of getting them to us faster, you could offer to help out somehow, if you have the time. They should be happy for people to help them. :)
 
No. I think it's fine the way it is. As long as we get told if we're not going to get them, or as long as they tell us if they're having issues with getting them.

It's only the 7th. In 4th gen we usually got them around the 10th I think, so this isn't a big deal.

In terms of getting them to us faster, you could offer to help out somehow, if you have the time. They should be happy for people to help them. :)
There are actually a lot of people that want to help and are totally competent and capable of doing so...but their offers for help don't reach anyone if they aren't part of the smogon cliques.

Realistically though, the raw data should at least be available for download on the first of every month and the members that know how to use it should be free to do so.
 
I use Machamp on my team, the Substitute set is one of the most threatening responses to Tyranitar in the game, and with Payback + Ice Punch it's only really dealt with by stuff like Reuniclus and Volcorona, and it can deal significant damage to both if it gets a free hit. It's not as good a tank as Conkeldurr, but it can get around Gliscor, Dragonite, and gives the opponent a 50% chance of being screwed up in most circumstances.


I used the same moveset recently with an adamant nature and Maxt attack / HP ev's and... It's amazing! Even Reuniclus has trouble with confusion and he's quite bulky! I love Sub Machamp especially because he's "immune" to Ferrothorn and T-tar!
 
There is no way to do Smogon server statistics like that: http://stats.pokemon-online.eu/ ?

Pokémon Online server got they own statistics updated every day (probably every day or every week...)

It's really bad that smogon server statistics don't be disponible since june, i understand that it's very hard to got the statistics and make the all topic, etc, but something should be done to make those statistics come to the players more fast x_x

I believe we don't do the stats like PO does because Rising Disk has expressed interest in more detailed statistics (teammates, move usage) which the Pokemon Online program is not capable of generating. Whether it's worth it is debatable, but it doesn't look like it's going to change...
 
I thought it was known that PO's usage statistics added rated and non-rated battles together with no differentiation for each in stats?

This might be outdated, but I remember RD talking about that.
 
I downloaded the PO server package and started playing around with it a bit. It looks like that PO stats page is automatically generated, but, as people are saying, it's EXTREMELY lacking, not even differentiating between rated and unrated battles.

What would be ideal, I guess, is if the server kept a full log of each and every battle. Then anyone with access to the data could literally do any kind of analysis he or she wanted. For example, I've always been curious about what would happen if usage stats were weighted by ranking. Or what if you could look the "average kills per battle" for each pokemon? (of course, there's the stuff people ACTUALLY care about, like lead rankings, teammate usage, etc. etc.)

Of course, the flip side of this is that the dataset would be HUGE--if an average battle log is 10KB, and there are a million battles per month (there were 300K OU battles alone for June--not unreasonable), then that's 10 gigs of raw data per month to sift through. And, looking at a battle log file, it's not actually what you'd want--what you really want is a header that gives the full movesets of every pokemon on each team IN ADDITION to the log of the actual battle.

What's my point? It's that this is a huge freaking project. Clearly we want more than the very bare-bones usage stats that come built in to PO, but it's not clear to me, at least, what's actually OBTAINABLE given the current software. And the current software is kinda what we're stuck with. I don't think Smogon really has a dedicated enough community to do a complete (or even partial) rewrite of the PO code, nor do we really WANT to fork with PO and lead to the same compatibility problems that took out the server for most of July.

And with that, I'm going to shut up about this and stop theorizing until someone provides more information about what the situation is actually like, in terms of the raw data the PO software generates.
 
There are actually a lot of people that want to help and are totally competent and capable of doing so...but their offers for help don't reach anyone if they aren't part of the smogon cliques.

I know... I've been here for years, and seen that sort of thing time and time again. But it is better than it used to be, to be fair.

But put yourself in the position of those in the cliques. As I keep saying, a lot of them probably have jobs, etc, and don't have time to devote 6 hours a day to Smogon or whatever. So if they let people on this website who DO have a lot of spare time to help out, they'll end up taking over, so to speak.

It sounds a bit selfish and power-hungry, but at the same time, if I were in their situation, I'd probably be scared about letting other people help for the same reason, so I can't blame them.

I don't think Smogon really has a dedicated enough community to do a complete (or even partial) rewrite of the PO code.

I think you'd be pleasently surprised.
 
We ARE working on getting usage stats, so all hope isn't loss. Don't take this as a guarantee but we're doing everything in our power to find a solution.
 
I just started digging in the PO source code. There's a really interesting little file in there called usagestats.cpp:

Code:
#include <cstdio>
#include "usagestats.h"
#include "../Server/playerinterface.h"
#include "../PokemonInfo/battlestructs.h"

ServerPlugin * createPluginClass() {
    return new PokemonOnlineStatsPlugin();
}

PokemonOnlineStatsPlugin::PokemonOnlineStatsPlugin()
    : md5(QCryptographicHash::Md5)
{
    QDir d;
    d.mkdir("usage_stats");
    d.mkdir("usage_stats/raw");
    d.mkdir("usage_stats/formatted");
}

QString PokemonOnlineStatsPlugin::pluginName() const
{
    return "Usage Statistics";
}

inline int norm(int ev) {
    return (ev/4)*4;
}

/*
 * From here onward C functions are used in order to optimize
 * speed for that crucial problem, because we access files a lot.
 */

QByteArray PokemonOnlineStatsPlugin::data(const PokeBattle &p) const {
    QByteArray ret;
    ret.resize(bufsize);

    /* Constructs 28 bytes of raw data representing the pokemon */
    qint32 *a = (qint32*)ret.data();
    
    a[0] = (p.num().toPokeRef());
    a[1] = (p.item());
    a[2] = (p.ability() << 16) + (p.gender() << 8) + p.level();
    a[3] = (p.nature() << 24) + (norm(p.evs()[0]) << 16) + (norm(p.evs()[1]) << 8) + norm(p.evs()[2]);
    a[4] = (norm(p.evs()[3]) << 16) + (norm(p.evs()[4]) << 8) + norm(p.evs()[5]);
    a[5] = (p.dvs()[0] << 25) + (p.dvs()[1] << 20) + (p.dvs()[2] << 15) + (p.dvs()[3] << 10) + (p.dvs()[4] << 5) + p.dvs()[5];

    /* Here the moves are sorted because we don't want to have different
       movesets when moves are in a different order */
    quint16 *moves = (quint16*) (a + 6);
    moves[0] = p.move(0).num();
    moves[1] = p.move(1).num();
    moves[2] = p.move(2).num();
    moves[3] = p.move(3).num();
    qSort(&moves[0], &moves[4]);

    return ret;
}

void PokemonOnlineStatsPlugin::battleStarting(PlayerInterface *p1, PlayerInterface *p2, int mode, unsigned int &clauses, bool)
{
    /* We only keep track of battles between players of the same tier
       and not CC battles */
    if (clauses & ChallengeInfo::ChallengeCup)
        return;

    QString tier = p1->tier();

    if (p1->tier() != p2->tier()) {
        tier = QString("Mixed Tiers Gen %1").arg(p1->team().gen);
    }

    if (!existingDirs.contains(tier)) {
        QDir d;
        d.mkdir(QString("usage_stats/raw/%1").arg(tier));
        existingDirs[tier] = QString("usage_stats/raw/%1/").arg(tier);
    }

    PlayerInterface *players[2] = {p1, p2};

    for (int i = 0; i < 2; i++) {
        const TeamBattle &team = players[i]->team();

        for (int j = 0; j < 6; j++) {
            bool lead = false;

            if (mode == ChallengeInfo::Singles) {
                lead = j == 0;
            } else if (mode == ChallengeInfo::Doubles) {
                lead = j <= 1;
            } else if (mode == ChallengeInfo::Triples) {
                lead = j <= 2;
            }

            savePokemon(team.poke(j), lead, existingDirs[tier]);
        }
    }
}

/* Basically, we take the first 2 letters of the hash of the pokemon's raw data,
   and we open that file. We then put the pokemon if it isn't already in, and
   we also open another file with those two letters + _count, in which we write
   two numbers: the usage and the lead usage of the set.

   The reason for using low level functions is because this is a pretty critical
   section in my opinion for big servers, and C++ file management systems are
   pretty slow.
*/
void PokemonOnlineStatsPlugin::savePokemon(const PokeBattle &p, bool lead, const QString &d)
{
    QByteArray data = this->data(p);

    QByteArray file = (d + QCryptographicHash::hash(data, QCryptographicHash::Md5).toHex().left(3)).toUtf8();

    FILE *raw_f = fopen(file.data(), "r+b");

    if (!raw_f) {
        raw_f = fopen(file.data(), "w+b");
    }

    char buffer[bufsize];

    /* We look for the pokemon in the file. Read 28 bytes, compare, skip 8 bytes, read 28 bytes, ... */
    while (!feof(raw_f) && fread(buffer, sizeof(char), bufsize/sizeof(char), raw_f) == signed(bufsize) ) {
        if (memcmp(data.data(), buffer, bufsize) == 0) {
            break;
        }
        /* Not being interested by the count, so we seek forward */
        fseek(raw_f, 2*sizeof(qint32), SEEK_CUR);
    }

    qint32 usage(0), leadusage(0);

    /* The pokemon was never used before? */
    if (feof(raw_f)) {
        fseek(raw_f, 0, SEEK_END);
        fwrite(data.data(), sizeof(char), bufsize/sizeof(char), raw_f);
    } else {
        /* The pokemon was used before so there's already a count,
            so we read the count and then move back */
        fread(&usage, sizeof(qint32), 1, raw_f);
        fread(&leadusage, sizeof(qint32), 1, raw_f);
        /* Seek back to the place where the count is... */
        fseek(raw_f, -2*sizeof(qint32), SEEK_CUR);
    }

    usage += 1;
    leadusage += int(lead);

    /* Write the final counts */
    fwrite(&usage, sizeof(qint32), 1, raw_f);
    fwrite(&leadusage, sizeof(qint32), 1, raw_f);

    fclose(raw_f);
}

bool PokemonOnlineStatsPlugin::hasConfigurationWidget() const {
    return false;
}

Basically, what it does is, every time a pokemon is used in battle, a file is opened on the server for that pokemon (in that tier). In that file, the program looks for the pokemon's moveset. If the set already exists, the program ++'s its count. If not, it creates an entry for that set. In addition, the program ++'s the pokemon's usage count and (if relevant) lead usage count.

The end result, it looks like, is a raw file and a formatted web page like this.

This seems to me to be all that we would want and need. The question is whether this plugin (it looks like it's a plugin) is implemented on the Smogon PO server. From the comments in the source code, it looks like there could be good reason NOT to use it--it might be fairly taxing on the server, which might not be up to the task.

Anyway, it's interesting.
 
Sorry to double-post.

If all we have for raw data is the actual battle logs, it'll still be relatively easy to write a script to produce basic usage and lead usage stats. However, these stats won't give movesets, spreads and won't factor in pokemon that were never used in battle.

Basically, for each battle, the following will need to be done:

  1. Identify the tier and whether the battle was rated.
  2. Find all lines beginning with <div class="SendOut">
  3. Identify the name of the trainer and the species of the pokemon sent out (THANK GOD we play with Species clause). This is a bit tricky because the string is different depending on whether the pokemon was nicknamed or not.
  4. Remove redundant entries (to account for switching)
  5. Write the species of all pokemon used in the battle to a file (write the species name twice if both trainers used it, obviously).
  6. Make another script. This one will take that giant file and simply tally each pokemon's usage (doing this step separately, rather than keeping a running tally, prevents racing conditions if you're parallelizing the workload).
  7. Sort the usage stats.
  8. PROFIT!!!

This is an afternoon project, and only because I haven't done much with regular expressions before.
 
I could actually see us doing statistics on site like PO does if we ever get this fixed out. Heck we could probably mix it in with the analysises in another tab.
 
Steps 1-5 done.

Ran 'em against a sample dataset (just one day's battles). They're looking good.

Update 5:15PM EDT...

And I am DONE!

Two python scripts:

Steps 1-5:

Code:
import string
import sys
filename = str(sys.argv[1])
file = open(filename)
log = file.readlines()

if (len(log) < 15):
	sys.exit()
#determine tier
if log[2][0:25] != '<div class="TierSection">':
	sys.exit()
tier = log[2][string.find(log[2],"</b>")+4:len(log[2])-7]
if log[3][0:19] == '<div class="Rated">':
	rated = log[3][string.find(log[3],"</b>")+4:len(log[3])-7]
else:
	if log[5][0:19] == '<div class="Rated">':
		rated = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
	else:
		print "Can't find the rating"
		for line in range(0,15):
			print line
		sys.exit()

#make sure the battle lasted at least six turns (to discard early forfeits)
longEnough = False
for line in log:
	if line == '<div class="BeginTurn"><b><span style=\'color:#0000ff\'>Start of turn 6</span></b></div>\n':
		longEnough = True
		break
if longEnough == False:
	sys.exit()

trainer = []
species = []
#find all "sent out" messages
for line in range(6,len(log)):
	if log[line][0:21] == '<div class="SendOut">':
		ttemp = log[line][21:string.find(log[line],' sent out ')]

		#determine whether the pokemon is nicknamed or not
		if log[line][len(log[line])-8] == ')':
			stemp = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
		else:
			stemp = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]

		#determine whether this entry is already in the list
		match = 0
		for i in range(0,len(species)):
			if (trainer[i] == ttemp) & (species[i] == stemp):
				match = 1
				break
		if match == 0:
			trainer.append(ttemp)
			species.append(stemp)

outname = "Raw/"+tier+" "+rated+".txt"
outfile=open(outname,'a')
for i in range(0,len(species)):
	outfile.write(str(species[i]))
	outfile.write("\n")
outfile.write("---\n")
outfile.close()

Step 6:
Code:
import string
import sys

file = open("pokemons.txt")
pokelist = file.readlines()
file.close()

lsnum = []
lsname = []
for line in range(0,len(pokelist)):
	lsnum.append(pokelist[line][0:str.find(pokelist[line],':')])
	lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
counter = [0 for i in range(len(lsnum))]
for entry in range(0,len(species)):
	if species[entry] == "---\n":
		battleCount=battleCount+1
	else:
		for i in range(0,len(lsnum)):
			if species[entry] == lsname[i]:
				counter[i]=counter[i]+1
				break
total = sum(counter)

#for appearance-only form variations, we gotta manually correct (blegh)
counter[172] = counter[172] + counter[173] #spiky pichu
for i in range(507,534):
	counter[202] = counter[202]+counter[i] #unown
counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
counter[413] = counter[413] + counter[551] + counter[552] #burmy
counter[422] = counter[422] + counter[556]  #cherrim
counter[423] = counter[423] + counter[557] #shellos
counter[424] = counter[424] + counter[558] #gastrodon
counter[615] = counter[615] + counter[616] #basculin
counter[621] = counter[621] + counter[622] #darmanitan
counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
counter[721] = counter[721] + counter[722] #meloetta
for i in range(507,534):
	counter[i] = 0
counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0

print "Total battles: "+str(battleCount)
print "Total pokemon: "+str(total)
for i in range(len(lsnum)):
	if (counter[i] > 0):
		print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"

The above code refers to a file "pokemons.txt" which can be found in the db/Pokes folder of your PO download (might only be in the server binaries--not sure). In any case, you can download it here as well. When troubleshooting the line numbers for the multiple formes corrections, keep in mind that arrays start at zero in Python.

Edit 5:42PM EDT...

There's a minor flaw in the first script. Looks like it's missing some challenge cup logs that were formatted weird. I need to make sure that no LEGITIMATE matches are getting mis-filed.

Edit 6:33PM EDT...
Looks like all those mis-filed battles were custom-rule battles. Not gonna bother having my program count them to any kind of usage stats.

Edit 7:15PM EDT...
A thought occurs to me: sweeps and forfeits are going to screw up the stats, since unused pokemon don't show up. Leads are going to be very over-represented. We should probably implement some kind of cap where battles shorter than, say, five turns don't get counted (note that the only way a battle can be this short is if there's a forfeit). What do people think?
 
Just for lulz,

BW OU Rated Stats for 7/31/11

[Deleted because I forgot to divide by two]

Two things to note: average of 9.8 pokemon appearing per battle. That's actually pretty good. Secondly, if the OU/UU cutoff was based on this alone (rather than usage over the past three months), Victini stays UU while Tangrowth and Mew move up to UU. Of course, there's nothing to guarantee that these stats are typical. I was mainly just posting them to show what I could do with the scripts.
 
Status
Not open for further replies.
Back
Top