As I'm sure many of you have seen, I recently took a crack at generating usage stats.
Unlike my predecessors, however, the only raw data I was able to access were the battle logs stored on the server. These logs, which are pretty much identical to the ones that get generated client-side, leave a lot to be desired--they only show the pokemon that appeared in the battle itself, they don't contain natures/items/EV spreads/movesets, and they don't tell the players' current ranking.
Also, they're in HTML. Great for turning into warstories, pretty annoying for trying to cull data from.
But, nonetheless, I managed to write a few python scripts which turn these battle logs into usage stats (what we're going to end up DOING with these stats is a question for another thread), and I'm posting them here. Feel free to make suggestions as to how to modify them or improve them--I'll need all the help I can get.
To turn battle logs into usage stats, here's what needs to be done:
Identify the tier and whether the battle was rated.
Make sure the battle meets with any arbitrary criteria we decide upon ("longer than 5 turns," "player has rating above 1000", "loser said gg after the battle...")
Find all lines beginning with <div class="SendOut">
Identify the name of the trainer and the species of the pokemon sent out (THANK GOD we play with Species clause). This is a bit tricky because the string is different depending on whether the pokemon was nicknamed or not.
Remove redundant entries (to account for switching)
Write the species of all pokemon used in the battle to a file (write the species name twice if both trainers used it, obviously).
Make another script. This one will take that giant file and simply tally each pokemon's usage (doing this step separately, rather than keeping a running tally, prevents racing conditions if you're parallelizing the workload).
This script will take a battle log (server version 1.0.23) and write the names of all pokemon used in the battle to a file corresponding to the battle's tier.
Usage:
Code:
python LogReader.py "name-of-log-file.html"
Source:
Code:
import string
import sys
filename = str(sys.argv[1])
file = open(filename)
log = file.readlines()
if (len(log) < 15):
sys.exit()
#determine tier
if log[2][0:25] != '<div class="TierSection">':
sys.exit()
tier = log[2][string.find(log[2],"</b>")+4:len(log[2])-7]
if log[3][0:19] == '<div class="Rated">':
rated = log[3][string.find(log[3],"</b>")+4:len(log[3])-7]
else:
if log[5][0:19] == '<div class="Rated">':
rated = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
else:
print "Can't find the rating"
for line in range(0,15):
print line
sys.exit()
#make sure the battle lasted at least six turns (to discard early forfeits)
longEnough = False
for line in log:
if line == '<div class="BeginTurn"><b><span style=\'color:#0000ff\'>Start of turn 6</span></b></div>\n':
longEnough = True
break
if longEnough == False:
sys.exit()
#trainer = []
#species = []
ts = [] #handle in one array to allow for sorting
#find all "sent out" messages
for line in range(6,len(log)):
if log[line][0:21] == '<div class="SendOut">':
ttemp = log[line][21:string.find(log[line],' sent out ')]
#determine whether the pokemon is nicknamed or not
if log[line][len(log[line])-8] == ')':
stemp = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
else:
stemp = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]
#determine whether this entry is already in the list
match = 0
for i in range(0,len(ts)):
if (ts[i][0] == ttemp) & (ts[i][1] == stemp):
match = 1
break
if match == 0:
ts.append([ttemp,stemp])
ts=sorted(ts, key=lambda ts:ts[0])
outname = "Raw/"+tier+" "+rated+".txt"
outfile=open(outname,'a')
outfile.write(str(ts[0][0]))
outfile.write("\n")
i=0
while (ts[i][0] == ts[0][0]):
outfile.write(str(ts[i][1]))
outfile.write("\n")
i = i + 1
outfile.write("***\n")
outfile.write(str(ts[len(ts)-1][0]))
outfile.write("\n")
for j in range(i,len(ts)):
outfile.write(str(ts[j][1]))
outfile.write("\n")
outfile.write("---\n")
outfile.close()
Here's a new version that does quite a bit more--this one identifies not only usage but culls data for other "pokemetrics." It does this by keeping track of all matchups in a battle and the outcome of that matchup.
LogReaderOnCrack.py
Code:
import string
import sys
filename = str(sys.argv[1])
file = open(filename)
log = file.readlines()
if (len(log) < 15):
sys.exit()
oldWay = 1
#determine tier
if log[2][0:25] == '<div class="TierSection">':
tier = log[2][string.find(log[2],"</b>")+4:len(log[2])-7]
if log[3][0:19] == '<div class="Rated">':
rated = log[3][string.find(log[3],"</b>")+4:len(log[3])-7]
else:
if log[5][0:19] == '<div class="Rated">':
rated = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
else:
print "Can't find the rating for "+filename
for line in range(0,15):
print log[line]
sys.exit()
else:
if log[5][0:25] != '<div class="TierSection">':
print "Can't find the tier for "+filename
sys.exit()
tier = log[5][string.find(log[5],"</b>")+4:len(log[5])-7]
if log[6][0:19] == '<div class="Rated">':
rated = log[6][string.find(log[6],"</b>")+4:len(log[6])-7]
else:
if log[8][0:19] == '<div class="Rated">':
rated = log[8][string.find(log[8],"</b>")+4:len(log[8])-7]
else:
print "Can't find the rating for "+filename
for line in range(0,15):
print log[line]
sys.exit()
#make sure the battle lasted at least six turns (to discard early forfeits)
longEnough = False
for line in log:
if line == '<div class="BeginTurn"><b><span style=\'color:#0000ff\'>Start of turn 6</span></b></div>\n':
longEnough = True
break
if longEnough == False:
sys.exit()
#get info on the trainers & pokes involved
ts = []
skip = 0
if oldWay == 0:
for line in range(1,len(log)):
if log[line][0:19] == '<div class="Teams">':
for x in range(0,2):
trainer = log[line+x][50:string.rfind(log[line+x],"'s team:")]
if string.find(trainer,"send out") > -1:
print trainer+" is a dick."
sys.exit()
stemp = ""
for i in range(string.rfind(log[line+x],"</span></b>")+11,len(log[line+x])):
if log[line+x][i:i+3] == ' / ':
ts.append([trainer,stemp])
stemp=""
skip = 3
if log[line+x][i] == '<':
break
if skip > 0:
skip=skip-1
else:
stemp = stemp+log[line+x][i]
ts.append([trainer,stemp])
break
if (line == len(log)) or oldWay == 1: #it's an old log, so find pokes the old way
#find all "sent out" messages
for line in range(5,len(log)):
if log[line][0:21] == '<div class="SendOut">':
ttemp = log[line][21:string.find(log[line],' sent out ')]
#determine whether the pokemon is nicknamed or not
if log[line][len(log[line])-8] == ')':
stemp = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
else:
stemp = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]
#determine whether this entry is already in the list
match = 0
for i in range(0,len(ts)):
if (ts[i][0] == ttemp) & (ts[i][1] == stemp):
match = 1
break
if match == 0:
ts.append([ttemp,stemp])
ts=sorted(ts, key=lambda ts:ts[0])
#gotta fill in the gaps
i=0
while (ts[i][0] == ts[0][0]):
i=i+1
if i<6:
for j in range(i,6):
ts.append([ts[0][0],"???"])
ts=sorted(ts, key=lambda ts:ts[0])
if len(ts)<12:
i=len(ts)
for j in range(i,12):
ts.append([ts[6][0],"???"])
#find where battle starts
active = [-1,-1]
t=0
for line in range(1,len(log)):
if log[line][0:21] == '<div class="SendOut">':
for x in range(0,2):
#ID trainer
trainer = log[line+x][21:string.find(log[line+x],' sent out ')]
if trainer == ts[0][0]:
t=0
else:
t=1
#it matters whether the poke is nicknamed or not
if log[line+x][len(log[line+x])-8] == ')':
species = log[line+x][string.rfind(log[line+x],'(')+1:len(log[line+x])-8]
else:
species = log[line+x][string.rfind(log[line+x],'sent out ')+9:len(log[line+x])-8]
for i in range(0,6):
if species == ts[6*t+i][1]:
active[t] = i
break
break
start = line +2
#metrics get declared here
turnsOut = [] #turns out on the field (a measure of stall)
matchups = [] #poke1, poke2, what happened
for i in range(0,12):
turnsOut.append(0)
#parse the damn log
#flags
roar = 0
uturn = 0
ko = 0
switch = 0
doubleSwitch = -1
uturnko = 0
ignore = 0
for line in range(start,len(log)):
#identify what kind of message is on this line
linetype = log[line][12:string.find(log[line],'">')]
if linetype == "BeginTurn":
#reset for start of turn
roar = uturn = switch = ko = uturnko = 0
doubleSwitch = -1
#Mark each poke as having been out for an additional turn
turnsOut[active[0]]=turnsOut[active[0]]+1
turnsOut[active[1]+6]=turnsOut[active[1]+6]+1
if linetype == "UseAttack": #check for Roar, etc.; U-Turn, etc.
#identify move
move = log[line][string.rfind(log[line],"'>")+2:len(log[line])-19]
if move in ["Roar","Whirlwind","Circle Throw","Dragon Tail"]:
roar = 1
elif move in ["U-Turn","Volt Switch","Baton Pass"]:
if line+3 < len(log):
if log[line][12:string.find(log[line],'">')] == "SendBack":
uturn = 1
elif linetype == "ItemMessage": #check for Red Card, Eject Button
#search for relevant items
if string.rfind(log[line],"Red Card") > -1:
roar = 1
elif string.rfind(log[line],"Eject Button") > -1:
uturn = 1
elif linetype == "Ko": #KO
ko = ko+1
#make sure it's not the end of the battle
o = p = 0
if line+2 < len(log):
o = 1
if line+1 < len(log):
p = 1
if log[line+2*o][12:string.find(log[line+2*o],'">')] == "BattleEnd":
pokes = [ts[active[0]][1],ts[active[1]+6][1]]
matchup=pokes[0]+' vs. '+pokes[1]+': '
if ko == 1:
matchup = matchup + ts[active[t]+6*t][1] + " was KOed"
elif ko == 2:
matchup = matchup + "double down"
else:
matchup = matchup + "no clue what happened"
matchups.append(matchup)
elif log[line+p][12:string.find(log[line+p],'">')] == "SendBack":
uturnko=1
elif linetype == "SendBack": #switch out
switch = 1
elif linetype == "SendOut":
#ID trainer
trainer = log[line][21:string.find(log[line],' sent out ')]
if trainer == ts[0][0]:
t=0
else:
t=1
#make sure it's not a double-switch
o = 0
if line+2 < len(log):
o = 1
if ignore == 1:
ignore = 0
elif (o == 1) and (log[line+2*o][12:string.find(log[line+2*o],'">')] == "SendBack"):
doubleSwitch = active[t]+t*6
else:
#close out old matchup
if doubleSwitch > -1:
pokes = [ts[active[0]][1],ts[doubleSwitch][1]]
else:
pokes = [ts[active[0]][1],ts[active[1]+6][1]]
pokes=sorted(pokes, key=lambda pokes:pokes)
matchup=pokes[0]+' vs. '+pokes[1]+': '
if doubleSwitch > -1:
matchup = matchup + "double switch"
elif (uturnko == 1):
matchup = matchup + ts[active[(t+1)%2]+((t+1)%2)*6][1] + " was u-turn KOed"
ignore = 1
elif ko == 1:
matchup = matchup + ts[active[t]+6*t][1] + " was KOed"
elif ko == 2:
matchup = matchup + "double down"
ignore = 1
elif roar == 1:
matchup = matchup + ts[active[t]+6*t][1] + " was forced out"
elif (uturn == 1) or (switch == 1):
matchup = matchup + ts[active[t]+6*t][1] + " was switched out"
else:
matchup = matchup + "no clue what happened"
matchups.append(matchup)
#new matchup!
uturn = roar = 0
#it matters whether the poke is nicknamed or not
if log[line][len(log[line])-8] == ')':
species = log[line][string.rfind(log[line],'(')+1:len(log[line])-8]
else:
species = log[line][string.rfind(log[line],'sent out ')+9:len(log[line])-8]
for i in range(0,6):
if species == ts[6*t+i][1]:
active[t] = i
break
outname = "Raw/"+tier+" "+rated+".txt"
outfile=open(outname,'a')
outfile.write(str(ts[0][0]))
outfile.write("\n")
i=0
while (ts[i][0] == ts[0][0]):
outfile.write(ts[i][1]+" ("+str(turnsOut[i])+")\n")
i = i + 1
outfile.write("***\n")
outfile.write(str(ts[len(ts)-1][0]))
outfile.write("\n")
for j in range(i,len(ts)):
outfile.write(ts[j][1]+" ("+str(turnsOut[j])+")\n")
outfile.write("@@@\n")
for line in matchups:
outfile.write(line+"\n")
outfile.write("---\n")
outfile.close()
Change Log
2011/09/14 -- Instead of just writing the species names, this version gives the trainer's names, too, and divides up the pokemon used in the battle by their teams.
Once the LogReader has been run over the set of battle logs, you're left with a bunch of pokemon names and not much else. StatCounter.py tallies these lists and turns them into usage stats.
I'm planning to modify this script soon to have the end result appear in a forum-friendly option, rather than the excel-friendly csv it currently does.
Usage:
Code:
python StatCounter.py "Raw/[Tier].txt"
where [Tier] is the tier you want to generate the stats for, e.g. "Raw/Standard OU Rated.txt"
Source:
Code:
import string
import sys
file = open("pokemons.txt")
pokelist = file.readlines()
file.close()
lsnum = []
lsname = []
for line in range(0,len(pokelist)):
lsnum.append(pokelist[line][0:str.find(pokelist[line],':')])
lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
teamCount = 0
counter = [0 for i in range(len(lsnum))]
trainerNextLine=True
for entry in range(0,len(species)):
found = False
if trainerNextLine:
trainer = species[entry]
trainerNextLine = False
ctemp = []
else:
if species[entry] == "***\n" or species[entry] == "---\n":
trainerNextLine = True
#decide whether to count the team or not
#if you were going to compare the trainer name against a database,
#you'd do it here.
if len(ctemp) == 6: #only count teams with all six pokemon
for i in ctemp:
counter[i] = counter[i]+1.0 #rather than weighting equally, we
#could use the trainer ratings db to weight these...
teamCount = teamCount+1
if species[entry] == "---\n":
battleCount=battleCount+1
else:
for i in range(0,len(lsnum)):
if species[entry] == lsname[i]:
ctemp.append(i)
found = True
break
if not found:
print species[entry]+" not found!"
sys.exit()
total = sum(counter)
#for appearance-only form variations, we gotta manually correct (blegh)
counter[172] = counter[172] + counter[173] #spiky pichu
for i in range(507,534):
counter[202] = counter[202]+counter[i] #unown
counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
counter[413] = counter[413] + counter[551] + counter[552] #burmy
counter[422] = counter[422] + counter[556] #cherrim
counter[423] = counter[423] + counter[557] #shellos
counter[424] = counter[424] + counter[558] #gastrodon
counter[615] = counter[615] + counter[616] #basculin
counter[621] = counter[621] + counter[622] #darmanitan
counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
counter[721] = counter[721] + counter[722] #meloetta
for i in range(507,534):
counter[i] = 0
counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0
#sort by usage
pokes = []
for i in range(0,len(lsname)):
pokes.append([lsname[i][0:len(lsname[i])-1],counter[i]])
pokes=sorted(pokes, key=lambda pokes:-pokes[1])
print " Total battles: "+str(battleCount)
print " Total teams: "+str(teamCount)
print " Total pokemon: "+str(total)
print " + ---- + --------------- + ------ + ------- + "
print " | Rank | Pokemon | Usage | Percent | "
print " + ---- + --------------- + ------ + ------- + "
for i in range(0,len(pokes)):
if pokes[i][1] == 0:
break
print ' | %-4d | %-15s | %-6d | %6.3f%% |' % (i+1,pokes[i][0],pokes[i][1],100.0*pokes[i][1]/teamCount)
#csv output
#for i in range(len(lsnum)):
# if (counter[i] > 0):
# print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
Change log
2011/09/14 -- Updated for compatibility with new LogReader.py (and to make use of the new data). Also, this implementation only counts teams with six pokemon (but you can comment that part out quite easily).
Old version that writes as csv
Code:
import string
import sys
file = open("pokemons.txt")
pokelist = file.readlines()
file.close()
lsnum = []
lsname = []
for line in range(0,len(pokelist)):
lsnum.append(pokelist[line][0:str.find(pokelist[line],':')])
lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
counter = [0 for i in range(len(lsnum))]
for entry in range(0,len(species)):
if species[entry] == "---\n":
battleCount=battleCount+1
else:
for i in range(0,len(lsnum)):
if species[entry] == lsname[i]:
counter[i]=counter[i]+1
break
total = sum(counter)
#for appearance-only form variations, we gotta manually correct (blegh)
counter[172] = counter[172] + counter[173] #spiky pichu
for i in range(507,534):
counter[202] = counter[202]+counter[i] #unown
counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
counter[413] = counter[413] + counter[551] + counter[552] #burmy
counter[422] = counter[422] + counter[556] #cherrim
counter[423] = counter[423] + counter[557] #shellos
counter[424] = counter[424] + counter[558] #gastrodon
counter[615] = counter[615] + counter[616] #basculin
counter[621] = counter[621] + counter[622] #darmanitan
counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
counter[721] = counter[721] + counter[722] #meloetta
for i in range(507,534):
counter[i] = 0
counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0
print "Total battles: "+str(battleCount)
print "Total pokemon: "+str(total)
for i in range(len(lsnum)):
if (counter[i] > 0):
print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
StatCounter1337.py
This version only counts teams where the user had a rating greater than or equal to 1337 at the time I pulled the player rankings. An example "ranking.txt" is included below.
Usage:
Code:
python StatCounter1337.py Raw/[Tier].txt
Sample "rankings.txt" (all this info is publicly available on the Smogon server, so I don't feel bad about posting it):
import string
import sys
file = open("pokemons.txt")
pokelist = file.readlines()
file.close()
lsnum = []
lsname = []
for line in range(0,len(pokelist)):
lsnum.append(pokelist[line][0:str.find(pokelist[line],':')])
lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
file = open("rankings.txt")
ratings = file.readlines()
file.close()
elite = []
for line in ratings:
if int(line[str.rfind(line,'\t')+1:len(line)-1]) < 1337:
break
elite.append(line[str.find(line,'\t')+1:str.rfind(line,'\t')])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
teamCount = 0
counter = [0 for i in range(len(lsnum))]
trainerNextLine=True
for entry in range(0,len(species)):
found = False
if trainerNextLine:
trainer = species[entry][0:len(species[entry])-1]
trainerNextLine = False
ctemp = []
else:
if species[entry] == "***\n" or species[entry] == "---\n":
trainerNextLine = True
#decide whether to count the team or not
#if you were going to compare the trainer name against a database,
#you'd do it here.
if trainer in elite:
#if len(ctemp) == 6: #only count teams with all six pokemon
for i in ctemp:
counter[i] = counter[i]+1.0 #rather than weighting equally, we
#could use the trainer ratings db to weight these...
teamCount = teamCount+1
if species[entry] == "---\n":
battleCount=battleCount+1
else:
for i in range(0,len(lsnum)):
if species[entry] == lsname[i]:
ctemp.append(i)
found = True
break
if not found:
print species[entry]+" not found!"
sys.exit()
total = sum(counter)
#for appearance-only form variations, we gotta manually correct (blegh)
counter[172] = counter[172] + counter[173] #spiky pichu
for i in range(507,534):
counter[202] = counter[202]+counter[i] #unown
counter[352] = counter[352] + counter[553] + counter[554] + counter[555] #castform--if this is an issue, I will be EXTREMELY surprised
counter[413] = counter[413] + counter[551] + counter[552] #burmy
counter[422] = counter[422] + counter[556] #cherrim
counter[423] = counter[423] + counter[557] #shellos
counter[424] = counter[424] + counter[558] #gastrodon
counter[615] = counter[615] + counter[616] #basculin
counter[621] = counter[621] + counter[622] #darmanitan
counter[652] = counter[652] + counter[653] + counter[654] + counter[655] #deerling
counter[656] = counter[656] + counter[657] + counter[658] + counter[659] #sawsbuck
counter[721] = counter[721] + counter[722] #meloetta
for i in range(507,534):
counter[i] = 0
counter[173] = counter[553] = counter[554] = counter[555] = counter[551] = counter[552] = counter[556] = counter[557] = counter[558] = counter[616] = counter[622] = counter[653] = counter[654] = counter[655] = counter[657] = counter[658] = counter[659] = counter[722] = 0
#sort by usage
pokes = []
for i in range(0,len(lsname)):
pokes.append([lsname[i][0:len(lsname[i])-1],counter[i]])
pokes=sorted(pokes, key=lambda pokes:-pokes[1])
print " Total battles: "+str(battleCount)
print " Total teams: "+str(teamCount)
print " Total pokemon: "+str(int(total))
print " + ---- + --------------- + ------ + ------- + "
print " | Rank | Pokemon | Usage | Percent | "
print " + ---- + --------------- + ------ + ------- + "
for i in range(0,len(pokes)):
if pokes[i][1] == 0:
break
print ' | %-4d | %-15s | %-6d | %6.3f%% |' % (i+1,pokes[i][0],pokes[i][1],100.0*pokes[i][1]/total*6.0)
#csv output
#for i in range(len(lsnum)):
# if (counter[i] > 0):
# print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
StatCounterOnCrack.py
This version is designed to function with "LogReaderOnCrack.py" and generates a table containing not only usage stats, but two relevant "pokemetrics." It also creates an "encounter matrix" that keeps track of what happens when two pokemon go head-to-head, but I'm not sure how it process it yet (hence, pickle it for later).
Usage:
Code:
python StatCounter.py "Raw/[Tier].txt" matrix.p
where matrix.p is the name of the file where you're going to dump your matrix.
Source:
Code:
import string
import sys
import cPickle as pickle
file = open("pokemons.txt")
pokelist = file.readlines()
file.close()
lsnum = []
lsname = []
for line in range(0,len(pokelist)):
lsnum.append(pokelist[line][0:str.find(pokelist[line],':')])
lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])])
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
battleCount = 0
teamCount = 0
counter = [0 for i in range(len(lsnum))]
realCounter = [0 for i in range(len(lsnum))]
turnCounter = [0 for i in range(len(lsnum))]
encounterMatrix = [[[0 for k in range(9)] for j in range(len(lsnum))] for i in range(len(lsnum))]
trainerNextLine=True
eventNextLine=False
for entry in range(0,len(species)):
found = False
if trainerNextLine:
trainer = species[entry]
trainerNextLine = False
ctemp = []
turnt = []
elif eventNextLine:
if species[entry] == "---\n":
eventNextLine = False
trainerNextLine = True
battleCount=battleCount+1
else:
poke1 = species[entry][0:string.find(species[entry]," vs.")]
poke2 = species[entry][string.find(species[entry]," vs.")+5:string.find(species[entry],":")]
event = species[entry][string.find(species[entry],":")+2:len(species[entry])-1]
#ID pokemon involved
for i in range(0,len(lsnum)):
if poke1+"\n" == lsname[i]:
break
if i == len(lsnum):
print poke1+" not found!"
sys.exit()
for j in range(0,len(lsnum)):
if poke2+"\n" == lsname[j]:
break
if j == len(lsnum):
print poke2+" not found!"
sys.exit()
#ID event type
e = f = -1
if (event == "double down"):
e = f = 2
elif (event == "double switch"):
e = f = 5
elif (event == "no clue what happened"):
e = f = 8
else:
poke = event[0:string.find(event," was")]
event2 = event[len(poke)+5:len(event)]
p = 1
if poke1 == poke:
p = 0
elif poke2 != poke:
print "Houston, we have a problem."
print entry
sys.exit()
if (event2 == "KOed") or (event2 == "u-turn KOed"):
e = p
f = (p+1)%2
elif (event2 == "switched out"):
e = p+3
f = (p+1)%2+3
elif (event2 == "forced out"):
e = p+6
f = (p+1)%2+6
else:
print "Houston, we have a problem."
print entry
sys.exit()
encounterMatrix[i][j][e] = encounterMatrix[i][j][e]+1
encounterMatrix[j][i][f] = encounterMatrix[j][i][f]+1
elif species[entry] == "***\n" or species[entry] == "@@@\n":
if species[entry] == "***\n":
trainerNextLine = True
else:
eventNextLine = True
#decide whether to count the team or not
#if you were going to compare the trainer name against a database,
#you'd do it here.
#if len(ctemp) == 6: #only count teams with all six pokemon
for i in range(len(ctemp)):
counter[ctemp[i]] = counter[ctemp[i]]+1.0 #rather than weighting equally, we
turnCounter[ctemp[i]] = turnCounter[ctemp[i]]+turnt[i]
if turnt[i] > 0:
realCounter[ctemp[i]] = realCounter[ctemp[i]]+1.0
#could use the trainer ratings db to weight these...
teamCount = teamCount+1
else:
stemp = species[entry][0:string.rfind(species[entry]," (")]+"\n"
turns = eval(species[entry][string.rfind(species[entry]," (")+2:string.rfind(species[entry],")")])
if stemp != "???\n":
for i in range(0,len(lsnum)):
if stemp == lsname[i]:
ctemp.append(i)
turnt.append(turns)
found = True
break
if not found:
print stemp+" not found!"
sys.exit()
total = sum(counter)
for i in range(len(lsnum)):
if realCounter[i] > 0:
turnCounter[i] = turnCounter[i]/realCounter[i]
pickle.dump(encounterMatrix,open(sys.argv[2],"w"))
pokes = []
for i in range(0,len(lsname)):
pokes.append([lsname[i][0:len(lsname[i])-1],counter[i],realCounter[i],0,turnCounter[i]])
for j in range(0,len(lsname)):
pokes[i][3] = pokes[i][3] + encounterMatrix[i][j][1]+encounterMatrix[i][j][2]
if pokes[i][2] > 0:
pokes[i][3] = pokes[i][3]/pokes[i][2]
#for appearance-only form variations, we gotta manually correct (blegh)
for j in range(1,5):
pokes[172][j] = pokes[172][j] + pokes[173][j] #spiky pichu
for i in range(507,534):
pokes[202][j] = pokes[202][j]+pokes[i][j] #unown
pokes[352][j] = pokes[352][j] + pokes[553][j] + pokes[554][j] + pokes[555][j] #castform--if this is an issue, I will be EXTREMELY surprised
pokes[413][j] = pokes[413][j] + pokes[551][j] + pokes[552][j] #burmy
pokes[422][j] = pokes[422][j] + pokes[556][j] #cherrim
pokes[423][j] = pokes[423][j] + pokes[557][j] #shellos
pokes[424][j] = pokes[424][j] + pokes[558][j] #gastrodon
pokes[615][j] = pokes[615][j] + pokes[616][j] #basculin
pokes[621][j] = pokes[621][j] + pokes[622][j] #darmanitan
pokes[652][j] = pokes[652][j] + pokes[653][j] + pokes[654][j] + pokes[655][j] #deerling
pokes[656][j] = pokes[656][j] + pokes[657][j] + pokes[658][j] + pokes[659][j] #sawsbuck
pokes[721][j] = pokes[721][j] + pokes[722][j] #meloetta
for i in range(507,534):
pokes[i][j] = 0
pokes[173][j] = pokes[553][j] = pokes[554][j] = pokes[555][j] = pokes[551][j] = pokes[552][j] = pokes[556][j] = pokes[557][j] = pokes[558][j] = pokes[616][j] = pokes[622][j] = pokes[653][j] = pokes[654][j] = pokes[655][j] = pokes[657][j] = pokes[658][j] = pokes[659][j] = pokes[722][j] = 0
#sort by usage
pokes=sorted(pokes, key=lambda pokes:-pokes[1])
l=1
print " Total battles: "+str(battleCount)
print " Total teams: "+str(teamCount)
print " Total pokemon: "+str(int(total))
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print " | Rank | Pokemon | Usage | KOs/b | Turns/b| Percent | "
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
for i in range(0,len(pokes)):
if pokes[i][1] == 0:
break
print ' | %-4d | %-15s | %-6d | %6.3f | %6.3f | %6.3f%% |' % (l,pokes[i][0],pokes[i][1],pokes[i][3],pokes[i][4],100.0*pokes[i][1]/total*6.0)
l=l+1
l=1
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print
print "Sorted by KOs/battle"
pokes=sorted(pokes, key=lambda pokes:-pokes[3])
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print " | Rank | Pokemon | Usage | KOs/b | Turns/b| Percent | "
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
for i in range(0,len(pokes)):
if pokes[i][1] == 0:
break
if (pokes[i][1] > 100) or (100.0*pokes[i][1]/total*6.0 > 1.0): #otherwise you get all sorts of silliness
print ' | %-4d | %-15s | %-6d | %6.3f | %6.3f | %6.3f%% |' % (l,pokes[i][0],pokes[i][1],pokes[i][3],pokes[i][4],100.0*pokes[i][1]/total*6.0)
l=l+1
l=1
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print
print "Sorted by Turns in/battle"
pokes=sorted(pokes, key=lambda pokes:-pokes[4])
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
print " | Rank | Pokemon | Usage | KOs/b | Turns/b| Percent | "
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
for i in range(0,len(pokes)):
if pokes[i][1] == 0:
break
if (pokes[i][1] > 100) or (100.0*pokes[i][1]/total*6.0 > 1.0): #otherwise you get all sorts of silliness
print ' | %-4d | %-15s | %-6d | %6.3f | %6.3f | %6.3f%% |' % (l,pokes[i][0],pokes[i][1],pokes[i][3],pokes[i][4],100.0*pokes[i][1]/total*6.0)
l=l+1
print " + ---- + --------------- + ------ + ------ + ------ + ------- + "
#csv output
#for i in range(len(lsnum)):
# if (counter[i] > 0):
# print lsnum[i]+","+lsname[i][0:len(lsname[i])-1]+","+str(counter[i])+","+str(round(100.0*counter[i]/battleCount/2,5))+"%"
Putting it all together, I wrote a bash script to compile stats for the entire month on my Linux computer.
The computer has multiple processor cores, so I did some parallelizing to make use of them.
File Structure:
RunMe.sh sits in a folder with my two python scripts.
The month's battle logs are all in a folder called "2011-08".
In that folder are sub-folders for each day's logs (example: "2011-08-05").
Back in the main folder where the scripts sit, there are two empty folders, called "Raw" and "Usage". "Raw" will contain the lists of pokemon, while "Usage" will contain the stats.
Usage:
Code:
$./RunMe.sh
Source:
Code:
rm Raw/*
rm Stats/*
maxjobs=6 #set to number of multiprocessors
for i in 2011-08/*
do
for j in "$i"/*
do
jobcnt=(`jobs -p`)
while [ ${#jobcnt[@]} -ge $maxjobs ]
do
jobcnt=(`jobs -p`)
done
echo Processing $j
python LogReader.py "$j" &
done
#serial version:
# for j in "$i"/*
# do
# echo Processing $j
# python LogReader.py "$j"
# done
done
wait
#stupid tier name changes--gotta consolidate...
cat "Raw/BW LC Rated.txt" >> "Raw/Standard LC Rated.txt"
cat "Raw/BW LC Unrated.txt" >> "Raw/Standard LC Unrated.txt"
cat "Raw/BW OU Rated.txt" >> "Raw/Standard OU Rated.txt"
cat "Raw/BW OU Unrated.txt" >> "Raw/Standard OU Unrated.txt"
cat "Raw/BW UU Rated.txt" >> "Raw/Standard UU Rated.txt"
cat "Raw/BW UU Unrated.txt" >> "Raw/Standard UU Unrated.txt"
cat "Raw/BW RU Rated.txt" >> "Raw/Standard RU Rated.txt"
cat "Raw/BW RU Unrated.txt" >> "Raw/Standard RU Unrated.txt"
cat "Raw/BW Uber Rated.txt" >> "Raw/Standard Ubers Rated.txt"
cat "Raw/BW Uber Unrated.txt" >> "Raw/Standard Ubers Unrated.txt"
rm "Raw/BW*.txt"
echo Compiling Stats...
for i in Raw/*; do python StatCounter.py "$i" > "Stats/${i/Raw}" ; done
For a file in the "Raw" folder (list of pokemon used in battle), this script will generate a list of the number of pokemon used in each battle. This list can then be imported into an analysis program for easy binning and plotting.
Usage:
Code:
python PPB.py "Raw/[Tier].txt" > [output.dat]
Example plot:
Source:
Code:
import string
import sys
filename = str(sys.argv[1])
file = open(filename)
species = file.readlines()
ppb=0
for entry in range(0,len(species)):
if species[entry] == "---\n":
print ppb
ppb = 0
else:
ppb = ppb+1
TableReader.py
Reads in a standard Smogon usage table and turns it into a csv sorted by species. Useful for comparing pokemon usage from month to month, or for doing statistics on multiple months.
Usage:
Code:
python TableReader.py file.txt
Source:
Code:
import string
import sys
file = open("pokemons.txt")
pokelist = file.readlines()
file.close()
lsname = []
for line in range(0,len(pokelist)):
lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])-1])
filename = str(sys.argv[1])
file = open(filename)
table=file.readlines()
counter = [0 for i in range(len(lsname))]
for i in range(5,len(table)):
j=26
found = False
while found == False:
j=j-1
if table[i][j] != ' ':
found = True
name = table[i][10:j+1]
found = False
for j in range(0,len(lsname)):
if name == lsname[j]:
counter[j]=eval(table[i][string.rfind(table[i],' ',0,40)+1:43])
found = True
break
if found == False:
print name+" not found..."
sys.exit()
for i in range(len(lsname)):
print lsname[i][0:len(lsname[i])]+","+str(counter[i])
RemoveRedundancy.py
I modified the PO source code in a really simple way to get it to write player rankings to the console (which can, of course, be redirected to file) whenever I view them on PO (getting around the "can't copy/paste" issue). Unfortunately, you still need to navigate through each and every page of the rankings in order to get all the stats, and rapid-paging is flagged by the PO server as being "overactive," and it has a tendency to kick you. This means you have to log back in, RELOAD the page, and find where you left off. If you do this, you end up with a LOT of redundant entries. This script will remove the redundant entries.
Usage:
Code:
python RemoveRedundancy.py filename.txt
Source:
Code:
import string
import sys
filename = str(sys.argv[1])
file = open(filename)
ranking=file.readlines()
i=0
while i < len(ranking):
rank = int(ranking[i][0:str.find(ranking[i],'\t')])
if rank < i+1:
del ranking[i]
else:
if rank > i+1:
print "You screwed up."
sys.exit()
else:
i=i+1
for line in ranking:
print line[0:len(line)-1]
ThreeMonth.py
This script combines data from the previous three months, with weighting given by the ratio of 20-3-1. It needs the CSVs generated by TableReader.py as its inputs, although if I were less lazy, i could rewrite it to work with the standard usage tables.
import csv
import string
import sys
may = []
jun = []
aug = []
maycsv = csv.reader(open(sys.argv[1], 'rb'), delimiter=',')
for line in maycsv:
may.append([line[0],eval(line[1])])
juncsv = csv.reader(open(sys.argv[2], 'rb'), delimiter=',')
for line in juncsv:
jun.append([line[0],eval(line[1])])
augcsv = csv.reader(open(sys.argv[3], 'rb'), delimiter=',')
for line in augcsv:
aug.append([line[0],eval(line[1])])
counter = [0 for i in range(len(aug))]
for i in range(0,len(aug)):
counter[i] = (1.0*may[i][1] + 3.0*jun[i][1] + 20.0*aug[i][1])/24.0
#counter[i] = (3.0*jun[i][1] + 20.0*aug[i][1])/23.0
pokes = []
for i in range(0,len(aug)):
pokes.append([aug[i][0],counter[i]])
pokes=sorted(pokes, key=lambda pokes:-pokes[1])
print
print
print
print " + ---- + --------------- + ------ + ------- + "
print " | Rank | Pokemon | Usage | Percent | "
print " + ---- + --------------- + ------ + ------- + "
for i in range(0,len(pokes)):
if pokes[i][1] == 0:
break
print ' | %-4d | %-15s | %-6d | %6.3f%% |' % (i+1,pokes[i][0],0,pokes[i][1])
PullOU.py
This code takes the usage tables and pulls out the list of pokemon that have a usage greater than 3.41%.
Usage:
Code:
python PullOU.py filename.txt
Source:
Code:
import string
import sys
file = open("pokemons.txt")
pokelist = file.readlines()
file.close()
lsname = []
for line in range(0,len(pokelist)):
lsname.append(pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])-1])
filename = str(sys.argv[1])
file = open(filename)
table=file.readlines()
counter = [0 for i in range(len(lsname))]
for i in range(6,len(table)):
j=26
found = False
while found == False:
j=j-1
if table[i][j] != ' ':
found = True
name = table[i][10:j+1]
found = False
for j in range(0,len(lsname)):
if name == lsname[j]:
counter[j]=eval(table[i][string.rfind(table[i],' ',0,40)+1:43])
found = True
break
if found == False:
print name+" not found..."
sys.exit()
outstring = ''
for i in range(0,len(lsname)):
if counter[i] > 3.41:
outstring = outstring+"\n"+lsname[i][0:len(lsname[i])]
print outstring
Tiers.py
If you take the lists of pokemon in each tier from PullOU.py and put them in one file, you don't *quite* have a tier list yet, since pokemon that moved up a tier will be shown twice, and pokemon that moved down a tier will disappear altogether. So this program takes the previous tier list and the current "tier list" (as generated through PullOU.py and some fancy concatenation--you'll need to put in Ubers and BL yourself) and generates a NEW tier list, perfect for posting on forums. Note that the old and new tier list files that the program takes as inputs are NOT in the same format. I've included two sample files for reference.
Source (since I've got UBB stuff in here, you're going to want to look at the source--hit the "quote" button):
Code:
import string
import sys
#read in files
file = open("pokemons.txt")
pokelist = file.readlines()
file.close()
file=open(str(sys.argv[1]))
curList = file.readlines() #current lists
file.close()
file=open(str(sys.argv[2]))
oldList = file.readlines() #previous cycle's tiers
file.close()
#parse files into tier lists
curUber = []
curOU = []
curBL = []
curUU = []
curRU = []
tn = 0
for line in curList:
if line == '\n':
tn = tn+1
elif tn == 0:
curUber.append(line)
elif tn == 1:
curOU.append(line)
elif tn == 2:
curBL.append(line)
elif tn == 3:
curUU.append(line)
elif tn == 4:
curRU.append(line)
else:
print "You screwed up, bub."
sys.exit()
oldUber = []
oldOU = []
oldBL = []
oldUU = []
tn = 0
for line in oldList:
if line == '\n':
tn = tn+1
elif tn == 0:
oldUber.append(line)
elif tn == 1:
oldOU.append(line)
elif tn == 2:
oldBL.append(line)
elif tn == 3:
oldUU.append(line)
else:
print "You screwed up, bub."
sys.exit()
tiers = []
for line in range(0,len(pokelist)):
tn = 5
name = pokelist[line][str.find(pokelist[line],' ')+1:len(pokelist[line])-1]
#identify current tier
for i in range(0,len(curUber)):
if name == curUber[i][0:len(curUber[i])-1]:
tn = 0
break
if tn == 5:
for i in range(0,len(curOU)):
if name == curOU[i][0:len(curOU[i])-1]:
tn = 1
break
if tn == 5:
for i in range(0,len(curBL)):
if name == curBL[i][0:len(curBL[i])-1]:
tn = 2
break
if tn == 5:
for i in range(0,len(curUU)):
if name == curUU[i][0:len(curUU[i])-1]:
tn = 3
break
if tn == 5:
for i in range(0,len(curRU)):
if name == curRU[i][0:len(curRU[i])-1]:
tn = 4
break
#make sure the poke isn't "NU" because it fell down a tier
if tn == 5:
otn = 5
for i in range(0,len(oldUber)):
if name == oldUber[i][0:len(oldUber[i])-1]:
otn = 0
break
if otn == 5:
for i in range(0,len(oldOU)):
if name == oldOU[i][0:len(oldOU[i])-1]:
otn = 1
break
if tn == 5:
for i in range(0,len(oldBL)):
if name == oldBL[i][0:len(oldBL[i])-1]:
otn = 2
break
if tn == 5:
for i in range(0,len(oldUU)):
if name == oldUU[i][0:len(oldUU[i])-1]:
otn = 3
break
#no need to search RU
if otn == 0:
tn = 1 #not that I think anyone if coming off the Ubers list
elif otn == 1:
tn = 3 #OU to UU (we don't want 'em going straight to BL)
elif otn == 2:
tn = 3 #coming off the BL list. Put 'em back in UU
elif otn == 3:
tn = 4 #UU to RU
#get name of tier
if tn == 0:
tier = 'Uber'
elif tn == 1:
tier = 'OU'
elif tn == 2:
tier = 'BL'
elif tn == 3:
tier = 'UU'
elif tn == 4:
tier = 'RU'
elif tn == 5:
tier = 'NU'
tiers.append([tn,name,tier])
tiers=sorted(tiers, key=lambda tiers:tiers[0])
print 'Uber\n
Code:
'
print tiers[0][1]
for i in range(1,len(tiers)):
if tiers[i][0] == 5:
break
if tiers[i][0] > tiers[i-1][0]:
print '
Smogon isn't really friendly to developers, isn't it?
Anyway, do you have any idea on what to do with this stats? I have a similiar problem: i made a script that converts PO binary usage stats into MySQL db, which allows generating any kind of statistics, yet I can't think of anything usefull...
Last edited by whitefag; Sep 12th, 2011 at 8:41:05 AM.
i made a script that converts PO binary usage stats...
do you have any specific knowledge of whether the Smogon server is still generating this data? Because if it is, with your help, I'll be able to parse it, and all the problems described in the thread above will vanish.
do you have any specific knowledge of whether the Smogon server is still generating this data? Because if it is, with your help, I'll be able to parse it, and all the problems described in the thread above will vanish.
I'm pretty sure it is, it's done by server plugin and since Smogon provides limited usage stats each month, I assume they collect it. You need too ask the new server administartor for that though.
I used Beta's stats since they are always available.
As for the script, here's the package (nevermind russian, just press the big black button).
It's my second python script (after Hello, world!), so it might be coded pretty poorly.
The idea is pretty simple: It converts PO's binary files directly into MyISAM files (this is the fastest way) and adds necessary files (db structure and Index file) from templates so it can be used by MySQL.
The data there is exactly as it present in PO's files, here's the code that generates it.
Last edited by whitefag; Sep 12th, 2011 at 11:46:42 AM.
Reason: whoops, pyc
While this is obviously pretty cool, I have one question: does this only take into account battles where 'Save Log' is on? Cause then the stats would be kinda off...
While this is obviously pretty cool, I have one question: does this only take into account battles where 'Save Log' is on? Cause then the stats would be kinda off...
I do not believe so. It's already been shown that the server and the client software produce slightly different battle logs (client version 1.0.30 gives the full teams, while no version of the server does so currently), so I really doubt the server is querying whether the users have opted to save their battle logs.
But the only way to be 100% sure would be do dig around in the PO source code, and--I'll be honest--I'm not going to be doing that.
I've uploaded my current scripts to a shiny new github repo. If you have the desire to contribute / modify any of my code, feel free to contribute through there.