Data Data-Driven Team Building & Theorymon

elrod · Jul 1, 2014

EDIT:
I'll rework this opening post in a little while.
For now, see here to look at an introduction to what the code is currently doing.
My first attempt was simply writing a script to prove to myself I could do things with the data, and get something out here so I could start getting feedback.
Ideas, things to add, etc.
Thanks for the help so far!

Working on another update.

I've deleted the content of this post, as I've completely changed how my script functions making the old descriptions potentially misleading.

Red Cat · Jul 1, 2014

It seems like most of your teams have multiple Pokemon meant to mega evolve, but of course you can only mega evolve one Pokemon per game, so I think your algorithm needs to be adjusted so that you can't have both Charizard and Mawile on the same team for example because no good team will ever have both.

Edit: Don't automatically rule out multiple potential megas since TTar, Gyarados, and Scizor have viable base forms.

Sanger Zonvolt · Jul 1, 2014

I'm not sure if your algorithm can account for this, but I feel I should mention that your top coverage teams and your hardest to counter teams both generate teams with at least 3 Pokemon weak to stealth rock with Dragonite being the only possible defogger in most of them.

Boltaway · Jul 1, 2014

Unfortunately, few of these teams look viable and that's largely due to synergy -- as two people already said, both with megastones and type matchups (especially stealth rock). In your scripts, you should also have access to the Teammates part of the moveset statistics, correct? Doing that, you could get a good sense of what pokemon synergize best with one another.

The other problem is that you're talking about the hardest to counter teams and conflating that with a team made up of the hardest to counter pokemon. The other problem, previously alluded to, is that counters don't exist just in other Pokemon, but in field mechanics like hazards, which I don't think the way movesets are currently doing can account for.

elrod · Jul 2, 2014

Thanks for the comments!
Its a work in progress, in need of your help to get it somewhere.

Mega problem
What is the best way to fix the mega-problem?
I redid it, this time counting the total percentage of "ite" items pokemon are holding. If over 180% for a team, that team gets thrown out. I thought 180% sounded like a decent cutoff; for example if a team has two pokemon that are mega 70% of the time, it isn't unreasonable to have one of them running some other set.
Perhaps there'd be a better way to do this?

I'll post the updated outcomes when done.

Stealth rock
Hmm, I could add a dictionary of pokemon and types, as well as another of resistances and weaknesses, so that it can calculate how many pokemon are weak to stealth rock on a team.
However, to actually use this information...
-Are there any good heuristics I can use to easily throw out teams that are too vulnerable?
-Or is it a lot more complicated than that, in which case I can consider moves (throw out teams not using pokemon with moderate %rapid spin/defog use)?
-Or come up with some other way to punish teams for high stealth rock weakness?

Access to team mates
Yes, I have access to all of the data Antar has made available.
Right now, I'm thinking I'll work on creating likelihoods to pick between teams, as these would be much easier to adjust. Eg, for % team mates we'd just then have to come up with corresponding probabilities of pokemon that don't normally go together making a poor team and pokemon that are typically found together making a strong one.

This would make it easier to incorporate all sorts of data.

If anyone has ideas on how to quantify if something is or isn't a good idea, that'd be really helpful.
Even if that is "automatically throw out teams that don't have >1 pokemon that regularly uses stealth rock", etc.

Results eliminating extra mega-pokemon:
Top Coverage Teams:
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Bisharp', 'Gyarados', 'Mawile') Score: -74.2569996406
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Gyarados', 'Mawile', 'Mamoswine') Score: -74.4007861395
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Gyarados', 'Mawile') Score: -74.4564655793
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Conkeldurr', 'Gyarados', 'Mawile') Score: -74.4900417561
Team: ('Greninja', 'Aegislash', 'Azumarill', 'Dragonite', 'Gyarados', 'Mawile') Score: -74.61094256
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Breloom', 'Gyarados', 'Mawile') Score: -74.6491935389
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Bisharp', 'Gyarados', 'Mawile') Score: -74.7108587937
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Excadrill', 'Gyarados', 'Mawile') Score: -74.7227088438
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Gyarados', 'Mawile', 'Thundurus') Score: -74.7798683197
Team: ('Azumarill', 'Dragonite', 'Bisharp', 'Gyarados', 'Mawile', 'Mamoswine') Score: -74.7976794484

Hardest to Counter Teams:
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Gyarados', 'Mawile') Score: 9.18601577177
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Bisharp', 'Gyarados', 'Mawile') Score: 9.22551304811
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Conkeldurr', 'Gyarados', 'Mawile') Score: 9.24010970015
Team: ('Greninja', 'Talonflame', 'Azumarill', 'Dragonite', 'Gyarados', 'Mawile') Score: 9.26790979701
Team: ('Aegislash', 'Talonflame', 'Dragonite', 'Bisharp', 'Gyarados', 'Mawile') Score: 9.28462196122
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Breloom', 'Gyarados', 'Mawile') Score: 9.28511007305
Team: ('Aegislash', 'Talonflame', 'Dragonite', 'Conkeldurr', 'Gyarados', 'Mawile') Score: 9.29921861326
Team: ('Charizard', 'Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Gyarados') Score: 9.30114881163
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Bisharp', 'Gyarados', 'Mawile') Score: 9.32034910874
Team: ('Greninja', 'Aegislash', 'Talonflame', 'Dragonite', 'Gyarados', 'Mawile') Score: 9.32701871012

[There was a mistake in my calculation of total scores, so I've taken it out.]

TheTabKey · Jul 2, 2014

Mega problem
What is the best way to fix the mega-problem?
I redid it, this time counting the total percentage of "ite" items pokemon are holding. If over 180% for a team, that team gets thrown out. I thought 180% sounded like a decent cutoff; for example if a team has two pokemon that are mega 70% of the time, it isn't unreasonable to have one of them running some other set.
Perhaps there'd be a better way to do this?

I'll post the updated outcomes when done.

First off, I gotta say this is impressive.

But, consider each mega it's own Pokemon (so there would be two instances of Mega Charizard running through the scans, not just one, and each mega would be considered separately from it's normal form) and check if the word "-Mega" appears more than once per team, and if it does then toss the team?

Idk. I'm not very knowledgeable on the whole Smogon Data thing, this idea may be complete trash lol.

JebusChrist · Jul 2, 2014

Looks like you still have gyarados and mawile together on every team, you might need to be stricter on the mega requirements (I don't think regular gyarados is that good)

~~Also just add Aegislash to every team and only change the other 5 slots~~

Valzy · Jul 2, 2014

Can you do this with only 5 Pokemon instead of 6 pokemon?

elrod · Jul 2, 2014

I completely redid the code. You can look at it here.
The script is now using three files - the list of 1825rankings, and the unweighted and 1825-weighted .json files.

It is also now utilizing likelihood functions, which:
a) Makes it easy to add multiple different types of evidence, updating the probability assigned to each team combo (think Bayes theorem).
b) Makes the output criteria less arbitrary.

I plan to add a lot more (please provide recommendations). What is included:
a) likelihood of team sweeping the defending pokemon
b) likelihood of the team resisting being swept by opposing pokemon
c) likelihood of the team having only 0 or 1 mega-pokemon
d) likelihood of the team having > 1 pokemon that uses stealth rock.

"a)" and "b)" each count as 6 pieces of evidence. This was somewhat arbitrarily chosen, as I set it equal to the number of pokemon on a team. It was originally number of pokemon on the team * number of pokemon I was comparing them with, which largely drowned out "c)" and "d)" - which both count as 1 piece of evidence, each.

Deciding just how important each factor is in influencing team performance is a big part in optimizing the team-creation process!
So - thoughts on significance?
Note: using only these four means I am saying everything else is meaningless...so, what else can I include?

Here are the top 10 using the first three:
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Bisharp', 'Mawile') ; Relative Prob: 1.0
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Bisharp', 'Gyarados') ; Relative Prob: 0.969177104779
Team: ('Aegislash', 'Talonflame', 'Garchomp', 'Azumarill', 'Dragonite', 'Mawile') ; Relative Prob: 0.932609468025
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Gengar', 'Mawile') ; Relative Prob: 0.918101690112
Team: ('Aegislash', 'Talonflame', 'Garchomp', 'Azumarill', 'Dragonite', 'Gyarados') ; Relative Prob: 0.917761364716
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Mawile', 'Mamoswine') ; Relative Prob: 0.907914903625
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Breloom', 'Mawile') ; Relative Prob: 0.901367146208
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Excadrill', 'Mawile') ; Relative Prob: 0.894146887271
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Bisharp', 'Mawile') ; Relative Prob: 0.891824057313
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Gengar', 'Gyarados') ; Relative Prob: 0.88980746823

"Relative prob" gives the likelihood relative to the highest probability team. Ie, the program thinks the first team is 1 / 0.88980746823 = 1.12 times more likely to be best than the 10th team.

Here are the top 10 using all four of the criterion:
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Mawile', 'Mamoswine') ; Relative Prob: 1.0
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Gyarados', 'Mamoswine') ; Relative Prob: 0.983416019567
Team: ('Talonflame', 'Garchomp', 'Azumarill', 'Dragonite', 'Mawile', 'Mamoswine') ; Relative Prob: 0.945794630088
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Heatran', 'Mawile') ; Relative Prob: 0.943664774345
Team: ('Talonflame', 'Garchomp', 'Azumarill', 'Dragonite', 'Gyarados', 'Mamoswine') ; Relative Prob: 0.930118751397
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Heatran', 'Mawile') ; Relative Prob: 0.926222623957
Team: ('Garchomp', 'Azumarill', 'Dragonite', 'Bisharp', 'Mawile', 'Mamoswine') ; Relative Prob: 0.920339475432
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Heatran', 'Gyarados') ; Relative Prob: 0.914264801667
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Heatran', 'Gyarados') ; Relative Prob: 0.911309262293
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Heatran', 'Bisharp', 'Mawile') ; Relative Prob: 0.910602752733

The algorithm really likes certain pokemon. The likes of Aegislash make sense, but some of the others?
Gyarados has a low enough "ite" percentage that it isn't punished too heavily for being included in teams with Mawile, but this method all but rules out a mawile + charizard pairing.

Being able to make use of conditional probaiblities would of course help substantially. On a team with few pokemon that are likely to use stealth rock, of course the one that can is likely to use it.
Mega-gyarados might cover mega-mawile's counters particularly well, but that isn't useful. (Hypothetically; I haven't looked into this pairing at all).
Etc.

Valzy,
No problem; the new code is substantially faster so running different variations takes hardly any time.
Using "a)", "b)", and "c)" only:
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Mawile') ; Relative Prob: 1.0
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Gyarados') ; Relative Prob: 0.969176304163
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Bisharp', 'Mawile') ; Relative Prob: 0.956267427996
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Bisharp', 'Gyarados') ; Relative Prob: 0.926782082882
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Bisharp', 'Mawile') ; Relative Prob: 0.904432598327
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Mawile') ; Relative Prob: 0.891824057313
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Gengar', 'Mawile') ; Relative Prob: 0.877950741842
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Gyarados') ; Relative Prob: 0.87761558759
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Bisharp', 'Gyarados') ; Relative Prob: 0.876554283739
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Mawile', 'Mamoswine') ; Relative Prob: 0.868209449728

Using all four:
Team: ('Garchomp', 'Azumarill', 'Dragonite', 'Mawile', 'Mamoswine') ; Relative Prob: 1.0
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Heatran', 'Mawile') ; Relative Prob: 0.997748077991
Team: ('Garchomp', 'Azumarill', 'Dragonite', 'Gyarados', 'Mamoswine') ; Relative Prob: 0.983414822559
Team: ('Aegislash', 'Azumarill', 'Dragonite', 'Heatran', 'Gyarados') ; Relative Prob: 0.966652272053
Team: ('Aegislash', 'Garchomp', 'Dragonite', 'Mawile', 'Mamoswine') ; Relative Prob: 0.950442356279
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Heatran', 'Mawile') ; Relative Prob: 0.943664774345
Team: ('Aegislash', 'Garchomp', 'Dragonite', 'Gyarados', 'Mamoswine') ; Relative Prob: 0.934679261821
Team: ('Garchomp', 'Azumarill', 'Dragonite', 'Heatran', 'Mawile') ; Relative Prob: 0.926222623957
Team: ('Talonflame', 'Azumarill', 'Dragonite', 'Heatran', 'Gyarados') ; Relative Prob: 0.914263671686
Team: ('Garchomp', 'Azumarill', 'Dragonite', 'Heatran', 'Gyarados') ; Relative Prob: 0.911308153053

Pyritie · Jul 2, 2014

TheTabKey said:
Idk. I'm not very knowledgeable on the whole Smogon Data thing, this idea may be complete trash lol.

The problem is that the stats only count what item the pokemon was holding and not whether it was a mega or not. Since people could do things like make a pokemon hold its mega stone but never mega evolve it, make a team with several and so can only mega one of them, and other stuff.

And you can't split the stats by item, so you can't see what percentage of charizard holding charizardite X know earthquake, for example, only what percentage of all charizards know earthquake

Outrageous Fortune · Jul 2, 2014

Another problem with this is that it's based purely on pokemon that check/counter other pokemon or pokemon that have good coverage in the meta (at least this is how I understood it, correct me if I'm wrong). Consider that in none of the outputs did deo-d ever come up, despite it being an S-tier threat. Clearly this is because it doesn't check or counter anything in particular, and it rarely does damage. I'm not sure how you could fix this though, sorry about being that guy who finds a problem and doesn't have the answer.

Karpi · Jul 2, 2014

What I wonder about is, for example, when Tyranitar shows up. The things it counters and is countered by according to the stats are a combination of all of its wildly divergent sets (Assault Vest, Mega, SR setter), and if I'm correct, the way your script interprets Tyranitar is as a pokemon who is able to fulfill all of those roles simultaneously, ie counter most special attackers and then set up and sweep. This obviously isn't a problem for pokemon like Mawile who is essentially always the same.

I think in order to solve that you would need to have more detailed counter data, such as saying X type of Tyranitar counters Charizard-Y but is countered by Breloom, and Y type of Tyranitar etc etc.

If you've somehow accounted for the fact that some pokemon can have multiple roles represented in the counter stats, ignore all of this.

Valzy · Jul 2, 2014

I actually tried out a team based off your data and so far it's done fairly well

http://replay.pokemonshowdown.com/ou-137284607
http://replay.pokemonshowdown.com/ou-137287006
http://replay.pokemonshowdown.com/ou-137287534
http://replay.pokemonshowdown.com/ou-137288501
http://replay.pokemonshowdown.com/ou-137290668

elrod · Jul 3, 2014

TheTabKey, Pyritie, and Karpi:
You are all correct. Ideally, I would count each move set as its own pokemon. However, the data is averaged by pokemon so I can't even tell the difference between Charizard-X and -Y, although I can see what the % figures are for both.
The problem with averaging is that it has a tendency to produce unholy monstrosities not found in nature.

So the result is, the algorithm doesn't see Charizard-X thinks Mega-Manectric is set up fodder but gets crushed by the same Quagsire Charizard-Y knocks silly.
What it sees and thinks is that Charizard is an okay counter for Manectric and Quagsire, but also at some risk of getting countered by the latter and walled by Chansey.

None of that is true. That pokemon does not exist, or, if it does, it is not named "Charizard".
Perfectly true with the Tyranitar example as well.

There is of course also a bias in the data, in that Charizard-X is much more likely to be switched into a Mega-Manectric than Charizard-Y is, while an opposing Chansey is much more likely to try its luck with the -Y than the -X.
What I mean is, the data is likely to reflect some point between the total averaging effect I describe above, and the "effective/ineffective against everything" sort of thing mentioned by Karpi due to the actual humans using the pokemon biasing their use towards what is actually effective.

Outrageous Fortune:
Yes. This reveals another problem with what my script was doing.
It looked at overall average effectiveness in killing/chasing, not the actual amount of work a pokemon does for the team. Deoxies-D can be a real workhorse, but isn't likely to directly kill or check anything himself.
Hell, even if it does - the stealthrock it set kills their Dragonite on switch in while your Derpmon was out, Derpmon gets the credit for the kill.

I was also looking at overall effectiveness of everything vs everything. Not reflecting how people actually play this game.
What does this mean?
If something is somewhat "effective" (ie, kills/chases which does not necessarilly equal actual performancein the game) against a huge sample, but not particularly effective against anything, the algorithm will love it.

But in reality, you want a team to be good at something. You want a team that does work.
If the opponent has mega-scizor, you don't want 6 pokemon who each have a 45% chance of killing/checking it. (likelihood my algorithm would calculate for you killing/chasing the scizor: 97%).
You would much rather choose to have two pokemon with an 70% chance and the other four only 20% each (likelihood: 95%).
Even though the latter team is more likely to get swept, why would you prefer Team Diversity over Team Homogeneous?
Because you won't throw pokemon at their scizor at random.
Likelihood of the teams losing the following number of pokemon or less to the scizor, before chasing/killing it:
0: Team V 70%, Team H 45%
1: Team V 91%, Team H 70%
2: Team V 93%, Team H 83%

My algorithm ranked Team V as worse, but these numbers look way better!!!
Because, in reality, you want to expend as little as possible in dealing with any given threat. Otherwise, the fact that the opponent gets to switch pokemon will be your downfall.

It is easy to see that the teams the script spit out have a preponderance of revenge killers for this very reason. Every single pokemon on the "best" team not considering stealthrock use was a priority user:
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Dragonite', 'Bisharp', 'Mawile') ; Relative Prob: 1.0
While, when considering stealthrock...
Team: ('Aegislash', 'Garchomp', 'Azumarill', 'Dragonite', 'Mawile', 'Mamoswine') ; Relative Prob: 1.0
That was the case for all but Garchomp, whose most commonly held item is choice scarf.

So, my fix:
I will only consider the three best pokemon on the team for answering each threat.
This will also superficially address the problem of treating all variants of each pokemon as the same, by making it less likely to include something if it isn't exceptional at any particular roles. A slight bias against pokemon the script has trouble understanding may be best way to treat them, for now.

I still haven't thought of a good way for the script to consider pokemon like Deoxies-D.

PS.
I've switched to using June's stats. I'll work on making this modification, and then post the results.

I AM THE JAMAICAN BOBSLED TEAM · Jul 3, 2014

maybe you can add a script that finds how often items are used and what moves are used with said item?

elrod · Jul 3, 2014

Smog Frog
I'm getting my data on move and item usage from here. Unfortunately, it doesn't list info in a way that lets me see the conditional probabilities of moves given items, or items given moves.
What I can use is dependent on what Antar's data scraping can produce.

I looked a bit through some of his threads, and IIRC there were some sort of limitations preventing him from producing that sort of data.

Tsaeb XIII · Jul 3, 2014

Love this concept - just thought I'd share some thoughts on metrics you could try using. I haven't examined your code, so apologies if I've misunderstood your explanations and you're already doing some/all of this.

1) Instead of just filtering the top three team members, why not take the square root of the sum of squares (or the n'th root of the sum of n'th powers, depending on how sharply you want the values to drop off)? This way a team that has six 100% counters against a Pokemon would still score better against it than a team with 3 100% counters, but a team with 1 100% counter will outperform a team with a number of less effective checks/counters (using square/square root, 1 100% counter is equivalent to 4 50%'s, which doesn't feel too bad intuitively).

2) Similarly, when adding the performance of the team against all Pokemon, you should apply a similar system of weighting. The only catch is that you need to include an additional weighting based on how common a particular Pokemon is, so that a team is penalised less harshly for having little response to an uncommon threat than a common threat.

3a) To account for the effectiveness of Deo-D, perhaps you can give each Pokemon a 'teammate grade' based on the effectiveness of its teammates. Pokemon X's teammate grade might be the average of the effectiveness of its top 5 most common teammates weighted by how commonly they appear together. Applying an appropriate scaling function (possibly exponential, so that Pokemon with average teammate grades don't get a substantial boost but any particularly outstanding support 'mon gets boosted), you could then include the teammate grade as a factor in optimising the team.
3b) As a modification of the above, you could assume that each team will have, say, 1 "glue" Pokemon. After the algorithm selects the first 5 team members, it would then look at the teammate statistics for the 5 Pokemon already chosen and select the most probable teammate that is not already included. If the optimised team consisted of members who typically benefit from reliable hazards, it is highly likely that Deo-D would then be selected as the team's glue.

Antar · Jul 4, 2014

This is really neat stuff, and exactly the sort of application I imagined for my detailed usage stats. I would love to see your source code--you should set up a github repo, if you haven't already done so.

Here's one way I think we can evaluate how successful your work is: we should take a bunch of real teams from the ladder and evaluate their "answering ability" and "vulnerability" scores and see how these metrics numbers correlate to, say, Elo/Glicko rating. Basically, we would expect that the better a team's metrics, the better the team performs. If that's not the case, well then, that gives us something to work on! Due to privacy concerns, I can't give you access to PS teams (without the players' consent, and I think this is something best run in bulk over thousands, if not tens of thousands, of teams), but if you give me access to your code (or at least an executable and an API), I can run it on my end.

Again, amazing work. Keep it up.

Valzy · Jul 4, 2014

Antar said:
This is really neat stuff, and exactly the sort of application I imagined for my detailed usage stats. I would love to see your source code--you should set up a github repo, if you haven't already done so.

Here's one way I think we can evaluate how successful your work is: we should take a bunch of real teams from the ladder and evaluate their "answering ability" and "vulnerability" scores and see how these metrics numbers correlate to, say, Elo/Glicko rating. Basically, we would expect that the better a team's metrics, the better the team performs. If that's not the case, well then, that gives us something to work on! Due to privacy concerns, I can't give you access to PS teams (without the players' consent, and I think this is something best run in bulk over thousands, if not tens of thousands, of teams), but if you give me access to your code (or at least an executable and an API), I can run it on my end.

Again, amazing work. Keep it up.

He did paste the code here
http://pastebin.com/45QwBqz3

Stryke · Jul 6, 2014

elrod and Antar,

A while back I worked on something similar: a tool to give naive counters to your team as well as suggestions for additions/replacements. What was really helpful was when Antar gave me access to encounterMatrices files that contained a huge matrix of data with elements of the form

encounterMatrix[poke1][poke2][#]

0: poke1 was KOed 1: poke2 was KOed 2: double down 3: poke1 was switched out 4: poke2 was switched out 5: double switch 6: poke1 was forced out 7: poke2 was forced out 8: poke1 was u-turn KOed 9: poke2 was u-turn KOed 10: poke1 was foddered 11: poke2 was foddered 12: no clue what happened.

I found this much easier and faster to use than the 'Checks and Counters' section of the json data, since that data cuts off after (#1+#4+#9+#11)/(sum(encounterMatrix[poke1][poke2])) reaches 60% which can mean that certain pokemon appear to be countered by Venusaur as well as Unown.

To counter this issue, I found InteractionWinPercentage for each of the team pokemon and all pokemon, weighted by the opposing pokemon's usage. I then took the highest value on the team as the TeamWinPercentage against that pokemon, and tried to find which team addition would increase TeamWinPercentage the most.

I've attached my two old scripts, and the encounterMatrices from December are located here http://sim.smogon.com:8080/Stats/2013-12/encounterMatrices/ (presuming Antar is still okay with them being public). It would be nice to have newer data, though.

Information about my two old scripts

WeightCounterFile.py: parses the encounterMatrix pickle file into a more useful and faster-reading form for both sweepage and countering
CounterValues.py: Adds values in a manner similar to your script, suggesting the pokemon that increase the sweep coverage of your team the most and the pokemon that increases the counter coverage of your team the most.

I hope this helps.

P.S. I ended up stopping the project because I couldn't think of a solution to the fact that the data is somewhat biased against pokemon with multiple commonly-used movesets: while one moveset may counter a certain pokemon very well, others do not, and as a result, it appears that the pokemon 'wins' interactions 55% of the time.

elrod · Jul 7, 2014

I played 15 games with a team generated from my most recent script. My record is 14 - 1, but a) I played far from optimally and b) my opponents probably did too, because creating a new account meant their ratings were all rather low.
My name is "data scientist".

The learning curve for a computer - generated team is going to be steeper. When creating a team, you do so with around an idea of how it is supposed to play. Here, I was clueless (but general unfamiliarity with Gen VI contributed).

I changed how the script functions from the original posting. I am no longer using scores, but a single likelihood value.
What this means is that it is easy to update the probability a team is optimal using Bayes' theorem:
probability a team is optimal given evidence = e^(ln(prior probability the team is optimal) + ln(how likely the evidence is to show up, given the team))
(derived from p(A|B) = p(A) * p(B|A)/p(B), for our purposes here)

This means, what I want help with here is:
-suggestions on what kinds of evidence I can pull from, among the available data
-help defining how likely that evidence is

To avoid underflow, I have been using natural log probabilities, rather than the more typical simple probabilities ranging from 0 to 1.

We initially assign equal probabilities to all teams. Currently, my script then performs the following updates:
1) For each of the 50 most used pokemon, it updates based on the probability the team can check/counter it without losing a single pokemon. An imaginary ideal team wouldn't lose pokemon, likelihood of losing 0 is natural to use.
2) For each of the 50 most used pokemon, it updates based on the probability the team can check/counter it without losing more than 1 pokemon. An imaginary ideal team wouldn't lose many pokemon, likelihood of losing <= 1 is natural to use.
3) For each of the 50 most used pokemon, it updates based on the probability the team can check/counter it without losing more than 2 pokemon. An imaginary ideal team wouldn't lose many pokemon, likelihood of losing <= 2 is natural to use.
4) For each of the 50 most used pokemon, it updates based on the probability that the most vulnerable pokemon on our team can avoid being checked/countered by it. An imaginary ideal team wouldn't too easily let its pokemon get checked/countered, so likelihood that 0 do is natural to use.
5) For each of the 50 most used pokemon, it updates based on the probability that the two most vulnerable pokemon on our team can avoid both being checked/countered by it. An imaginary ideal team wouldn't too easily let its pokemon get checked/countered, so likelihood that <= 1 do is natural to use.
6) For each of the 50 most used pokemon, it updates based on the probability that the three most vulnerable pokemon on our team can avoid both being checked/countered by it. An imaginary ideal team wouldn't too easily let its pokemon get checked/countered, so likelihood that <= 2 do is natural to use.
7) Good teams don't have >1 megas, so the likelihood that either 0 or 1 of the pokemon on the team do.
8) Good teams have stealth rock, so the likelihood that at least 1 pokemon on the team uses it is natural to use.

(Deciding how much weight to give each update is also important. I weigh each opponent's pokemon based on the relative frequency with which they're encountered at higher levels of play, and also give more weight to being able to deal with opposing pokemon sooner than later; I will likely update to stretch through the entire team, declining exponentially in weight for each extra pokemon like Tsaeb XIII suggested.)

Pokemon don't have to be mega, and lots of pokemon that can learn stealth rock don't necessarily want to. "7)" and "8)" thus naturally punish getting pokemon to run sub-optimal sets; eg a team of charizard and mawile is very unlikely to have 1 or less mega pokemon.

It is extremely easy for me to add extra evidence to update these hypothesis. The only requirement is that it is in the form of probabilities.
I am not sure how to do this with a team-mate grade. An example of something that would work: if we can define a role based on certain parameters, we can then assign probabilities to pokemon for their ability to successfully fit that role.

On this front, I first performed a KMeans cluster analysis on each of the top 50 pokemon simply using:
-Mean of (interactions between that pokemon and pokemon_other / (use of pokemon * use of pokemon_other) )
-Standard deviation of the above
-Mean interaction win
-Standard deviation of the above

Calling for 5 groups, I get:
Group 1 : ['Greninja', 'Aegislash', 'Rotom-Wash', 'Ferrothorn', 'Scizor', 'Tyranitar', 'Breloom', 'Skarmory', 'Bisharp', 'Latios', 'Thundurus', 'Gardevoir', 'Mamoswine', 'Infernape', 'Mandibuzz', 'Volcarona', 'Landorus', 'Goodra']
Group 2 : ['Talonflame', 'Azumarill', 'Excadrill', 'Espeon', 'Landorus-Therian', 'Clefable', 'Alakazam', 'Medicham', 'Cloyster', 'Deoxys-Speed', 'Vaporeon']
Group 3 : ['Scolipede', 'Smeargle', 'Deoxys-Defense']
Group 4 : ['Gliscor', 'Venusaur', 'Heatran', 'Conkeldurr', 'Sylveon', 'Keldeo', 'Manectric', 'Chansey', 'Sableye', 'Blissey']
Group 5 : ['Charizard', 'Garchomp', 'Dragonite', 'Gengar', 'Gyarados', 'Togekiss', 'Mawile', 'Pinsir']

We can vastly expand and alter the types of data we use for this, but...

...a discriminant function analysis would be far more useful.
In a discriminant function analysis (DFA) you give the data, and group-membership to the program.
As in, with a list of 50 pokemon and the groups they belong to, I can feed it that and a fat pile of data tied to the pokemon.
It can then be fed data on a pokemon, and spit out the probabilities that it belongs to each of the groups.

Of course, this leads to the question - how do we apply probabilities of group membership to team performance?
Are good teams likely to have certain patterns? If so, how do we define that?

We could do this for any sort of groups you define. Glue vs nonglue, sweeper vs tank vs wall vs supporter vs revenge killer, etc.
I think it is in this way that we may be able to give Deo-D some credit.
Hmmm.
OU Teambuilding
OU Pokemon Categorization Tread

Antar ,
Thanks.

Here's one way I think we can evaluate how successful your work is: we should take a bunch of real teams from the ladder and evaluate their "answering ability" and "vulnerability" scores and see how these metrics numbers correlate to, say, Elo/Glicko rating.

The function "likelihood" from my script will produce a likelihood rating for any given team.
It must simply be given the following argument
argument = [list_of_team_members, dictionary_of_pokemon_usages, unweighted_ou_pokemon_data, weighed_by_1825_ou_pokemon_data, total_uses_of_top_50_pokemon]

and it will return a likelihood.

Any two likelihoods can easily be compared; the higher the likelihood, the more likely the team is better (or so the script predicts).
e^(likelihood_of_team2 - likelihood_of_team1) = probability that team2 is better than team1.

I would be very interested in knowing how this correlates with actual ELO or Glicko ratings.

EDIT: Stryke , encounter matrixes would of course be interesting.
Overall however, your method of looking at the pokemon appears quite different from mine. How effective did you find it at producing suggestions/making improvements?

EDIT:
I attached the most recent update to my script. Here are the top few results it generated:
Team: ('Aegislash', 'Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 1.0
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Bisharp', 'Mamoswine') ; Relative Prob: 0.983756510516
Team: ('Aegislash', 'Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Mamoswine') ; Relative Prob: 0.966421036151
Team: ('Aegislash', 'Venusaur', 'Heatran', 'Conkeldurr', 'Latios', 'Mamoswine') ; Relative Prob: 0.956584751979
Team: ('Aegislash', 'Dragonite', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.955332494453
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Bisharp', 'Mamoswine') ; Relative Prob: 0.949232968402
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Sylveon', 'Mamoswine') ; Relative Prob: 0.94527844914
Team: ('Talonflame', 'Excadrill', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.940396927782
Team: ('Aegislash', 'Venusaur', 'Heatran', 'Conkeldurr', 'Thundurus', 'Mamoswine') ; Relative Prob: 0.936177628366
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.935832019792
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Sylveon', 'Mamoswine') ; Relative Prob: 0.935421186685
Team: ('Greninja', 'Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.933568583059
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Thundurus', 'Mamoswine') ; Relative Prob: 0.926194616035
Team: ('Aegislash', 'Talonflame', 'Gliscor', 'Venusaur', 'Heatran', 'Mamoswine') ; Relative Prob: 0.925466288184
Team: ('Talonflame', 'Dragonite', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.921427082901
Team: ('Talonflame', 'Gliscor', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.920732565763
Team: ('Aegislash', 'Venusaur', 'Heatran', 'Breloom', 'Latios', 'Mamoswine') ; Relative Prob: 0.920452416057
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Gyarados', 'Mamoswine') ; Relative Prob: 0.919947410468
Team: ('Aegislash', 'Dragonite', 'Venusaur', 'Heatran', 'Breloom', 'Mamoswine') ; Relative Prob: 0.919696042633
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.916807857623
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Venusaur', 'Heatran', 'Mamoswine') ; Relative Prob: 0.916292115793
Team: ('Talonflame', 'Heatran', 'Breloom', 'Conkeldurr', 'Mawile', 'Mamoswine') ; Relative Prob: 0.915735603015
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Sylveon', 'Mamoswine') ; Relative Prob: 0.915730367723
Team: ('Venusaur', 'Heatran', 'Conkeldurr', 'Bisharp', 'Latios', 'Mamoswine') ; Relative Prob: 0.913421932261
Team: ('Talonflame', 'Azumarill', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.912294518987
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Latios', 'Mamoswine') ; Relative Prob: 0.91217057109
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Heatran', 'Breloom', 'Mamoswine') ; Relative Prob: 0.909314541675
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Gardevoir', 'Mamoswine') ; Relative Prob: 0.909300729789

Note: these pokemon on the teams are simply in order of usage; if you're going to try a team, pick the most effective lead out of the six to fill that role.

Stryke · Jul 8, 2014

elrod

As for my scripts, I found it to be quite effective, not because of the algorithms used to judge the team (in that regard yours seems much more developed), but because of the sheer amount of data in the encounterMatrices (and the speed of access). If Antar decides to make them public, you will be able to vastly improve your script results.

I remember being quite impressed by the success of my thrown-together team at the team. In fact, many of the "top" teams included Mega Lucario/Genesect, or some combination, predicting the need for a ban.

Again, there is the problem of certain types of pokemon being favored by this script. For example, you may notice that in none of your top teams do any major wall pokemon appear, such as Chansey or Skarmory. There are several possible explanations for this, but the one that makes most sense to me is that offense pokemon appear more threatening, resulting in a higher probability of KO or switch out, and thus a higher value on the counter list.

I'm afraid that this kind of script will only be able to make anti-meta offensive teams (which is by no means a bad idea). What are your thoughts on this?

Side Note: The Heatran/Venusaur defensive core was highly favored by my script, and I can see that in the past 6 months it is still going strong. I also recognize many of the other pokemon listed on your teams as ones that were highly favored by my script.

toshimelonhead · Jul 8, 2014

For the Deoxys-D underweighting issue, you could add Spikes the same way you do for Stealth Rock. Would also help for stall, too, since stall relies on indirect forms of damage (hazards, status) that are difficult to pick up through just looking at Checks and Counters.

There is another way to approach data-driven team building. Since Antar lists stats for multiple ability levels, if you could take the difference between using a pokemon at a higher level and a lower level and scripting checks and counters from there, you would find Deoxys-D showing up in nearly every top team. This could also be another condition (good teams use pokemon that are more common at the 1825 level than at the 1695 level, for instance).

Arca10 · Jul 11, 2014

This is a very interesting project, elrod, and I've been thinking recently about trying something similar myself. I haven't had much time to look at your script in depth yet, but I will once I get home tonight. It certainly seems like it will favor balanced teams, which seem to be less popular lately. If that's because they (balanced teams) are less viable than hyper offense or stall, I'm not sure. If that is the case, then there obviously need to be some changes.

Also, I'm attaching an updated version of your script that works with Python 3 and up, in case anyone wants it.

Kiga · Jul 11, 2014

I...don't supposed that Python 3 happens to be compatible with CodeBlocks or C++, does it? Probably not, but C++ and MatLab are the only programing languages that I (somewhat) understand, so I'll probably have to find somewhere to download it. This is an interesting concept.

Data Data-Driven Team Building & Theorymon

elrod

Red Cat

Sanger Zonvolt

Boltaway

elrod

TheTabKey

JebusChrist

Valzy

Destroyer of Worlds

elrod

Pyritie

TAMAGO

Outrageous Fortune

Karpi

Valzy

Destroyer of Worlds

elrod

I AM THE JAMAICAN BOBSLED TEAM

formerly Smog Frog

elrod

Tsaeb XIII

Antar

Valzy

Destroyer of Worlds

Stryke

Attachments

elrod

Attachments

Stryke

toshimelonhead

Honey Badger don't care.

Arca10

Attachments

Kiga

Users Who Are Viewing This Thread (Users: 1, Guests: 0)