Programming A neural network predicting battle results ( in progress )

Currently it is in progress. (edit Jan.19th 2019)

Until now ( Jan. 19th 2019 ), It can only run on python + tensorflow, with few functions and limitations.
I'm going to make a standalone executable or something, but I don't know when will it be done.
This thread might be updated anytime xD

BAD News:
My accuracy calculation was totally messed up before. Actually I noticed that it was weird before but I didn't thought about it in that specific way...:
tl;dr training acc was messed up with validation acc, actually the model was totally overfitted and do nothing more than random guess.
Things goes wrong with tf.metrics.accuracy. It stores all historical data for computing the accuracy, it does NOT calculate accuracy only over a batch.
Thus, my training accuracy and validation accuracy was all averaged together.
I fixed that bug and found it was nothing more than random guess.
I didn't thought that way although I feel it was really weird. I double-checked all my code to ensure I don't make the validation set dirty, and I found nothing, then I thought everything was fine...
OMG.

Current progress:
I created a 2nd version of the model, using (one-hot) embedded vectors instead of name string.
I also introduced permutation invariance to the model by the method presented in [1].
Currently best accuracy was 61.9% on gen7ou (This should not have above problem...it should not).
I'm very glad to receive any advices from you xD (maybe I'll post on r/ml ... idk)

I'll upload some graphs later, probably.

Structure:
  1. one-hot vectors of embedded pokemon data as input "x"
  2. get latent vector for each pokemon through a network "P" (currently -128-dt-)
  3. combine 6 pokemon vectors to 1 single vector using sum or maxpool and get team vector "t"
  4. Concat 2 team vector and feed into a classifier network "B" ( p1 or p2 wins ) (currently -512-512-2-)
Notes.
  • The network overfits a lot.
  • maxpooling out-performs summation (avg-pooling). Probably because this is a classification? inspired by [2]
  • L1 or L2 regularization might not work ... ?
  • Best accuracy was achieved via a 0.7 - 0.5 dropout on network B (last 2 layers) with maxpooling.
  • Currently the structure varies a lot and lack for experiments so above is just for reference.

[1]. Zaheer, Manzil, et al. "Deep sets." Advances in Neural Information Processing Systems. 2017.
[2]. Qi C R, Su H, Mo K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation[J]. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, 1(2): 4.

Plans:
  • Increase data-set size & quality ( probably movesets, items, orders etc.? )
  • Any working trick.
    • I probably want to mask-out some pokemons to empty while training...? idk if this helps
    • any regularization ( l1 & l2 seems not working, I'll try orthogonal / spectral etc. but idk if it helps ... )
  • Balance the data-set? (appearance / usage freq.)
    • Currently I have no idea on how to do this.
  • Other structures
    • fc layers don't make me feel good though ...
-----------------------------------------------------------------------

I've trained a neural network predicting battle results, which could be understood as an AI predicting battle results.
It learned how to predict results from ( currently ) thousands of battles, in some specific tier.
The input is simple, some string contains pokemon names ( It might be impossible to get moves / spreads / items from replay ? ) , and output some win rate.

For details please see:
https://github.com/linkzeldagg/Pokemon-Showdown-Win-Rate-Prediction

sry for my poor English.
Thanks you reading this nonsense project > <
 
Last edited:
If i understand it correctly, you're training the net to make predictions on likelihood of winning solely based on the 6 pokemon on each team? sounds pretty cool :o. A few questions come to mind... (forgive me if some are silly, its late at night but this was cool enough that i wanted to look now :P)

a) what kind of model are u using? A quick look at the github seemed to indicate ur using multiple recurrent nets for different sections of data?
b) do you account for potential mega evolutions in team preview (i.e. if a team with both gyarados and charizard is used, would you be able to incorporate that into the data so that the net could learn to distinguish between which one is the more likely mega and improve its prediction accordingly)
c) for VGC, since the actual matches are 4v4, how do you account for this in training and then mapping it over to testing? perhaps this accounting for lower accuracy in VGC?
d) Eo Ut Mortus has a tool that would allow you to get revealed moves from replays (but not items, abilities, etc.). Do you think it would be possible to incorporate this into your model somehow? A lot of move data would be incomplete (not all 4 attacks revealed). As a side project that could maybe be useful in a more complex model, maybe you could use GANs to fill in the gaps of incomplete movesets with reasonable predictions?
 
If i understand it correctly, you're training the net to make predictions on likelihood of winning solely based on the 6 pokemon on each team? sounds pretty cool :o. A few questions come to mind... (forgive me if some are silly, its late at night but this was cool enough that i wanted to look now :P)

a) what kind of model are u using? A quick look at the github seemed to indicate ur using multiple recurrent nets for different sections of data?
b) do you account for potential mega evolutions in team preview (i.e. if a team with both gyarados and charizard is used, would you be able to incorporate that into the data so that the net could learn to distinguish between which one is the more likely mega and improve its prediction accordingly)
c) for VGC, since the actual matches are 4v4, how do you account for this in training and then mapping it over to testing? perhaps this accounting for lower accuracy in VGC?
d) Eo Ut Mortus has a tool that would allow you to get revealed moves from replays (but not items, abilities, etc.). Do you think it would be possible to incorporate this into your model somehow? A lot of move data would be incomplete (not all 4 attacks revealed). As a side project that could maybe be useful in a more complex model, maybe you could use GANs to fill in the gaps of incomplete movesets with reasonable predictions?
Thanks! This net just does the things as you said.

a) I used a sort of hybrid model, contains some RNNs to make with the raw string input, followed with some MLPs on different hierarchies, such as pokemon-level, team-level & battle-level.

b) I have no idea with that. Currently, since I got the data from showdown replays, the data does not show which pokemon is going to mega evolve ( pokemon names do not contain "-Mega"s ). The net might know who's going to mega evolve, but it was a black box, which is hard to open and see what happens. I'm also interested in that and want to figure it out lol

c) Yes, this could also be reasons for a lower accuracy for VGC battles. Currently in my net, VGC battles also has the same input ( 6 pokemon each team ) as OUs, and the net does not know we should select 4 out of 6. It was just treated as the same problem: 6x2 strings -> likelihood.

d) Wow thanks you for telling me this tool! I did not know that before. I will check it.
 
Thanks! This net just does the things as you said.

a) I used a sort of hybrid model, contains some RNNs to make with the raw string input, followed with some MLPs on different hierarchies, such as pokemon-level, team-level & battle-level.

b) I have no idea with that. Currently, since I got the data from showdown replays, the data does not show which pokemon is going to mega evolve ( pokemon names do not contain "-Mega"s ). The net might know who's going to mega evolve, but it was a black box, which is hard to open and see what happens. I'm also interested in that and want to figure it out lol

c) Yes, this could also be reasons for a lower accuracy for VGC battles. Currently in my net, VGC battles also has the same input ( 6 pokemon each team ) as OUs, and the net does not know we should select 4 out of 6. It was just treated as the same problem: 6x2 strings -> likelihood.

d) Wow thanks you for telling me this tool! I did not know that before. I will check it.
for b), the net probably inherently sees lopunny and mega lopunny, for example, as the same, in the sense that it doesn't know the difference. the only issue arises when some teams have multiple mega evo candidates, which I'm guessing would lead to the net predicting higher odds of winning than it should. Maybe some sort of pre-processing of data to make megas 'explicit' might be able to make a difference.

for c), i think it would be really interesting to see if we can modify it to understand the pick 4 best of 3 format. Perhaps you could train one model on 4v4 results and another one which determines a probability distribution of which 4 get picked from 6 v 6, and then combine those to make predictions when fed a 6 v 6 lineup, while accounting for bo3. (or bo1 if ur basing it on showdown data, not real life vgc matches, due to the existence of data sets)

link to the tool in case you didn't find it: http://replaystats-eo.herokuapp.com/scouter/. Also, what datasets are you using? Is it all based on showdown ladder? are you filtering for player ELO?
 
for b), the net probably inherently sees lopunny and mega lopunny, for example, as the same, in the sense that it doesn't know the difference. the only issue arises when some teams have multiple mega evo candidates, which I'm guessing would lead to the net predicting higher odds of winning than it should. Maybe some sort of pre-processing of data to make megas 'explicit' might be able to make a difference.

for c), i think it would be really interesting to see if we can modify it to understand the pick 4 best of 3 format. Perhaps you could train one model on 4v4 results and another one which determines a probability distribution of which 4 get picked from 6 v 6, and then combine those to make predictions when fed a 6 v 6 lineup, while accounting for bo3. (or bo1 if ur basing it on showdown data, not real life vgc matches, due to the existence of data sets)

link to the tool in case you didn't find it: http://replaystats-eo.herokuapp.com/scouter/. Also, what datasets are you using? Is it all based on showdown ladder? are you filtering for player ELO?
I'm using datasets all based on showdown replays, regardless of player elo. In order to get which pokemon is going to be mega-evolved, or which pokemon is going to be selected in the battle ( for vgc format etc ), the replay log needs to be analyzed in detail. That's kinda tricky and need time to done that xD

I have no idea about how to collect thousands of data from vgc matches. All by hand? sounds impossible to me lol

I gonna try my best opening the black box a little bit and see what does it learn now. I don't know if the net has the ability to predict mega pokemons based on the entire team ( although I guess it cannot ). Also, for a given pokemon, which pokemon(s) does the net thought that they are similar? Does it thinks like us?

There're many works left to improve this project, so it is still in progress, and I'll update it if I done anything interesting.
Very welcome for any questions and suggestions xD
 
for b), the net probably inherently sees lopunny and mega lopunny, for example, as the same, in the sense that it doesn't know the difference. the only issue arises when some teams have multiple mega evo candidates, which I'm guessing would lead to the net predicting higher odds of winning than it should. Maybe some sort of pre-processing of data to make megas 'explicit' might be able to make a difference.

for c), i think it would be really interesting to see if we can modify it to understand the pick 4 best of 3 format. Perhaps you could train one model on 4v4 results and another one which determines a probability distribution of which 4 get picked from 6 v 6, and then combine those to make predictions when fed a 6 v 6 lineup, while accounting for bo3. (or bo1 if ur basing it on showdown data, not real life vgc matches, due to the existence of data sets)

link to the tool in case you didn't find it: http://replaystats-eo.herokuapp.com/scouter/. Also, what datasets are you using? Is it all based on showdown ladder? are you filtering for player ELO?
That is an amazing tool. I'm wondering can I use it as some kind of REST API? ( well I will ask the author in that thread lol )
thxxx
 

Mr. Uncompetitive

What makes us human?
is a Contributor Alumnus
ahhh i hate getting reminded that I need to do more personal project work (I've been planning out something in a similar vein to this, but I need to actually make time to sit down and learn tensorflow/basic NN use)

anyways, this a really cool idea. I've scraped for replay data in the past, and I can affirm that you need to do a LOT of parsing to grab moves and items, and parse the actual turn-by-turn events tie stuff into a calc if you wanted to estimate EVs/IVs/Nature. And even after all of that, you can't find out everything from the replay. I imagine Antar, the guy who does the usage stats, has some method for keeping track of the teams that enter each battle but if it exists it's not publicly available as far as I know (would love to get my hands on it though...)

I'm not as comfortable with Neural Nets as I'd like to be, but one thing that does seem a bit odd from checking your Github, is every character of a Pokemon's name being inputted as a node? I'd imagine that might be trouble if the NN is considering each individual character that goes into it rather than a whole Pokemon string.
 
ahhh i hate getting reminded that I need to do more personal project work (I've been planning out something in a similar vein to this, but I need to actually make time to sit down and learn tensorflow/basic NN use)

anyways, this a really cool idea. I've scraped for replay data in the past, and I can affirm that you need to do a LOT of parsing to grab moves and items, and parse the actual turn-by-turn events tie stuff into a calc if you wanted to estimate EVs/IVs/Nature. And even after all of that, you can't find out everything from the replay. I imagine Antar, the guy who does the usage stats, has some method for keeping track of the teams that enter each battle but if it exists it's not publicly available as far as I know (would love to get my hands on it though...)

I'm not as comfortable with Neural Nets as I'd like to be, but one thing that does seem a bit odd from checking your Github, is every character of a Pokemon's name being inputted as a node? I'd imagine that might be trouble if the NN is considering each individual character that goes into it rather than a whole Pokemon string.
lol if you want we can do it together xD

Yes, there must be something that is not publicly available, and the replay data only contains a limited subset of actual team data. Calculate EVs from replay data sounds ... like a horrible amount of work. Pokemon moves might be okay but abilities ( since you should check it from lots of different varient ability effects ), EVs etc. makes me feel tired and actually I don't want do that lol

------

In order to input a string to a neural network, that is, a string composed from characters with variable length, something like a RNN structure ( or 1D CNN or other structures is also okay but I didn't choose it ) may should be used, or the network just cannot accept variable length strings. Also for simplicity, I directly used Pokemon names instead of something like pokedex IDs. Since its difficult ( for me, I'm lazy (x) ) to obtain pokedex ID from their names ( with many "-Therian, -Wash, -East" etc ).
 

Kalalokki

is a Site Content Manageris a Top Social Media Contributoris an Artistis a Member of Senior Staffis a Community Contributoris a Smogon Discord Contributoris a Pokemon Researcheris a Top Smogon Media Contributoris an Administrator Alumnusis a Battle Simulator Moderator Alumnus
Sprite Leader
I've had Marty extract full team data by just linking him a replay a couple of years ago, used it to get the info for an article I was writing, so either it's possible to extract everything or that he could find where that data was located by just knowing the replay number. Either way, I guess you could ask him how it's done, but I'm guessing it's admin/sysop access that's needed to do it anyway.
 
finding EVs seems not worthwhile. finding items sounds like some extra work for sure, but possibly useful if it is being combined with moveset data and another network is filling in unrevealed moves with reasonable predictions to make full sets for pokemon.

When I say the net is probably inherently valuing pokemon as its mega evolution, I say this because in most cases, a pokemon that can mega evolve does (ttar & gyarados are notable exceptions where both mega and non-mega are used for different things, as well as stuff like alakazam + another potential mega, where the zam may be LO even though mega is generally superior to allow for the other mega). Thus, when it's training, that pokemon will bring the value of its mega form to the result, and so the net will never see 'lopunny' as PU level pokemon or w/e, it will always see its value as that of mega lopunny, although it doesn't know the difference. The net will just think lopunny IS mega lopunny.

That's my hunch, at least. Opening the black box can definitely lead to some better insight on what's going on. betairya any thoughts on changing the training process for VGC matches?
 
finding EVs seems not worthwhile. finding items sounds like some extra work for sure, but possibly useful if it is being combined with moveset data and another network is filling in unrevealed moves with reasonable predictions to make full sets for pokemon.

When I say the net is probably inherently valuing pokemon as its mega evolution, I say this because in most cases, a pokemon that can mega evolve does (ttar & gyarados are notable exceptions where both mega and non-mega are used for different things, as well as stuff like alakazam + another potential mega, where the zam may be LO even though mega is generally superior to allow for the other mega). Thus, when it's training, that pokemon will bring the value of its mega form to the result, and so the net will never see 'lopunny' as PU level pokemon or w/e, it will always see its value as that of mega lopunny, although it doesn't know the difference. The net will just think lopunny IS mega lopunny.

That's my hunch, at least. Opening the black box can definitely lead to some better insight on what's going on. betairya any thoughts on changing the training process for VGC matches?
Yeah, I think the net does think "lopunny" as "mega-lopunny". However, this net was trained regard to a specific tier ( to say, OU ), so I think if it was trained on PU or sth then it will see lopunny as normal lopunny. Well mega and non-megas should be considered though xD

I still don't have a clear idea on how to bring 4 of 6 rule ( VGC ) and "what is doubles and singles" to the network. I may get player selections from replays, and may have another network to predict which pokemon will be selected in order to predict the final result, and may combined with some monte-carlo methods to improve the accurancy ( it may not predict the right pokemon to be selected ) ? I'm still not very clear about that, but I will still try them for my best xD

What I want, but may never become real, is a kind of "super network" that don't know the rules ( 4o6 / Doubles / etc ) as a prior, but can learn the rules from raw data. This just sounds impossible, I know lol

Sorry for the late reply, I am currently still a student and I am using my rest time working on this ( That makes me so happy lol ) so maybe sometimes I cannot reply you all for a day or something xD
 
I've had Marty extract full team data by just linking him a replay a couple of years ago, used it to get the info for an article I was writing, so either it's possible to extract everything or that he could find where that data was located by just knowing the replay number. Either way, I guess you could ask him how it's done, but I'm guessing it's admin/sysop access that's needed to do it anyway.
Is he or she named Marty ? Could you post something like userid or username or something to search with ? Thanks very much xD
 

Kalalokki

is a Site Content Manageris a Top Social Media Contributoris an Artistis a Member of Senior Staffis a Community Contributoris a Smogon Discord Contributoris a Pokemon Researcheris a Top Smogon Media Contributoris an Administrator Alumnusis a Battle Simulator Moderator Alumnus
Sprite Leader
Is he or she named Marty ? Could you post something like userid or username or something to search with ? Thanks very much xD
Their username is Marty. I just tagged them now so they will see this the next time they're online and can answer you themselves.
 

Marty

Always more to find
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Member of Senior Staffis a Community Contributoris a Top Researcheris a Top Tiering Contributor
Research Leader
Is he or she named Marty ? Could you post something like userid or username or something to search with ? Thanks very much xD
I don't really remember the situation Kalalokki is referring to; getting entire team info from a replay has only been possible for maybe 2-3 months now, and it's really only designed for global admins to use in case of bugs because it allows access to sensitive information. Maybe the battle hadn't expired yet and I got to it live on the server?

Anyway, I'm not entirely sure what you need for this project but Antar has a "Data Grant" program he may or may not still be running if a large amount of battle logs is what you're after. Feel free to follow his instructions here to participate: https://www.smogon.com/forums/posts/6317488/
 

toshimelonhead

Honey Badger don't care.
is a Tiering Contributor
I was planning on doing something similar, but using simpler models. It would be interesting to see the results from a Naive Bayes or Random Forest model, for example. I downloaded your OU file and I'll see what I can do with it.
 
Last edited:
Good work, I just have a couple of suggestions:

1.) You want to use a perutation invariant structure for your network, since the order of the pokemon doesn't matter.

eg. a teamsheet of the following:
Code:
1. Abra
2. Beldum
3. Chansey
4. Drapion
5. Exeggutor
6. Finneon
is the same as
Code:
1. Finneon
2. Beldum
3. Drapion
4. Chansey
5. Exeggutor
6. Abra
But your network will consider those cases completely separately, which is hugely inefficient. A quick hack would be to first order the teamsheet alphabetically, but this doesn't really completely solve the issue. Unfortuately there's not much on the web about permuation-invariant networks, but essentially if you just apply a MLP to each input (with shared weights), pool all the results using mean pooling, and then stick a final MLP on the output, that's usually enough. If you want more details, let me know and I can try and explain it a bit better (or you can try and decipher this paper).

2.) You really want to use a one-hot encoding, or learned embedding (if you use deep neural network they end up doing the same thing) for representing each pokemon. 'Vulpix' and 'Vulllaby' might start with the same three letters, but they're completely separate pokemon. All the RNN ends up doing is make your network horrendously more complicated than it needs to be.
 
Good work, I just have a couple of suggestions:

1.) You want to use a perutation invariant structure for your network, since the order of the pokemon doesn't matter.

eg. a teamsheet of the following:
Code:
1. Abra
2. Beldum
3. Chansey
4. Drapion
5. Exeggutor
6. Finneon
is the same as
Code:
1. Finneon
2. Beldum
3. Drapion
4. Chansey
5. Exeggutor
6. Abra
But your network will consider those cases completely separately, which is hugely inefficient. A quick hack would be to first order the teamsheet alphabetically, but this doesn't really completely solve the issue. Unfortuately there's not much on the web about permuation-invariant networks, but essentially if you just apply a MLP to each input (with shared weights), pool all the results using mean pooling, and then stick a final MLP on the output, that's usually enough. If you want more details, let me know and I can try and explain it a bit better (or you can try and decipher this paper).

2.) You really want to use a one-hot encoding, or learned embedding (if you use deep neural network they end up doing the same thing) for representing each pokemon. 'Vulpix' and 'Vulllaby' might start with the same three letters, but they're completely separate pokemon. All the RNN ends up doing is make your network horrendously more complicated than it needs to be.
Thanks a lot for those information. Actually I have already read that paper before ( but after I post this thread ) and many paper about permuation-invariant & equivalent networks ( That was also related to my research in my collage lol ), so I was going to make something like sum or maxpool or avgpool or sth to achieve such permuation-invariance. For (2), It just sounds too complicated to hard-code many words ( including Rotom-XXX Rotom-YYY Rotom-ZZZ etc ) to some one-hot embeddings, and I don't want to do that at the time I posted this xD. Although currently I figured out that I could actually get some list from smogon usage ranking and use that to do a simple embedding. Thanks for your advices again !

I do apologize for that there are no updates during a long time > < ( and my poor english (
 
What project are you working on? I have been doing a lot with permutation-invariant networks myself.

The embedding shouldn't be too hard to do. Just trawl through your dataset, and for each 'Mon, add it to a dictionary. Then you can just assign a cardinal number to each Mon based on its place in the dictionary. There's in-built.

I might give this a go myself later.
 
What project are you working on? I have been doing a lot with permutation-invariant networks myself.

The embedding shouldn't be too hard to do. Just trawl through your dataset, and for each 'Mon, add it to a dictionary. Then you can just assign a cardinal number to each Mon based on its place in the dictionary. There's in-built.

I might give this a go myself later.
Basically saying, to predict simple motion dynamics among particles. So particles here is permutation-invariant.
I'll try embedding this week xD
 
What project are you working on? I have been doing a lot with permutation-invariant networks myself.

The embedding shouldn't be too hard to do. Just trawl through your dataset, and for each 'Mon, add it to a dictionary. Then you can just assign a cardinal number to each Mon based on its place in the dictionary. There's in-built.

I might give this a go myself later.
Actually I am still confusing on embedding. I think it's difficult to convert pokemon name to one-hot vectors, they are just so big (in dimensions). Although it might be okay to do so, it sounds so bad. Imagine a 800-d one-hot vector. It's not necessary to take every pokemon in to that count, but it still...sounds bad. If they can be converted to some vectors (maybe around 64-d or something) it sounds great. However I cannot figure out a way to do so.
 
What project are you working on? I have been doing a lot with permutation-invariant networks myself.

The embedding shouldn't be too hard to do. Just trawl through your dataset, and for each 'Mon, add it to a dictionary. Then you can just assign a cardinal number to each Mon based on its place in the dictionary. There's in-built.

I might give this a go myself later.
I've done the embedding and it works now. Actually my previous version has a serious bug (well it is a feature ]=) so things just gonna start from the very beginning.
I believe embedding made the network better from random guess. (I have updated my post at #1)
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top