Programming Pokemon Battle Predictor: A Working Machine Learning Browser Extension

Hello,

Being stuck inside had me bored, so back in April I restarted a project I dabbled in last August that tried to use machine learning to predict who will win a Pokemon battle. Over time I realized you could do more and more with machine learning, so eventually the project expanded to predict what players will do. And after a couple of months, I ended up with a few really good working models that I'm releasing (hopefully) today in a Chrome extension known as Pokemon Battle Predictor!

EDIT: The extension is live!! You can install it from the extension store for the following browsers:

Promo_title_Large.png

On the surface, Pokemon Battle Predictor uses 4 TensorFlow.js models trained on 10,000+ battles to tell you the current probability of:
  • Who will win the battle
  • Your opponent switching out or choosing a move
  • Which move they will use if they stay in
  • Which Pokemon they will switch to if they switch
Here is a sample of what it looks like while using the extension:
Predictor2.png


The chance of the player to win is listed in the battle log after every turn. Key word here is chance, as there is a difference between trying to predict what will happen next and the chance of something happening. The difference is the former is judged by the accuracy of each prediction while the latter is judged by whether the outputs of a specific chance are accurate "that chance" of the time. I went for predicting chance as this is way more useful for any kind of game and this one in particular is way too random to find anything but chance.

How does it work?

All the machine learning models start with the same data: battle replays. I downloaded all the replays for gen8ou battles from June 3rd to June 13th and kept the ones where both players have a rating of 1250 or higher. The dates were chosen to reflect the most recent meta-game (more on that later) and the rating was kinda arbitrary as I wanted to make sure the battlers knew what they were doing but also wanted to keep as much data as possible. All the models also have the same inputs which are taken for each turn of each battle:
  • Each Pokemon's current HP
  • Each Pokemon's statuses
  • Which Pokemon are in on either side
  • Stat boosts on either side
  • Last used move on either side
  • The volatile and side effects on either side
  • Weather and pseudo-weather active
  • The "Switch Coefficient" for the Pokemon who are in
    • How often one Pokemon switches out when then the opposing Pokemon is also in
In total, that leads to 6815 attributes. Yes, seemingly necessary attributes like items and types are not included, but that's because that information is mostly consistent across a Pokemon species within a meta-game, so just knowing a specific Pokemon is present does a good job of encapsulating those ideas. The outputs for each model are where things diverge. For predicting the chance to win and whether a move or switch will occur, the outputs were trained on an equal number of their 2 possible outcomes (player 1 or player 2 winning and switching or moving respectively) so the model returns a signal number representing the chance player 1 will win or the chance they will switch. The biggest difference in how they are trained is the chance of winning only looks at turns more that are more than 20% of the way through the battle.

Predicting who will be switched in requires first training a model where the training output is a list of all Pokemon where the Pokemon that switched in is marked as the correct answer. After that is trained, a layer is added on the end of that with the same set of outputs so the model can learn which Pokemon are brought in under similar situations. For example, the first layer may only give Seismitoad a large chance to be switched in, but the second layer has learned a high chance for Seismitoad should also mean a high chance for Gastrodon and Quagsire as well.

For predicting what move they will use, the model was trained where it would predict the chance of every move being used, then base on which Pokemon is in, would multiply those chances by 1 if the move has been used by that Pokemon in this battle before or the usage percent for that from the most recent moveset usage stats. This is done to both teach the model what moves a Pokemon can use and the likelihood of the Pokemon having the learnable move.

I know that wasn't a clear explanation, so if you want to learn more I can answer any questions you have. I just want to get this out there now so clarity wasn't top of mind.

How well does it work?

The models were tested using a separate set of data to see how often results of specific chance are correct and the overall accuracy of the model.

Each model has a 95% confidence interval of 8%, which means the chance it gives for something to happen is within 8% on either side of the real chance. For a game and players as random as it's trying to predict without full information on either team, those are very good results and means the chances the model returns are accurate. I tried a whole bunch of different methods (which to spare this post being an essay I won't talk about now) and getting that confidence any lower will be an ordeal.

And even though I only cared about finding the chance, the models are pretty accurate too. For any given turn in a battle, the chance to win is 67% accurate (with that number increasing the further into the battle you go). All the other models have an accuracy of ~65%. However, from running the Chrome extension on battles the last couple of days, it feels like the prediction of who will switch in is either always right or always wrong for a given battle, but that's just anecdotal evidence.

The biggest caveat to all of this has to do with how this reflects the current meta-game. First off, it was made to work with how people play in OU in mind and nothing else, so it shouldn't be used on other tiers. It might work fine in UU and decent in RU, but anything else would just be luck. It's very easy for me to make models for the other tiers as all I'd need to do is download the replays, but because of the royal pain that is getting replays more than a day old, I'm not going to do that soon. The reason I'm waiting is the obvious other downfall of this: DLC is about to change everything. That does mean the extension as is now will not work once all the DLC is added and may take a bit before the meta-game is stable enough to predict again. That's why I'm launching my extension now (or as soon as the Chrome store is done reviewing it) so people can use it and see what they think before I have to wait a month to update it.

What comes next?

So obviously my next big objective is making it work post DLC, but in the meantime I'll probably get it to work on the National Dex meta-game. Beyond that, my next goals are:
  • Get it to work with double battles (probably for VGC rules)
  • Add other meta-games
  • Make it agnostic to a meta-game / specific Pokemon
That last task is a doozy as instead of having all the Pokemon names as attributes, the attributes of being 6 sets of stats, types, abilities, etc. This would allow training on battle on all the different meta-games and hopefully create a battle predictor that can work on any battle. The only reason I didn't do this in the first place is my computer would explode with the amount of data that would use for training. Another thing I'd want to do is add the chance to win model to showdown itself because running on a server that knows everything about each team would make it way more accurate, but that's only if the people running the website would like it.

I first found this forum a few days ago when looking for a way to tell people about this and only while writing this noticed a few people have tried this before. If I'd seen that any earlier, I would've helped them, but alas I'm open to talking to people about helping on this project.

Once it's available on the Chrome Store, I'll post here again. But if you just look up "Pokemon Battle Predictor" in the Chrome Store in a couple of hours, it should be there.

And one more thing: You might think to yourself "if you can find the chance to win for any turn and predict your opponent's next move, couldn't you also use this to make a good Battle AI?". Yes, yes you could, and I only know that because I did, but I'll talk more about that later.
 

Attachments

Last edited:
This is super neat. I haven't yet read into the details of your work, but I had a few questions right off the bat.

1. Will the extension work on replays, and have you tried evaluating your results on tournament replays? I think it would be quite interesting to see how your model analyzes tournament matches, compared to the general opinion of the smogon community, and the actual outcome. It could be nice for a practical proof-of-concept. For example, I would love to see the results on my OST matches.
2. Is it possible to get the extension on other browsers? If this is not too hard to do, I'd love to have a firefox version. If not I don't mind using chrome a little to mess with this. :)
3. Is your code open source? I would love to dig deeper in how exactly you contructed and trained your models.
 
This is super neat. I haven't yet read into the details of your work, but I had a few questions right off the bat.

1. Will the extension work on replays, and have you tried evaluating your results on tournament replays? I think it would be quite interesting to see how your model analyzes tournament matches, compared to the general opinion of the smogon community, and the actual outcome. It could be nice for a practical proof-of-concept. For example, I would love to see the results on my OST matches.
2. Is it possible to get the extension on other browsers? If this is not too hard to do, I'd love to have a firefox version. If not I don't mind using chrome a little to mess with this. :)
3. Is your code open source? I would love to dig deeper in how exactly you contructed and trained your models.
Thanks for the support!

1. It does work on replays, but only the chance to win shows up. The way I have it working is it calculates the chance to win at the end of every turn while the other 3 calculations are done whenever there's a pause in the battle. This means win percentage will show up in the log as if the battle was live. I also couldn't figure our where I'd want to put the info on what the opponent's next move is on the replay page, but there's nothing fundamentally stopping all the data from being calculated for replays.

As for if it works on tournaments: I don't see why not. I do know I didn't train it on any tournament battles as I only wanted battles where the replay file could confirm the players' ratings, but as long as tournaments have identical rules and generally the same sets as regular play, it should work all the same. (I haven't participated in tournaments before so I don't know much about them).

2. This was my first browser extension I've made, so I have no idea what making one Firefox is like. However, there's nothing that requires using Chrome in the extension itself, so if it's easy to make a Firefox version then might as well.

3. Right now, no. I've been going back and forth on whether I want to make my repo public. But as I figure that out, I'd be happy to answer any questions you have and, once the extension is done being reviewed the code in there gives a very good idea about how the model works.
 
2. Is it possible to get the extension on other browsers? If this is not too hard to do, I'd love to have a firefox version. If not I don't mind using chrome a little to mess with this. :)
Actually, I just looked into it and I don't have to change a thing to make it work with Firefox; I have it running on my computer right now! I'll submit it to their add-on store too.
 

marilli

With you
is a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Past SPL Championis a Former Other Tournament Circuit Champion
Good work so far! I tried playing some games with the add-on active. Some minor issues it seems but can't be perfect I suppose.

I've been getting poor win % calculations on 9/10 the games I've played so far. Granted they can simply be considered "anecdotal evidence" but most of them seemed like pretty trivial errors, where I'm clearly ahead and about to win, but the predictor gives 90%+ chance of win to my opponent instead or something random.


Surprised to find that it still hasn't picked up on last Pokemon literally being unable to switch out, but I guess in OU forfeits happen and the last Pokemon isn't active for too many turns so getting them wrong doesn't punish your model too hard. Hopefully this behavior is trained out for VGC model when a single KO means the option to switch both Pokemon out is removed.


Cheers,

edit: just had a game where it had predictions flipped, I disconnected, and I reconnected back and the predictions flipped back. I don't know much about extensions and firefox plug-ins, but it seems like its either an issue there or an issue on my end.
 
Last edited:
Good work so far! I tried playing some games with the add-on active. Some minor issues it seems but can't be perfect I suppose.

I've been getting poor win % calculations on 9/10 the games I've played so far. Granted they can simply be considered "anecdotal evidence" but most of them seemed like pretty trivial errors, where I'm clearly ahead and about to win, but the predictor gives 90%+ chance of win to my opponent instead or something random.


Surprised to find that it still hasn't picked up on last Pokemon literally being unable to switch out, but I guess in OU forfeits happen and the last Pokemon isn't active for too many turns so getting them wrong doesn't punish your model too hard. Hopefully this behavior is trained out for VGC model when a single KO means the option to switch both Pokemon out is removed.


Cheers,
Thanks for the feedback! Yeah, the model favors lower percent chances of winning as doing so makes the chance more accurate. That does mean it gives unreasonably low values sometimes for clear victories, but that's the trade off I'm currently taking. The model does factor in how many mons are left, so that shouldn't happen too often.

As for the flipped results, a few people have mentioned that happens occasionally and I think it is a problem with how I coded the extension itself. I'm looking into fixing it!
 
I was also noticing some very odd behavior with the win% chance (this was the only aspect I was looking at, since I was using it on replays).

For what its worth, I quickly checked just now and the win % predictions I got today are different from the results I was getting yesterday. This seems like a red flag, unless you made some changes to that calculation during that time.

Apart from that anomaly, I've noticed strange behaviors in general. Sometimes I would land a crucial KO, the opponent would bring in their best response to my current pokemon, and my win % chance would go down. This behavior seemed to happen throughout most of the matches I looked at. Another issue is sometimes one person's win % will randomly start climbing, and then get stuck at >99% for the rest of the match. This often happened in slower matches where my opponent was using stally pokemon like gastrodon/toxapex. Other times I would be outright losing, but my win% would stay quite high even after I keep sending in pokemon to be sacked off to end the game. I was trying to figure out how your model was coming to its predictions, but it was quite hard to interpret.
 
I was also noticing some very odd behavior with the win% chance (this was the only aspect I was looking at, since I was using it on replays).

For what its worth, I quickly checked just now and the win % predictions I got today are different from the results I was getting yesterday. This seems like a red flag, unless you made some changes to that calculation during that time.

Apart from that anomaly, I've noticed strange behaviors in general. Sometimes I would land a crucial KO, the opponent would bring in their best response to my current pokemon, and my win % chance would go down. This behavior seemed to happen throughout most of the matches I looked at. Another issue is sometimes one person's win % will randomly start climbing, and then get stuck at >99% for the rest of the match. This often happened in slower matches where my opponent was using stally pokemon like gastrodon/toxapex. Other times I would be outright losing, but my win% would stay quite high even after I keep sending in pokemon to be sacked off to end the game. I was trying to figure out how your model was coming to its predictions, but it was quite hard to interpret.
There's a glitch currently with longer battles increasing the chance to win drastically, which I've made a fix for and an update is being pushed out now. Getting different win values on different tries... I'll have to look more into that, but thanks for letting me know!
 
Updated Version Released:
I just pushed an update to the store for the following
  • Fixed player win chance being assigned to the wrong player
  • Fixed win percentage increasing too much for longer battles
Just go to your browser's extension settings, click on the gear, then choose "Check for Updates" to get it!
1592346977798.png
 
New bug I've noticed: if I switch sides in a replay, it will change the player it assigns the current win % to and maintain that 'swapping' of the players.
 
Big Update:

I'm happy to announce it's finally available to download for Google Chrome, and you can find it here! That version and the Firefox now have the following features
  • Support for predicting gens 1-7 formats added
    • Predicting moves and switches limited gen 7 formats
  • Options to show/hide specific predictions
  • Fixed win percentage not adjusting when switching sides
  • General bug fixes (mostly consisting of those people talked about above)
Also, I now have a website, https://www.pokemonbattlepredictor.com/, that goes into more detail about how to project works, what features are in the works, and more. I will keep it up to date with the current extension version, but once you download the extension it will update you whenever a new feature is added as well.

I did remove support for gen 8 battles for now as DLC made it even less accurate than I anticipated. I'll add support back in a few weeks when the meta-game is changing as much. Along those lines, the battle AI using these models has been tested and works on the older gens, but I decided to wait until the new gen 8 models are made before I fully announce it, let other users battle against it, etc.
 
Last edited:

GMars

It's ya boy GEEEEEEEEMARS
is a Site Content Manager Alumnusis a Battle Simulator Admin Alumnusis a Social Media Contributor Alumnusis a Senior Staff Member Alumnusis a Community Contributor Alumnusis a Top CAP Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnus
Have you considered weighting your training based on player ranking through elo or gxe to allow you to alter chances based on the elo/gxe of the opponent you're currently facing, rather than lumping all replays above 1250 elo together? I'd be very interested to be able to have a switch to look at different elo brackets (1250, 1500, 1750 for example) and see how that changes move chance based off of assigning more weight to the replays in the respective band of elo.
 
Have you considered weighting your training based on player ranking through elo or gxe to allow you to alter chances based on the elo/gxe of the opponent you're currently facing, rather than lumping all replays above 1250 elo together? I'd be very interested to be able to have a switch to look at different elo brackets (1250, 1500, 1750 for example) and see how that changes move chance based off of assigning more weight to the replays in the respective band of elo.
Yes, I have thought about it and really want to do it! You're right on the money of the biggest issue being not having enough battles of various elo brackets to make different models for them. I'm not sure who I'd reach out to about this, but I'd love to talk to whoever runs the replay database so I can access battled directly and hopefully get enough battles to do just that.
 
Update: Gen 8 OU Predictions

The meta-game still might be all over then place, but I decided to bite the bullet and create models for gen8ou battles! I was able to collect the same amount of battles I had for the original model and got results I was fine with releasing. Predicting the next switch in, well, it exists... but the other 3 models works as well as the best of them. The update is already available on Firefox and will be out in the next few hours on Chrome.

I know I said last time I'd talk about the battle AI once the gen 8 models are out, which I guess would be right now. But after testing it on the gen7ou ladder for past few days I realized 1) my team building skills leave much to be desired and 2) models based on more varying data are needed for it to play smarter. Its current play is almost indistinguishable from a human player, but a mediocre human player at best. Good news is not only will the data problem be solved soon, but I came up of a new way to make the models which in theory will increase all of their accuracy several times.

So, I'm saving the release for next time (about a week or two from now) when Future Sight AI is ready for battle.
 
Big Update: Massive Accuracy Bump + 20 more formats

Whatever you thought about how well the predictor worked before, forget it: A lot has changed, and all for the better. The inputs into the model have been completely reworked to factor in more complex relationships between the elements of a battle, leading to each model's size reducing by 50%. But, the biggest impact is how it changed the accuracy. All 84 models that are in this update have a accuracy of at least 75%, with most of hitting 85%, and some even reaching 95%! This is far better than what was achieved before, with the chance to win models seeing the smallest bump with ~10% greater average accuracy and the move choice models almost doubling performance with ~40% greater average accuracy.

The idea that this means a computer program can correctly tell you what your opponent is going to do ~85% of time is blowing my mind on its own, but considering even when it's wrong the chances it assigned to your opponent's move choices are reasonable has me in disbelief. I don't know how I'd find the upper limit of accurately guessing what your opponent will do next in a game as random and varying as this, but I have to imagine this is approaching it.

The other big part of the update is those 84 models I mentioned. Pokemon Battle Predictor now has added support for all of the following formats:
  • Gen 8 OU (worked before, but Isle of Armor changes are now well accounted for)
  • Gen 8 UU
  • Gen 8 RU
  • Gen 8 NU
  • Gen 8 PU
  • Gen 8 Ubers
  • Gen 8 Little Cup
  • Gen 8 National Dex
  • Gen 8 Monotype
  • Gen 7 OU
  • Gen 7 UU
  • Gen 7 Ubers
  • Gen 7 Monotype
  • Gen 6 OU
  • Gen 5 OU
  • Gen 4 OU
  • Gen 4 UU
  • Gen 3 OU
  • Gen 2 OU
  • Gen 2 Ubers
  • Gen 1 OU
This means all 4 of the predictions models now work completely on each of the formats listed above. There is a known exception to that in Zarude causing the move prediction to glitch out a bit, but I can get that fixed when the next set of usage stats come out. That's not saying there are no glitches and there aren't times when it's blatantly wrong (not factoring in choice items, messing up when transform is used, z-moves and max-moves always being usable again, etc), so if you find a glitch hit me up. Absolutely huge shout out to pre for helping me with the data needed to add this many formats, neither this nor the accuracy increases would've happened otherwise!

I didn't think it would ever reach a point where I could say this wholeheartedly, but after this update, I think the Battle Predictor could be considered legitimate, reliable tool.
 
Last edited:
The program seems to not consider the existing reflect very well. Even after the reflect is set, it still says the foe's Espeon most likely to use reflect next turn.
 
Hi, this seems like a very interesting project!

What model did you use exactly? It doesn't seem very clear to me, although I guess as you use TensorFlow it's some kind of deep model?

How do you deal with the input essentially being part of the output? Eg. the switching to a teammate depends on the team. You mention that the output for switching predicts all possible Pokémon and I guess is filtered on which ones actually exist? Obviously, if you determine the move/switch percentages beforehand this is not really accurate - whether you will stay in or switch also depends on the team.

PS depending on the effectiveness of this program, it could be considered cheating in a way for tournament games, perhaps a way to notify that it is being used should be implemented?
 
The program seems to not consider the existing reflect very well. Even after the reflect is set, it still says the foe's Espeon most likely to use reflect next turn.
Yep, this does happen. Kind of how I mentioned it thinks a z-move or max-move can be used again, it doesn't seem to entirely understand when a repeated use of certain kind of moves doesn't make any sense. This occurs almost exclusively with status moves, although some like Stealth Rocks and Heal Bell work pretty well, because the model no longer considers effectd as a binary present or not present and changing it back to that would cause some other issues. I could hard code it to not predict moves that reapply effects that are already there, but I'd like to avoid rules like that as much as possible since the whole point of this is to experiment with how well a machine learning model can figure out the game on it's own.

I have a fix in mind, so hopefully that works and the bug can be solved.
 
Hi, this seems like a very interesting project!

What model did you use exactly? It doesn't seem very clear to me, although I guess as you use TensorFlow it's some kind of deep model?

How do you deal with the input essentially being part of the output? Eg. the switching to a teammate depends on the team. You mention that the output for switching predicts all possible Pokémon and I guess is filtered on which ones actually exist? Obviously, if you determine the move/switch percentages beforehand this is not really accurate - whether you will stay in or switch also depends on the team.

PS depending on the effectiveness of this program, it could be considered cheating in a way for tournament games, perhaps a way to notify that it is being used should be implemented?
All of your guesses are mostly correct! So all of the models are neural networks, it does filter out Pokemon that can't be switched to after predicting all that are possible, and calculating the move/switch chance is done separately and only considers the available options on the opponents team while doing so.

And yes, the cheating consideration was the first thing I thought of when I released and has been talked about a bit. Whether it is considered cheating is up in the air, but I really like the idea of having it tell the opponent you're using this in a tournament setting. I'll look into doing that!
 
Gen8 RU, to be honest, it's quite common to see this glitch.
Okay, I fixed it. It was a problem with mons that changed tiers between this month and last. The extension will have the updated changes later today.

Edit: The update to fix the bug was released
 
Last edited:
Okay, I fixed it. It was a problem with mons that changed tiers between this month and last. The extension will have the updated changes later today.

Edit: The update to fix the bug was released
QQ截图20200828105637.png
Are you sure? I think you forgot to release the fix or something...
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top