• The moderator of this forum is jetou.
  • Welcome to Smogon! Take a moment to read the Introduction to Smogon for a run-down on everything Smogon, and make sure you take some time to read the global rules.

(Re-)Introducing Foul Play: A Competitive Pokemon Battle Bot

Hello once again Smogon,

It has been ... 6 years(!?) since I first posted about a Pokemon Showdown battle-bot that I've been working on. I'm here to share an update as I believe I've made some good progress.

The unnamed project has been re-branded as foul-play. It is still a singles focused battle-bot that only plays formats with species clause. It also does not support formats with mega-evolving, z-moves, or dynamax yet. It continues to rely on a search based engine that looks into the future and uses a static, hand-crafted evaluation function to guide the search, though I have completely re-written the Python battle-engine in Rust as poke-engine. One of the most impactful changes made is that minimax has been replaced by monte-carlo search. This was a game changer for Pokemon as monte-carlo search better deals with simultaneous move games.

How does it do? In my opinion it is an above average player when it is able to predict the opponent unknowns well. It is not able to consistently dominate the top of the ladder but it can get placements that are definitely impressive. Furthermore, as the engine doesn't understand battle mechanics with 100% accuracy, it is certainly exploitable if the opponent knows it is playing a bot.

Here are some results I've achieved with the bot. Note that (at least, imo) peak ELO is not always indicative of skill. Rank #4 in gen3ou for example was due to a lucky run. GXE values shown are all after Glicko deviation had dropped below 50.
1751723458751.png


If you would like to read a bit more about some of the techniques used by foul-play and poke-engine to achieve these rankings, as well as see some replays of the bot battling, check out: https://pmariglia.github.io/posts/foul-play
 
Sorry if I'm dumb and misunderstood it, but would it be possible to challenge it personally (send it a battle request) and use it as a training bot?
 
Sorry if I'm dumb and misunderstood it, but would it be possible to challenge it personally (send it a battle request) and use it as a training bot?
Is there a way I can play against it via a challenge? I'd love to try it out if possible.
You'd have to build and run it yourself locally. I'd love to provide something that people can challenge but running Foul Play, especially at its strongest settings, takes a fair bit of resources.
 
This is fascinating! Do you have any of the replays associated with the bots run to #4 on the gen 3 OU ladder?

There are a few sample replays towards the end of this page for a few different formats, including gen3ou: https://pmariglia.github.io/posts/foul-play

You can see a broader set of replays by going on https://replay.pokemonshowdown.com and searching for the two accounts I commonly tested on: Accelerock Ttar and Playing Foul. Unfortunately, these will be biased towards losses as I configured Foul Play to save replays on loss unless I was actively observing and found it interesting for some other reason.

Here are two gen3ou wins though:
- 1600 elo win
- 1800 elo win
 
You'd have to build and run it yourself locally. I'd love to provide something that people can challenge but running Foul Play, especially at its strongest settings, takes a fair bit of resources.
I have been trying to challenge myself for a minute but I cant seem to get it to send me a challenge request or even accept a challenge. Can you help with that?
 
I kid you not I literally started working on a project like this for gen 3 specifically last month. I have a bot that uses extremely similar techniques to what you outlined, and I've been testing it on a local showdown server. It even beat me one time! (Attached replay) (I got too cocky)
The funny thing is I was using poke-engine for my simulator too, so I really should have noticed the recent updates and realized you were already working on this. Oh well, at least I can rest easy knowing that somebody who actually knows how to program has done better than I ever could.

Are you going to continue working on the bot to improve it, or are you satisfied with it as is? My lofty goal was to make something that was genuinely superhuman at playing pokemon (in my case just gen3ou), and I think that's definitely possible based on what you've managed here.

I have a few technical questions as well since I'm curious:

1) Are you predicting unrevealed pokemon on the opponents team, or just narrowing down and predicting sets of the revealed pokemon? If not, I've had moderate success doing so (utilizing teammate data for the revealed pokemon from smogon usage stats) and I think it could potentially lead to a significant improvement.

2) Have you looked into using sampling algorithms other than MCTS? I came across this paper which presents a new sampling algorithm and compares it to different variants of MCTS. It looks like you're using something similar to their MCTS-UCT algorithm, so potentially room for improvement there?

3) Have you tested the bot with different team archetypes or mostly the same teams? From the teams in the winning replays it looks like a lot of offense, so I was curious if it still performs well with more defensive/stall structures. I noticed with my bot that it often makes poor long term decisions or underestimates moves like toxic which have high but delayed impact.

Finally it goes without saying but this is a really awesome and exciting project so thanks for doing it!
 

Attachments

Last edited:
I have been trying to challenge myself for a minute but I cant seem to get it to send me a challenge request or even accept a challenge. Can you help with that?
If you have a problem installing/running I'd suggest making a Github issue.


I kid you not I literally started working on a project like this for gen 3 specifically last month. I have a bot that uses extremely similar techniques to what you outlined, and I've been testing it on a local showdown server. It even beat me one time! (Attached replay) (I got too cocky)
The funny thing is I was using poke-engine for my simulator too, so I really should have noticed the recent updates and realized you were already working on this. Oh well, at least I can rest easy knowing that somebody who actually knows how to program has done better than I ever could.

Are you going to continue working on the bot to improve it, or are you satisfied with it as is? My lofty goal was to make something that was genuinely superhuman at playing pokemon (in my case just gen3ou), and I think that's definitely possible based on what you've managed here.

I have a few technical questions as well since I'm curious:

1) Are you predicting unrevealed pokemon on the opponents team, or just narrowing down and predicting sets of the revealed pokemon? If not, I've had moderate success doing so (utilizing teammate data for the revealed pokemon from smogon usage stats) and I think it could potentially lead to a significant improvement.

2) Have you looked into using sampling algorithms other than MCTS? I came across this paper which presents a new sampling algorithm and compares it to different variants of MCTS. It looks like you're using something similar to their MCTS-UCT algorithm, so potentially room for improvement there?

3) Have you tested the bot with different team archetypes or mostly the same teams? From the teams in the winning replays it looks like a lot of offense, so I was curious if it still performs well with more defensive/stall structures. I noticed with my bot that it often makes poor long term decisions or underestimates moves like toxic which have high but delayed impact.

Finally it goes without saying but this is a really awesome and exciting project so thanks for doing it!
1) Yes. For random battles the unrevealed pokemon are sampled from the pool of Pokemon PS would put on the team. For formats like Gen3OU I do a very non-scientific sampling of the most likely pokemon. This does not intelligently try to infer how a team is composed, but it uses smogon's available usage statistics to do it's best. I'm sure there's room for improvement here

2) I have not. These seem interesting, I will definitely give them a read.

3) You're right, its mostly offense/balanced offense that the bot is good with. Long term decisions are hard to reason about with a search based engine that can only see perhaps 10 turns ahead in the best case. There have to be clear HP gains gains made in that time for the engine to see the value.
 
Back
Top