Hey Smogon,
We’ve been working on something wild over the last year: a full open-source Pokémon AI benchmark built around Pokémon battles.
It’s called the PokéAgent Challenge, and it’s going to be hosted at NeurIPS 2025 in San Diego this December.
Our goal is simple: use Pokémon as a real testbed for AI reasoning and learning.
There are two main agents powering the challenge with crazy performance on Gens 1,2,3,4 and 9 OU (and even VGC in the works!):
PokéChamp:
Large Language Model-based agents.
We use LLMs like ChatGPT, Claude, and Gemini to make think ahead, model opponents, and plan actions. It can even explain its choices like a human player.
→ GitHub: github.com/sethkarten/pokechamp
Metamon:
Reinforcement Learning agents that learn from experience: no scripts, no hand-coded rules.
→ GitHub: github.com/UT-Austin-RPL/metamon
Data
Did I mention we have the largest Pokemon battle dataset? Through a combination of human replays and bot ladder battles, we have almost 10M replays (and growing).
If you’re into bot development, battle analysis, or just want to see how close AI can get to real human play, check out:
https://pokeagent.github.io

We’ve been working on something wild over the last year: a full open-source Pokémon AI benchmark built around Pokémon battles.
It’s called the PokéAgent Challenge, and it’s going to be hosted at NeurIPS 2025 in San Diego this December.
Our goal is simple: use Pokémon as a real testbed for AI reasoning and learning.
There are two main agents powering the challenge with crazy performance on Gens 1,2,3,4 and 9 OU (and even VGC in the works!):
PokéChamp:
Large Language Model-based agents.
We use LLMs like ChatGPT, Claude, and Gemini to make think ahead, model opponents, and plan actions. It can even explain its choices like a human player.
→ GitHub: github.com/sethkarten/pokechamp
Metamon:
Reinforcement Learning agents that learn from experience: no scripts, no hand-coded rules.
→ GitHub: github.com/UT-Austin-RPL/metamon
Data
Did I mention we have the largest Pokemon battle dataset? Through a combination of human replays and bot ladder battles, we have almost 10M replays (and growing).
If you’re into bot development, battle analysis, or just want to see how close AI can get to real human play, check out:
https://pokeagent.github.io
Last edited:
