Someone Simulated 87,000 Pokemon Red NPC Trainer vs. NPC Trainer Battles to Determine a Trainer ELO Ranking

littlechandler · Jun 15, 2020

As the title says, someone simulated 87,000 Pokemon Red NPC trainer vs. NPC trainer battles, and then created an ELO ranking and tier lists for how well they did against other NPC trainers.

The youtuber's process was roughly:

Select two trainers two perform a battle (initially randomly, later sequentially), Trainers 1 & 2.
Load a save state in an emulator against Trainer 2, with Trainer 1's Pokemon in the position of the player. Load another save state, but with the positions of the two trainers reversed.
Whenever the player needs to make another decision, load up the second emulator and check to see what option it would choose. Input that decision in the first emulator.
Make some adjustments (e.g. disable exp gain) to make the trainer in the player position more like an NPC trainer.
Repeat until someone wins, no damage is done over 75 turns, or if the battle hasn't ended in 1,000 turns.

Summary + some of my impressions from watching the video:

RBY AI is terrible as expected, but surprisingly some Cool Trainers had a decision tree for switching out Pokemon.
4:24 Basically confirms the AI cheats: If the player moves first the AI doesn't determine its move until after the player.
Based on the ELO rankings of different iterations of Gary with different starters Bulbasaur is the best starter until very late in the game.
Lorelei (pokemon levels 53-56) is only 2 ELO points above Gary during the rival battle in the Pokemon Tower (pokemon levels 20-25).
Sabrina is OP as a 5th gym (#13th highest ELO overall), she is highest of all gyms and is also above Bruno.
In general the Gym Leaders and rival fights are stronger than their surrounding trainers.
There is a Victory Road trainer (Juggler, lvl 48 Mr. Mime), that has a lower ranking than SS Anne Gary (lvl 16-20). There is a Scientist with a level 33 Electrode with a lower ELO than a Bug Catcher with a level 14 Caterpie and Weedle. There is a Hiker with a level 25 Geodude with a lower ELO than a Lass with two level 10 basic Pokemon.
The D-tier Lass with a level 11 Oddish was robbed. She could have beaten B- tier Beauty's Level 29 Ivysaur if the youtuber hadn't implemented a 1,000-turn limit before calling a match a draw.

RocketSurgery · Jun 15, 2020

This is super-cool. I would love to see more stuff like this.

When you mentioned that the creator swapped the positions of the opponents I was wondering exactly if it was because of the AI's cheating tendencies (which came up in another youtube video about beating Red/Blue without taking damage, in which they exploited the cheat to hax Gary's Exeggutor).

pimanrules · Jun 16, 2020

littlechandler said:
The D-tier Lass with a level 11 Oddish was robbed. She could have beaten B- tier Beauty's Level 29 Ivysaur if the youtuber hadn't implemented a 1,000-turn limit before calling a match a draw.

I was actually reading the Smogon forums to decide what the turn limit should be. I saw people saying that some Gen I and II strats lead to 3,000 turn battles, but since I was saving videos of each battle I decided to keep it shorter.

R_N · Jun 16, 2020

1000 turns is probably more than sufficient in 99% of cases but also I say give Lass the win by decision just for seemingly being one the bizarre situation where the battle could go 1000 turns but not end in an actual distinct draw since she was the only one 100% capable of bringing down her opponent without being in any danger.

Congrats Lass, you're kind of the Glass Joe of pokemon red.

Anyway this was a fun watch. Obviously curious about how gen 2 would pan out but taking it further I would be fascinated to see how something like XY pans out just because there's such a hard cap on the number of pokemon trainers have and the pokemon are all over the place.

Ironmage · Jun 16, 2020

I wonder if Venusaur Champion's loss was due more to matchups than being a worse team: with only Grass moves, it would be screwed (locked into growth?) against the other Champions' Pidgeots and Exeggutors, and all of Oak's exeggutors and arcanines. Charizard Blue runs Slash to get around this against rhydon and gyarados.

Edit: additionally, all Champion sets run Alakazam, which beats Venusaur but doesn't beat the replacement Grass-type Exeggutor, and the non-Blastoise Water-type for both Blue and Oak is Gyarados, which isn't weak to grass (and charizard double-resists, but I'm not sure that would come up). The deck was really stacked against them here.

Tmi489 · Jun 17, 2020

lower tier grass (poison) types are better because of better MUs against poison itself? or of abundance of rock types ig, then falls off near the end due to champion style teams

littlechandler said:
Lorelei (pokemon levels 53-56) is only 2 ELO points above Gary during the rival battle in the Pokemon Tower (pokemon levels 20-25).

i wonder what would lorelei's ranking be if yellow's ai was used (aka: dont use rest)

Millky95 · Jun 17, 2020

Tmi489 said:
i wonder what would lorelei's ranking be if yellow's ai was used (aka: dont use rest)

I am also curious to see how much these would change with different match up orders. Interesting to see if there are many changes particularly for those that faced the Oak fights earlier compared to when they got him later.

I really liked the video and wish I had the code skills/time to do all the games with better AI. Imagine Gen 2's Red fight

R_N said:
1000 turns is probably more than sufficient in 99% of cases but also I say give Lass the win by decision just for seemingly being one the bizarre situation where the battle could go 1000 turns but not end in an actual distinct draw since she was the only one 100% capable of bringing down her opponent without being in any danger.

Congrats Lass, you're kind of the Glass Joe of pokemon red.

Anyway this was a fun watch. Obviously curious about how gen 2 would pan out but taking it further I would be fascinated to see how something like XY pans out just because there's such a hard cap on the number of pokemon trainers have and the pokemon are all over the place.

You wouldn't need the 1000 turn limits in Gen 2 cause the NPCs have PP (iirc) so you'd get a lot more results when everything starts to struggle. Though he'd wanna make sure the turn counter goes up in Gen 2 cause that can cause some issues

orangeoctober · Jun 17, 2020

That was really interesting, thanks for posting! I hope they continue this series and do the same for the other Generations. I'd love to see how the trainers and gym leaders in the other games stack up as well.

littlechandler · Jun 17, 2020

pimanrules said:
I was actually reading the Smogon forums to decide what the turn limit should be. I saw people saying that some Gen I and II strats lead to 3,000 turn battles, but since I was saving videos of each battle I decided to keep it shorter.

Haha yeah I was joking on that. It makes sense to be consistent with your methodology, I just wanted D-tier Lass to get the moral victory.

Mace · Jun 17, 2020

If this doesn’t deserve a research badge, we should just remove the badge all together

cherryb0ng · Jun 18, 2020

This stuff is amazing.

plat27 · Jul 6, 2020

Crazy that Sabrina is so high, guess it just goes to show how powerful psychic types were in the original games

GG Unit · Jul 20, 2020

Lazy_bread27 said:
Crazy that Sabrina is so high, guess it just goes to show how powerful psychic types were in the original games

That plus Sabrina, Koga, and Blaine are designed to be taken on in any order, both in terms of the party levels being similar and in the game not gating any one of them off if you don’t have the others’ badges.

Probably the main Psychic advantage in terms of these simulations is having moves to hit Poison types super effectively (I don’t think there’s even a trainer in the game that has Earthquake on anything) since it’s definitely possible for the AI to take a really bad loss against an underleveled Pokemon/team if it happens to randomly choose enough items or non-attacking moves while getting hit once by the right status-inducing move.

That’s probably also why the rival with the Bulbasaur line is ranked higher than the other rivals for most of the game since it’s immune to poison (and Leech Seed, which is nearly a signature move in gen I) the entire game and can use Leech Seed and/or Poisonpowder in every rival battle except the first and last (aka the only time it happens to do worse than the Charizard/Blastoise rival). Both the Bulbasaur line rival and Erika have more draws than their Elo neighbors, and even the champion’s Venusaur team seems to mainly differ in terms of having at least one loss to Lorelei who is about 500 Elo points ’worse’ than the other champions’ worst loss.

Looking more closely at Lorelei, she was just a weird high-variance outlier due to Dewgong giving her the possibility to draw against much worse trainers if it picked Rest too much (she had fewer losses than any non champion/Prof. Oak); Venusaur Blue was likely at the unfortunate nexus of trainers who both were strong enough to defeat Dewgong reliably enough to advance to the game state where only a win or loss was possible (and she was definitely E4 worthy in terms of ratio of wins to losses) and had an ace Pokémon that was weak to Ice and slower than the Jynx. Or it could’ve just been a one-time fluke in a small sample size where everything was getting frozen and choosing dumb moves.

Millky95 · Oct 20, 2021

He made a follow up video with some bug fixes and by leveling scaling every Pokemon to lvl50. Really good to watch

Millky95 · Jul 23, 2023

Well he has gone and done it again but with Pokemon Crystal. No draws this time, better AI and nearly 300,000 battles. Worth the wait, imo

Someone Simulated 87,000 Pokemon Red NPC Trainer vs. NPC Trainer Battles to Determine a Trainer ELO Ranking

Users Who Are Viewing This Thread (Users: 1, Guests: 0)