Applying Game Theory to Pokemon (tl;dr)

McGrrr · Jun 8, 2007

Introduction

Game Theory is the study of decision making by individual agents (players) in response to others in any given scenario (game). It is applicable to all walks of life, so why not Pokemon? The aim of this thread is to give you a new perspective on battling, with the long term goal of educating the masses.

Assumptions

1. Players are rational.
This must be true if you expect to predict accurately.

2. Players aim to maximize their chances of winning.
Otherwise, there is nothing to it.

The Importance of Information

As anyone who has scouted a tournament opponent (or been the victim of scouting) will testify, knowing your opponent's team gives you an undeniable edge. However, scouting a battle can tell you so much more; notably how risk loving or risk averse your opponent is (how often does he switch out of a neutral match up? does he use fire blast over flamethrower? etc). All these tidbits cumulate to provide you with a profile to work with later.

Before a battle, you need to know just how good your opponent is, so you can associate him with a profile; for example weaker players will switch out of immediate dangers, stronger players will account for secondary dangers (your likeliest predictions) and the strongest players will be familiar with all sorts of mind games (I know that you know that I know that you know that I know that you know... etc).

During a battle, you need to figure out as soon as possible whether your profiling is accurate. This applies for even familiar opponents, because nobody battles the same on different days, for any number of exogenous reasons. Further, given that players tend to be more cautious early on (perhaps because they are doing the same thing!), this is more difficult than touted.

Catalog all information that occurs during the battle, and in particular note the turns where you outplay your opponent and vice verse. After a while, the strongest players will develop a sense of what is going on, and press home their advantage. Essentially, a battle amongst the best players is a race to pigeon hole how well their opponent is playing on that particular day (all else held equal), and predict accordingly.

The strongest players are able to adjust their style if they realize that they are being outplayed.

The Choice Band Dilemma

This is a play on The Prisoner's Dilemma, but it is slightly more complicated. Consider the following (not unlikely) scenario:

~Bug catcher Aeolus leads with choice band Heracross with megahorn/close combat/stone edge/pursuit
~Schoolboy Batpig leads with Swampert with earthquake/stone edge/ice beam/stealth rock
~Bug catcher Aeolus is a good battler, schoolboy Batpig is top of his class.
~Neither player knows anything about the other's team.

Clearly the matchup favours Heracross and with a lack of information, we can safely assume Aeolus will not switch. The question is what attack he will use. If Swampert stays, it will certainly use stealth rock. Given this, Aeolus does best with close combat or megahorn. If Swampert switches, neither of these rate to be optimal, as the new pokemon will likely resist one or both. The risk of predicting no switch and Batpig changing to a ghost means megahorn is a better option than close combat.

Given a switch, pursuit will not damage Swampert significantly and more or less forces Aeolus to switch on turn two. Therefore, if Batpig switches, Aeolus does best with stone edge. We have thus concluded that given this scenario, Heracross will only ever use megahorn or stone edge (through similar analysis, you can be reasonably sure that a choice band Tyranitar will use crunch as its first attack etc).

Since Batpig is top of his class, he understands this mechanism and assigns a probability for which attack Aeolus will choose. He knows that if he leaves Swampert in to megahorn, the price could be defeat. On the other hand, if Heracross uses stone edge and Swampert stays in, not only has he laid stealth rock first turn, but Heracross must switch second turn.

Since Aeolus is merely a “good” battler, Batpig decides that Heracross will stone edge 70% of the time (arbitrary example), expecting a switch. If neither player began with a team edge over the other, Batpig needs this play to win 30% more battles than he would otherwise, to break even in the long run and justify staying in.

Decision Tree

In the above scenario, I used words to describe something that is easier to communicate visually. Observe the generalized decision tree of one turn in Pokemon battling (N is a third agent “nature” also known as “luck” i.e. hit/miss, CH? and RNG):

It is important to note that A and B act simultaneously with the same information set (the tree suggests B acts after A). The payoffs are expectations based upon the probability of that outcome. More precisely, there should be 4 options rather than “attack” and 5 options over “switch”, but that would over complicate things.

Independence

Subsequent decisions are more interesting. The outcome of a repeated matchup will not only depend on the situation of the battle, but some causation can be attributed to what has happened before. The good battler tries to be unpredictable by not adhering to patterns, but the best battlers always consider each situation on its merits.

Undoubtedly, you will have gathered more information about your opponent's team, thus changing the probabilities and expected payoffs of every possibility. Mind games really kick in now, and your profiling better be accurate.

Commitment

If you want someone to do something, commit to an action that obliges them to do just that. For example in real life, if you want your girlfriend to go to a football match with you when she would rather watch a teen flick, book the tickets in advance. This incurs an additional cost to not going to the football, so she is more likely to err in your favour.

Similarly, commitment occurs in pokemon with choice items. Once you lock yourself into an attack, you can only continue to use that attack, or switch. Since your attacking option is known, your opponent can only react optimally to that attack. For example, by committing to ice beam for a kill, your opponent is unable to switch to a Salamence who could otherwise dragon dance and sweep.

Comment

These methods may seem mechanical and even long winded, but after accumulating experience, you will find that they become almost second nature. Other battlers may well recognize in themselves everything that I have said, but probably have never thought about battling so explicitly. This is kind of short (honest) because I got fed up with it halfway through. I might add stuff later depending on interest.

Firestorm · Jun 8, 2007

Pokemon, as a turn based strategy game, really does adhere to this more than any other game I play. I have no comments on it, well-written. So, uh, good job? =P Hope people do read all of it, especially newer battlers. Unlike other games, Pokemon does allow you more time to make decisions so people should take advantage of that!

Jumpman16 · Jun 8, 2007

i will read this (everyone else should too!), remind me on irc mcgraw

Treedweller · Jun 8, 2007

Hmm, I don't really have anything to add either, besides the fact that I had never thought of pokemon this way. This certainly sounds like something that will significantly improve my prediction and battling skills, and thanks for writing it.

EFF · Jun 8, 2007

Good read. And yeah, Pokemon makes pretty good use of game theory. Especially now that it's got a worldwide multi-player system built in, it's one of the best strategy multi-player games out there. For anyone who can't convince their parent to get them the game (I'm so glad I outgrew that problem), just tell them this: It's like chess, except you can choose which pieces you want to use, and how they each move. So, it's like super-chess. I've sold it to parents of kids by telling them it's almost educational. It doesn't teach facts, really, but it encourages logical thinking, strategy, planning, and other higher brain functions.

Of course, the flip side is that, especially if you're doing a random battle in PBR, or going up against the computer, you can't always assume the player is rational. You may be playing against an elementary schooler who will keep his Pikachu in against a Steelix because he likes his Pikachu, and besides, Ash's Pikachu beat Brock's Onix. (Also it may have surf.)

Discord · Jun 8, 2007

Absolutely wonderful read. I've thought of such things like this before, but not to the degree in which you explained it, so great job. I'm also in agreeance with EFF. In the case of random battling, the opponent is almost always NOT rational. It will probably be some little kid who got some ideas from the anime. But to EFF, always expect the worst, because if you fight a string of non-rational opponents, and then underestimate your next match, and it so happens that it's McGraw or something, you are in a lot of trouble. Always think negatively and think the oppnent is rational.

Synre · Jun 8, 2007

This wasn't nearly as long as I had been hoping, but that made it a lot less painful to read so I guess the trade-off is acceptable. I'm going to digress from any sort of main point I'm trying to make here a lot, so try and bare with me.

I've definitely thought about all of this before, although I really enjoyed this topic because I've tried to explain this to a couple of my friends who are thinking of picking up Pokemon and I didn't do nearly as good of a job at articulating it as you did here.

More people really need to exploit this line of thought more, I think. I had a 'best of 3' with Skarm, and consecutive best of 3s with TTS, in a tournament a little over a month ago, which made prediction really interesting since for the most part we knew each others teams and how we intended to battle them after the first match. As a whole I did better against both of them (won game 1-2 with skarm, and the second two games of the last best of three against TTS after making a fairly large mistake in the second game of the first best of three where I had an assured win due to 100% accuracy, speed and minimum damage) in the later games of the series, partially because I was intentionally adding variables by changing my lead and the items my Pokemon were holding, but you definitely learn a battler quickly like that. In a normal single battle situation however, thus far in DP it's seemed to matter a bit less than in Advance for me since most of the people I've fought have been running much more offensive teams which limits the amount of turns in a battle, and subsequently how long you get to learn your opponent and react accordingly.

To some extent I think the vein of this logic(especially the decision tree) is severed a bit by what I implied above, though. In addition to teams being more offensive, the offensive Pokemon themselves have become a lot more potent. I can't even count how many people were running fairly defensive teams in Advance OU where they could take very minimal amounts of damage consistently switching to the Pokemon that would take the least amount of damage regardless of the move used, and even being outpredicted with switches nearly every turn, could last a very long time taking virtually no risk. There are a lot of Pokemon in DP(Rhyperior is a good example) that simply don't allow this; even a Pokemon that 'walls' Rhyperior typically can't take consistent damage from it, particularly if it doesn't predict correctly. This, obviously, reduces the payoff for frequent switching a lot(Stealth Rock helps a lot here too). I've been really surprised at how the risk/reward of a lot of decisions has changed in DP, and in general I think it's been for the better.

Matt · Jun 8, 2007

http://en.wikipedia.org/wiki/Minimax

Pokemon isn't "perfect information" until late in the game, though. Still, it should be possible to write a decent AI, as long as you have knowledge of standards/likely movesets. I hope something like that be possible with Ruby / Competitor.

sarunibi · Jun 8, 2007

A really good read! there are a couple of other things that I thought of that could be added, but since this is your thread, I won't steal your fire...

This information is also useful for who to choose for a lead. Good job!

Surgo · Jun 9, 2007

How the hell did this get kicked off the front page, it's better than any thread there.

re tenchi: The problem of course is coming up with a good heuristic. I'm still at a loss as to how to make one to cover everything important in pokemon.

X-Act · Jun 9, 2007

You know, I was thinking around these lines when I was considering creating an intelligent bot that plays Pokemon.

YJiang · Jun 9, 2007

Eh...I think most top battlers do this subconsciously. It's certainly a well thought out article. I enjoyed it very much, actually.

When I battle, if I'm faced with a dilemma such as choosing attacks, I just flip a coin. It's unpredictable and no one can ever pre-gauge me much based on patterns.

Sir Spanky · Jun 9, 2007

This topic is one of those wonderful examples of why Pokemon/a decision making sport, is so much deeper than one would immediately presume. Excellent stuff.

With something to add, I think a small, but often significant factor is how 'sure of themselves' the player may or may not be. Using myself as an example, I often instantly decided what move to make (going along the lines of YJiang's suggestion of subconscious action), and then while I wait for them to make a move, I start to question my actions, click undo, and switch to a different move/Pokemon. It more often than not turns out that my first action would have been by far the more useful one, and I should have just stuck with my gut instinct.

The indecisiveness of a player may therefore affect performance.

EDIT:

YJiang said:
When I battle, if I'm faced with a dilemma such as choosing attacks, I just flip a coin. It's unpredictable and no one can ever pre-gauge me much based on patterns.

But doesn't that go against McGraw's first assumption? Factoring the coin-flip luck factor effectively removes the rational thought process from the decision. You could argue you immediately ruled out Close Combat/Pursuit (to use MG's example) and then used the 50/50, but there is still an instance of irrational outcome here - making the example void :*

pooch · Jun 9, 2007

SirSpanky said:
EDIT:
But doesn't that go against McGraw's first assumption? Factoring the coin-flip luck factor effectively removes the rational thought process from the decision. You could argue you immediately ruled out Close Combat/Pursuit (to use MG's example) and then used the 50/50, but there is still an instance of irrational outcome here - making the example void :*

But this is precisely why he does it - so that his opponent cannot properly predict his actions

Sir Spanky · Jun 9, 2007

I understand why he does it, that is obvious. I am just saying it goes against one of the assumptions McGraw stated...

pooch · Jun 9, 2007

Unless I misunderstand, YJiang's strategy counts as an effective application of game theory as stated because while assuming the rationality of his opponent, he acts in such a way as to render himself (borderline) irrational to his opponent. The assumptions McGraw states are the assumptions necessary for Game Theory to have legitimate application. In this case, the criteria are met.

Dragontamer · Jun 9, 2007

While I don't see how Choice Band relates to the Prisoner's Dilemma at all... it was an interesting read to try and apply this to Game theory :-p

The prisoner's dilemma has a Nash Equilibrium for the worst case for both players. A Nash Equilibrium occurs when both players play the best move possible and can stop "predicting" each other, for both players.

A far more closer example to the Prisoner's dilemma is the use of Double Team + Baton Pass in battles (or using Ubers in standard, or whatever). Both players can certainly use Double Team, but both players arguably will have less fun. However, if the other player is using Double-Team + Baton Pass, then your best choice is to adopt a similar strategy. (And erm... pretend that Aerial Ace and other Double-Team counters don't exist :-p It screws up the explanation)

(assuming Double-team has no counters. Just pretend that it doesn't for now) However, if both players do not use double team, a stronger metagame can occur and both players benefit from a more fun metagame.

I know this debate isn't settled yet either, but it is the closest thing to the Prisoner's Dilemma. I'm not sure if we can apply the Prisoner's Dilemma inside the game. But it certainly is not like Choice Band Dilemma presented here :-p

Nash supposedly proved that Nash Equilibriums exist in all finite games. Pokemon is certainly finite (finite number of PPs, and struggle kills yourself now, so all games will end eventually). I cannot think of a Nash Equilibrium however right now :-p But I love the concept. A position where neither player benefits from changing his strategy.

And just a little note :-p A payoff matrix is more appropriate for Pokemon. Pokemon is a simultaneous game (neither player knows what the other's strategy is), while decision trees are more appropriate for describing sequential games. (Like chess, go. etc. etc. You know the opponent's move before you move)

I think a game theory analysis would be great for Pokemon btw. I just wonder how far it would go. I guess you can say the Normal Form matrix thing would be very very long, not only will it contain every possible team and team arrangement (the team leader _does_ count, but the other 5 can be in any order), it would contain all the possible strategies for every match possible. IE: switching, and so on, so forth.

Far too complex to actually write down, but if we were to limit ourselves to say, 2 pokemon per team, 2 moves per pokemon and other limitations to constrain this massive matrix, we might be able to perform a much easier analysis. :-p

-------------

As for programming, I've always loved the idea of having a genetic algorithm search for a good team. The search space is huge, and if we include EVs and so forth, then we're talking about a number larger than a gogool number of teams available. (6 pokemon out of 493, maybe 200 if you limit to full evolutions. Team leader matters. Items mater. Each pokemon has 4 moves out of maybe 100 or so it can choose from ...). However, the number of pokemon teams IS finite, and genetic algorithms have been applied to extremely large search trees before.

Furthermore, all a genetic algorithm has to do is find a way to evaluate whether one team is better than another. Not even how much better, we just need a "compare" function to say which team is better than another. It doesn't even have to be perfect (and IIRC, Genetic Algorithms degenerate into a hill-climbing algorithm very easily if you have a perfect compare function anyway. Most people throw in an element of randomness... supposedly it is better) And this is easy... except for the fact that we need a good AI to play the games against each other :-p

Or maybe a "simple" min/max tree down to the bottom to see which team _surely_ beats the other team... hmm... maybe I ought to try this :-p

McGrrr · Jun 9, 2007

Yeah, I realize that the choice band example is not like the Prisoner's Dilemma; I was writing something else involving a payoff matrix at first but ended up contradicting myself at some point, so naturally I dropped it, wrote something completely different but kept the original claim!

Part of the reason why I got fed up with this, was because I realized pretty quickly just how ridiculously large the normal form matrix would have to be. Like I mentioned in the original post, the analysis is somewhat short... and perhaps somewhat of a cop-out too! But I think I got the general message across. Thank you for your input, Dragontamer.

chaos · Jun 9, 2007

this thread is great. it shows what smogon is supposed to be about :)

leafgreen386 · Jun 9, 2007

Excellent thread. I think I do a lot of that subconsciously, already, although it was some nice insight. I usually play very mechanically in the beginning of a battle, knowing my opponent will often do the same until we know each other's teams, and after that I focus more on predicting my foe's actions. I'm a little worried about how that's gonna work in dp, as from the looks of things I won't have as long to scope out my foe, but the same applies to my opponent, so I guess it works out.

Akayfortyseven · Jun 9, 2007

I showed this to my parents and friends who think pokemon is childish... It has a whole psychological and mathamatical theory behind it... awesome.

TempestJG · Jun 9, 2007

Discord said:
Absolutely wonderful read. I've thought of such things like this before, but not to the degree in which you explained it, so great job. I'm also in agreeance with EFF. In the case of random battling, the opponent is almost always NOT rational. It will probably be some little kid who got some ideas from the anime. But to EFF, always expect the worst, because if you fight a string of non-rational opponents, and then underestimate your next match, and it so happens that it's McGraw or something, you are in a lot of trouble. Always think negatively and think the oppnent is rational.

you might be able to tell if he's a rational player just by his starting pokemon alone, i.e. he starts with Charizard cuz it "looks cool".

chaos · Jun 9, 2007

If you're playing in a tournament, your opponent is going to be rational regardless of the starting pokemon. Experienced players have used weird pokemon for the surprise factor before

Sir Spanky · Jun 9, 2007

Man that reminds me of Bugsy's Parasect team lol

EnoOn · Jun 9, 2007

tl;dr part 2

A great start, McGraw. It turns out I was having an interesting discussion about this with my g/f today. ;-)

Before I get too far into talking about this, an article I want to throw in that fits into the idea of "profiling your opponent" is from Sirlin (again): http://www.sirlin.net/archive/yomi-layer-3-knowing-mind-of-the-opponent/

I want to expand on the idea of a Nash Equilibrium some more. Dragontamer started to talk about it, but the idea is that, in a finite n-player game, there exists a strategy such that it is optimal. Even better, since the game is finite, and everyone can (theoretically) deduce the same strategy, you can TELL the other person what your strategy is, and, if it is a Nash equilibrium, it doesn't matter. The strategy, however, may be a mixed strategy. This means that you do certain actions with a certain probability.

Consider Rock/Paper/Scissors for a moment. If you win, you get 1 point and if you lose you lose 1 point. There exists a Nash Equilibrium for this game: Throw Paper with probability 1/3 (or: P(paper) = 1/3), P(rock) = 1/3, P(scissors) = 1/3. If I tell you in advance that I'm going to do this, it doesn't matter what you throw. The outcome over infinite trials will be the same: We both score 0.

The score at a Nash Equilibrium does not have to be 0. Consider a game called two-finger morra. In this game, like RPS, player 1 "throws" 1 or 2 fingers out, and so does player 2. If the total is odd, player 1 wins, and gets the total of all the fingers. If the total is even, then player 2 wins, and gets the total of all the fingers. So, the payouts look like this (for player 1)

Code:

[font=Courier New]
                  P1
             1        2
P2   1      -2        3
     2       3       -4
[/font]

Although it looks equal, it turns out that P1 can say "Look P2, I'm gonna throw 1 finger 7/12s of the time, and 2 fingers 5/12s of the time. Do whatever you want." P2 can do whatever he wants, and he will lose an average of 1/12th per throw. (http://www.cs.wisc.edu/~jerryzhu/cs540/gametheory.pdf has more information on this stuff, if you're interested)

"Ok EnoOn, I'm falling asleep, and I want to play Pokemon, not two-finger morra, especially if you're going to roll dice to get random numbers, come on"

okokok

First, Nash tells us that, since there are a finite number of choices in Pokemon, there is an optimal strategy. This applies to team building AND move choice.

Fortunately for our sanity, there are too many choices for us to actually calculate them all, so we still have a game.

We can much more easily talk about choosing moves in an actual game. Let's take an example. You and I are starting a match. You lead with

Salamence @ Choice Specs
Draco Meteor
Dragon Pulse
Flamethrower
Hydro Pump

I lead with (because I suck at this game)
Blissey @ Leftovers
Ice Beam
Seismic Toss
Thunder Wave
Softboiled

Let's also say that we've played a lot, and we both completely know these movesets (to simplify my calculations, thanks~)

Now, let's say you have a Ludicolo you could switch in which is an excellent counter for my Blissey, and I have a Heracross that is an excellent counter for your Ludicolo, and (to simplify things a little more) we're not likely to do anything else besides use a move and switch to these.

Let's say that you are player 1, and I am player 2, and let's also say we can assign "scores" to each of these moves. Positive is better for P1, Negative is better for P2.

Code:

                        Blissey           
                 IB      ST       TW       SB    Hera
          DM    -30     -15      -12      -10     50
          DP    -35     -10      -10        0     25
Salamence FT    -40     -10      -10        0     50
          HP    -40     -10      -10        0     30
          Ludi    5       8        5       20    -10

Now, I'm really not trying to have a fight about what these numbers should be, they're completely arbitrary (and, actually, the problem of coming up with these numbers accurately is exactly what Surgo is talking about about the difficulty of coming up with a good heuristic... I'll talk a little more about this in a minute). What I'm trying to say is that, you can represent how good a choice is, against another person's choice.

It turns out that this game can be reduced some due to some strictly dominated decisions. It turns out that Blissey should really only Ice Beam or switch to Heracross (and the other three choices are strictly worse than one of those), and Salamence should only Draco Meteor (to catch Hera on the Switch) or switch to Ludicolo (to catch Blissey).

If you solve this system (and if you really want to learn how to do this, do some research on the web, although http://people.hofstra.edu/Stefan_Waner/gametheory/games.html has a nice utility which I used), it turns out I should IB about 63% of the time, and switch about 37% of the time. Meanwhile, you should DM about 16%, and switch to Ludicolo about 85%.

Now, where the profiling gets really interesting is, if you KNOW I am going to IB every time, of course you should switch to Ludicolo. But it turns out (if those values are right) that if you know I will IB ANY MORE than 63% of the time, you should ALSO always switch. Why?

Consider the rock paper scissors again. Let's say I tell you "You know, I know about that optimal strategy. But I really like rock. So I'm going to throw rock 50%, and paper and scissors 25% of the time." Your best strategy? Always throw paper. If you throw paper any less than 100% of the time, you don't fully take advantage of the way I'm playing sub-optimally.

So, if you know that your opponent is going to play suboptimally, you can always take advantage. Of course, if they notice, they can take advantage of your now sub-optimal play, and... so on.

Of course, all of this depends on knowing what the optimal play is! And it's really too complex to know. But, knowing how the math of all of this works can help you towards what optimal is.

One final point, to Surgo: I'm working on such an AI now. If you're interested in a possible heuristic method (or if any of you are really into game AI) check out UCT and Monte Carlo based planning. (http://zaphod.aml.sztaki.hu/papers/ecml06.pdf is a good academic paper on it) People are using it in Go right now. The high-level idea is that, if after a set of moves, you can see how good the position is by randomly playing a number of games, and you can do even better if you keep some information about which moves are better later in the game tree. I'm not going to go into it here, it's a bit beyond the scope of this discussion.

Whew! I'm gonna go level my Cresselia.

Applying Game Theory to Pokemon (tl;dr)

McGrrr

Facetious

Firestorm

I did my best, I have no regrets!

Jumpman16

np: Michael Jackson - "Mon in the Mirror" (DW mix)

Treedweller

EFF

Discord

Synre

Matt

Maybe monads might not matter

sarunibi

Surgo

goes to eleven

X-Act

np: Biffy Clyro - Shock Shock

YJiang

Sir Spanky

pooch

Sir Spanky

pooch

Dragontamer

McGrrr

Facetious

chaos

Tournament Banned

leafgreen386

Akayfortyseven

TempestJG

chaos

Tournament Banned

Sir Spanky

EnoOn