Applying Game Theory to Pokemon (tl;dr)

McGrrr

Facetious
is a Contributor Alumnus
Excellent reply, I enjoyed reading that. I merely hinted at mixed strategies, but you explained them explicitly (something I was loathe and too lazy to do, so kudos). I did not expect much genuine/useful feedback from this thread, but I am pleasantly surprised.

Somehow I lured out two random economists with 3 posts each, from NOWHERE.
 

Hipmonlee

Have a nice day
is a Super Moderator Alumnusis a Live Chat Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnusis a Past WCoP Champion
I would like to add that if mcgraw leads with CB heracross, and you lead with Swampert, you probably want to switch in Lucario.

Have a nice day.
 
tl;dr part 3

LOL at luring economists out of nowhere. ;-)

I'm just a long-time lurker. I don't know much about actual competitive pokemon, but I know a little bit about this, and Smogon is amazing for battling information.

One thing I noticed in a second-readover of the thread, probably not interesting to non-programmers, sorry:

As for programming, I've always loved the idea of having a genetic algorithm search for a good team. The search space is huge, and if we include EVs and so forth, then we're talking about a number larger than a gogool number of teams available. (6 pokemon out of 493, maybe 200 if you limit to full evolutions. Team leader matters. Items mater. Each pokemon has 4 moves out of maybe 100 or so it can choose from ...). However, the number of pokemon teams IS finite, and genetic algorithms have been applied to extremely large search trees before.
Genetic algorithm is definitely the way to go on this, IMO, and it's what I plan to do. The problem is, genetic algorithms also have a tendency to fall into local maxima if you don't very carefully control how mutation and generational crossover work.

My plan is to have two different GAs running, one for 1 vs. 1, and one for 6 vs. 6, and have them feed each other, as well as for creating "counter teams" (which should also be "good"). We'll see how it goes.

Furthermore, all a genetic algorithm has to do is find a way to evaluate whether one team is better than another. Not even how much better, we just need a "compare" function to say which team is better than another. It doesn't even have to be perfect (and IIRC, Genetic Algorithms degenerate into a hill-climbing algorithm very easily if you have a perfect compare function anyway. Most people throw in an element of randomness... supposedly it is better) And this is easy... except for the fact that we need a good AI to play the games against each other :-p
This is not exactly right, because of the problem of local maxima, again. Genetic algorithms work well because of the ability to bump out of these local maxima.

Or maybe a "simple" min/max tree down to the bottom to see which team _surely_ beats the other team... hmm... maybe I ought to try this :-p
Unfortunately, because of the randomness aspect, the min/max tree is too large, even with various pruning methods.

Consider Salamence DM vs. Blissey IB. Neither are (without CH) OHKOs. IIRC, the damage formula has a 237+Rand( 0-18 )8)/255 at the end of it. Also, IB has 2 other possibilities (freeze or not). Also, either one could CH. So you have 19 (DM) * 19 (IB) * 2 (freeze) * 2 (DM CH) * 2 (IB CH) * 2 (does DM hit?) = 5876 possibilities for just one move. And if you want to create an accurate min-max tree, you have to account for all of the possibilities, because even small changes can impact how later parts of the tree look. And there are many other possibilities (quick claw? what if both moves have chance factors like DM did?)

Then, FORGET about all of that, and let's assume the game is completely perfect information, and we know in advance how much moves will hit for, etc. Each player has at least 4 choices every move (not thinking about running out of PP). This could be more like 9 (switching to one of 5 other pokes) or 14 (+ baton pass). But even if it's only 4, and an average match goes 20 turns, the tree has something like 1.2 * 10^24 nodes, which by itself is way too many.

Nope... you're stuck with either heuristic evaluation at nodes, or some other method of heuristically evaluating moves.

FFS, I need to stop typing. ;-)
 
Somehow I lured out two random economists with 3 posts each, from NOWHERE.
Ahem. Computer Scientist. :-p Economy? Say what? Lol.

Genetic algorithm is definitely the way to go on this, IMO, and it's what I plan to do. The problem is, genetic algorithms also have a tendency to fall into local maxima if you don't very carefully control how mutation and generational crossover work.
Exactly. As stated before, you don't want GAs to run as a hill-climber. I know there are hill-climbing algorithms combined with GAs. (usually, when a global optimum is "found", a specialized hill climber is used to reach the top of the local maxima). But I don't know that much.

My plan is to have two different GAs running, one for 1 vs. 1, and one for 6 vs. 6, and have them feed each other, as well as for creating "counter teams" (which should also be "good"). We'll see how it goes.

This is not exactly right, because of the problem of local maxima, again. Genetic algorithms work well because of the ability to bump out of these local maxima.

Unfortunately, because of the randomness aspect, the min/max tree is too large, even with various pruning methods.

Consider Salamence DM vs. Blissey IB. Neither are (without CH) OHKOs. IIRC, the damage formula has a 237+Rand( 0-18 )8)/255 at the end of it. Also, IB has 2 other possibilities (freeze or not). Also, either one could CH. So you have 19 (DM) * 19 (IB) * 2 (freeze) * 2 (DM CH) * 2 (IB CH) * 2 (does DM hit?) = 5876 possibilities for just one move. And if you want to create an accurate min-max tree, you have to account for all of the possibilities, because even small changes can impact how later parts of the tree look. And there are many other possibilities (quick claw? what if both moves have chance factors like DM did?)

Then, FORGET about all of that, and let's assume the game is completely perfect information, and we know in advance how much moves will hit for, etc. Each player has at least 4 choices every move (not thinking about running out of PP). This could be more like 9 (switching to one of 5 other pokes) or 14 (+ baton pass). But even if it's only 4, and an average match goes 20 turns, the tree has something like 1.2 * 10^24 nodes, which by itself is way too many.

Nope... you're stuck with either heuristic evaluation at nodes, or some other method of heuristically evaluating moves.

FFS, I need to stop typing. ;-)
A heuristical search will greatly improve this however. Choosing the most damaging move at a given time first for example, should greatly reduce the number of nodes in the end-game. And while this is certainly a large search tree, there are much larger trees where computers do better than humans do. Chess has a search tree that is some 80 magnitudes larger than the Pokemon search tree for example, and Chess machines do much better than humans.

Reversi is on the order of 10^28 (4 magnitudes higher than your estimation). And no human can ever beat an expert computer at Reversi. So right there, we know that while this search tree is massive, it is certainly possible to create a decent AI.

Furthermore, Genetic Algorithms don't need a perfect AI during evaluation stages. With imperfect AIs, there is a good deal of "noise" that knocks the GAs out of local maximums.

Did you read Blondie24: Playing at the Edge of AI? They used genetic algorithms to search for a checkers ai. Not for a team, but for the weights to a Neural Network checker's AI evaluator.

Frankly, I think that a search tree on 6 pokemon is much easier than that. Yes, we are still talking about maybe 10^50+ pokemon teams avaliable (pulled that number from my ass btw). But searching for all the weights of a neural network is certainly a much higher search space, with a huge order of dimensions, with even more local maximums.

EDIT: And I do believe there are significantly more heuristics to Pokemon than in Chess, Checkers and other more complicated games that Comp Scientists have already made ultra-AIs for. Swords dance is equal to attacking if you are going to attack once. Dragon Dance is equal to attacking if you are going to attack 2 times afterward. (course, +speed helps you get that 2rd attack in)

I'm almost certain that a Pokemon AI with arbitrary teams is feasible, just as Chess AIs with arbitrary board positions are certainly possible while they beat all but the most expert of experts of humans.
 
There is no reason not to lever examples of human expertise to build an AI. For instance, when competitor is out and after it's been used for a couple months, there should be heaps of examples of teams along with their effectiveness in battle, the exact moves used along with their context, etc. From that information, it should be possible to train a fairly good predictor (using standard neural nets, multiclass svms, etc). Given a context (your pokemon, the opposing pokemon, the rest of your roster, what you know of the opponent's roster, those pokemon's health, exact or inferred ranges, the opponent's identity, etc.), it would yield a probability distribution over all possible actions.

When you have that distribution, there's a simple yet efficient heuristic to choose your next action: from every possible action you can do, sample an action from your opponent (using the distribution given by your predictor) and calculate the outcome. From that outcome and every possible action you can take, sample the opponent's, and so on until a given depth. For each possible action that will get you an estimate of the reward, but that reward is probabilistic. Therefore, you resample over and over again. The more you sample, the more precise the average reward and variance for that reward will be. Hopefully we don't need to sample a million times... anyway, once you obtain a good estimate of the reward for each action you can take, you can normalize them to obtain probabilities and sample your final action from the resulting distribution (or you can take the max, but it's good to explore the battle space). To evaluate the reward, you can train yet another predictor that tries to predict how probable it is that you will win if the battle is in a certain state, again using a large enough database of battles. Or a handmade heuristic.

The probiem is thus to build a good predictor (and adjust it well during battle). With a sufficient amount of data, I think it would be feasible (could competitor eventually get us a million battles to train with?). Of course, the result would be an AI specifically tailored to beat human experts but that would already be quite a feat.

I'm not sure how genetic algorithms would fare. I already used them to train a Tetris AI with moderate success (of course, I had to invest a lot of thought on what representation to use, which was weights for a cost function). But anyway, I suspect it would be quite long to train. It would help a lot to initialize it with good teams and see if it can improve on them, though that wouldn't leave much room for innovation (not for a while).
 
There is no reason not to lever examples of human expertise to build an AI. For instance, when competitor is out and after it's been used for a couple months, there should be heaps of examples of teams along with their effectiveness in battle, the exact moves used along with their context, etc. From that information, it should be possible to train a fairly good predictor (using standard neural nets, multiclass svms, etc). Given a context (your pokemon, the opposing pokemon, the rest of your roster, what you know of the opponent's roster, those pokemon's health, exact or inferred ranges, the opponent's identity, etc.), it would yield a probability distribution over all possible actions.
Well... competitor is not out yet and there is no database of human expertise that I know of for D/P :-p Although, I'm curious as to how you'd represent this in Neural Networks. I can see how to do pokemon type... but then making an input neuron for every attack that the opponent has or could have doesn't seem like it would work.

The probiem is thus to build a good predictor (and adjust it well during battle). With a sufficient amount of data, I think it would be feasible (could competitor eventually get us a million battles to train with?). Of course, the result would be an AI specifically tailored to beat human experts but that would already be quite a feat.
I would say that finding the optimum probability distribution and then making a random choice based on that (basically, learn and generate a strategy on the fly, perhaps based on some initial heuristics) would be good enough.

I'm not sure how genetic algorithms would fare. I already used them to train a Tetris AI with moderate success (of course, I had to invest a lot of thought on what representation to use, which was weights for a cost function). But anyway, I suspect it would be quite long to train. It would help a lot to initialize it with good teams and see if it can improve on them, though that wouldn't leave much room for innovation (not for a while).
I'm not sure if GAs would work for creating an AI, but it certainly works for creating a team, assuming we have a good game-based AI already. For the AI, standard prediction methods (Q-Learning, statistics, etc. etc.) would probably learn the human's pattern much faster than GAs or Neural Nets. Q-Learning especially applies to this thread, although I admit to not knowing too much about it myself.

Hell, maybe just simple min/max would work out well. Pokemon is zero-sum afterall. I do believe min/max has been applied to backgammon... I'm not sure how it works out when actions have probabilities assigned to them but someone did manage to do it.
 
Well... competitor is not out yet and there is no database of human expertise that I know of for D/P :-p
Something that can simulate battles, such as competitor, is needed anyway.

Although, I'm curious as to how you'd represent this in Neural Networks. I can see how to do pokemon type... but then making an input neuron for every attack that the opponent has or could have doesn't seem like it would work.
As input, you would have the relevant information your opponent has about you: pokemon species, hp left, status condition, four move slots, six evs, one nature and potentially the same information for each of your five other pokemon. Then you have information like the presence or absence of spikes, leech seed, mean look, etc. In total, we're looking at about 200 or 300 raw inputs. This said, the species of pokemon should not be encoded as an integer or real but as a one-hot vector, so that one input alone would use 500 input neurons - same for moves. That's still pretty large, but there exist several techniques that can reduce the dimension of the input (PCA's worth a shot). You can also train an auto-encoder and use the weights to initialize the first layer of a multilayer network (an auto-encoder tries to reproduce its own input through a hidden layer that is usually smaller than the input, effectively extracting a high-level representation of the useful information).

As output, you would have each possible action: switch to any of the 493 pokemon or use any of the 400 or 450ish moves. So the output layer would use a softmax over about 1000 neurons.

I don't know how many hidden layers to use and how large they would be.

I'm not saying it would necessarily work, but it's worth a try.

I would say that finding the optimum probability distribution and then making a random choice based on that (basically, learn and generate a strategy on the fly, perhaps based on some initial heuristics) would be good enough.
It's easier to find the optimum distribution if you have a good one to start with.

I'm not sure if GAs would work for creating an AI, but it certainly works for creating a team, assuming we have a good game-based AI already.
It's worth a try. Still long.

For the AI, standard prediction methods (Q-Learning, statistics, etc. etc.) would probably learn the human's pattern much faster than GAs or Neural Nets. Q-Learning especially applies to this thread, although I admit to not knowing too much about it myself.
I don't know that much about Q-Learning, but first if you have to play against humans that becomes a bottleneck in itself and second at first sight it seems like Q-Learning could suffer from the size of the state space and the action space. I don't think it can generalize very well to situations it has never seen, hence the need to reduce the state space, but doing so helps pretty much all algorithms too...

Hell, maybe just simple min/max would work out well. Pokemon is zero-sum afterall. I do believe min/max has been applied to backgammon... I'm not sure how it works out when actions have probabilities assigned to them but someone did manage to do it.
It's worth a try. Well, many things are worth a try. It'd be good if someone tried.
 
Sorry to butt in, but EnoOn, do you think my "strategy" of facing a dilemma has merit? If I'm not sure weather to say, attack or switch out, I'll just randomly (flip a coin) decide, thus removing any advantage my opponent has of assuming I'm a rational thinker. Any analysis he may be using of my sub-optimal playing is negated by chance.
 
Just want to say, very nice read. I know a little bit about Game Theory, but it was nice seeing someone apply it to Pokemon like this.
 

Aeolus

Bag
is a Top Tutor Alumnusis a Tournament Director Alumnusis a Site Staff Alumnusis a Battle Simulator Admin Alumnusis a Live Chat Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
Since Aeolus is merely a “good” battler
pish posh... this thread is clearly bullshit.

Just kidding, nice read even if I might differ on a few of the more minor points.

edit: now that I get a chance to read the whole thread... I see that you didn´t really mean to equate it to prisoner´s dilemma. No complaints anymore!

Really enjoyed the imput of the others who are familiar with this stuff. Nice job guys.
 
Reversi is on the order of 10^28 (4 magnitudes higher than your estimation). And no human can ever beat an expert computer at Reversi. So right there, we know that while this search tree is massive, it is certainly possible to create a decent AI.
There's a really big difference. First, Reversi (and checkers/chess/etc) are games of perfect information. Pokemon is not. As such, you have to deal with far more than the 4/9/14 moves that are possible each term. This doesn't even begin to get into chance nodes, etc., which is another difficulty.

Having said this, I think that a heuristic approach is possible. The biggest problem is dealing with situations that you have to be able to look really far ahead to be able to deal with. It's not all about who does the most damage. It has to be able to see far ahead to really see the effect of, for example, status effects, or boost moves.

All I was really saying was "Getting to the bottom of the tree and doing pure minimax is not a realistic expectation"

I'm also really skeptical about just looking at damaging moves, etc.

Furthermore, Genetic Algorithms don't need a perfect AI during evaluation stages. With imperfect AIs, there is a good deal of "noise" that knocks the GAs out of local maximums.
Yes. But of course, the better your fitness function is, the better the GA performs.

Did you read Blondie24: Playing at the Edge of AI? They used genetic algorithms to search for a checkers ai. Not for a team, but for the weights to a Neural Network checker's AI evaluator.
No, I hadn't. I do get the idea of using a neural network of choosing moves. I'm hesitant... I'm not saying this is also not possible, because I think it is. It just won't be my first approach. =)

I did read One Jump Ahead, which is by the author of Chinook (who is basically the world champion checkers program) They use variations of alpha-beta pruning, and an endgame database (endgame databases don't really come over well to pokemon. ^_^;)

Well, in any case, I'm not saying it's not possible. I believe it is in some way, or I wouldn't be working on it. I just don't think it's going to be as easy as "plug in basic min-max".

A couple other just quick notes:

I don't know anything about Q-learning either. I'll be reading up on it.

YJiang: Using randomness is great, I think, because it means you're not exploitable. I don't think all situations are 50-50 situations, though, so don't limit yourself there. Take the example I had with Blissey/Salamence. Flipping the coin with Blissey is clearly the wrong answer, because it's so far from optimal.

Overall, I think it would be great if 3 good bots were ready when competitor came out. ^_^ Let's do it.
 
Having said this, I think that a heuristic approach is possible. The biggest problem is dealing with situations that you have to be able to look really far ahead to be able to deal with. It's not all about who does the most damage. It has to be able to see far ahead to really see the effect of, for example, status effects, or boost moves.
That's why I think it'd be cool to learn or calculate a reward or penalty for stat boosts, status effects, etc. from examples of real battles. For example, you can calculate a pokemon's effectiveness by the amount of damage it dealt and/or the amount of pokemon it killed and/or how many turns it lasted. Comparing a paralyzed pokemon's effectiveness to a healthy one's should give you a measure of how bad it is for that pokemon to be paralyzed. That way, you get a zero-depth estimate of how good or bad your situation is and you can avoid to make the search tree too deep.

All I was really saying was "Getting to the bottom of the tree and doing pure minimax is not a realistic expectation"

I'm also really skeptical about just looking at damaging moves, etc.
I agree on both counts.

No, I hadn't. I do get the idea of using a neural network of choosing moves. I'm hesitant... I'm not saying this is also not possible, because I think it is. It just won't be my first approach. =)
I'm in a machine learning lab so it would be my first approach ;)

Overall, I think it would be great if 3 good bots were ready when competitor came out. ^_^ Let's do it.
I don't see how these could be done without competitor to test them.
 
Man that reminds me of Bugsy's Parasect team lol
or some of your UU teams :P

But actually, when it comes down to crunch time, i actually tend to take my time and write out models like the one posted. Of course, i was never the best RSE battler, simply because i never really got around to knowing Damage Calcs and best movesets, speed, HP, and such off of the top of my head :S

But it was allways fun to see BBM go "wtf VD predicted"
 
There's a really big difference. First, Reversi (and checkers/chess/etc) are games of perfect information. Pokemon is not. As such, you have to deal with far more than the 4/9/14 moves that are possible each term. This doesn't even begin to get into chance nodes, etc., which is another difficulty.
Yes. I looked it up actually, and min-max has NOT been successfully applied to Backgammon for the reasons you stated (contrary to my previous assertion), even though it has a smaller search space. Pokemon indeed is a very complex game.

Having said this, I think that a heuristic approach is possible. The biggest problem is dealing with situations that you have to be able to look really far ahead to be able to deal with. It's not all about who does the most damage. It has to be able to see far ahead to really see the effect of, for example, status effects, or boost moves.
Yes, true. But chess takes massive heuristics that have nothing to do with the "real" game. Many successful chess AIs simply count the number of pieces on the board for example. (another one is counting how many squares of control you control).

The heuristic of "doing the most damage" is going to be the most important one to a Pokemon AI. It wipes out choices like Tackle vs Body Slam for example. Furthermore, a Pokemon AI can calculate Ice Beam vs Blizzard on the fly and appropriately choose which is better. I'm not talking about damage "averages" calculations we do on the boards, I'm talking like the probability of winning a battle. 70% chance of hitting once implies a 91% chance of hitting within 2 turns. If Blizzard OHKOs, but Ice Beam takes 2 hits, 70% of the time, Blizzard would be better, 21% of the time they would be equal, and 9% of the time Ice Beam would be better. The choice is obvious in this limited setting. And a computer can actually run these calculations on the fly. (even with more complicated probabilities. Like: if Blizzard has 50% chance of OHKO for example)

Yes. But of course, the better your fitness function is, the better the GA performs.
I'm not 100% sure on that claim. Certainly, the better the GA segregates good from bad. Anyway, anything GAs can do a simple random hill-climber can do probably better. So GAs are really just used to get out of local maximums and search for the global maximum. When global maximums are found, a hill-climber probably can take over to optimize.

Most likely, it would appear that the most ambitious thing to do is to somehow make a GA to learn a team generator, instead of just a normal team. Even if it were as simple as assigning probabilities to specific teams (choose "Standard team 1" 20% of the time, choose "Standard team 2" 30% of the time, etc. etc.) that would be a very ambitious project.

I did read One Jump Ahead, which is by the author of Chinook (who is basically the world champion checkers program) They use variations of alpha-beta pruning, and an endgame database (endgame databases don't really come over well to pokemon. ^_^
Really amazing. If you get down to 5 or so checkers... the Checkers program plays perfectly. I do think however, that this applies quite well to Pokemon.

In the endgame pokemon, you know the opponent's move sets as well as their last pokemon, at least, some of the time. If you know a certain endgame condition is not going to go favorably, the AI may opt to try and explore the unknown instead of facing certain doom.

Well, in any case, I'm not saying it's not possible. I believe it is in some way, or I wouldn't be working on it. I just don't think it's going to be as easy as "plug in basic min-max".
Well, it is always a good idea to think about the most simple solution first, before throwing things like Genetic Algorithms evolving a set of weights for a Neural Network onto the table :-p

EDIT: Furthermore, a min-max expansion would be necessary in a good bot anyway, to look ahead and see "into the future". Things like truant are great not because of the damage they do, but because they reduce the amount of damage that the opponent can deal to you. Stuff like that would be picked up in the min-max tree, even if it only expanded ahead to 3 or 4 moves.

Overall, I think it would be great if 3 good bots were ready when competitor came out. ^_^ Let's do it.
Hell, just one bot would be godly. Forget 3 bots, a single good expert-level bot would be great. Just as long as we aren't too successful, we should be fine. :-) Perfect bots made chess much less appealing to the public so... yarr...
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Staff Alumnusis a Programmer Alumnusis a Live Chat Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
What about making an AI bot modify its team as well? That would be much more complicated, don't you think?

I was considering making an AI bot for Pokemon, as I said in my previous thread. However, after pondering about it, my way of doing it was going to be really different from Nash's methods: I was going to try to make the computer learn from its mistakes. The computer would start as a complete idiot, and slowly learn from its mistakes until it improves its game.
 
I don't see how these could be done without competitor to test them.
Well, I think that, realistically, you're going to have to have some function that knows the rules of pokemon anyway. I suppose it's possible to do it completely through learning, and observing others games, but I know for the method I'm using I'm going to have to write most of a pokemon rules simulator anyway. So I don't really need competitor to start... competitor will be great for finding where I have problems when the AI gets better than me (which isn't that hard, I'm only average at best)

I'm not 100% sure on that claim. Certainly, the better the GA segregates good from bad. Anyway, anything GAs can do a simple random hill-climber can do probably better. So GAs are really just used to get out of local maximums and search for the global maximum. When global maximums are found, a hill-climber probably can take over to optimize.
The problem is, in a state-space like Pokemon, there is not a way to ensure that what you're climbing into is a global maximum, without also searching the entire state-space.

Hill climbing, by itself, will always head straight for whatever maximum is nearest its starting point, whether it is global or local. People get around this with restarts, but hill climbing by itself does not (and cannot) differentiate between different types of maxima.

Additionally, talking about the "goodness" of an evaluation function, both hill climbing and GA depend on the fitness function being accurate. If the fitness function is not accurate, a lot of stuff that is actually not good will be considered good. Consider just some 1D hill climbing application, trying to find a maximum of some function f(x). If the hill climbing application can't accurately determine what f'(x) is, it will climb inaccurately, and may not even eventually settle at any maximum.

Hell, just one bot would be godly. Forget 3 bots, a single good expert-level bot would be great. Just as long as we aren't too successful, we should be fine. :-) Perfect bots made chess much less appealing to the public so... yarr...
I'm not too worried about this yet. I know the problem you're talking about, though...

In chess, there was an idea of people playing together with computers, to create matches and games that were better than the humans or the computers could do alone. I, for one, would love to see, eventually, an AI bot that would help create teams, or be able to show why certain moves are not as good as others... and both of these are a much harder problem than getting the computer to do well in the first place.

But, baby steps first. ^_^ I think it's going to be hard to make an expert level bot in the first place, much less one that blows everyone out of the water.
 
What about making an AI bot modify its team as well? That would be much more complicated, don't you think?

I was considering making an AI bot for Pokemon, as I said in my previous thread. However, after pondering about it, my way of doing it was going to be really different from Nash's methods: I was going to try to make the computer learn from its mistakes. The computer would start as a complete idiot, and slowly learn from its mistakes until it improves its game.
This is more like Q-learning/reinforcement learning (and I think you can also do things like this with normal neural nets, maybe Brain will have more to say on this?)

As far as modifying its team, I think there are a number of interesting ways to do it. The way I'm working on, every time the genetic algorithm finishes a generation, the best team for that generation will become its new team, so it changes that way. You could also consider the team that just beat you, or consider counters to the thing that just beat you. There are a number of possible ways (and they all can be somewhat complicated. ^_^)
 
What about making an AI bot modify its team as well? That would be much more complicated, don't you think?

I was considering making an AI bot for Pokemon, as I said in my previous thread. However, after pondering about it, my way of doing it was going to be really different from Nash's methods: I was going to try to make the computer learn from its mistakes. The computer would start as a complete idiot, and slowly learn from its mistakes until it improves its game.
I think that might be very complicated because then it would be constantly changing, unless it's made to do that only when it loses by a greater margin; a smaller margin could probably make it to just store that information and call it up so it doesn't repeat that same small mistake...

Also, I now think that the bot would choose a move by calling up all of the possible moves and choosing the most statistically safe option and if it does not work, then it would eliminate that option.
 
Well, I think that, realistically, you're going to have to have some function that knows the rules of pokemon anyway. I suppose it's possible to do it completely through learning, and observing others games, but I know for the method I'm using I'm going to have to write most of a pokemon rules simulator anyway. So I don't really need competitor to start... competitor will be great for finding where I have problems when the AI gets better than me (which isn't that hard, I'm only average at best)
Ideally, competitor should have an API suitable to build AIs. Like taking a snapshot of the current game, simulating a few turns and reverting back. If there's no such feature yet I guess I'll have to pressure.

The problem is, in a state-space like Pokemon, there is not a way to ensure that what you're climbing into is a global maximum, without also searching the entire state-space.

Hill climbing, by itself, will always head straight for whatever maximum is nearest its starting point, whether it is global or local. People get around this with restarts, but hill climbing by itself does not (and cannot) differentiate between different types of maxima.
Take a look at simulated annealing. It's guaranteed to fall into a global maximum at the limit of infinite time. It can be slow, though.

Additionally, talking about the "goodness" of an evaluation function, both hill climbing and GA depend on the fitness function being accurate. If the fitness function is not accurate, a lot of stuff that is actually not good will be considered good. Consider just some 1D hill climbing application, trying to find a maximum of some function f(x). If the hill climbing application can't accurately determine what f'(x) is, it will climb inaccurately, and may not even eventually settle at any maximum.
There can be advantages to use "inaccurate" fitness functions. To take a dumb but obvious example, if you use a convex function which has only one maximum and that it's near the real function's global maximum, you'll find it a lot quicker. Some algorithms, such as EM (Expectation Maximization) use a cost function that's not exactly the same as the real function because it's easier to optimize (though in that case I think there's a guarantee about where the maximum will be so it converges to the right thing). But basically it can be a good idea to have a simpler, less accurate fitness function if it is smoother than the real one. In any case, if your function has a shitton of local maxima, it'll be much faster to get near the interesting regions if you use a more stable function at first.
 
As far as modifying its team, I think there are a number of interesting ways to do it. The way I'm working on, every time the genetic algorithm finishes a generation, the best team for that generation will become its new team, so it changes that way. You could also consider the team that just beat you, or consider counters to the thing that just beat you. There are a number of possible ways (and they all can be somewhat complicated. ^_^)
Ah, the elitist GA strategy, right? IIRC, this GA strategy has problems with local maximums/minimums. Which is why people prefer to add noise into the equation, even if it means losing the current best team in the short term. Which goes into why I'm saying inaccuracy in the evaluation formula can actually work to the GA's advantage.

There can be advantages to use "inaccurate" fitness functions. To take a dumb but obvious example, if you use a convex function which has only one maximum and that it's near the real function's global maximum, you'll find it a lot quicker. Some algorithms, such as EM (Expectation Maximization) use a cost function that's not exactly the same as the real function because it's easier to optimize (though in that case I think there's a guarantee about where the maximum will be so it converges to the right thing). But basically it can be a good idea to have a simpler, less accurate fitness function if it is smoother than the real one. In any case, if your function has a shitton of local maxima, it'll be much faster to get near the interesting regions if you use a more stable function at first.
If only we had that formula for pokemon teams :-p I certainly agree with your point, I'd just like to expand this assertion even further.

Lets assume we had a perfect team evaluation function, that can generate the exact ranking of all teams. (And if you are smart enough to think that there is no "best" team because of the inherent nature of rock-paper-scissors in this game, then pretend that I'm saying "best team generation strategy" instead. "Best team" is just much easier to say and conceptualize). It goes without saying that there are local maximums in pokemon team building strategies. Theme based teams ARE by definition local maximums. Whether they are the global maximum or not is another difficult question :-p

But either way, it is obvious that:
1. The "perfect" evaluation function of Pokemon teams have a "shit-ton" of local maximums
2. As we still don't know what the best team is, the inaccurate "convex" that Brain theorized about, while a good example for his argument, is too difficult to find in Pokemon anyway.
3. This "perfect" pokemon team evaluator probably exists mathematically, but is beyond the skill of today's programmers and computers. But it exists.

So, lets first assume we have this perfect team evaluator. If we run a hill-climber on it, it will certainly reach one of these local maximums, be it a baton-pass team, or a rain dance team, or even the "standard" local maximum. (T-Tar, Skarm/Bliss, whatever)

To "knock it out" of this local maximum, you have to be non-elitist. You must be willing to sacrifice the current known maximum and start the search over again elsewhere. Because of this, we have probabilistic methods of finding children in GAs. But lets say we are just using a simple hill-climber :-p

Anyway, so we run the hill-climber, reach a local maximum, and how do we try to get out of this local maximum? The easiest way that I've found is to simply introduce "noise" into the evaluator. That is, just add a random number to the evaluation function. Make it imperfect. So lets say this perfect evaluator is a function that converts "pokemon teams" to "integers". All you would do to add noise is simply add a random integer to the evaluation function.

So if PE == perfect evaluator. And PE(team) == the team's "global ranking", to solve this problem of getting out of a local maximum, all you have to do is do PE(team) + random_int. The larger the random integer, the easier you get out of local maximums... but then again, it takes longer to find these maximums as well :-p

Furthermore, when a hill climber runs on a PE + noise, it has a much higher chance of staying at global maximums than at local maximums. Entropy works with the hill climber, not against it.

Now lets take a step back to reality. Where does this put us? It simply states that if we are going to use a hill-climbing procedure (which MOST of these search methods really are, a GA is really just a massively parallel hill-climber), then a perfect evaluation function is not what we want. We want something close to the perfect evaluation function with a good bit of noise so that we can get knocked out of local maximums and find the true global maximum.

------------------

Okay, so now that we know that the evaluation function does not have to be perfect (it simply must be "close enough"), here is my plan if I were to create a GA team creator.

1. First, I program a very basic AI. A little more advanced than just randomly picking any move, but one that can see the benefits of rain-dance, spikes, and so forth. It doesn't have to look ahead further than 5 or so moves, and certainly doesn't have to beat a human.

2. Now, I create a set of random (legal) teams. To evaluate the teams, I sort them using insertion sort or something. The "compare" function is going to be an actual simulation of the game. AI vs AI using the two different teams. The winner is judged as "better". And of course, after insertion sort is run (or maybe shell-sort or something). Or maybe even just a tourney is run... the general thing happens. Cross teams, mutate teams, etc. etc.

3. Iterate for hundreds of thousands of loops.

4. Write a better AI, and write a simple hill climber based on random mutations. No crossover or anything, just change 1 to 4 aspects of the team, compare old vs new. Keep the better one. Iterate for thousands of generations till the team cannot be improved. The goal at this point is to find the local maximum. Hopefully, the GA found a solution close to the global maximum, and a specialized hill-climber will find the local maximum faster.

5. Profit!

KOs mean more than the percentage of damage dealt. But anyway, the problem is how far ahead it can see. Sometimes payoffs are distant, such as when you use Heal Bell/Aromatherapy or spiking moves. If status effects, spikes, etc are associated with a penalty, part of that foresight is immediately taken into account and the bot's play should improve.
Certainly. However, those issues can be simplified down to damage dealt later on in the min/max expansion. I'm probably beginning to stretch the heuristic here too far, but please correct me if you see things otherwise.

Heal Bell/Aromatherapy can be seen as preventing damage from the opponent. This would be caught in the "min" part of the min/max tree. Minimizing the % damage the opponent would do to you in a given turn in the future.

Spikes, inflicting status effects, and so forth can be seen as Maximizing the total damage dealt by us, or minimizing the total damage dealt by the opponent. As a stupid example, if we look ahead 5 turns, and we predict that the enemy will switch 4 times with our given plan, EVEN if spikes are on the field, then that can be seen as increasing our % damage 5 turns later. If the opponent has a rapid spinner, he'd want to MINIMIZE the % damage dealt to him by switching his rapid spinner first, and then switching 3 times. Or at least, that would be his plan.

Again, this is a heuristic. Just as chess computers heavily use a bad heuristic to play chess well, I'm sure we can use a bad heuristic to make Pokemon AIs play pokemon well. As stated before, many chess and checkers AIs simply count the number of pieces left on the board, and check for the possibility of forced checkmate. Similarly, all a Pokemon AI would have to do would count the total % damage dealt to the opponent, and check for the possibility of a perfect sweep.
 
2. Now, I create a set of random (legal) teams. To evaluate the teams, I sort them using insertion sort or something. The "compare" function is going to be an actual simulation of the game. AI vs AI using the two different teams. The winner is judged as "better". And of course, after insertion sort is run (or maybe shell-sort or something). Or maybe even just a tourney is run... the general thing happens. Cross teams, mutate teams, etc. etc.
Alright, I don't have a huge understanding of programming here, but certainly you aren't just planning on running a single battle between two teams, right? Even if there is zero variability in the AI's choices, random events of the actual battle (ie. accuracy, turns asleep/frozen, possible status from moves like flamethrower, etc) would change the way the battle progresses, and thus the AI will end up making different choices than it would in a perfect battle scenario.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top