# Applying Game Theory to Pokemon (tl;dr)

#### Brain

Ah, the elitist GA strategy, right? IIRC, this GA strategy has problems with local maximums/minimums. Which is why people prefer to add noise into the equation, even if it means losing the current best team in the short term. Which goes into why I'm saying inaccuracy in the evaluation formula can actually work to the GA's advantage.
I don't think an elitist strategy would fare well either. The way I see it, one would run the GA as follows: create N initial teams (I'd go in the order of 100 or 1000). Evaluate how well they perform. Kill off the worst M teams and replace them with random mutations and/or crossbreeds of the remaining best teams, which you also mutate. Maybe make a couple completely random teams or revive forgotten teams to inject variety. Redo. N and M are hyper-parameters of the system that must be adjusted for best results. Mutation and crossbreed must also be properly defined and they have to stay valid. For example, move evs from a stat to another, change the nature, replace one pokemon in the team with a new, random one. For crossbreeds, pick 6 pokemon out of the parents' 12. I'm worried about the time it would take before it yields interesting results, though.

To "knock it out" of this local maximum, you have to be non-elitist. You must be willing to sacrifice the current known maximum and start the search over again elsewhere. Because of this, we have probabilistic methods of finding children in GAs. But lets say we are just using a simple hill-climber :-p

Anyway, so we run the hill-climber, reach a local maximum, and how do we try to get out of this local maximum? The easiest way that I've found is to simply introduce "noise" into the evaluator. That is, just add a random number to the evaluation function. Make it imperfect. So lets say this perfect evaluator is a function that converts "pokemon teams" to "integers". All you would do to add noise is simply add a random integer to the evaluation function.
There are several techniques to avoid falling into local maxima. You listed one. GAs are another. There is also a process called simulated annealing. Simulated annealing would basically go this way: first you make a random team and you put the temperature T at 1 or higher. Then, you make a new team by changing the team you have a bit. If it's better, you switch. If not, I don't remember the exact equation and I don't want to look it up, but you calculate a quantity that looks like e^((new - original)/T). That gives you a number between 0 and 1 (because the original was better). You draw a random number and if it's smaller you do the switch. Then, you reduce T.

The smaller T is, the smaller the transition probability will be. What this means is that when T = 0, we don't move. Furthermore, the higher T is, the more probable we are going to go to a new state even if it is much worse than the previous one. Therefore, at first, we'll look everywhere, hopefully finding a good zone. As we do more iterations and T decreases, we'll be less and less willing to move. This is like shaking the surface violently at the beginning to go for the most promising zone and then shaking more and more gently to find the deepest hole and hopefully stick in it. It is guaranteed to converge at the limit of infinite time, but we don't have to go that far to get good results.

Now lets take a step back to reality. Where does this put us? It simply states that if we are going to use a hill-climbing procedure (which MOST of these search methods really are, a GA is really just a massively parallel hill-climber), then a perfect evaluation function is not what we want. We want something close to the perfect evaluation function with a good bit of noise so that we can get knocked out of local maximums and find the true global maximum.
The noise you are talking about is usually part of those algorithms, not of the evaluation function. And even if they are all hill climbers in a way, each algorithm climbs differently. If you use a GA with breeding, the teams you'll obtain at each generation from the previous ones won't resemble the teams you would obtain using other algorithms. That's important, because it means different algorithms hill-climb in different spaces: for a GA, the children node of two parents is a neighbor of the former, but not for simulated annealing. Another example: if your function is f = cos(x) - x^2 and your hill-climber only explores values on multiples of 2*pi, you're hill-climbing in a subspace where f is convex, so there are no local minima at all. That is a contrived example that doesn't happen in practice, but my point is that adding noise is important (well, it's good in practice), but the way you hill-climb is too.

2. Now, I create a set of random (legal) teams. To evaluate the teams, I sort them using insertion sort or something. The "compare" function is going to be an actual simulation of the game. AI vs AI using the two different teams. The winner is judged as "better". And of course, after insertion sort is run (or maybe shell-sort or something). Or maybe even just a tourney is run... the general thing happens. Cross teams, mutate teams, etc. etc.
Ideally, teams should all play the same amount of matches. That won't necessarily happen if you compare them as you sort. I think you should first make them play a fixed amount of matches against other teams and their value would be the amount of wins, and you sort using that.

4. Write a better AI, and write a simple hill climber based on random mutations. No crossover or anything, just change 1 to 4 aspects of the team, compare old vs new. Keep the better one. Iterate for thousands of generations till the team cannot be improved. The goal at this point is to find the local maximum. Hopefully, the GA found a solution close to the global maximum, and a specialized hill-climber will find the local maximum faster.
That's an interesting approach.

5. Profit!

Certainly. However, those issues can be simplified down to damage dealt later on in the min/max expansion. I'm probably beginning to stretch the heuristic here too far, but please correct me if you see things otherwise.

Heal Bell/Aromatherapy can be seen as preventing damage from the opponent. This would be caught in the "min" part of the min/max tree. Minimizing the % damage the opponent would do to you in a given turn in the future.

Spikes, inflicting status effects, and so forth can be seen as Maximizing the total damage dealt by us, or minimizing the total damage dealt by the opponent. As a stupid example, if we look ahead 5 turns, and we predict that the enemy will switch 4 times with our given plan, EVEN if spikes are on the field, then that can be seen as increasing our % damage 5 turns later. If the opponent has a rapid spinner, he'd want to MINIMIZE the % damage dealt to him by switching his rapid spinner first, and then switching 3 times. Or at least, that would be his plan.
In theory, you're right that the min/max expansion would see those things but only if you have enough folds. I doubt looking only 4 or 5 turns in advance would correctly capture that complexity. And even if it did, a more clever evaluation of the situation might allow us to get similar results with less folds and thus less computations.

Again, this is a heuristic. Just as chess computers heavily use a bad heuristic to play chess well, I'm sure we can use a bad heuristic to make Pokemon AIs play pokemon well. As stated before, many chess and checkers AIs simply count the number of pieces left on the board, and check for the possibility of forced checkmate. Similarly, all a Pokemon AI would have to do would count the total % damage dealt to the opponent, and check for the possibility of a perfect sweep.
You might be right about this. I'm still a bit skeptical, because pokemon offers more ways to set an opponent up. If you play checkers and you're not taking pieces, what are you doing? Counting pieces in checkers tells you more than counting damage in pokemon.

#### Dragontamer

Alright, I don't have a huge understanding of programming here, but certainly you aren't just planning on running a single battle between two teams, right? Even if there is zero variability in the AI's choices, random events of the actual battle (ie. accuracy, turns asleep/frozen, possible status from moves like flamethrower, etc) would change the way the battle progresses, and thus the AI will end up making different choices than it would in a perfect battle scenario.
Actually, this madness is what I'm talking about with the "inaccurate" evaluation function. I'm saying such a strategy is possible because, while there is a chance that the "better" team would lose in a single battle, on the average, the better teams would win more.

---------

Perhaps some concrete background on hill-climbing would help?

A hill-climbing algorithm pretty much makes a small change to something, compares the two and then picks the better one. For example, lets say I'm making a team of 1 pokemon, and I have:

Infernape
---
Close Combat
Nasty Plot <---- here on purpose
Flame Blitz
Aerial Ace

Anyway, a hill-climber makes a random change. Lets say that the random change is to change "Nasty Plot" to "Swords Dance". Then, it runs the evaluator. It compares Infernape A vs Infernape B. I'd assume the Swords dance variation would be better, so it would stick to the swords dance variant. The next iteration may randomly change Flame Blitz to Flame Wheel. It realizes that Flame Blitz is better, and then sticks with the old one.

Etc. etc. etc. Just random changes like that occur over and over again. A "local maximum" would be whatever is best. I'll assume Surgo's physical attacker from his analysis is the best:

Infernape
--------
Swords Dance
Flare Blitz
Close Combat
Thunderpunch / Stone Edge

This is a local Maximum because small changes to anything here will not result in a better pokemon. Changing Flare Blitz to any other fire attack, or even any attack in general won't make it better.

So we are "stuck" at this local maximum. This happens to be a very very good local maximum, but now we NEVER get a chance to compare this moveset with Surgo's Special Sweeper set:

Infernape
move 1: Nasty Plot
move 2: Flamethrower / Fire Blast
move 3: Grass Knot
move 4: Hidden Power Ice

So as you can see, all 4 moves are changed. There is no "path" to get from the physical moveset to this moveset.

The problem is even more complex than this. Local maximums may or may not be good at all. Local Maximums can exist anywhere, from pathetic movesets to excellent movesets.

The above Crobat discussion thread is a possible "local maximum" that I'm talking about. It is a bad set, however, changing any one aspect of this Crobat makes him even worse. EDIT: Well... now that I think of it, it is a bad example. A Better example would be Standard Tyranitar vs Tyraniboah, but that is again excellent vs excellent. But lets assume it is a local maximum for sake of this argument :-p

There may not be any way to "hill climb" to say, the sweeper Crobat. The hill-climber may find a very very good "annoyer" Crobat, but in the end, an "annoyer" just isn't as useful as Crobat can be.

Which is why you have to sometimes "jiggle" the pot a little. Re-randomize the movesets, restart the process. This option requires human interaction, yet preferably we want the AI to be intelligent enough to get out of local maximums like these pathetic movesets by itself.

Probably a bad example, someone might find a hill-climbing path from an annoying Crobat to something useful... but either way, I'm sure you can see that local maximums do exist and do pose a significant threat to AI team builders.

And having a non-perfect evaluation function (such as only battling the teams once or twice) can actually help in those regards. By stroke of luck, a local maximum may lose to another team which later becomes a better local maximum. Of course, we need to still make better teams have a better chance of winning, but making a perfect evaluation function for hill-climbers will make them get stuck in local maximums.

#### Dragontamer

You might be right about this. I'm still a bit skeptical, because pokemon offers more ways to set an opponent up. If you play checkers and you're not taking pieces, what are you doing? Counting pieces in checkers tells you more than counting damage in pokemon.
Well, if you're playing pokemon and you're not trying to set up more damage against the opponent, what are you doing? :-p

You're either finding a good counter against the current pokemon (to prevent him from doing damage, and so that you can do more damage), or statting up (so that you can do more damage because of higher speed, etc. etc.) or spiking up (so that damage is dealt automatically) or heal belling (so that you lose paralysis or burn so that you are faster/do more damage).

At the end of the day, the guy who deals 600% damage (more specifically 100% damage to 6 pokemon) wins.

EDIT: Anyway, as for a counter-statement to your claim, there are advanced strategies involved in Chess. Not too sure about Checkers, but in Chess you can:
* Set-up a pawn wall
* Castle
* Set up skews, pins, and forks.
* Develop pieces (that is, move them into a more offensive position. A rook on an open or half-open file or rank for instance)

And so on, so forth. In the end, a min/max tree expansion would see that later, they either minimize the damage the opponent would do to you, or maximize the damage you would do to the opponent. So a min/max tree would certainly "discover" these strategies.

4. Write a better AI, and write a simple hill climber based on random mutations. No crossover or anything, just change 1 to 4 aspects of the team, compare old vs new. Keep the better one. Iterate for thousands of generations till the team cannot be improved. The goal at this point is to find the local maximum. Hopefully, the GA found a solution close to the global maximum, and a specialized hill-climber will find the local maximum faster.
That's an interesting approach.
If there was a way to combine simulated annealing with GAs, it would be great. Actually, I have heard of simulated annealing and I think with GAs before. The temperature describes how elitist the GA should be. I don't know the results of that research (1 means very non-elitist, 0 means 100% chance to choose the best, or something like that), but I think it has happened before. If not, it is a good thing to look into :-p

Either that, or it was Simulated annealing with Neural Networks. Which seems to make more sense...

#### Brain

EDIT: Anyway, as for a counter-statement to your claim, there are advanced strategies involved in Chess. Not too sure about Checkers, but in Chess you can:
* Set-up a pawn wall
* Castle
* Set up skews, pins, and forks.
* Develop pieces (that is, move them into a more offensive position. A rook on an open or half-open file or rank for instance)

And so on, so forth. In the end, a min/max tree expansion would see that later, they either minimize the damage the opponent would do to you, or maximize the damage you would do to the opponent. So a min/max tree would certainly "discover" these strategies.
That's more in line with what I meant than what you said before the edit. My point was that it might require too much depth. Pokemon is a game where the state is less local than it is in a lot of other games. At a very basic level, traits like Slow Start require at least 5 turns of foresight to be exploited (if it can even be exploited). The algorithm will never see any point to perish trapping if it can't see 4 turns in advance and it will never be able to setup a bp chain if it can't see the payoff. Future sight is useless if the algorithm can't see that it'll hit later. Now, 5-6 folds isn't too bad. But then you have situations like: do I put pokemon X asleep or not? If you predict the opponent will switch it out once it's sleeping, will you have the foresight that it'll eventually come back in the game crippled? Probably not.

Anyway, it could still work pretty well, probably well enough to help a GA to make decent teams. I just don't think it'll have a very elaborate behavior. It should be relatively easy to abuse.

Or you could go very deep, but since the amount of possibilities is always extremely large, you'll have to use heuristics, prune the tree, etc. and the accuracy of your expansion will degenerate. But hey, doesn't cost anything to try.

If there was a way to combine simulated annealing with GAs, it would be great. Actually, I have heard of simulated annealing and I think with GAs before. The temperature describes how elitist the GA should be. I don't know the results of that research (1 means very non-elitist, 0 means 100% chance to choose the best, or something like that), but I think it has happened before. If not, it is a good thing to look into :-p

Either that, or it was Simulated annealing with Neural Networks. Which seems to make more sense...
SA will always go to a better solution and sometimes it will err to worse solutions. The temperature controls how much worse those solutions can be.

I don't see any theoretical problems in combining SA and GAs and quick googling yields me this: http://citeseer.ist.psu.edu/111048.html (PDF link at the top of the page). I didn't read it through.

Simulated annealing + Neural networks = Boltzmann machine:
http://en.wikipedia.org/wiki/Boltzmann_machine

They're not very practical to train, though. For that, one can use Restricted Boltzmann Machines (RBMs) that limit the amount of connectivity.

#### Dragontamer

That's more in line with what I meant than what you said before the edit. My point was that it might require too much depth. Pokemon is a game where the state is less local than it is in a lot of other games. At a very basic level, traits like Slow Start require at least 5 turns of foresight to be exploited (if it can even be exploited). The algorithm will never see any point to perish trapping if it can't see 4 turns in advance and it will never be able to setup a bp chain if it can't see the payoff. Future sight is useless if the algorithm can't see that it'll hit later. Now, 5-6 folds isn't too bad. But then you have situations like: do I put pokemon X asleep or not? If you predict the opponent will switch it out once it's sleeping, will you have the foresight that it'll eventually come back in the game crippled? Probably not.

Anyway, it could still work pretty well, probably well enough to help a GA to make decent teams. I just don't think it'll have a very elaborate behavior. It should be relatively easy to abuse.

Or you could go very deep, but since the amount of possibilities is always extremely large, you'll have to use heuristics, prune the tree, etc. and the accuracy of your expansion will degenerate. But hey, doesn't cost anything to try.
Oh yeah, I wasn't saying that Damage was the only way, I was just pointing out that it probably would be the most important one to take into consideration.

BTW: something was said earlier that I forgot about, but now suddenly feel is important. The fact is that there is a Nash Equilibrium for Rock Paper Scissors because a "mixed" strategy exists. And I assume that there is some kind of algorithm that can find this mixed strategy. (I'm not _that_ good with Game theory).

Because the Nash Equilibrium is the best strategy you can do assuming the opponent is doing his best strategy (or something like that), then why not solve for the Nash Equilibrium at every point?

I know the general concepts of Game Theory here, perhaps someone else who knows more about it can nail down the details (or tell me if it is possible first of all). With only 9 possible actions per player per turn (4 attacks, 5 pokemon to switch to), the matrix size should be only 81 max. Certainly small enough for a Computer to figure out some stuff.

If a heuristic is required, min/max could be used for that possibly. But solving for the best (mixed) strategy seems like the best thing to me. Q-learning would not apply here... the states change far too quickly for it to keep up. So it would be some kind of hard-coded solver...

#### EnoOn

Wow, it's really just amazing to have this level of discussion about pokemon AI here. ^_^

Ideally, competitor should have an API suitable to build AIs. Like taking a snapshot of the current game, simulating a few turns and reverting back. If there's no such feature yet I guess I'll have to pressure.
While I agree ideally, I don't really think that's their focus. As well it shouldn't be, they're trying to get Competitor out to the masses, not to the few of us who want to make AI for it. ;-)

Having said that, I'm not developing for it, so I don't know what's in there.

Lets assume we had a perfect team evaluation function, that can generate the exact ranking of all teams. (And if you are smart enough to think that there is no "best" team because of the inherent nature of rock-paper-scissors in this game, then pretend that I'm saying "best team generation strategy" instead. "Best team" is just much easier to say and conceptualize).
This certainly holds true for 1 vs. 1. I believe that this also holds true for 6 vs. 6 as well, but I'm not certain of it... it could be that there is enough space in a 6 vs 6 team to create what is effectively a Nash Equilibrium for the whole space. Or there might not be. I don't think there is, but I can't discard it right now.

2. As we still don't know what the best team is, the inaccurate "convex" that Brain theorized about, while a good example for his argument, is too difficult to find in Pokemon anyway.
All he was saying was that there exist reasons not to use a perfect fitness function. He was not saying this is the model for team searching

(snip) Elitist GA strategy
All that I was saying to this: "What about making an AI bot modify its team as well? That would be much more complicated, don't you think?" was that, for the GA example, the #1 fitness team is the "Best" team the GA has currently found (or, if you want to get really technical, you could say that one of the previous generation bests is as well, but that this is the best the GA currently has)

I'm envisioning a system, when Competitor comes out, where, in addition to the teams the GA is creating, the GA is also constantly updated with good teams by opponents (as yet another way to jolt out of local maximas). So yes, it would pick, as its current team, one of the best teams from the last generation. This doesn't mean that it doesn't keep running new generations!

The noise you are talking about is usually part of those algorithms, not of the evaluation function.
Thank you, Brain, that's a great way of saying it.

Ideally, teams should all play the same amount of matches. That won't necessarily happen if you compare them as you sort. I think you should first make them play a fixed amount of matches against other teams and their value would be the amount of wins, and you sort using that.
I think playing some sort of internal round-robin tournament has been the method of choice in the past. I'm considering doing something a little faster, but I need to crunch numbers on it, to see how accurate I think it is (things like swiss systems, etc.)

One of my big concerns is trying to get the GA to create reasonable teams in a ... not-long... amount of time. I can't imagine an internal tournament game taking less than a few seconds (with a reasonable AI method), but if I can restrict a game to 10 seconds, and I have 1024 teams in there playing a round robin... well, let's just say a generation every 120 days is not my idea of a fun time.

With something like a swiss system, it can be 10-12 rounds x 1024 games, which gets me a generation a day. If I can somehow make the evaluation function go faster and still have reasonable games, I can do better. All of this is stuff that I'll just have to play with. Do I need 1024 teams? Does 128 or so save enough variation of "genetic material"? Does having a faster evaluation significantly change how fit the teams are seen to be? Can I run a game in 2 seconds instead of 10? This is all research.

BTW: something was said earlier that I forgot about, but now suddenly feel is important. The fact is that there is a Nash Equilibrium for Rock Paper Scissors because a "mixed" strategy exists. And I assume that there is some kind of algorithm that can find this mixed strategy. (I'm not _that_ good with Game theory).

Because the Nash Equilibrium is the best strategy you can do assuming the opponent is doing his best strategy (or something like that), then why not solve for the Nash Equilibrium at every point?

I know the general concepts of Game Theory here, perhaps someone else who knows more about it can nail down the details (or tell me if it is possible first of all). With only 9 possible actions per player per turn (4 attacks, 5 pokemon to switch to), the matrix size should be only 81 max. Certainly small enough for a Computer to figure out some stuff.

If a heuristic is required, min/max could be used for that possibly. But solving for the best (mixed) strategy seems like the best thing to me. Q-learning would not apply here... the states change far too quickly for it to keep up. So it would be some kind of hard-coded solver...
Yes, this is true. There exists a way to find the NE for a system...if you can accurately describe the payoffs. It's basically a linear algebra problem.

The problem is, describing these payoffs is exactly the heuristic problem. In order to solve and find a mixed strategy for this 9x9 (or really 14x14 or 19x19) matrix, you have to know what the payoffs are for each square of the matrix.

Really, coming up with a heuristic for pokemon, at the end of the day, isn't what I think is "hard" about using the game tree method. What I think is really hard about it is something that I pointed out at the beginning and made the mistake of saying to ignore:

Consider Salamence DM vs. Blissey IB. Neither are (without CH) OHKOs. IIRC, the damage formula has a (incorrect damage formula with 19 choices ;-) ) at the end of it. Also, IB has 2 other possibilities (freeze or not). Also, either one could CH. So you have 19 (DM) * 19 (IB) * 2 (freeze) * 2 (DM CH) * 2 (IB CH) * 2 (does DM hit?) = 5876 possibilities for just one move. And if you want to create an accurate min-max tree, you have to account for all of the possibilities, because even small changes can impact how later parts of the tree look. And there are many other possibilities (quick claw? what if both moves have chance factors like DM did?)
The problem is that, for each move, you're facing thousands of chance nodes. And if you want to do "accurate" min-max, you have to consider them all, and the probabilities for them all. It may be possible to prune them out somewhat, but small amounts of damage matter from turn to turn, and small percentages become important when applied over many turns.

If, to make a contrived example, it's Blissey with Ice Beam/Seismic Toss vs. Milotic with Rest, and these are your last two pokemon, how does Blissey win? Can't Seismic Toss! Milotic can just rest it off, and you'll run out of PP. You have to try to freeze Milotic first... and writing the min-max AI to realise this, when you have 10% chance to freeze each turn... difficult, even BEFORE choice nodes.

And these edge cases are all over.

Heuristic evaluation may be the way to go. As you've noted, many games use it. But the evaluation function is going to be INTENSELY complicated, and you cannot shy away from that. It's not as simple as damage. Start considering things like power up moves, type advantages of remaining pokemon, how many PP are left, what items you've seen, weather types and whether it matters any more, changes when you're down to 1 pokemon, bonuses for speed, etc etc etc etc

There's a lot to it. And it may be the only answer (I don't agree right now, but...), if it is, it will be intensely hard.

Also, one problem with having a simple heuristic for something like a GA team finder is that it won't be able to properly evaluate teams that may be good, but that are too "forward thinking" for the evaluator. Brain brought up baton-passing teams, and those are a perfect example.

As a sidenote, I'm really glad that there are people who are interested in having this kind of discussion, talking like this really clarifies a number of things I'm working on in this. Thank you all.

#### Brain

I'm envisioning a system, when Competitor comes out, where, in addition to the teams the GA is creating, the GA is also constantly updated with good teams by opponents (as yet another way to jolt out of local maximas). So yes, it would pick, as its current team, one of the best teams from the last generation. This doesn't mean that it doesn't keep running new generations!
But it should keep several teams for the next generation, I think that was the origin of the confusion. Your idea to update with good opponent teams is interesting, I think it would give a boost to learning.

One of my big concerns is trying to get the GA to create reasonable teams in a ... not-long... amount of time. I can't imagine an internal tournament game taking less than a few seconds (with a reasonable AI method), but if I can restrict a game to 10 seconds, and I have 1024 teams in there playing a round robin... well, let's just say a generation every 120 days is not my idea of a fun time.
It's easy to parallelize. Of course, not everyone has access to large clusters. I could probably get away with using a dozen cpus at uni since they're usually inactive.

With something like a swiss system, it can be 10-12 rounds x 1024 games, which gets me a generation a day. If I can somehow make the evaluation function go faster and still have reasonable games, I can do better. All of this is stuff that I'll just have to play with. Do I need 1024 teams? Does 128 or so save enough variation of "genetic material"? Does having a faster evaluation significantly change how fit the teams are seen to be? Can I run a game in 2 seconds instead of 10? This is all research.
If a game is lopsided you don't really have to wait until it ends. A problem is that if you start with bad teams, the battles might take longer. It also won't take just a few generations to get interesting teams...

There exists a way to find the NE for a system...if you can accurately describe the payoffs. It's basically a linear algebra problem.

The problem is, describing these payoffs is exactly the heuristic problem. In order to solve and find a mixed strategy for this 9x9 (or really 14x14 or 19x19) matrix, you have to know what the payoffs are for each square of the matrix.
If you don't know your opponent's moves or the rest of its team, it's pretty much impossible to know the payoffs because you don't know what actions they represent. The correct grid would consider switching to each pokemon that exists as a possible action, so you jump from 9 to 497. Same for moves. In the end you're looking at a 9x~550 matrix, each species is still categorized very roughly, indifferently of their moveset and stats. Furthermore, options don't all have the same prior.

If, to make a contrived example, it's Blissey with Ice Beam/Seismic Toss vs. Milotic with Rest, and these are your last two pokemon, how does Blissey win? Can't Seismic Toss! Milotic can just rest it off, and you'll run out of PP. You have to try to freeze Milotic first... and writing the min-max AI to realise this, when you have 10% chance to freeze each turn... difficult, even BEFORE choice nodes.

And these edge cases are all over.
That's a good example.

Heuristic evaluation may be the way to go. As you've noted, many games use it. But the evaluation function is going to be INTENSELY complicated, and you cannot shy away from that. It's not as simple as damage. Start considering things like power up moves, type advantages of remaining pokemon, how many PP are left, what items you've seen, weather types and whether it matters any more, changes when you're down to 1 pokemon, bonuses for speed, etc etc etc etc
I know we don't have a bank of battle examples yet, but when we do, we can try to learn correlations between those things and the battle's outcome. There's still a lot of work to do before that can be done, but it'd help.

As a sidenote, I'm really glad that there are people who are interested in having this kind of discussion, talking like this really clarifies a number of things I'm working on in this. Thank you all.
I love discussing those things so you're welcome :)

#### Dragontamer

If, to make a contrived example, it's Blissey with Ice Beam/Seismic Toss vs. Milotic with Rest, and these are your last two pokemon, how does Blissey win? Can't Seismic Toss! Milotic can just rest it off, and you'll run out of PP. You have to try to freeze Milotic first... and writing the min-max AI to realise this, when you have 10% chance to freeze each turn... difficult, even BEFORE choice nodes
There are only 4 choices each turn because all pokemon have fainted. And that is actually an example that is solved using a directed depth-first search down a min/max tree using the damage heuristic. Looking forward only 3 turns, it can be shown that milotic walls Blissy's Seismic toss. That is, 0% or less damage after 3 turns. Now granted, unless we are using some more complicated method, the AI will have to "rediscover" this every time it goes down a search tree.

Basically, I wouldn't count min-max out on that example alone. I'd like to see a test first :-p

Anyway, if we are going to be starting with pure random teams first, then a very very basic and FAST evaluator is the priority. Not necessarily a very accurate one either. Just one that approximates a good battle. As long as the dataset isn't overtrained on this AI, then alls well.

A full blown AI capable of taking on humans would be much tougher, and is a totally different project. It would be useful to have a more accurate evaluator in the Team Creator GA but I don't think it is a major priority if a Team Creator GA.

Hell, the GAs can evolve simply based on 1v1 matchups between all pokemon. We don't necessarily even have to run a full match between the wo teams to evaluate them.

For example, 6 vs 6 pokemon, 1 at a time. Thats 36 combinations there. So in each combination, check who wins on the switch in, on a revenge switch in, and other similar conditions. This unfortunately cuts out baton passing teams, or rain dance teams however :-( But then, if we put random matchups of 2v2 or 3v3, it would be a much faster game. 3v3 is simply computationally less expensive than 6v6. There are so many less choices, and so many less possibilities. So I'd feel that running 10 games of 3v3 is going to be faster than one game of 6v6. And possibly, it would be more accurate than just a single 6v6 game.

Course, benchmarking and testing should come into play here. Yes, another research question. And of course the obvious one is: "Is it more effective".

No seriously, I should investigate, and this would be my PH. D dissertation or something. What kind of approximations can a GA take to improve its efficiency with respect to Game AIs? What kind of sub-games can be played... ohhh... damn. Thats actually applicable and Pokemon can serve as a case study...

Ahem. Anyway, yeah. Just pointing out that htere are other ways to expand upon approximations.