Technical Machine: A Pokemon AI

cantab · Mar 30, 2010

Wikey said:
Basically, one of the big flaws of any AI program is its predictability.

Addressing this is trivial - add a small amount of randomness. In situations where the AI deems two moves have only a slight difference in outcome, don't always choose the stronger.

You could add more randomness if your wanted to weaken the AI, but I don't think that's the best way. I reckon an AI on weaker settings ought to play like a weaker human. So do things like discard sets or even whole Pokemon from its database. You could bias it towards not considering recently-developed sets, to simulate a human player who's reasonably good but not 'current'. Or you could give it knowledge of the 'wrong' tier, to be like someone who's good in OU but a novice in Ubers. At a very weak level it might even make mistakes with the type chart. If the AI teambuilds, then at a weak level it builds teams with little synergy and overall strategy, teams just of six strong Pokemon thrown together.

Of course, obi needs to develop the AI first. Then I'd expect his main focus will be on strengthening it, but it's worth taking a look at weakening it. Perhaps one could do a variant on a Turing test - can a battler distinguish a human player from an AI 'set' at the same rating?

obi · Mar 30, 2010

I'm not interested in creating watered down versions of the AI.

To address the issue of replaying a computer to try and "cheat" it, the ideal strategy for the AI is not "in this situation, use this move", due to the simultaneous execution of moves. The actual perfect strategy is what's known as a mixed nash equilibrium. What this means is that it plays all moves with a certain probability. In other words, 10% of the time, use move 1, 60% use move 2, 30% switch, all other options use with 0% probability. Of course, it will be a long time before I add that level of sophistication to the program, but I mention it to show your problem is not insurmountable and does not require having the AI play at a weaker level.

I've solved the problem of a lot of battle effects. Rather than trying give a boost to a team that uses Rain Dance, for instance, I simply have multiple spots in which I score things. The Pokemon themselves get scores for various aspects like how much % HP they have remaining, but each Pokemon's score is the sum of the general Pokemon stuff along with the score of each of their moves. In other words, I have a move evaluation function. Fire moves get a higher score when the sun is shining, but a lower score when it's raining. Physical moves are worth less while Reflect is active. Similarly, each team's score is the sum of the scores of all the Pokemon, plus some general team stuff (such as the total number of Pokemon alive). The final score is my team's score - foe's team score.

Using this approach of multiple levels of evaluation makes field effects much simpler to evaluate. I'm thinking maybe Stealth Rock itself will have no intrinsic value (outside of PP concerns), but while it's down it lowers the score of all Pokemon on the side that it's down on.

The reason that I can't simply find the expected damage over time of Stealth Rock is that it's possible to do 600% damage over time with Stealth Rock and leave the opposing team at 90% health on 6 Pokemon. The reason is that they can use Leftovers and healing moves to counteract slow damage. A move that does 100% damage to one Pokemon in one turn is usually more valuable than one that does 300% damage total to all 6 Pokemon over many turns, based solely on that concern. Of course, Stealth Rock is a "fire and forget" kind of move, meaning I can be using moves that do 87.5% damage in one turn after using Stealth Rock, and suddenly it's like I'm doing 100% damage per turn.

FlareBlitz · Mar 30, 2010

If you're already adjusting scores for things like how much HP each Pokemon has remaining, why not just count hazards by automatically "discounting" the hazard amount from a Pokemon's HP and reflecting that in its score? i.e. If rocks are down and you have a Salamence, just count its score the same as it would be if Rocks weren't down but it's at 75%. Or maybe that's already what you're doing and I just didn't interpret your post correctly.

callforjudgement · Mar 31, 2010

obi said:
I am working on an AI to play Pokemon (I'll make a more in-depth, general thread on it later), and one of the things I need is an evaluation function. I am using an expectiminimax algorithm with alpha-beta pruning, which functions best when moves are already ordered best. This means I need a relatively simple function that can try and "guess" at how good a particular position is. Therefore, I need to assign values to various things so that my program can properly weight things.

Unfortunately, Pokemon has elements of luck in it. This is why I have to use an expectiminimax tree instead of a minimax tree (I'm simulating the game as being a three-player game: AI, foe, and God. God moves whenever there are elements of luck). This means that I can't just get the order of things correct, I have to get their magnitude correct.

There is a huge huge risk with using expectiminimax with a game that contains prediction, such as Pokémon: the AI would be open to get outpredicted really really badly. (I've had some thoughts along these lines myself.) A contrived illustrative example: suppose your Starmie, on relatively low HP, has a choice of using Thunderbolt or Surf (let's say, its other attacks are out of PP...) and your opponent has only a slower Vaporeon (currently out) and faster Jolteon left, both on relatively low HP. You can KO either of them if you predict which one will be out at the time, but each will need a different move. Let's further suppose that the Jolteon's only attacking move is Thunder (maybe it was on a baton pass team), the Vaporeon will try to use HP Grass if it stays in (as the best attacking move available), and the Vaporeon can't survive switching out and back in due to Stealth Rock. Now, expectiminimax will think "if I use Thunderbolt, he'll switch to Jolteon and use Thunder to kill me next turn; if I use Surf, he'll stay in, use HP Grass and kill me this turn. But Thunder might miss, and HP Grass won't; so I should use Thunderbolt this turn, as it gives me a better chance of winning." There's one obvious failure in the reasoning here (the reasoning assumes that the opponent will outpredict; expectiminimax does that). There's a subtler failure, in that a human opponent can go through the same reasoning and decide to switch.

As you mention later in the thread, finding a mixed equilibrium avoids this problem (and shouldn't actually be too difficult; you can do 81 separate, relatively shallow, analyses, one for each of the possible 9 moves that the opponent could use, and one for each of the possible 9 moves that you could use, then use the standard formulae to calculate the perfect mixed strategy). The usual caveats about mixed equilibria apply here, though; they make you impossible to predict, but also make the prediction level of the opponent irrelevant, in that any possible prediction by the opponent, so long as it's plausible (as in, counters something you have a chance of doing), leads to the same outcome on average; I wonder if trying to use psychology on prediction would work better than the purely random method?

coolking said:
*Brain explodes*
Interesting but complex. I have a couple thoughts. How will the AI make its team? Will we make it for it, then let it play? I would find it interesting to know the "best" team possible, according to the AI.

I've had some thoughts along these lines myself; I was using the switching stats in order to generate a team that had a good switchin to any OU that the opponent might come up with, and left the opponent with no switchins that devastated the team in general. (How good one Pokémon was as a switchin for another was estimated in terms of of how often people actually made the switch in question, with adjustments for the relative usages.) I produced an RMT calculator from the results, which was discussed in this C&C thread; but one obvious thing to do with such a calculator is to try all possible OU teams in it, and see which is considered the best; doing this over the course of days gave me Blissey, Gliscor, Vaporeon, (Zapdos|Rotom-H), Forretress, Weavile as the resulting team. (There was only a very marginal difference between using Zapdos and using Rotom-H.) Of course, there are all sorts of problems with this method; the two biggest problems are that it doesn't take sets into account (because the statistics it was based on don't contain sets), and that it can't allow for a Pokémon being "just better overall" than another Pokémon (i.e., it calculates what Pokémon are good at, but not how good they are relative to other Pokémon). Some day I'd like to build that team and see how well it actually does; it would likely be the only team ever built either entirely on theorymon, or not at all on theorymon, depending on your point of view.

obi · Mar 31, 2010

On the issue of teams: A team building AI would essentially be a completely different program. I'm not making any assumptions about what type of team the AI is using. That part is interchangeable.

-ixil- · Apr 7, 2010

On Stealth Rocks: when there are unknown pokemon it may help to use some data on the usages of pokemon and leftovers to determine the amount it will lower the opposing pokemons score by (this score will then be moved up or down depending on what is revealed). The amount that it lowers the score by will be affected by things like typing, leftovers, and recovery moves, right?

Also, another problematic move may be taunt. It requires values of problematic battle effects, and it would really help to know the common sets of opposing pokemon (and distinguish between, say common leading sr's and other movesets that would be much more common if they are in other places in the team). It will also weight the chance that it makes the opposing move do nothing versus them using an attacking move, but that can just be done with probabilities.

Have you considered the possibility of making some killer moves to reduce the amount of nodes that have to be traversed? I feel that this could especially useful in the middle of the game when a lot about your opponents team is known, but not there are too many possibilities, so it can't just be brute-forced.

Finally, can you post the exact point values of some different types of moves/pokemon and some modifiers that you have so far (or just snippets of code)?

deluge · Apr 8, 2010

I once wrote a few basic minmax algorithms for simple games but I'm real rusty and never did ones involving chance.

Anyway, I was thinking that the complexity of he weighting system could be reduced by making most conditions penalties or bonuses for Pokemon rather than having intrinsic value.
IOW, the value of the a Pokemon is reduced in proportion to the damage it takes from Stealth Rock or a Swift Swim Kingdra gets a bonus from rain.
So the AI might choose to use Rain Dance on turn one because all its Swim Swimmers get a boost that maximizes the next state.
Or it might choose to use SR because the advantage given to every Pokemon on its team (maybe even a higher bonus for phazers) outweighs the numerical benefit from fainting the opposing lead.

One question this approach brings to mind is how the AI would evaluate the opponent's Rain while knowing only one Pokemon, the lead.
It could do so by assuming there are a certain number of rain abusers and boosting the value of the opponent's unknown Pokemon accordingly.
As a result, the AI might take the opponent's turn one Raindance more seriously than its use later by Uxie when it knows Uxie is the only thing left.

One issue I'm having is the consequences of your apparent decision not to make the AI 'cheat' by having perfect knowledge of the opposing team.
In that case, for the AI to play strongly, it might have to make initial assumptions about movesets and the probability of certain Pokemon being present.
For example, it would have to recognize Stealth Rock as a very likely move for a lead Aerodactyl that results in a very poor state for it in the presence of Charizard, Salamence and Gyarados on its team and possessing a Scarf Froslass, it would choose to Taunt.
It could probably play quite satisfactorily just knowing learnsets but if it new shoddy moveset statistics it might be even more convincing.

Anyway, I'm kind of just brainstorming here and I don't know if I'm being particularly useful but you have motivated me to think about or try some approaches.

TSPhoenix · Apr 8, 2010

cantab said:
About entry hazards - for an offensive team, how much they're worth depends on how many of the AIs team members gain nHKOes on how much of 'the metagame' with the hazard.

Well this depends on whether you want the AI to play to the metagame the way we do or to do things differently. I personally think a 'metagame-aware' AI with some kind of access to Smogon's statistics and analyses would be at the very least beneficial to the bot's success rate.

Honestly the hardest part of the AI is going to be to get it to play the metagame as well as a human. People know the ins and outs of their teams. I for instance know that certain threats maim me if I lose my counters/checks, and as such try to play carefully until such a time I know I can handle it. The classic example would be Magnezone and Scizor I guess.

As such while you are working on the AI it might be worthwhile to cheat a bit and use say a Yawn Swampert lead to help scout and easy the imperfect knowledge problem.

cantab said:
Almost everything useful a Pokemon can do relates to damage. Attacks, poison and burn, entry hazards, and phazing with entry hazards cause it. Stat boosting increases the ability to cause it. Healing moves remove it from yourself. All status except poison (including Taunt) inhibits the opponent from causing it to you or healing it from them.
You could insert a large 'gap' between 1 and 0 HP, to encourage KOing a weak Pokemon rather than denting a strong one. You might also add a small weighting towards reducing uncertainty, so the AI would 'scout' if it's running a phazing move.

Except in the real game a player will often forego dealing damage in order to setup, which is to say potentially increace/decreace future damage against the opponent/themselves respectively. The thing is this is just potential. Sword Dance on Lucario deals no damage, yet you could sweep the opponent or a Gliscor could come in an ruin your fun.

In Pokemon there are many instances where if you are losing, you have to risk throwing the match to make a comeback. There is no point min-maxing if your last Pokemon can only take out 2 of their 3 Pokemon, whereas the risky move may end the game prematurely, but is the only chance of winning.

The aim of the game is to win the battle, so the AI should assess the situation (ie Lucario 97% VS Suicune 88% and two known others that can be OHKOed after an SD) and see that the only way to win is to SD. Its better to have an AI that actually wins 30% of the time than one that nearly wins 80% of the time.

There is also the concept of overkill, why go to +3 if +2 lets you sweep. Similarly if a moves does 55% why go to +1 as you still only 2HKO.

FlareBlitz said:
However, an actual player would, upon seeing Bronzong switch in, notice that it has taken damage from Spikes, and would therefore simply use the Ground attack, which is the optimal choice. I feel that in order for an AI to be successful, it certainly needs some rudimentary prediction and extrapolation skills, and I'm curious as to how you will program this in.

One of the easiest parts of the AI is gathering data, its using it to win thats hard. The AI should basically start out with a blank sheet each game and add every bit of info it can glean as the game unfolds. For example on the first turn it adds the Lead's species, level, gender and any other information that may be apparent such as an ability. Then using a set of rules it would gather more and more info about the opposing team, if you Surf a Scizor and it takes 40% you can learn something about how bulky it is as your Surf's power is knows give or take 15%. The AI should attempt to keep track of movesets, PP, approximated stats and so forth. Then this information is used in the evaluate function to help pick an optimal move. Of course you've already done some of this, but just clarifying for those who asked.

Now the big question of what values to assign to what, this needs to be derived from the team the AI is being given and will vary throughout the game as more data is attained.

One thing that is going to be very hard to model is the concept of fear. Generally speaking in Pokemon the closer something gets to being guaranteed the more likely players are to make risky moves. For instance Gengar VS Scizor. The logical move is to Bullet Punch as its a KO, yet because its so logical you'd expect them to switch out so you might try to Pursuit/U-Turn, however the opponent thinks the same and stays in and Gengar kills you. This kind of cat-mouse prediction in AI is going to be tough, I guess minmax will just do its thing here.

I'm just spewing thoughts here as I can't even begin to imagine how complex the process our brain naturally just handles when we play is.

obi · Apr 8, 2010

This thread has been mostly a "general AI questions thread", so I'm modifying the title to say so.

-ixil- said:
On Stealth Rocks: when there are unknown pokemon it may help to use some data on the usages of pokemon and leftovers to determine the amount it will lower the opposing pokemons score by (this score will then be moved up or down depending on what is revealed). The amount that it lowers the score by will be affected by things like typing, leftovers, and recovery moves, right?

That's somewhat what I'm doing right now. I have an incomplete evaluation function (as soon as I get enough of my code working to run a few tests, I'll post what my incredibly arbitrary and definitely wrong values are). The way I have it, Pokemon have scores and moves have scores (and teams have scores). For example, if a Pokemon is hit by Taunt, the score of their moves that are blocked by Taunt drop down low (not to 0, because it's not a permanent state, but very close).

Also, another problematic move may be taunt. It requires values of problematic battle effects, and it would really help to know the common sets of opposing pokemon (and distinguish between, say common leading sr's and other movesets that would be much more common if they are in other places in the team). It will also weight the chance that it makes the opposing move do nothing versus them using an attacking move, but that can just be done with probabilities.

I'm thinking of doing the following for the first draft of the AI as a coding simplification: Assume the most likely members of their team are on their team, and assume the most common move set is their move set. For instance, the battle starts and I have Hippowdon and they have Heatran. OK, then I'll assume their team is Heatran, Scizor, Latias, Salamence, Tyranitar, Jirachi (5 most common Pokemon on the Heatran teammate statistics) and their move set is Earth Power, Explosion, Fire Blast, Dragon Pulse @ Choice Scarf. As I get more information, I'll refine this data. At some point, I'd probably make my algorithm a little more sophisticated, for instance, if the top moves for Heatran were Fire Blast, Flamethrower, blah, blah, I would not assume it has both Fire Blast and Flamethrower.

Eventually, I might do something smarter than that, but this seems like a good start that will tend to be mostly accurate.

Have you considered the possibility of making some killer moves to reduce the amount of nodes that have to be traversed? I feel that this could especially useful in the middle of the game when a lot about your opponents team is known, but not there are too many possibilities, so it can't just be brute-forced.

Yeah, I'm looking into alpha-beta pruning for this, and the killer heuristic just might be the best. I haven't really looked to much into the advanced heuristics yet, just because I'm trying to work on a sensible representation of the game tree with alternating player nodes and chance nodes where the order of the player nodes isn't fixed.

Finally, can you post the exact point values of some different types of moves/pokemon and some modifiers that you have so far (or just snippets of code)?

As soon as I get it working a bit so I can test them. Right now everything is a total guess. With the methodology I used, my numbers would be useless. They're probably within an order of magnitude of being correct, which is absolutely horrible.

deluge said:
One issue I'm having is the consequences of your apparent decision not to make the AI 'cheat' by having perfect knowledge of the opposing team.

I answered most of your post in my response to the previous post, so I'm just going to focus on the differences.

It's not that I made the decision not to have the AI cheat, as it's not possible to have it cheat unless it were playing server-side, in which case I might as well cheat by giving it all critical hits or something like that. It will use as much information as is available to it, which is to say: everything available to any player.

TSPhoenix said:
In Pokemon there are many instances where if you are losing, you have to risk throwing the match to make a comeback. There is no point min-maxing if your last Pokemon can only take out 2 of their 3 Pokemon, whereas the risky move may end the game prematurely, but is the only chance of winning.

That's actually an example in which mini-max would be better than your average amateur, as long as it has enough depth to see this. If a path has a 0% chance to win if you search deep enough, that path should never be taken (unless all paths lead to a sure loss). A deep search doesn't care about the score of the game at intermediate nodes, only at the nodes at the end. In other words, the AI is perfectly happy to sacrifice 5 Pokemon along its search tree if, at the end, it wins. The only time that such a situation comes up is if the AI can only search 3 turns ahead, but the win will take 4 turns, for instance. Then it will stop right before it can win and evaluate the state and think that this path leads to a loss.

Wichu · Apr 8, 2010

I've thought about programming a Pokémon AI before, too. I think that's currently beyond my capability, though (and would take me forever to finish).

Anyway, I was considering allowing it to learn over several battles, rather than just use logic independently in each one. For example, it would keep track of switch-ins to each of its team members; if it found that a common switch-in to, say, Magmortar was Blissey, it would either give Magmortar a move to counter it (Cross Chop) or switch out accordingly to a Blissey counter. Obviously, this wouldn't work without major modifications (if it just switched to a counter for common switch-ins every time, it wouldn't get anything done), and it would require a lot of battles for it to learn what to do. However, this leads on to my next point. Each time it faced a player, it would record their team and general strategy (stall, offensive etc) based on the game. Much like a human player might, it could then recall the information next time it faces the player (e.g. if they don't normally use Blissey, there's no point switching in a Blissey counter). I'm sure many people are roughly familiar with the playing styles of more prominent battlers, so why couldn't an AI do the same?

Finally, it would be great if the AI's settings could be tweaked (assuming it gets a public release). For example, you could set it to a defensive style, or make it take more risks. If the AI is decently fast, you could even get it to tweak its team if something is doing badly, and run it against itself (maybe with different settings) many times; this could lead to some interesting decisions (and new sets). The only drawback is that it would become optimised to play against itself, and may end up getting worse against human players...

obi · Apr 8, 2010

The self-play strategy was used to great success in TD-Gammon, a Backgammon playing program that uses neural nets to modify its own behavior. That is far beyond my ability to program, however.

My vision has always been to have it learn from battles. Ideally, it would use the general stats for some information, and then slowly modify that based on its own (more detailed) information.

Wichu · Apr 8, 2010

That bit was more directed at the AI's team. Looking back, I just saw that you're not going to implement any team building, so I guess you can skip over that bit.
Even if you're not implementing complete team-building, maybe you could still allow it to make small changes to its current team? It might help people with team building, as they could assign their team to the AI, then have it simulate many battles and see if anything could be tweaked. That might not be feasible time-wise, though, as you'd probably need to do in the order of hundreds or thousands of games.

On that topic, it would be interesting to have access to everyone's Shoddy teams, so you could run a virtual ladder (assuming it's feasible). I'm not sure what information could be gained, though (maybe which teams are best, assuming all players have the same amount of skill).

EDIT: Also, are you going make the source available to the public? I'd certainly be interested in adding a better AI to my game project :P

obi · Apr 8, 2010

I will most likely be releasing the source under the AGPLv3+ (Affero General Public License version 3+) some time in the next few months. This is the same license that Shoddy Battle is distributed under.

I do plan on having the AI be fully self-sufficient. That is to say, I will at some point have it make its own teams, but that's much farther down the list. My thinking is that team building is a more complicating thing to do than battling (as long as the team isn't just a derivative work of other teams), and you can battle without really knowing too much about team building, but you can't build a team without knowing how to battle.

Res Ipsa Loquitur · Apr 8, 2010

Let me preface this post by saying that I am a game theorist, not a programmer, and I have never studied algorithms in the computer science context.

That said, it seems like ideally, with unlimited data and computing power, plus perfect information, the program would use an extended-form game tree and simply choose the move with the largest number of remaining victory conditions. By taking this move each time, the AI would always win. The addition of a move by nature ("God" in your model) makes this a probabilistic game, but this is still the ideal strategy.

Of course, Pokemon has two obstacles to this method: too many variables and incomplete information. So if I am understanding your "score" for each node, it is basically supposed to be an estimation of the full extended-form game. Using the information available and remaining within the bounds of reasonable calculation times, we are making our best guess as to which node has the most remaining subsidiary victory conditions. This node is the payoff-dominant move. Because it is a simultaneous-move, imperfect-information game, this is often, as you say, a mixed equilibrium, since there is some small incentive to deviate from a strict-move or strict-trigger strategy.

This view of the model produces some additional insights. For example, you can remove the evaluation of any variables which no longer impact the number of extant victory conditions. You have already suggested doing this by ceasing to evaluate entry hazards when only one Pokemon remains. This could apply in other contexts also.

For example, if your opponent's last remaining Pokemon is Scizor, and you have a Scarf Heatran on the field, you can ignore the Scizor's current HP for the calculation, since even a max/max Scizor with Occa Berry cannot survive even a min-damage Fire Blast. There are numerous other circumstances in which you might ignore variables in this way, including many scenarios involving Pursuit, for example.

This may also provide insight into the relevance of field effects such as weather, Gravity, and Trick Room. In general, visible information about the opposing team should provide at least some ability to guess at its capacity for abusing the effect. Normally, a rain team will want to keep Rain Dance up as much as possible, since even an opposing team with a random Kingdra will have far less ability to abuse the rain. Of course, the utility goes back down if the opponent has a Vaporeon. Usage statistics could provide a guess as to how much good rain can do you, weighted gradually less as additional information is revealed about the opposing team, until the calculation is entirely empirical.

For example, let us suppose that the AI's active Pokemon on turn t is Ludicolo, Modest @ Life Orb with Surf/Grass Knot/Ice Beam/Focus Punch and standard EVs, at 90% health, and the rain ends after it scores a kill. The AI's remaining team is standard SD Kabutops and Electrode @ Damp Rock, the only thing left with Rain Dance. The opponent has just switched in a Muk, at 60% with only Brick Break revealed in its set, and has a Magcargo and an unrevealed Pokemon.

The AI has two real choices here. It can use Surf on Muk, or it can switch to Electrode and go for Rain Dance. With its Focus Sash intact, Electrode can definitely get rain up. But, if the opponent's last unrevealed Pokemon is Kingdra, Electrode's ability to outspeed without rain and use Thunder may be essential to winning. Ludicolo's Surf is a 2HKO on any common spread for Muk, but nearly all Muks carry Brick Break so it is impossible to determine the set. If Muk is running attack EVs and Gunk Shot, as many do, it can OHKO Ludicolo. There is also a small percent chance that a Curser could KO with Poison Jab + Shadow Sneak.

Therefore, the AI should weigh: probability based on both usage and teammate statistics of Kingdra usage, the likelihood that the opponent has Kindgra (this is really the probability that the unrevealed Pokemon can defeat both Ludicolo and Kabutops in the rain, it's just that Kingdra is the only common Pokemon able to do this); probability based on moveset statistics that Muk has Gunk Shot; probability that Poison Jab + Shadow Sneak will kill Ludicolo. With ballpark statistics for these three things, it should be easy to hazard a very good guess as to which move is payoff-dominant in the extended-form game.

If the last Pokemon is known, you simply drop the first calculation and replace it with the probability, if any, that the remaining Pokemon will benefit from the rain and that this potential benefit could be outcome-determinative.

-ixil- · Apr 8, 2010

obi said:
Yeah, I'm looking into alpha-beta pruning for this, and the killer heuristic just might be the best. I haven't really looked to much into the advanced heuristics yet, just because I'm trying to work on a sensible representation of the game tree with alternating player nodes and chance nodes where the order of the player nodes isn't fixed.

I saw the alpha beta pruning earlier, which is why I recommended the killer heuristic: this is something that is normally layered with the alpha-beta pruning, and would be rather worthless in a simple minimax or negamax search.

obi said:
As soon as I get it working a bit so I can test them. Right now everything is a total guess. With the methodology I used, my numbers would be useless. They're probably within an order of magnitude of being correct, which is absolutely horrible.

Ouch. Good luck getting getting them closer.

obi said:
That's actually an example in which mini-max would be better than your average amateur, as long as it has enough depth to see this. If a path has a 0% chance to win if you search deep enough, that path should never be taken (unless all paths lead to a sure loss). A deep search doesn't care about the score of the game at intermediate nodes, only at the nodes at the end. In other words, the AI is perfectly happy to sacrifice 5 Pokemon along its search tree if, at the end, it wins. The only time that such a situation comes up is if the AI can only search 3 turns ahead, but the win will take 4 turns, for instance. Then it will stop right before it can win and evaluate the state and think that this path leads to a loss.

I think you mentioned using a quiescene search earlier, this is a place where it could help quite a bit, and increase the AI's advantage over a normal player.

Finally, if you have anything else that is somewhat presentable, could you post that (even if it is just a representation of what the game tree will look like)? Sorry to bug you about posting stuff, but it is sort of hard to comment in vague generalities

.

iKitsune · Apr 8, 2010

Im completely math computer illiterate, but can the computer compensate against player styles. Some players I find use "prediction" more, enter into more high risk reward situations could the AI punish them for that or change its playing style to accommodate for team matchup weaknesses: "wow that infernape looks dangerous how can i best mitigate it as a threat?". I realise it is early days, but A would this be possible and B would this be something you would consider implementing. I often find this crucial to success so it may be integral to this program.

On a different tact. I'm in absolute awe of this and eagerly await any and all outcomes of this undertaking.

XtrEEmMaShEEn3k2 · Apr 8, 2010

The way I would make this AI is begin with the "hardest" setting possible, by allowing it to "cheat":

Give it a set of teams with immunities to as many types possible, as well as some some way to beat the game "easily" such as with Tank Offense, All-Out Offense, or Stall. ALLOW IT TO SEE THE OPPONENT'S MOVE TO SEE WHAT CHOICE THE OPPONENT HAS MADE BEFORE IT MAKES A MOVE (this is the only "cheating" aspect), all the members of the opponent's team, and knowing how much damage it would take/critical hit beforehand and have it respond accordingly. If the opponent is gonna switch to a Steal Type, have it use a fire move, if your opponent uses a fighting attack, have it switch to Rotom, etc.

Start with this aspect of the AI first, since hard counters (if/then commands) will probably be the most basic thing to program. From this point, you can begin developing the AI into "easier" difficulties by having it not look at the opponent's moves every couple turns. For example: 0 will be it will look at the opponent's move every turn, at level 1 there'll be a 99% chance that it'll see the opponent's move, at level 2 there'd be a 98% chance, etc. all the way up to 100 where it wouldn't see the opponent's move at all and rely completely on the evaluation function.

From this point you will develop the evaluation function as to what would be the best move when it does not know what the opponent is going to do, and Endgame scenarios

This is my theory anyway, and how I would go about it if I were making a Pokemon AI. Starting with Hard Counters/"cheating" AI first as the most difficult setting would probably be the easiest way to get this AI project started.

With the large amount of if/then commands to come up with if you go along with the "Hard Counters" idea, make it so that it does a good job playing 1 team really well first, and then you use that experience to get a better idea of how to build the AI from there.

Essentially, begin with an all-knowing omniscient AI that will always make the "Perfect" move with a given team and, with your experiences, work your way down from there. I think this would be the best method of developing this AI quickly since you're attempting to write an AI for a strategy game from essentially scratch, and that's very, very hard. Though it may not be as hard if you took a look at Nintendo's AI for their higher-tiered Computer-controlled trainers and figured out how it works and explaining it to us.

cantab · Apr 9, 2010

I'm not sure cheating will teach you anything useful really.

Also, poking around Nintendo's code is a no-no - quite apart from being of dubious legality of itself, it opens you up to claims your own AI code infringes copyright.

Sixotanaka · Apr 9, 2010

It would be pretty cool if Nintendo upped the sophistication of their AI though, however obi definitely deserves financial compensation for his work if they do include it :P.

I've done a bit of work before in AI, and I'm curious to know why you chose to write it in C++ rather than a functional language such as LISP, which is better suited to the task (At least in my own experience)?

kaleep · Apr 10, 2010

thoughts have you considered using an evolutionary learning function?

the book blondie24 does a good job describing an evolutionary learning function. i'd end up using that, for a team at the beginning i'd assign every neural net a "random" team that was pregenerated. i'm sure there are more than enough pregenerated teams you could use.

the other option is to go with setting each status condition with positive and negative values. this however would require you to watch your game play and to note "dumb" moves the neural net does.

TSPhoenix · Apr 10, 2010

In regards to assigning values for the AI to use, maybe it would be best to have them all be variables and for the AI to alter them if it wins or loses. For instance if you assign a value of 3 to Stealth Rock, but its not paying off that much, the AI would lower its value until its payoff it equal to the amount of priority placed on using SR. That said I can see these value probably best being a per-team thing, as obviously something like TSpikes is useless to some teams and very useful to others.

Res Ipsa Loquitur said:
That said, it seems like ideally, with unlimited data and computing power, plus perfect information, the program would use an extended-form game tree and simply choose the move with the largest number of remaining victory conditions. By taking this move each time, the AI would always win. The addition of a move by nature ("God' in your model) makes this a probabilistic game, but this is still the ideal strategy.

Not even, if the AI's team is badly matched they can still lose even with no bad luck. There is always the chance you can get OHKOed on a switchin as you can't predict which move the enemy will use. In Pokemon playing the optimal move won't always lead to the best outcome.

Res Ipsa Loquitur · Apr 11, 2010

Sure, but that doesn't change that it's the payoff-dominant strategy. No matter how you change the rules, your best strategy is always to choose the node with the most remaining victory conditions. The entire discipline of game theory revolves around simplifying and estimating this process.

Zacchaeus · Apr 15, 2010

I honestly know hardly anything about programming and I don't think I've ever fully understood one of your posts obi, but perhaps you could put your work in progress AI against some very skilled competitive people, and have it attempt to mimic them? It's just a thought

Solefuge · May 21, 2010

I am 'somewhat' knowledgeable in the fields of both programming and the logic being used here (meaning that I'm better than everyone I know, but I don't really know anyone who's any good), but I have noticed a few things about the problem that bear thinking about. First off, I agree with what Res Ipsa Loquitur about the problem with basing things off HP (and sorry if you addressed this already; I must have missed it), and after thinking about it awhile have decided that the best way to weight that sort of thing would be with a measure of 'KO-ability', given your current team (perhaps the weighted average of the number of moves each of your Pokemon needs to KO, with the weighting against those of your pokemon unlikely to survive and in favor of those likely to be out against it. This needs to be thought out further, obviously). I also think that Pokemon should be evaluated in terms of a percentage rather than a number, because that lets you use infinity (for a Pokemon which guarantees victory, perhaps) rather than just really big numbers. I don't think that it actually matters though, except from a stylistic point of view.

Also with regards to what people have been saying about prediction and base point values for moves (which would of course be modified by things like STAB, weather, etc.), I thought of something that I'm not sure would work (not knowing enough about the programming language you're using) but would be useful if it did. First include some variable in all of the various estimates (how likely are they to switch out? How much intrinsic value does Ice punch have?) so that instead of '30%' we had '30% + N'. Now, after every battle where that statistic comes into play, pick a random N between say, -5 and 5. Over many battles, if substantially more wins occur with an N of 3-5 (in relevant situation) then change the range of Ns to 0-10, and so on. You could probably calculate this continuously, perhaps decreasing the bounds of the range by .1% for each win with a low N and increasing by .1% for each win with a high N. Someone might have suggested something like this earlier, I'm not sure.

Another note on prediction: something interesting to see would be if different people with similar skill predict in different ways, either based on their personality or with simple(?) randomness. Give two people a copy of the program, and check their 'N' values after a few hundred battles, and see whether and how they are different. This could also show whether different instances of essentially the same scenario (I need to guess whether he switches in fear of a S.E. hit or stays in in fear of a different S.E. hit) are predicted the same, or if some variations exist which change the optimal percentages (the only thing I can think of is maybe with different S.E. types in this case; maybe Ice is always better than ground?).

I think that using some variant of the above (thats two paragraphs ago) strategy would be necessary, or at least more helpful than recorded statistics. I mean, when we get the data from Shoddy it just says absolute usages; for all we know a ton of the starmie usage was on one day, and most of the time it would have rated 19. And as previous statistics have made obvious, it is next to impossible to predict anything for the future based on these statistics until its already happened. The numbers jump around from month to month, and I wouldn't be at all surprised if they jumped even more from day to day. Normally there would be no way to get around that, but if this AI were widely used, it would be gathering huge amounts of data constantly on its own, not enough to predict the future, but probably (hopefully) enough to get a handle on the present.

Of course, that's just my opinion.

I also have a suggestion in regard to how Pokemon are rated. I agree that the sum of the scores of the moves should play a part, but I think that, in keeping with the idea I presented earlier, any further alterations (due to sandstorm, type advantage, STAB, etc) should be multiplicative, probably removing a percentage of the difference between the Pokemon's other value and 1 (a Kingdra which already has two dragon dances would not gain as much from rain as would one with no other boosts).

Finally, I disagree somewhat with the analysis of taunt you present. You suggest that it heavily de-value the non-attacking moves of the target, not dropping them to zero because the status isn't permanent. I say that it should drop them to zero, but only on the turns it is in effect. I may be misunderstanding what you are saying, but you seem to have forgotten that you have intermediate nodes to work with, not just a static score (in this case, by my reasoning, the non-attacking moves would be absolutely de-valued for every turn until it switches out, curing the status and fully restoring the moves' scores. With nodes perhaps looking like: 100 (not taunted), 10 (taunted), 10 (stayed in), 100 (switched out))

I stand by each thing I said above, though I would guess at least three quarters of it is probably worthless. I hope some of the good stuff helps, and more importantly that none of the stupidity hurts. Good Luck

Phantom_IV · May 23, 2010

Obi, it would be incredible if you could build such a program. 'Twould make pokemon somewhat like chess in that regard, but humans would still be necessary to build the teams. However, I would suggest that you consider a watered-down version of the program first. For example, reverse engineering the EVs might be somewhat challenging, as you have to consider the opponent's pokemon's held item and nature and the randomness of the game. Perhaps getting a working framework and then improving that would be a better strategy.

Technical Machine: A Pokemon AI

cantab

obi

formerly david stone

FlareBlitz

Relaxed nature. Loves to eat.

callforjudgement

obi

formerly david stone

-ixil-

deluge

TSPhoenix

obi

formerly david stone

Wichu

obi

formerly david stone

Wichu

obi

formerly david stone

Res Ipsa Loquitur

-ixil-

iKitsune

XtrEEmMaShEEn3k2

cantab

Sixotanaka

kaleep

TSPhoenix

Res Ipsa Loquitur

Zacchaeus

Solefuge

Phantom_IV