Technical Machine: A Pokemon AI

Discussion in 'Stark Mountain' started by david stone, Mar 29, 2010.

  1. callforjudgement

    callforjudgement

    Joined:
    Jul 8, 2009
    Messages:
    301
    There are really two problems involved here (not counting teambuilding, which is a third problem). Things like deducing the opponent's set based on the damage your moves do to them, and vice versa, is not an AI problem at all, but one of calculation; and programs to do that sort of thing exist already (DougJustDoug's damage calculator, for instance, already deduces EVs from damage; all that would be needed to adapt it to work with an AI would be to allow for the probabilistic factor, and change the interface from a human-usable one to a bot-usable one; both pretty large tasks, but no problem in theory). Arguably, an automatic EV/item-deducer would be useful even in human vs. human play; it could make for a fun Shoddy plugin, for instance. (I wonder if people using it would be accused of cheating?)
  2. wavedash

    wavedash

    Joined:
    Mar 7, 2009
    Messages:
    337
    I'm sure many people would be accused of cheating if they "predicted" that a Tyranitar is Expert Belted and not Scarfed.
  3. YJiang

    YJiang

    Joined:
    Nov 18, 2006
    Messages:
    175
    This sort of thing happens all the time, though. An example is my Scarf Jirachi lead against an opposing Azelf; depending on how much the U-Turn (into Scarf Heatran roflroflroflrofl) does against his Azelf, I'm able to more or less guess what the Azelf is running EV wise.
  4. david stone

    david stone Fast-moving, smart, sexy and alarming.
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Programmer Alumnusis a Super Moderator Alumnusis a Researcher Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Aug 3, 2005
    Messages:
    5,150
    I've been doing more work on this since I posted this thread, and I have made good progress. I have most simple moves, items, and abilities implemented, as well as some of the more complex ones (such as Hidden Power and Choice items). Most importantly, I have a very rudimentary AI that can read Shoddy Battle 2 teams. For instance, in a battle between an AI Salamence with Dragon Claw and Dragon Dance @ No Item vs. a foe Suicune with Calm Mind, Surf, Rest @ Leftovers, and using just a simple 1000 through -1000 scale, based entirely on HP of each Pokemon (50% HP vs. 100% HP is 500), this is sample output, with the number representing the depth of the search:

    And this is how much the Leftovers can change things (Salamence gets Leftovers instead of no item):

    And then Life Orb:

    One of the problems I've encountered is that I spend a lot of time fixing bugs, only to discover that my program is smarter than I am, and there really wasn't a bug. For instance, one non-bug I tried to solve ended up being that my program plays like a jerk. I spent the longest time trying to figure out why my program recommended that Latias use Surf against Suicune when it has Thunderbolt at its disposal, and I finally realized why. The issue was that at a certain depth, Latias would have enough Special Attack to force Suicune to Rest, and then use Surf, Thunderbolt, and then Draco Meteor for the kill (I don't have accuracy added yet, sadly, so it doesn't recognize that Draco Meteor could miss). Thunderbolt, Draco Meteor would be enough to kill Suicune easily, but since I don't yet have iterated deepening searches, it dutifully performed a depth-first search and that just happened to be the first result it found. It believed this to be a guaranteed win against Suicune, and one optimization I have programmed in is to stop searching a branch when it finds a guaranteed win or a guaranteed loss, as there is no way for any other move to out-compete that. I thought there was a bug, and an opponent would think my AI was toying with them, but the reality was just that my too-simple implementation has the program arrive at a correct but surprising conclusion.

    Whenever there are healing moves involved, there are surprising results at low depth searches. Where I don't give a certain value to status, depth=1 searches generally lead to Suicune using Rest. This is the output for the same Salamence, with Roost added in.

    I have no idea why adding Roost suddenly makes first move Dragon Claw the best bet all the way through. However, removing Dragon Dance from the move pool entirely causes every depth except 2 to report a score of 0, all the way out to 8, so the program clearly recognizes it needs Dragon Dance to win. Whatever the issue is, it's an artifact of the order of evaluation and the way I programmed (only replace the "best move" if it's better than the previous move, not just as good). As proof, putting the move Roost at a different location in the move set (at the beginning instead of the end), this is the output for "no item".

    This re-ordering should never change the minimum depth needed to find the win, however, as a depth-first search is guaranteed to find a solution, as long as it exists and the search doesn't get stuck in an infinite search.


    I've done away with my silly ideas of actually creating nodes for the game tree and I just pass a copy of the important data to a function. This is a dramatic reduction in complexity so great that I don't know what I was thinking before.




    You may not believe in God, but He believes in you. Praise the Lord!

    My ultimate goal is to have the score correspond with the probability of winning. For instance, if the range of scores is -50 through 50, then each point increase is equivalent to saying you have a 1% greater chance to win. A score of 50 means a 100% chance for the AI to win, and a score of -50 means a 100% chance for the foe to win.

    However, I have no idea how to really estimate that yet, so I'm using the value of a % of HP as my reference. It makes it easier to try and adjust the value of points when you have something to compare it to, and if I were to say a single % HP is worth 1 point, for instance, then giving something a score of 10 is like saying "I would be willing to give up less than 10% HP on a random Pokemon in exchange for this, and feel like I came out ahead". This is not as clean or useful as having an actual probability to win, but still nice. That was chosen as a reference point because it applies to pretty much all Pokemon, so it's universal, which helps to think about the value of things in all situations.

    Having my max score be a real number (as opposed to an infinity) is good because then I can do calculations like the following:

    Suppose I have two options: I can use a move that gives me a 70% chance to win that turn, or I can use a move that causes the game to branch. I have a 50% chance to win if it goes down one path, which will occur at 50% probability, and a 100% chance to win if it goes down the other, which will happen the remaining 50% of the time. With real numbers, I can just multiply the score (50 for a win in my example above) by the probability Assuming that anything that doesn't lead to my win leads to my loss has option 1 being

    .7 * 50 + .3 * -50 = 20

    and option 2 is

    .5 * (.5 * 50 + .5 * -50) + .5 * 50 = 25

    So I should take path 2, as it improves my chance to win. Now, in that example you could have just easily multiplied the probabilities through because we were just dealing with wins and losses to see that option 2 is a 75% chance to win, but the real strength comes when you have stuff like "This path has a 30% chance to win and a 70% chance to not end the game, but put it in a state which has a value of 34.".

    This seems like a simple version of evolutionary algorithms.

    But if you bring the value of a Taunted move down to 0, then you're saying it's just as bad to have a Taunted move as to not have the move at all, which is clearly false. As far as I can see, the main point of the evaluation function is to try and limit the depth I need to search to get an accurate picture. If I could search to an infinite depth, the only thing I would need to evaluate is "Have I won yet?". I cannot do an infinite search, so the evaluation function needs to look at other things, and then try and use those things to give information about how the game will work in the future. On that turn, a Taunted move might be as bad as no move at all, but it won't be that way forever. Reflecting that non-zero value in the evaluation function is basically a way to save me some depth in my search, as if to say "I know you can't search infinitely far ahead, but if you could, this is my best quick guess of what it would look like". We still want to penalize Taunted moves (even though Taunt will wear off some time in the future) because while that move is Taunted I'm opened up to other things, of course.
  5. Aura Guardian

    Aura Guardian

    Joined:
    Dec 25, 2009
    Messages:
    2,318
    Do you have any intentions of releasing this on Shoddy with a bunch of pre-set teams possible after you finish it?

    Also, for the Surf, Thunderbolt, Draco Meteor Latias thing vs Suicune, only searching until it finds a win... at what point would forcing it to keep searching become feasible? As long as it (to save time) knows that X should never be switched into Y or W should always be switched out of Z or U easily beats V, I don't see much of a limit until it always finds a good situation.
  6. FlareBlitz

    FlareBlitz This was never a story that would have a happy end
    is a Tiering Contributor Alumnusis a Contributor Alumnusis a Past SPL Winner

    Joined:
    Sep 19, 2009
    Messages:
    2,039
    Regarding the Latias thing, I bet if you gave weight to the PP of moves, it would prevent the program from wanting to use Surf a bunch of times, as Thunderbolt -> Draco Meteor takes far less PP. Giving weight to PP would also be important for stall wars; you don't really want it to spam Toxic against something with Rest (not sure if it does this currently?) when it can just use an attack to burn Rest PP and then Toxic.

    Regarding calculating win chances, you might be able to pull off something like Chess, where point values for remaining pieces are taken into account (which would be like the HP in Pokemon) but position and initiative is also given weight. I'm not sure how it would be possible to analyze "position" in Pokemon without letting the computer analyze the whole game-tree (i.e. foe uses this move, I use this move, foe uses that move... until the game ends), which would take up quite a lot of resources, particularly since Pokemon has the God agent to introduce luck and has more variations on possible moves in general.

    I suppose it would be best to look at a simple theoretical situation (i.e. a full-health CroCune versus a 15% Vaporeon and a 19% Toxicroak) and attempt to isolate common factors that can be used to determine positional advantages, and you could work up to slightly more complex situations (like a full health ScarfGon against a 50% LO Heatran and a 26% Gyarados) until you have solid framework for what constitutes a "winning" or "losing" position despite total HP left. Testing that framework in complex battles conditions would likely reveal more factors you could account for or adjust.

    At any rate, this is really interesting. Thank you for the update.
  7. Invariance

    Invariance

    Joined:
    Jul 19, 2010
    Messages:
    45
    I think that in the situation as it has been described, there is no reason to keep searching. As far as the program cares, it has found a win that succeeds with probability 1 (or, more precisely here, that it lowers the opponent's HP to 0). But that sort of situation seems to ultimately be rare, especially with moves with imperfect accuracy scurrying about. This sort of thing should evaporate naturally once accuracy is implemented (it shouldn't recognize it as a win with probability 1, so it should keep searching in case there is one). FlareBlitz also mentions PP, which would also resolve this issue if PP were too limited to pull this strategy off (though he mentions it in the context of evaluating the value of a state, which is harder to state).

    But it does bring up a good point, in asking how states are valued further down in the tree. After all, pokemon is a game of imperfect information. Unknown information could make a huge difference, though this may vary the further down in the game tree we go. So to what extent does how much information we have reflect how certain we are of a given series of events occurring (especially from the opponent, but also from the randomness inherent in the game)? If all information in the game is 'known' to a high degree of certainty, then we should expect this modification to be small, while if the game is still 6 on 6 and each person has only revealed 1-2 pokemon, this modification should be larger.

    Also, while the AI does not have full information in the game, neither does the opponent, so to what extent will the AI try to simulate what the opponent 'knows' (or perhaps, reasonably suspects)? I'm very curious how you plan to use a scheme usually adapted for games of perfect information for a game of imperfect information.
  8. david stone

    david stone Fast-moving, smart, sexy and alarming.
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Programmer Alumnusis a Super Moderator Alumnusis a Researcher Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Aug 3, 2005
    Messages:
    5,150
    Most of the program is simulator-neutral. Right now I have a module that can read Shoddy Battle 2 teams, and I plan on adding an interface to let it battle on Shoddy Battle 2 (and possibly Pokemon Online), and I'll be constantly giving it different teams.

    The main problem with forcing a search to continue is that each time I increase the depth, the search takes much longer. With maximum optimizations (g++ options -O3 -march=native), the Salamence vs. Suicune example:

    depth = 5 is about .5 seconds
    depth = 6 is about 1.5 seconds
    depth = 7 is about 6 seconds
    depth = 8 is about 22 seconds

    That doesn't seem to bad, but Salamence wins at a depth of 5, so some of my optimizations speed it up dramatically. If we were to remove Dragon Dance from Salamence's set and replace it with Splash, then suddenly the search times are this long:

    depth = 5 is about .5 seconds
    depth = 6 is about 4.5 seconds
    depth = 7 is about 40 seconds
    depth = 8 is about 367 seconds

    And keep in mind, this is without a lot of branching coded in yet. For instance, my program always assumes moves hit and doesn't alter any of the random numbers.

    Well, it would only care about the PP of the move for positions that don't end in a win. As far as the program is concerned, a game that ends in the AI having one Pokemon remaining with 0 PP on all its moves and the Pokemon is at 1 HP and burned is just as good as a 6-0.

    End-game calculations are interesting. If you have Toxic against a Pokemon with Rest, you need to get the Rest Pokemon to use Rest more often some how, ideally at 100% HP. One the one hand, end-game positions are good for the AI because there are fewer things to calculate, so it can look farther ahead. However, it's also better for humans because we can make use of our own mental simplifications to see farther than that with less accuracy. It's easy for a human to see that if they can get rid of all the Rest PP and still have 1 Toxic PP left, then as long as neither Pokemon is an offensive threat to the other, the they win. It's a bit harder for a computer to see that, because it requires looking ahead dozens of turns.

    I'll probably have to create my own heuristics for the end-game that accepts a minor loss of certainty in exchange for greater look-ahead. In other words, have it try to find some condition that must be true for either side to get an advantage. In this case, basically have it notice that the only way Toxic is useful is if Rest were gone, and then have it work toward that goal.



    I had a question sent to me in PM about how I construct the decision tree to actually do a minimax calculation. The basic structure is this:

    Structure of my expectiminimax code

    I currently have 3 "tree" functions. Each function represents a point where I need to branch. For instance, tree3 is where the end of the turn branches, handling stuff like Shed Skin's chance to activate, Yawn's variable-length sleep. This is just to help me make sure I "undo" each move / end-of-turn effect, because each time I call the function it copies all of the important data so I can back it up.



    One thing I'm thinking of that could give me huge savings is a way to identify transpositions. In chess openings, a transposition is a set of moves that arrives at the same position as a different set of moves. For instance, it doesn't matter if I use Dragon Claw and get hit by Surf, then use Earthquake and get hit by Thunderbolt, compared to using Earthquake as I'm hit by Surf and then using Dragon Claw as I'm hit by Thunderbolt, or any other permutation, at least not in most cases. So if I could find some way to identify all of those different branches to the game tree and collapse them down to one, that would give me big advantages.
  9. DayKeem

    DayKeem

    Joined:
    Jan 11, 2008
    Messages:
    3
    Obi, as someone with very little skill in the areas where you excel, I'd like to just say that it is fascinating to read through your posts as you work through problems related to this project. Sure, half the time I have no idea what you're saying, but Wikipedia can usually fix that in a few minutes, and the lengths you go to in order to nail down the exact responses you want is amazing.

    I look forward to continued updates and, hopefully sometime in the near future, a finished product. The idea is a very cool one and, if you weren't doing this, I don't know who else would be able to. Rock on.
  10. Acritter

    Acritter

    Joined:
    Dec 6, 2009
    Messages:
    454
    About implementing move accuracy, wouldn't a simple way to do that be multiplying the accuracy by the move's point value? It wouldn't get perfect results under some circumstances (using an OHKO move against a Pokemon you can't KO any other way would always be the best option, but might not be recognized as such), but it might help. Sorry if you've already thought this through.
  11. CRASHiC

    CRASHiC

    Joined:
    May 14, 2010
    Messages:
    287
    About "things not starting out at 0." Are you going to allow the CPU to instantly know what your team is, while you are unaware of theirs? I think to get the best experience, that the CPU should register each pokemon when you reveal them, and give them numbers based on his team synergy. When it sees a lead, it judges it based on its lead and its team and makes a decision. This would probably work best in a Metagame that's already been worked to its maximum and has little experimentation. RBY or GSC would probably work best.
  12. -ixil-

    -ixil-

    Joined:
    Nov 20, 2007
    Messages:
    23
    Have you tried a transposition table?

    Quick summary for any one who wants one:
    Create a hash lookup table, to be used for storing game states and evaluations of those state.
    Whenever the function completes a move, search that current game state in the hash table.
    If it is there, apply that value to this state (in the case of a depth-first search, it might make sense not to bother to expand that node, and move on to the next move).
    If not, evaluate the game state and add the state and the evaluation in the table.


    This should make sense if the %hp Salamence lost to surf is less than %hp Suicune lost to dragon claw.

    Do the different runs with roost in a different place give the same result if the "break when a winning result is found" is removed?

    Does this mean you are going to implement iterative deepening at some point?

    Finally, very impressive work so far.
  13. ungulateman

    ungulateman

    Joined:
    Apr 11, 2009
    Messages:
    1,121
    >_> I really wish I could use Shoddy right about now. It's hard to believe that there are amateur programmers out here who can program AI which is significantly smarter than the normal game's "smart" AI.
  14. Wichu

    Wichu ACUPRESSURE
    is a Pokemon Researcher

    Joined:
    May 30, 2007
    Messages:
    1,540
    The problem with implementing this in a 'normal game' is that the DS likely doesn't have enough power to run it fast enough. If you've ever played one of the Yu-Gi-Oh! games on DS, you'll know how annoying the smart AI can be (it takes about 20-30 seconds for it to take its turn, even if it ends up not doing much). Obi's AI can take several seconds on a PC (depending on depth); on a DS, it would take a lot longer.

    Oh, and Obi, you said it can be adapted for any simulator. What about adapting it for a different metagame, e.g. CAP? I assume it would simply need the differences (stats, new moves etc) added; are you planning on making this simple to do?
  15. david stone

    david stone Fast-moving, smart, sexy and alarming.
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Programmer Alumnusis a Super Moderator Alumnusis a Researcher Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Aug 3, 2005
    Messages:
    5,150
    Part of my program is to try and reverse calculate various things about the foe.

    I'm taking a break from adding new features right now to optimizing that very thing, actually. In my reverse damage calculator, I ran a quick test and assumed that the slowest point in my program was the damage calculator, because it's the point at my deepest inner loop in my reverse damage calculator. I changed my damage calculator by splitting it into two functions: one before the random number, and one after. That way, I could cache all of the data that doesn't change, and then just calculate the random part.

    This sped up my program nearly 100% (the reverse damage calculator part, not the AI part). Running gprof to profile my program shows the random damage calculator taking about 70% of my program's time, not including any functions inside it (when you include that, it's up to about 76%). I'm looking at other ways I can speed this up. A little more caching just gave me a reduction in time of about 15%, with the random portion of the damage formula dropping to 62% of my time.

    Adding up all of my optimizations, I was able to take this part of my reverse damage calculator from about 15-20 seconds of run time down to about 2. The two biggest optimizations were pre-calculating as much of the damage formula as possible and moving it to a shallower loop, and using a binary search tree for the random part, rather than a naive linear search. I'm considering looking into whether I can use a mathematical approach, rather than what amounts to sophisticated brute force. However, this is obviously only helpful if it can find the result faster. Right now, the random section takes at most 5 runs of the full damage calculator.

    The standard binary search algorithm has me select 92 as the first random number, which means I have an average case of 3.38 turns, and a worst case of 5 turns only when r == 100. Selecting 93 as the first test value gives an average case of 3.36 turns, but the worst case of 5 turns is on r == 92 (twice as likely as 100), so that's thrown out. Selecting 91 as the first test value gives an average case of 3.26, however, with a worst case of 5 turns on r == 95 (three times as likely as 100). I'm considering selling out on my worst case to improve average times. I'll profile this to see what the results are. I suspect it's a winner because I'm looking at a wide enough data set that my average results should win out, except toward the end of my calculations where few data points remain, but then my times will be small all around anyway.

    A sample test of 39 runs gave an average run time of 2.253 seconds for r = 92 for the first test, and an average time of 2.192 for r = 91. This is about 97% of the run time, which is the savings I would expect to see based on the theoretical speed up, so the optimization seems sound. I didn't perform any statistical analysis on it beyond getting the mean. The data range for r == 92 is 2.14 through 2.60 with a median of 2.23. The data range for r == 91 is 2.08 through 2.52 with a median of 2.18. All times were generated with gprof.

    All that being said, I recently had an idea for how to use all of this data. It is currently beyond my ability to parse all of that, and my program is 100% predictable at the moment, neither of which is a good thing for expert level playing. My idea is to look at my list of all possible things the opponent can have, and at each calculation, assume that one of the possible sets is the real one. I'll probably assume the same random set for each calculation within one turn, to make comparisons valid.

    Currently, my plans for improving the program is such:

    I am currently at stage 1: dumb expectiminimax. Expectiminimax takes the traditional minimax algorithm (the best move is the one in which your opponent's best response is the worst) and adds in elements of chance by assuming you'll get the average score out of all options. The only smart optimization I really have is that I stop continuing to search down a branch when it's done searching, so nothing too special.

    Stage 2: Expectiminimax with Alpha-Beta pruning. Alpha-beta pruning is a powerful optimization that stops searching a branch when it's found to be worse than a previously searched branch. To use an example, let's say I have a depth = 1 search. I evaluate Fire Punch and see that it gives me an expected score of 12 after looking at all of my opponent's responses. Then I start evaluating Belly Drum. Their first move is Thunderbolt, which kills my Charizard after it uses Belly Drum, giving me an expected score of -33. I don't have to bother inspecting any other of my opponent's responses, because I already know that Belly Drum is worse than Fire Punch, according to this algorithm.

    This is increasingly powerful as the depth increases, because after I fully evaluate the first option of the first turn, if I ever find a response to my second move that's better, I don't have to look at the rest. In a depth = 8 search, for instance, then let's say I finish evaluating Dragon Dance and see that my expected score is 10. If I then evaluate Dragon Claw all the way out 8 turns ahead, and on the very first response I evaluate at that 8th turn out my opponent has a move which will lead to the score being -10, then I don't have to look at anything else in the Dragon Claw turn 1 branch, which means that I've reduced my total time by nearly 1/4, just in that one optimization.

    Step 3: I'm considering MTD(f) here, which is a refinement of alpha-beta pruning. It relies on me being able to sort moves properly most of the time, and then it just proves that the first move is the best. If it's wrong, it basically reverts to normal alpha-beta at that point, but if it's right, it's much faster. This relies on both a transposition table to beat out algorithms such as Negascout as well as iterative deepening to re-order the search nodes. It's very similar to Negascout, however, and Negascout seems easier to implement (more chess engines use Negascout than MTD(f), from what I've read).

    I haven't actually tested this, but based on my understanding of the program, the answer is no. The cause of this is that if the AI can win in 5 turns, but you let it look 6 turns ahead, if it evaluates a node that is essentially wasting a turn, it can then use the winning sequence of 5 moves and still win. If it sees this position first (due to the order the moves are listed), it will list this as the winning sequence.

    Yes.

    Right now, there is very little metagame specific info. It would be easy to change it as of yet. I may later add optimizations specifically for Gen 4 OU, but this shouldn't alter the core of the code, letting me extend it to other generations and tiers at some point. The ideal solution, of course, would be to only give it a source of all legal information and have it go from there, if needed, letting me optimize for OU (giving it all the usage stats), but still have it be able to play anything.
  16. david stone

    david stone Fast-moving, smart, sexy and alarming.
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Programmer Alumnusis a Super Moderator Alumnusis a Researcher Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Aug 3, 2005
    Messages:
    5,150
    I only have three parts of the program to write before it can go live: a team predictor, an interface with a simulator, and an awareness of time so it knows what depth to search positions.

    My program is now significantly smarter since the last post. I have alpha-beta pruning added in. Because I was programming with it in mind the entire time, when I actually thought about it, it ended up being a single line of code (for great speed up). My next big optimization will be to improve move ordering using some sort of sorting heuristic. This implies iterative deepening to be truly effective. Iterative deepening will also help with the time issue, because that will mean it can just keep searching at essentially infinite depth, if I wanted, and just return the best result so far after a certain amount of time.

    As a recap, this is the importance of iterative deepening:

    There are two simple search methods: depth-first and breadth-first (there are also things like best-first, but that's a little more complicated).

    Breadth-first looks at using each move one turn ahead, and then two turns, and then three turns, etc., and it has the advantage of always finding the shortest solution. In other words, if there is a path to victory in 5 turns, breadth-first will find that. However, the cost of this is that you have to store in memory your results of each level, which means you need a lot of memory.

    Depth-first uses significantly less memory, but has a couple of drawbacks. It searches down one path entirely, until it comes to the end. If you don't limit the depth to which you search, however, it could get caught in an infinite path (for instance, both people switching back and forth with no entry hazards would lead to a depth-first algorithm never finding a result). It could also return a win that takes 9 turns to complete when there is one that takes 5 turns.

    What iterative deepening does is it first does a depth first search with a max depth of 1 and returns the result. Then it does it again, but with a max depth of 2. In other words, it acts like a breadth-first search (so it finds the fastest result and doesn't get caught in infinite loops), but it doesn't need to store everything in memory. What I can do with this is to return the score of each move with a depth of 1. Then I can search these moves in order of score, because searching the best move first leads to faster results, and the score at a lower depth is a good estimate of teh score at a higher depth. I can then return the score at depth = 2 to do a depth = 3 search, and so on. In other words, I use the information gained a lower depth to speed up my searches at a greater depth, which is an overall time saver.

    My program understands most moves, has full support for chance-based side-effects (like Thunderbolt's paralysis happening 10% of the time), and knows how to switch. Thanks to optimizations I've made, when depth <= 6, my run time is about the same (on my current test-case) as I previously reported, which is pretty awesome when you consider how much stuff I've added, which increases the branching factor. When the depth goes much more than 6, however, the program really slows down as exponential growth in time starts to hurt me. Smart move-reordering will be a huge time saver here. I also still need to add a transposition table (so that I don't need to re-search identical positions that I arrive at by different means).

    I plan on writing an interface for Poke Lab and Pokemon Online. I also plan on releasing all of the source under the AGPLv3 license (the same as Shoddy Battle / Poke Lab). In other words, everyone is free to modify, review, and distribute the source code and the program, as long as they do so under the same terms. Anyone who modifies the program and provides access to it over a server (for instance, if they modify the AI and run it on their own server) also have to make their changes publicly available under the same conditions.

    The current problem I'm working on is a team predictor algorithm. These are my thoughts:

    I plan on using the usage statistics to estimate what my opponent has. My original thoughts were to use the teammate statistics only, because I figured that I will always know at least one of my opponent's Pokemon (the lead). Therefore, the teammate statistics of that one Pokemon will be more accurate than the overall usage statistics. However, I got stuck when trying to combine multiple teammate statistics. Now my thoughts are to see how much the teammate statistics change the overall statistics and multiply all of those changes through ("the multiplier method"). In other words, Machamp is used 12.98% of all Pokemon. If I see a Celebi, Machamp is used 16.9% of the time, or 1.3 * as much. If I then see their next Pokemon is Gengar, Machamp is only used with it 11.95% of the time, or .92 times as much. Using the multiplier method, this gives Machamp an estimated usage of 15.54%. If I were to simply average Gengar and Celebi's teammate statistics, that gives Machamp an estimated usage of 14.43% (so the two are not the same). I'm not sure if the multiplier method is the most accurate way to estimate these probabilities, but it seems the most accurate way that I've thought of yet. I'd be happy to get any ideas. If a Pokemon is not on the teammate statistics list, I'll just assume the multiplier is 1.

    One other thing I'm thinking of doing is using the lead statistics as a way to reduce the probability of certain Pokemon being on the team. In other words, if a Pokemon were used for 50% of its usages as a lead, and a different Pokemon is leading, I can reduce the odds of that Pokemon being on my opponent's team by 50%. Again, any suggestions would be helpful.

    I'm also considering how to use the detailed Pokemon statistics to determine what moves the foe has. The main thing to do there is try to find a way to reduce the probability of Heatran having, say, Flamethrower, once I see it has Fire Blast.

    I mostly need ideas / proof of correctness for this method of estimating the probability of the foe having a particular Pokemon, and ideas for estimating the foe's moves, given this information.
  17. Umbreon Dan

    Umbreon Dan 〉λ=
    is a CAP Contributor Alumnus

    Joined:
    Oct 19, 2008
    Messages:
    3,199
    if a pokemon is not on the teammate list, doesn't that mean that it is a rare partner? i would think that you could assume the multiplier is some arbitrary number like .9 to reflect that.

    something else came to mind when i read this. usually if i'm battling and i see a pokemon like forretress, i go ahead and assume that my opponent doesn't have a similar pokemon like tentacruel. how can you make the ai pick up on that? i guess you could have it reduce the chance of seeing a pokemon with similar or identical typing, which would be useful for something like vaporeon/suicune/milotic. you don't want to reduce the chance of seeing, say, a swampert on a team with vaporeon, though, so maybe it would make more sense to go by sets of weaknesses rather than typing. instead of "it is unlikely that my opponent's team will have three water-types," it would be "it is unlikely that my opponent's team will have three weaknesses to electric."

    i'm still stumped on the forretress/tentacruel thing. we know that these two are almost never found on the same team, but how can you make the ai realize that? i guess you could go by move usage, either for all moves or for a list of moves that includes toxic spikes. shrug.

    couldn't you simply subtract the lead usage counts from the total usage counts? instead of referring to a list of pokemon by usage, it would be a list of "usage as non-leads". obviously stuff like azelf and aerodactyl will be far lower on this list.

    you might be able to grab sets directly from the smogon site. for example, as soon as you saw that heatran has fire blast, you could check the heatran analysis and assume that heatran is running one of the sets with that move. because there is no set that has both fire blast and flamethrower, the bot would then assume that the heatran doesn't have flamethrower.

    this would be even better if you could combine it with items... for example, if the heatran that used fire blast also outsped your starmie, then you could make the ai search the analysis for sets that have both fire blast and a choice scarf. in this case, it would be narrowed down to one set, and the ai could then assume that the heatran is timid, and has earth power, explosion, and hidden power/dragon pulse.

    this is a cool project; i'm really looking forward to seeing the results.
  18. Wichu

    Wichu ACUPRESSURE
    is a Pokemon Researcher

    Joined:
    May 30, 2007
    Messages:
    1,540
    Isn't this what the teammate statistics are for? They should tell you that Forretress is rarely teamed with Tentacruel.

    I'm pretty sure that's what obi is suggesting already; I don't think there's really any other way to do it.

    If you collect stats at least as detailed as PO does at the moment, you could get a table similar to the teammates table which tells you what each Pokémon's move is commonly paired with. The problem here is that it's really specific, so the table would be pretty huge. I would use a similar strategy to this, but rather than pulling sets from Smogon, I'd use the usage stats; that way, you also get the probability the opponent is using a certain set, and you can also guess recent sets that Smogon may not have updated with yet, as well as obscure sets that are rarely used and not on the analysis. For example, if you see that the opponent's lead Heatran smacks you with a Life Orb Earth Power, you might assume it uses the Life Orb set (Fire Blast/Earth Power/Hidden Power/Taunt); however, PO's stats will tell you it's actually most likely using Earth Power/Overheat/Hidden Power/Stealth Rock, meaning your counter, which might just be able to survive Fire Blast, would eat an Overheat or let Heatran set up SR.

    Anyway, I'm glad to see that the AI is going well :) I'm looking forward to incorporating it in my game engine (if I decide to use PokéLab or PO's battle system code, it shouldn't be too hard, right?).
  19. Ricky Jhon

    Ricky Jhon

    Joined:
    Nov 22, 2010
    Messages:
    2
    i'm still stumped on the forretress/tentacruel thing. we know that these two are almost never found on the same team, but how can you make the ai realize that? i guess you could go by move usage, either for all moves or for a list of moves that includes toxic spikes. shrug.
  20. Umbreon Dan

    Umbreon Dan 〉λ=
    is a CAP Contributor Alumnus

    Joined:
    Oct 19, 2008
    Messages:
    3,199
    as far as i know, the teammate stats only show the top 20 pokemon used as a teammate. below that, there's no data. this is what i mean.
  21. Wichu

    Wichu ACUPRESSURE
    is a Pokemon Researcher

    Joined:
    May 30, 2007
    Messages:
    1,540
    Oh. Would it not be possible to gather teammate statistics beyond the top 20, perhaps in a separate file? Let's see what Doug has to say on this...
  22. david stone

    david stone Fast-moving, smart, sexy and alarming.
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Programmer Alumnusis a Super Moderator Alumnusis a Researcher Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Aug 3, 2005
    Messages:
    5,150
    I've been thinking about ways to improve the team mate stats, and I've come up with a few ideas. You've also given me a few.

    What I was considering is looking at all the usages that the stats do give me (top 20 used with it, top 10 by change up, top 10 by change down, so 20 to 40 Pokemon), and then just kind of consider every other Pokemon as being equally likely to appear. In other words, if 70% of all team mates can be accounted for by the listed Pokemon, then weight the remaining Pokemon equally down as being just 30% likely. I have to consider carefully how to weight that exactly, though.

    I had that problem for a little bit, because Tentacruel doesn't appear on the usage stats of Forretress. However, Forretress appears on the usage stats of Tentacruel. After a lot of trying to fix a bug that didn't exist while trying to implement this, I realized that the 'bug' that the multiplier for Forretress being on the team with Tentacruel coming out to the same as the multiplier for Tentacruel being on the same team as Forretress was actually not a bug. If Tentacruel make Forretress about 40% as likely, Forretress makes Tentacruel about 40% as likely, too. It took me a while to realize this, but it turns out that because I'm using a multiplier method here, that works out both directions. This allowed me to take the team mate stats that normally only give somewhere between 20 and 40 data points for 100 Pokemon (for a total of at most 4000 data points) and end up with 5168 data points (unique data points, so it doesn't have Abomasnow paired with Walrein on there twice like the stats on the site do). If I wanted, I could 'compress' this to half that by not storing Walrein with Abomasnow, but it's easier to parse this way. Now if it sees either Tentacruel or Forretress, the other one is 40% as likely as it was before it saw that Pokemon.

    Unfortunately, that does not work. The reason for this is that team mate stats don't discriminate between lead team mates and non-lead team mates. Fortunately, the real solution is almost as easy. I simply apply a non-lead multiplier after seeing the first Pokemon to my table of probabilities. In other words, if Azelf appears as the lead 40% of the time, I just multiply the probability of Azelf appearing by .6 (which, because everything I do is multiplication, essentially propagates through all calculations).

    That seems like it would be a good idea. I guess I'd have to learn how to do this first, though.

    For the most part, I don't think it really matters too much what battle system code you use. Right now the only simulator-specific code I have is a function that parses Poke Lab teams. However, I've separated that from the rest of my code, so I just need to duplicate that functionality for Pokemon Online teams, which is basically just me looking at the team format and sitting down for a few minutes and writing it. I plan on doing the same thing for the battle interaction. Have the program output use move slot 0, 1, 2, or 3, or switch to slot 0, 1, 2, 3, 4, 5, or 6 (or something along those lines), and then just have a function for each supported simulator that converts that to what they need. Should make things easy to port (I hope, anyway, I haven't looked into this at all, but letting it log on to a simulator is the next step after team prediction, which is cruising along).



    Just an idea of how accurate this is, I was testing it out in #dreamworld the past two days, and I had two algorithms: cautious and aggressive. The aggressive algorithm seemed to have about 50% accuracy in guessing the team (it would generally get half of the unknown team members correct). However, in a good portion of teams, it would generally predict one member correct some of the time, and none correct some of the time. The cautious algorithm almost always get exactly one correct. However, I made a few updates to it last night that appear to have brought the two algorithms closer together, so hopefully each one will now have more of the strengths of the other. Here's sample output (bold == correct predictions):

    However, order in which the Pokemon are seen is important. If it sees Hippowdon then Spiritomb, Aggressive gets all but Celebi correct (It predicts Rotom-H), and Cautious gets all but Celebi correct (it predicts Forretress).

    One thing I've noticed about the Aggressive algorithm is that it always predicts a solid team. In other words, I've gone through the RMT forum, and where its predictions differ from the posted team, I generally prefer the AI's predicted team to the posted team.

    Here are a few example teams where the bot only knows the lead Pokemon (bold Pokemon are the differences between the lists):

    After my most recent update of including team mate stats that go both ways, Cautious and Aggressive have gotten a lot more similar than they were before.

    Here's another example team that the AI can predict perfectly, my BP team. (Ninjask, Scizor, Vaporeon, Mr. Mime, Smeargle, Lucario):

    And then any other Pokemon gives it the whole team correct with both algorithms. However, Scizor is the most common Pokemon on the team, and thus gives it the least information. The program seems to perform best when it's seen the rarest Pokemon. This is how it compares with any other member of the team (leaving off first step where I tell it Ninjask):

    For the Cautious algorithm, the earlier a Pokemon is listed, the more likely it is to be on the team. For the Aggressive algorithm, I believe the same is true. At the very least, earlier Pokemon are closer to 'reality', whereas later Pokemon are more 'predicted', because they're several degrees of statistics removed. The first predicted Pokemon for both algorithms will always be the same. Among revealed Pokemon, order doesn't matter, but which Pokemon on the team have been revealed is very important.

    One more team prediction, on IRC, to show the weakness of the algorithm:

    In other words, final team is Gliscor, Kingdra, Jirachi, Infernape, Gengar, Starmie
  23. Wichu

    Wichu ACUPRESSURE
    is a Pokemon Researcher

    Joined:
    May 30, 2007
    Messages:
    1,540
    One more thing: will it work (or be adaptable to work) in battles with more than one Pokémon (doubles, triples, rotation, multi)? I'd imagine it would take a lot longer to search to the same depth (as there are more actions that can be performed per turn by both players), but this is somewhat offset by the more aggressive playstyle in these metagames (stall isn't that good due to being able to choose your target and having more coverage between two Pokémon; in VGC at least, battles rarely last more than a few turns).
  24. eric the espeon

    eric the espeon maybe I just misunderstood
    is a Forum Moderator Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnus

    Joined:
    Aug 7, 2007
    Messages:
    3,698
    PO stats have exact moveset/item combinations to go with probabilities, and you can download the raw data from their site for non-current stats (since the stats are updated several times a month).
  25. david stone

    david stone Fast-moving, smart, sexy and alarming.
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Programmer Alumnusis a Super Moderator Alumnusis a Researcher Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Aug 3, 2005
    Messages:
    5,150
    I'm trying to set up a mercurial repository on my web site so that I can release this code and get some suggestions for improvement. However, it turns out this isn't as simple as I had hoped thanks to my simple free hosting plan and lack of knowledge. If someone either has a suggestion for a free host that might make this easier than freehostia (I'm looking for one that I can use with my own domain name, not blahblahblah.bitbucket.com), or else is willing to help me set up Mercurial, I would very much appreciate it.

Users Viewing Thread (Users: 0, Guests: 0)