View Single Post
Mar 29th, 2010, 3:57:08 PM   #5
david stone
Fast-moving, smart, sexy and alarming.

Join Date: Aug 2005
Posts: 5,152

How alpha-beta pruning works is it assumes that the best move it's seen so far is the best move the player can use. It then tries to find out whether this is correct. If the first move is the best, all it has to do is verify that this is the case. If the first move is not the best, however, it has to find the actual value of the best move, which is significantly slower. Here is the reason why:

Let's say all moves are sorted in order of best move to worst move.

I use some move and look at the opponent's responses (a 2-ply search, ignoring "God"). I see that after each of their moves, the game state can be a value of any of the following: -5, 2, 3, 7, 10. I'm going to assume the opponent will use the best move, which means the score will end up at -5 if I use that move. Then let's look at my next move and the score of the foe's responses: -10, 7, 20, 50, 100. In there, I have to check all 5 moves using a naive search. However, because everything is ordered properly and I'm using alpha-beta pruning, I only have to search one node for that second move! The reason is that as soon as I see that it starts with -10, it doesn't matter what the other moves are. My opponent already has a move which gives him 5 more points (-10 compared to -5), which means that if my opponent plays perfectly, I'm better off using the first move than the second. Therefore, I don't have to search the other moves at all.

If I don't have perfect move order, I might have to search more than just the first move to find the opponent's "better" response. I say better because it doesn't have to find the best. All it has to do is show the foe has at least one response that is better than their best response to the best move I've already checked. This is why it's faster to verify something is better than to find the best. To find the best, it will have to search all responses and pick the one that is best. With alpha-beta pruning, it stops searching when it shows the opponent can make the game state better.

The evaluation function doesn't need to be perfect, however. Anything in the evaluation function that involves looking ahead and calculating future outcomes is completely out of the question. That will be handled by my expectiminimax tree. I plan on doing something similar to quiescence search in chess. Quiescence search is when the program notices that the state of the game is likely to change a lot in the next couple of moves, it searches deeper than it normally would until the game becomes more stable. This is related to another thing I plan on doing, which is similar to an end-game tablebase in chess. When the 'material' gets low enough, I'll head more toward brute force and deep searches to try and find the win.

I need to call the evaluation function for every single outcome of a turn. This means that it can be called several thousand times per turn. Obviously, the more accurate my function is, the better, but if I get something that selects the best move 5% of the time more often, but takes twice as long to run, that will usually give a net loss in playing power. This is because I'm spending so much more time evaluating each position that I can search interesting positions as deeply. Deeper searches tend to give stronger play more than anything else. This is why the evaluation function should not have stuff like "If this Pokemon has a greater than 50% chance to win in a 1v1 battle with each side using optimum moves, give a bonus of 60 points.".

And just to explain a little deeper, the reason a good evaluation function is related to move ordering is that I plan on traversing the tree using a process called iterative deepening. What that means is that first I look one "ply" (a move, basically) ahead. I evaluate all the positions, and order them from best to worst. Then I repeat the search with one more ply, and re-order based on the values at the end of the tree (at the leaf node). This might seem wasteful to have to continually re-evaluate positions, but due to the large branching factor in Pokemon, almost all of the time will be spent at the deepest parts of the tree, meaning that re-ordering the tree early and often will give massive speed ups much greater than the cost of re-evaluating positions.

Quote:
 Originally Posted by Fat cantab When thinking about speed, bear in mind you have nearly 5 minutes. In singles you have at most 9 options for your move (not including resignation), far fewer than the number in a 'typical' chess position. I don't think you'll have much trouble with calculation speed provided you use a fast language (ideally something compiled).
I'm writing this in C++.

There are really more than 9 options per turn. The best case is a move like Dragon Dance. It has one outcome. The worst case is a move like Fire Fang. It causes the tree to branch in an additional (16 values of r * 2 values of CH * 2 values of causing burn * 2 values for flinch =) 128 paths. Fortunately, with smart move ordering I can simply not evaluate many of those paths. For instance, if I can show that CH flinch burn Fire Fang is still not as good as the worst case of another move (let's say, Earthquake), then there is no need to evaluate any of those 127 other branches. In most cases it won't be on either extreme, but the branching factor is much more than 9 per move. The elements of luck is actually why I've been thinking that computers would play better stall Pokemon than offense, as long as they can properly evaluate entry hazards. Non-attacking moves tend to have only one outcome.

Even if we assume 9 options per move, a turn consists of both players moving, which means a turn has 81 outcomes. The average branching factor in a game of chess is about 20-30, which is smaller because of no simultaneous move execution. It's also simpler to evaluate attacks, because taking a piece is "Can I move to this square?", compared to Pokemon where you need to run a fairly long calculation.

Just to give you an idea of how much luck can cause the game to branch, if both sides are down to their last Pokemon and have four attacking moves each, with all of them having one side effect (Thunderbolt, Flamethrower, Ice Beam, Psychic, for instance), then each turn has 65536 possible outcomes. Of course, if they can switch the game only gets an additional 2585 outcomes, bringing the total to 68121, which is not quite the worst case (moves like Fire Fang make it worse), but it's much worse than average.
__________________
Previously obi.
Technical Machine, a Pokemon AI.
"Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu