Data An OU Bot

Status
Not open for further replies.
Point taken. In that case, might I suggest that you let it "watch" some tournament games? I'm no programmer, so I'm not sure how exactly it learns, but feeding it high quality games and tournament replays is the best way to learn how top players react in specific situations.
I was kind of alluding to this when I compared it to an amiibo. Granted, this is a bit different, but basically I meant that it is able to battle and improve based on learning experiences from battling. (I didn't understand that that's what it does when I wrote It initially) Also, there has been mention of having it battle high-caliber players to basically use as a role model.
 
Maybe instead of letting it loose on the ladder, let it first observe battles of high elo players to mine a bunch of data and make the bot learn from that?

Edit: about the JavaScript thing, you don't need to run the bot from a browser. Showdown allows you to directly communicate to the server through a websocket, which basically allows you to exchange strings with the server. The showdown client uses this, it basically parses the strings from the server and displays the battles, chat rooms, etc. I don't know the specifics but you can look at the source code of the client, or look at the code of boTTT (I believe it's open source).

Sorry to quote an older post, but do you know where the code lives? I'd love to check it out.
 
Posting here because I want to see this thing succeed. And also, some replays! I don't care how garbage they are, I'm sure most of us would like to see some. :)
 
Since I'm taking a Comp Sci course right now (though regretting it heavily), I'm intrigued by the idea of managing to program something this complex. I'd certainly be interested in seeing some replays, whether they're good or bad.
 
@ Simperheve: https://github.com/Zarel/Pokemon-Showdown-Client

The action values are related to means. After each game, they are updated according to the following formula:
if the team won:
new action value = old action value + alpha * (1 - old action value)
if the team lost:
new action value = old action value - alpha * old action value
You should weight this according to opponent's rating. Just recording win/loss has no benefits, considering you're always matched with someone of similar skill. No matter how good or bad your bot is, the win chance is about 50%. And obviously, beating a good player is different to beating a bad one.

Why not just give it equally good teams? Seems a bit pointless not doing so, and writing a "pick the best team" algorithm just seems like extra work to me. Teams tend to be matchup reliant anyway, so even a completely random choice would do fine, I reckon.

Prior probabilities come from Smogon's published stats. It is simple to get prior probabilities for items, abilities, and attacks.
Stat distributions are a little more complicated. If you look at the 1825-movesets file, we see the following list of spreads for Londorus-Therian, the most used Pokemon by the highest ranked players:
Spreads
Jolly:0/252/0/0/4/252 11.981%
Jolly:0/252/4/0/0/252 8.519%
Impish:252/4/252/0/0/0 5.306%
Jolly:0/252/24/0/0/232 5.005%
Impish:252/0/240/0/8/8 4.813%
Jolly:32/252/4/0/0/220 4.533%
Other 59.842%
I just want to add that once you land a hit on the enemy, you can work out their physical/special bulk much more accurately. Rearrange the damage formula and you roughly get this:
HP*Def = (Level*0.4 + 2)/50 * Atk * BP * [Modifiers] * Roll / Damage

Because defense is made of 2 stats together (HP & defense), you're better off guessing their "physical tankiness" (HP * Def) rather than HP and defense separately. By doing this, your only uncertainty comes from the damage roll which can be narrowed further down by hitting them again. You can also determine the max possible bulk they can achieve from EVs. If it exceeds that they're using a boosting item (e.g. AV, eviolite). Same concept applies to receiving attacks - you can approximate their physical/special attack by getting hit once. Choice bands would be if they exceed their max attack, and life orbs are obvious due to recoil. The only tricky ones are expert belt and type-boosting items, as the damage roll uncertaintly makes it harder to pinpoint.

I can, intervene and forbid actions I think are almost always bad.
So far, I have forbidden:
-Using "Rain Dance" while it is already raining, if the opponent can't change the weather.
-Using stealth rock, if the opponent's side already has rocks and the opposing pokemon can't that is out can't learn defog or rapid spin.
-Continuing to use stat increasing moves after that stat already reached 6.
-Heal bell, when no one has a status condition.
-Defog or rapid spin if the opponent doesn't know a hazard move, and the bot's side doesn't currently have hazards. Rapid spin is allowed if the Pokemon is afflicted by leech seed.
IMO don't do that. It's rather clumsy and doesn't achieve much. Bots programmed like this tend to be weak and predictable. Finish the prediction algorithm first, and these "rules" will naturally fall into place. You should only resort to hardcoding it like this if the bot repeatedly makes the same mistake and there's no elegant way to fix it. Besides, pp stalling is a thing and you'd actually want your +6 Clefable to keep Calm Minding as your opponent waits for you to run out of Moonblasts.

3) Deciding on actions
To be succinct: an artificial neural network (NN) uses information on the game to compare the value of decisions and choose one.
I tried coding a simple neural network before, but it didn't work and I wasn't bothered to fix it (aka I suck!) but I'm going to say this anyway:

Don't treat ANNs like some sort of magic that will eventually get you some good AI. Somehow I don't believe competitive pokemon can be solved by just plugging values into a neural network. If you think I'm wrong then that's good news, because I really want to be proven wrong. Good luck with your project. =]
 
The bot has been playing under the name "SirSwitchalot". Because, it switches WAY too often.
I figured opponents would either get a laugh, or predict switches and rightfully punish it for switching when it shouldn't (ie, practically every time it switches). I have since modified the script to explicitly punish switching (this excludes volt-turn/when a pokemon gets KOed, although that is also currently being punished).

The bot was trained for a week against itself before playing online, in which two bugs that went uncovered hampered its ability to learn:
a) Megas were both registered as being incapable of dealing and receiving damage. This is noise makes it difficult for the net to see logical connections between information and outcome - if a pokemon that is immune sometimes takes damage, why not try using earthqake on their gliscor?
b) I made a mistake with the "restriction list", which caused the databases to incorrectly record the action the net actually chose. Sometimes it thought it used an action different from what it was actually using: of course, if doing B sometimes does Y instead, how are you supposed to learn which to actually do? Specifically, the error led to high numbered actions (4 - 9 == switch, 0-3 == attacks) to be recorded as low numbered. This might help explain why it likes to switch so much.

Something else going on: Notice that action values are like estimates of expected reward, and the bot picks with a bias towards the highest. When training vs itself, half the games in the dataset were wins making the best action's expected rewards for an even game state close to 0 (win == +1, loss == -1), while slightly worse actions wont be far behind. If suddenly the bot gets thrown in an environment where its win rate drops to 5%, what happens? Well, the expected rewards for the actions it actually picked would on average tend to drop, making other actions start to take the top rank and become most probable. A big problem! If it keeps losing for along time, a lot of weird actions could surface as the best. This reminds me of learned helplessness. A bot that loses all the time can become depressed. :(
I initialized team values above average expected win rates, however, to encourage exploration: the bot would try everything a few times, as they drop to expected values. You could imagine someone with high positive expectations and an optimistic bias: they'll keep looking, imagining that better is out there (until they've searched everything). Doing the opposite could lead to the problem of it being stuck in a cruddy situation, without looking elsewhere because it pessimistically imagines that is the best there is.

I have taken the bot offline, and it is training again locally against itself. Good news: the medium sized neural net overtook the small one in performance. Smaller sizes learn faster, but have less potential to match complex relationships. I dropped the small network when the medium sized networks win rate vs the small reached around 70%. When you see the replays, you'll learn that this means it is also extremely terrible, but that is a step forward. I am currently training a medium and rather large network.
I am still considering the idea of assembling a dataset that will feed a lot more information.

Searching for "SirSwitchalot", these replays have been saved:
http://replay.pokemonshowdown.com/oususpecttest-210270381
http://replay.pokemonshowdown.com/oususpecttest-210529551
http://replay.pokemonshowdown.com/oususpecttest-210533832
http://replay.pokemonshowdown.com/oususpecttest-210540578
http://replay.pokemonshowdown.com/oususpecttest-210540578 - asks if SirSwitchalot is a troll. :(
http://replay.pokemonshowdown.com/oususpecttest-210651705
http://replay.pokemonshowdown.com/oususpecttest-210677132
http://replay.pokemonshowdown.com/oususpecttest-210793458 - SirSwitchalot wins, via manaphy sweep
http://replay.pokemonshowdown.com/oususpecttest-210796423 "lol switching too much isn't good dude learnt the hard way too" -opponent
http://replay.pokemonshowdown.com/oususpecttest-210805414
http://replay.pokemonshowdown.com/oususpecttest-210808009
http://replay.pokemonshowdown.com/oususpecttest-210810642

QxC4eva, thank you for your criticism. It is much appreciated.
You should weight this according to opponent's rating. Just recording win/loss has no benefits, considering you're always matched with someone of similar skill. No matter how good or bad your bot is, the win chance is about 50%. And obviously, beating a good player is different to beating a bad one.
I could multiply alpha by opponent's GXE/own GXE given a win, and own GXE/opponent's GXE given a loss.
However, as all opponents it is currently playing are relatively similar in skill, how necessary is this?

Why not just give it equally good teams? Seems a bit pointless not doing so, and writing a "pick the best team" algorithm just seems like extra work to me. Teams tend to be matchup reliant anyway, so even a completely random choice would do fine, I reckon.
I do not know which team is the best. How can we find that out, without such a record keeping algorithm?
I picked a bunch of teams that either a) were used successfully by other players (and posted on the Smogon RMT board) or b) supposedly had good coverage according to a team building program. "b)" teams are both particularly unknown, and ones I'm particularly curious about.
If, hypothetically, all the teams were actually exactly the same good, their action values would be roughly identical, and thus the odds of picking them are too -> completely random choice achieved.
It took far less time to implement this highly empirical method than any theoretical method of team evaluation and construction would.

I just want to add that once you land a hit on the enemy, you can work out their physical/special bulk much more accurately. Rearrange the damage formula and you roughly get this:
HP*Def = (Level*0.4 + 2)/50 * Atk * BP * [Modifiers] * Roll / Damage

Because defense is made of 2 stats together (HP & defense), you're better off guessing their "physical tankiness" (HP * Def) rather than HP and defense separately. By doing this, your only uncertainty comes from the damage roll which can be narrowed further down by hitting them again. You can also determine the max possible bulk they can achieve from EVs. If it exceeds that they're using a boosting item (e.g. AV, eviolite). Same concept applies to receiving attacks - you can approximate their physical/special attack by getting hit once. Choice bands would be if they exceed their max attack, and life orbs are obvious due to recoil. The only tricky ones are expert belt and type-boosting items, as the damage roll uncertaintly makes it harder to pinpoint.
The bot predicts the % HP damage every attack would deal to every % HP + (Special) Defense + item + ability combination the opponent is likely to have, in order to decide on attacks. This information is saved, and then simply recalled every time it does an attack to save on the number of calculations the bot actually has to do.
When an attack lands, it simply looks at the table of possible stat combinations, items, and abilities, and sees those that match. For those off by only a little, it uses kernel density estimates to look at how much pokemon with those spreads are likely to vary - sometimes stats may shift by a small handful. This is used as a likelihood.
The bot does use, like you suggested, a combined HP * (special) defense kernel density estimate: no need to muck around with two when one suffices.

IMO don't do that. It's rather clumsy and doesn't achieve much. Bots programmed like this tend to be weak and predictable. Finish the prediction algorithm first, and these "rules" will naturally fall into place. You should only resort to hardcoding it like this if the bot repeatedly makes the same mistake and there's no elegant way to fix it. Besides, pp stalling is a thing and you'd actually want your +6 Clefable to keep Calm Minding as your opponent waits for you to run out of Moonblasts.
First of all, these hard codings are intended to be minimally invasive: the net almost always makes the decisions, but sometimes a few options that are almost always bad are moved. All the restrictions I imposed are also removed as soon as one of its moves run out of pp. Of course, that first move to run out could be moonblast.
I do want the bot to know that there is a hard difference between being at +5/+5 and +6/+6.
-I realize now, that I shouldn't implement this as a hard restriction, but implement it by no longer telling the bot that it will increase special attack and special defense once it reaches +6/+6. Because it won't. Duh. *facepalm*.
Sometimes it is hard to think of the simple/obvious things! Thanks.

I tried coding a simple neural network before, but it didn't work and I wasn't bothered to fix it (aka I suck!) but I'm going to say this anyway:

Don't treat ANNs like some sort of magic that will eventually get you some good AI. Somehow I don't believe competitive pokemon can be solved by just plugging values into a neural network. If you think I'm wrong then that's good news, because I really want to be proven wrong. Good luck with your project. =]
They certainly aren't magic, and my results thus far have been abysmal compared to the "50%" you suggested. They have however been successful in other games, like backgammon. I certainly believe it is possible, but not necessarily with the current version of my code.
It is late. More updates another day.
 

MAMP

MAMP!
The bot has been playing under the name "SirSwitchalot". Because, it switches WAY too often.
I figured opponents would either get a laugh, or predict switches and rightfully punish it for switching when it shouldn't (ie, practically every time it switches). I have since modified the script to explicitly punish switching (this excludes volt-turn/when a pokemon gets KOed, although that is also currently being punished).

The bot was trained for a week against itself before playing online, in which two bugs that went uncovered hampered its ability to learn:
a) Megas were both registered as being incapable of dealing and receiving damage. This is noise makes it difficult for the net to see logical connections between information and outcome - if a pokemon that is immune sometimes takes damage, why not try using earthqake on their gliscor?
b) I made a mistake with the "restriction list", which caused the databases to incorrectly record the action the net actually chose. Sometimes it thought it used an action different from what it was actually using: of course, if doing B sometimes does Y instead, how are you supposed to learn which to actually do? Specifically, the error led to high numbered actions (4 - 9 == switch, 0-3 == attacks) to be recorded as low numbered. This might help explain why it likes to switch so much.

Something else going on: Notice that action values are like estimates of expected reward, and the bot picks with a bias towards the highest. When training vs itself, half the games in the dataset were wins making the best action's expected rewards for an even game state close to 0 (win == +1, loss == -1), while slightly worse actions wont be far behind. If suddenly the bot gets thrown in an environment where its win rate drops to 5%, what happens? Well, the expected rewards for the actions it actually picked would on average tend to drop, making other actions start to take the top rank and become most probable. A big problem! If it keeps losing for along time, a lot of weird actions could surface as the best. This reminds me of learned helplessness. A bot that loses all the time can become depressed. :(
I initialized team values above average expected win rates, however, to encourage exploration: the bot would try everything a few times, as they drop to expected values. You could imagine someone with high positive expectations and an optimistic bias: they'll keep looking, imagining that better is out there (until they've searched everything). Doing the opposite could lead to the problem of it being stuck in a cruddy situation, without looking elsewhere because it pessimistically imagines that is the best there is.

I have taken the bot offline, and it is training again locally against itself. Good news: the medium sized neural net overtook the small one in performance. Smaller sizes learn faster, but have less potential to match complex relationships. I dropped the small network when the medium sized networks win rate vs the small reached around 70%. When you see the replays, you'll learn that this means it is also extremely terrible, but that is a step forward. I am currently training a medium and rather large network.
I am still considering the idea of assembling a dataset that will feed a lot more information.

Searching for "SirSwitchalot", these replays have been saved:
http://replay.pokemonshowdown.com/oususpecttest-210270381
http://replay.pokemonshowdown.com/oususpecttest-210529551
http://replay.pokemonshowdown.com/oususpecttest-210533832
http://replay.pokemonshowdown.com/oususpecttest-210540578
http://replay.pokemonshowdown.com/oususpecttest-210540578 - asks if SirSwitchalot is a troll. :(
http://replay.pokemonshowdown.com/oususpecttest-210651705
http://replay.pokemonshowdown.com/oususpecttest-210677132
http://replay.pokemonshowdown.com/oususpecttest-210793458 - SirSwitchalot wins, via manaphy sweep
http://replay.pokemonshowdown.com/oususpecttest-210796423 "lol switching too much isn't good dude learnt the hard way too" -opponent
http://replay.pokemonshowdown.com/oususpecttest-210805414
http://replay.pokemonshowdown.com/oususpecttest-210808009
http://replay.pokemonshowdown.com/oususpecttest-210810642

QxC4eva, thank you for your criticism. It is much appreciated.

I could multiply alpha by opponent's GXE/own GXE given a win, and own GXE/opponent's GXE given a loss.
However, as all opponents it is currently playing are relatively similar in skill, how necessary is this?



I do not know which team is the best. How can we find that out, without such a record keeping algorithm?
I picked a bunch of teams that either a) were used successfully by other players (and posted on the Smogon RMT board) or b) supposedly had good coverage according to a team building program. "b)" teams are both particularly unknown, and ones I'm particularly curious about.
If, hypothetically, all the teams were actually exactly the same good, their action values would be roughly identical, and thus the odds of picking them are too -> completely random choice achieved.
It took far less time to implement this highly empirical method than any theoretical method of team evaluation and construction would.



The bot predicts the % HP damage every attack would deal to every % HP + (Special) Defense + item + ability combination the opponent is likely to have, in order to decide on attacks. This information is saved, and then simply recalled every time it does an attack to save on the number of calculations the bot actually has to do.
When an attack lands, it simply looks at the table of possible stat combinations, items, and abilities, and sees those that match. For those off by only a little, it uses kernel density estimates to look at how much pokemon with those spreads are likely to vary - sometimes stats may shift by a small handful. This is used as a likelihood.
The bot does use, like you suggested, a combined HP * (special) defense kernel density estimate: no need to muck around with two when one suffices.



First of all, these hard codings are intended to be minimally invasive: the net almost always makes the decisions, but sometimes a few options that are almost always bad are moved. All the restrictions I imposed are also removed as soon as one of its moves run out of pp. Of course, that first move to run out could be moonblast.
I do want the bot to know that there is a hard difference between being at +5/+5 and +6/+6.
-I realize now, that I shouldn't implement this as a hard restriction, but implement it by no longer telling the bot that it will increase special attack and special defense once it reaches +6/+6. Because it won't. Duh. *facepalm*.
Sometimes it is hard to think of the simple/obvious things! Thanks.



They certainly aren't magic, and my results thus far have been abysmal compared to the "50%" you suggested. They have however been successful in other games, like backgammon. I certainly believe it is possible, but not necessarily with the current version of my code.
It is late. More updates another day.
Thanks for posting these replays man, this is great. Do you know why it switches so much? Is it trying to predict the opponents switch?
 
Has the bot definitely improved in your opinion? I at least know it's managed to learn the basics so far, but has it stopped switching quite as much, or started predicting moves at least sort of accurately? It will take a LONG time for this to get close to good, but I'm extremely interested in its progress.

Also, does it learn from only its own moves and how they affect the opponent, or does it look at the opponent's team to copy patterns? If the latter is the case, you might want to have it challenge some high-level opponents once in a while, or it might start Toxicing a Manaphy in Rain like your opponent did in that one replay.
 
At least your bot can beat low ladder players, watching that toxic/protect/bd/aqua jet Azu was painful.

Does your bot have common moves for each poke imputted, so it knows for instance that an opposing Keldeo will probably have secret sword and water STAB, so most likely beat Excadrill? Having something to know the likely victor of a matchup might make it's switches better. You probably have something like this already, but stuff such as bringing Latios into Sylveon makes me wonder if it knows Sylveon normally wins that matchup.

It's statusing ability is questionable, taunting a +2 Heracross that will outspeed and KO next turn might not be as bad as it's switching problem, but made me wonder: Does it try to burn fire types, and try to paralyze slow pokes like Ferro? Most of these teams don't seem to use status in the replays I've watched, but it's use of status moves (esp. healing and inflicting status) needs improvement.

At least SirSwitchalot can win some battles, it's improving for sure.
 
I am working on a major revamp, to try and address a lot of the glaring issues. Once the revamp is done, I will reply to all the posts numbered 47+.
The bot will only train against itself for at least the duration of the month. It wont play online again until sometime in the first week of March at the earliest.

EDIT:
That should also give me enough time to add some documentation to my code, before posting it. It is written entirely in Python.
EDIT 2/21/15:
Bot version 1.2 is currently training.
 
Last edited:
I'm not sure how your bot calculates the best move, but when watching the replays, I saw it make many illogical moves, such as sending in Landorus-T on Azumarill, double switching your switch-in that you sent in right after a Pokemon faints, etc.
 

Karxrida

Death to the Undying Savage
is a Community Contributor Alumnus
I'm not sure how your bot calculates the best move, but when watching the replays, I saw it make many illogical moves, such as sending in Landorus-T on Azumarill, double switching your switch-in that you sent in right after a Pokemon faints, etc.
Double switching in that scenario can be useful for luring something out or predicting them to switch out. For example, double switching out of Keldeo to T-Tar expecting a Lati to come in.
 
Hi everyone!
A few updates:
-With the release of February stats, a new and improved version will launch online tomorrow. I am expecting it to still be terrible.
-I am posting all the code. It is poorly documented at the moment, but I added some notes. Ask anything about the code, and I should respond within 24 hours.
-I want help on this. The core of the AI is based on temporal difference learning. Here is a chapter on its success with backgammon. Note, the earliest version - which was decent - had a training set of 300,000 games! Later version incorporated methods looking ahead into future turns (like David Stone's technical machine), and had training sets of 1,500,000 games! So far, I barely have 700.
-I am at a cross roads. I am an unideal project manager for this: I did this as a trial of reinforcement learning concepts. Not for the pokemon. I have since moved on to another textbook ("Probabilistic Reasoning in Intelligent Systems", if anyone is interested). I am also not an experienced programmer, and am at the moment uninterested in learning javascript, or that side of things.
-The method I have been using is far too slow at generating games. It would take 8 years to get to 300,000 games. By then, people will be playing an entirely different generation of pokemon. This means that if this is going to be relevant this CANNOT keep running a browser to click buttons. It cannot simulate turns featuring attack animations.
-I do not want to do this. Statistics and machine learning are my interests, not app development or writing a whole new pokemon battle simulator for this to run on. There are many proper programers who do have loads of experience and can do the things I am uninterested in far more easily and quickly. I am hoping some of you are interested in Javascript, and want to see this succeed!
-If so, I apologize that my code probably looks like absolute rubbish.
-I am happy to continue running battles, as well as dedicating my processing power to training the network.
-I am happy to explain any of the theoretical stuff. I am happy to edit or improve anything related to the learning algorithms. As I learn more, if any of it seems particularly relevant, I may be eager to make changes that I think will improve learning performance. Eg, the current network architecture is almost certainly far from ideal.
Will you add extra tier and generation capabilities? I can see why you'd begin with ORAS OU, it's the flagship tier, but it'd be cool once it became competent at that to have it play OU every gen, and every main tier in gen 6. (and then maybe some other tiers in other gens, and some OMs way off into the future) :)
Mega evolutions make things a lot harder!
As long as there is data, gens 4 and 5 probably wouldn't be difficult.
Tiers other than ubers should also be simple. Ubers and anything goes, on the other hand, would take a lot of work to get going.

Ah clever, it's Python based. I'm not that clued up on Python's syntax, but if you need help with Selenium based stuff I should be able to help.
I need to move away from the entire current interface system.
This includes any sort of browser interaction.
But, for now, I occasionally get "Cannot send request" errors. I do not know why they would be caused by my own code, but they typically show up in the first few turns (1-4), or not at all. This is too specific for it to be unrelated to what my code does.

post
dat
source code
I've actually done a little bit of machine learning stuff professionally, I'd love to help with this.
Awesome! Code is coming up.
If you've done it professionally, odds are you could do things better yourself. Any thoughts, suggestions?

Point taken. In that case, might I suggest that you let it "watch" some tournament games? I'm no programmer, so I'm not sure how exactly it learns, but feeding it high quality games and tournament replays is the best way to learn how top players react in specific situations.
Some modifications that I haven't made yet are going to need to take place, but yes, that is possible.
If someone who has access to Showdown's logs/records is willing to take this bot and train it, or generate datasets and ship them to me, I would make this a priority:
The bot is going to need thousands upon thousands of games. Who actually played the games doesn't matter, but the better the players, the better.

The only thing lost:
The TD-gammon bot developed and improved upon the strategies pros were using at the time; pros have since adjusted their opening plays!
The potential for such creativity would be diminished. However, the alternative I see at the moment is the bot not having anywhere close to the number of games for it to get good enough to ever be worth imitating by anyone.
TD-gammon used 300,000+ games.

This huge number needed is why simply selecting tournament games is not feasible: we need a far and beyond higher numbers of games.

How cool wouldn't i be that the bot became number one on every ladder, it's possible!
I think such a thing is definitely achievable for a similar bot:
a) success of AI in games like chess, backgammon, poker
b) the fact the bot doesn't have to sleep, so it will be able to minimize its variance scores

Having the bot consider possible turn outcomes one or two turns ahead will likely lead to a HUGE almost immediate improvement. This will require a big change.
It will also likely need >100,000 games saved. It doesn't even have 1,000.
The doesn't have to actually have played this many. It just needs to be able to "watch" and generate dataset files.
More changes, and an interest by people more involved with Showdown would be necessary.

about the JavaScript thing, you don't need to run the bot from a browser. Showdown allows you to directly communicate to the server through a websocket, which basically allows you to exchange strings with the server. The showdown client uses this, it basically parses the strings from the server and displays the battles, chat rooms, etc. I don't know the specifics but you can look at the source code of the client, or look at the code of boTTT (I believe it's open source).
I know writing a bunch of code might make me look like a programmer, but I never took a programming class/coded professionally, and hadn't even heard of "Python" until November of 2013. This is far from my strong suite!
I am interested in it because I find a lot of ideas within statistics and AI absolutely fascinating. There are a million things I don't know about these, so my focus is in trying to read about and gain a grasp of these, rather than learning about JavaScript, and what it would all take to get that to start.
Honestly, a big part of it is that as I know absolutely nothing. Knowing nothing while trying to think about something is a REALLY uncomfortable feeling. I don't even know where to go to try and start getting a grasp of what is involved - are there some simple tutorials somewhere covering this sort of thing specifically?

Since I'm taking a Comp Sci course right now (though regretting it heavily), I'm intrigued by the idea of managing to program something this complex. I'd certainly be interested in seeing some replays, whether they're good or bad.
More replays once the caches for February's stats are generated.
The part that got me most was that, when starting, all I had was an idea of all the things I wanted to do.
I did not map out the most effective ways of doing them, and instead started by just picking a part of it, and then continuing to code until I had everything. Then modifying stuff when something broke.
The end is that it is a giant mess, and looks nothing like the elegant code I see posted everywhere else.

I dug through PyBrain's code a lot to make changes, and can't even come close to touching that sort of design. It takes foresight and planning to create something beautiful.
Also, to create something stable. Lots of bugs. And, with lots of code repeated at random places but minor differences, this gets a LOT worse.

Thanks for posting these replays man, this is great. Do you know why it switches so much? Is it trying to predict the opponents switch?
I tried to give some explanations earlier, but I honestly have no idea. Even when I punish switching heavily, the AI will sometimes switch so often through the course of a game it would have had more points in losing without switching than it could with even the most emphatic victory (besides the switches).
Currently, a switch is -0.075
Every turn that passes is -0.01
Winning at most gives +1, and losing a minimum of -1 (adjustments are made based on the number of pokemon KOed on both sides)

Even an emphatic loss without switching would look better to the AI, in terms of reward, than some of the games it plays where one of the copies wins. The other, of courses, loses on top to the massive switch-caused penalties.

Given thousands of games, it should be unnecessary to punish switching at all. If switching under certain circumstances does not increase the probability of winning, it simply should not do it.
Currently, I am impatiently trying to speed up the process. In games where it plays itself, it is painful to see one AI get rewarded for a terrible attack choice because the other AI made a horrible switch choice. Ugh.

Has the bot definitely improved in your opinion? I at least know it's managed to learn the basics so far, but has it stopped switching quite as much, or started predicting moves at least sort of accurately? It will take a LONG time for this to get close to good, but I'm extremely interested in its progress.
When it is first created, it essentially makes attacks at random, so it improved a lot by the time of SirSwitchalot.
I am excited to see how our venerable knight performs in a few hours. :)

Also, does it learn from only its own moves and how they affect the opponent, or does it look at the opponent's team to copy patterns? If the latter is the case, you might want to have it challenge some high-level opponents once in a while, or it might start Toxicing a Manaphy in Rain like your opponent did in that one replay.
I'm not sure how your bot calculates the best move, but when watching the replays, I saw it make many illogical moves, such as sending in Landorus-T on Azumarill, double switching your switch-in that you sent in right after a Pokemon faints, etc.
yuruuu quoted, for the following explanation on how the bot makes choices.

It does not copy the opponent's patterns.
It has an object represents a bunch of information on the game. By bunch, I mean "one thousand five hundred and forty three".
A neural network then takes 1552 inputs: those, plus one for each possible action it can take: 4 attacks, 5 choices to switch to.
For each action it is allowed to take (eg, it is not allowed to switch if the opponent has a gothitelle), the neural network uses those inputs to estimate how many points it is going to get if it takes that. KOing pokemon and winning get it points, switching and losing lose it points.

It then picks the action that it thinks will get it the most points.

When the neural network learns, it looks back on all the games it has played, for each decision it made asks a few questions.
For example, if for decision 18 it took action 3:
-How many points did this turn get me?
-How many points could the best possible thing I could have done on turn 19 get me?

It then adds the answer to both of these questions together, and then adjusts a bunch of weights so that the inputs of turn 18 combined with taking action 3 get a number closer to that answer.
This way, it thinks of both immediate reward, and the change of situation: if the bot is suddenly in a great decision in turn 19, that is extremely important to take into account. If the bot was dumb, and threw the game with a dumb decision, that doesn't really matter: it considers the best thing (what it currently thinks it should have done) on turn 19. Not what it actually did.
So, to answer a question you didn't ask:
If the bot makes bad decisions, it shouldn't learn to do them. Instead, it should learn they're bad. Once it has learned better, training on bad decisions should teach it "yeah, that is bad. Not many points."

The opponent making bad decisions is actually worse. If the opponent does something really dumb, then maybe something else would actually have been much smarter to do. If the bot does something really dumb, AND the opponent does something really dumb, the bot can easily learn "what a great brilliant move I did".
The bot should be punished for stupidity, and rewarded for cleverness. And unfortunately, against bad opponents, it can be harder to tell which is which.

A good player is extremely unlikely to lose to a terrible player, so this isn't a long term problem. The AI should eventually learn that making the safe plays leads to a higher probability of winning.
But the thing about 1543 data points is, you can easily pick out correlations that don't exist.

We humans are masters of generalization. We look at something called "Number people who drowned by falling into a swimming-pool correlates with Number of films Nicolas Cage appeared in", and at most think "lol".
When we look at "amount of rain correlates with wetness of grass immediately afterwards", we think "duh".
The AI has NO mechanism of telling these apart. We learn nothing from either case, because we can use huge banks of knowledge and understanding of how things work, to ignore the first as an anomaly, and think the second is so obvious it goes without saying.
The only way the neural network learns is from relationships in the data. If two things seem equally strongly related, it cannot tell them apart. And with 1543 data inputs, there are LOADs of things that can be correlated out of chance. There are also LOADs of possibilities for modifying, for making exceptions, for coming up with elaborate explanations to explain past experiences that are not actually true. That will not accurately project points gained vs lost in future games it plays.

Some of you have suggested I let it look one or more turns ahead. I think this is a FANTASTIC idea that will greatly increase performance, because incorporating a clear understanding of cause and effect we have - and is written into the mechanics of the game - will make learning easier. It will shift the focus of learning on things we cannot pin down mathematically ourselves: how advantageous is paralysis in situation X?/stealth rock in situation Y? +1 attack & + 1 speed/ etc.
Possible outcomes of actions do not have to be learned, but it is trying to learn them now.

At least your bot can beat low ladder players, watching that toxic/protect/bd/aqua jet Azu was painful.
I lost count of how many times the bot used belly drum, and then switched immediately afterwards.
While I was watching, it only belly drummed and swept once. That sweep, too, was cut short by a switch. :(

Why was switching so bad there? Because it had +6 at this one location that got lost?
Or because of this specific pattern of +1s and -1s it saw at this time?
We know it is the former. It can't, until it has seen loads of games with that sort of thing.

Does your bot have common moves for each poke imputted, so it knows for instance that an opposing Keldeo will probably have secret sword and water STAB, so most likely beat Excadrill? Having something to know the likely victor of a matchup might make it's switches better. You probably have something like this already, but stuff such as bringing Latios into Sylveon makes me wonder if it knows Sylveon normally wins that matchup.
The bot knows the top 6 most likely known moves by each of the opponents pokemon. As well as the amount of damage the opposing pokemon's moves would do to each of your own pokemon.
It also knows the checks and counters data of match ups between each of its own and each of its opponent's pokemon.

It's statusing ability is questionable, taunting a +2 Heracross that will outspeed and KO next turn might not be as bad as it's switching problem, but made me wonder: Does it try to burn fire types, and try to paralyze slow pokes like Ferro? Most of these teams don't seem to use status in the replays I've watched, but it's use of status moves (esp. healing and inflicting status) needs improvement.
Opponent's statting up is absolutely nothing to worry about. In its experience, they almost switch and lose status updates. One of the problems of playing horrendous players (ie, itself).
It hasn't learned that it could crush itself by dragon dancing and then NOT switching.

It knows that will-o-wisp fails on fire types, pokemon with magic bounce, and after I saw your comment, pokemon with guts. Oops, totally forgot to add that.
A problem with battling itself: using will-o-wisp on heatran is probably okay, because they'll probably switch it out and bring in garchomp.

Maybe I'm being hard on it. But it is frustrating to feel like Forest Gump runs intellectual circles around your baby. =p

I will add more teams with status moves to the database!
It needs to actually have games with them to learn.

I am not sure about ferrothorn and thunder wave. This doesn't explicitly do nothing, so I don't want to express that. But it should be able to (eventually) learn that it is far more valuable to paralyze sweepers.

It, cannot, however learn that paralyzing pokemon with gyro ball can backfire. I express how much damage gyro ball is likely to do, and that number will increase, but there is no legitimate information it gets for it to pick out why that is.
This would, however, change if it simulated future turns: it could then see projected gyro ball damage increased. I'm increasingly liking the sound of that idea.

The question is if this is a testament to the bot's ability, or the lack of the low ladder's ability.

why not both
Most of the bot's wins have come against:
a) people with 1 pokemon teams. Duh.
b) Impatient pokemon ranking outside of the top 160 on usage statistics. Loads of people just forfeit rather than wait for it to generate caches at the bottom of the ladder. Or, you know, turn on the timer. =/

Double switching in that scenario can be useful for luring something out or predicting them to switch out. For example, double switching out of Keldeo to T-Tar expecting a Lati to come in.
Yeah. I think a feedback loop that will just take some time to get past is part of the cause:
-Playing against someone who switches a lot, trying to pull off a double switch is more likely to work (since they're switching)
-You're less likely to get punished by an attack
-You're unlikely to get punished by a stat up move, because a) they're poorly reinforced due to all the switching making it hard for the bot to realize they're actually valuable and b) even if the AI takes the free turn to stat up, they'll just switch afterwards and lose it.



[hr]

I attached a bunch of scripts.
Before running any of them, they need:
Python 2.7.x.
sklearn
splinter
pybrain
statsmodels
numpy
scipy
pandas
statsmodels

To get it to run, you'll also need to download a few things from the most recent month: http://www.smogon.com/stats/
From the months page, save 'ou-1825.txt' and keep that name.
chaos: 1825, 1500, and 0 ratings. Name them as the names appear, eg 'ou-0.json', etcetera.
leads: ou-1825.txt as 'leads-ou-1825.txt'.
Create two new folders. One named datasets, and the other logs.

You'll also have to download the neural network (16 mb), and save it as "Medium_Net_Long.xml".


They scripts are:
Team_List3.py
These are the teams the AI chooses from. It will generate the team list file, as well as the action value files, and append to them as necessary in case you decide to add more teams.
If you do, some guidelines:
-It probably cannot learn to protect on the turn a pokemon mega evolves. It is definitely incapable of not mega evolving while doing so to gain sharpedo's speed boost, so that is simply a pokemon it cannot use effectively.
-It cannot handle nick names, shiny pokemon, or genders.
-It cannot handle mega pokemon. Stick with [Non-mega pokemon] @ Metaitem format, instead of [pokemon species]-Mega @ Megaitem with Ability: Megaability.
-It cannot handle you telling it what type of hidden power it has (although, that would be an easy fix). For now, just tell it the ivs, and edit the pokemon_class9.py file, roughly line 357, to make it correctly identify what hidden power it is using. This is already done for:
"Magnezone":fire,"Magneton":fire,"Mienshao":ice,"Raikou":ice, "Thundurus":ice,"Rotom-Wash":fire, "Keldeo":flying if the ivs are [31,31,31,31,30,30], else bug, "Serperior":fire, "Latios":fire, "Tornadus-Therian":ice, "Amoonguss":fire
As I was the only one using this, user friendliness never really crossed my mind. =/

fill_out_team_cache_dictionaries.py
This will create team caches. It'll create the folders on its own, because I don't feel like doing more of two of anything manually myself. You just have to delete or rename the old one each time you download a new month's data so that it doesn't see the old caches anymore.

interface_9.py
You need to go to lines 61 and 62 and replace the three usernames and passwords with real usernames and passwords that actually exist on Showdown. Usernames 1 and 2 cannot be the same. You must leave the \n at the end of the passwords, but this cannot be a part of the actual password ("\n" is a line break, like hitting enter. Basically, the bot immediately hits enter when it enters the password). "SirSwitchalot" is Username3 on my copy of the file.
You could comment a couple lines out of interface_9.py and delete a few hashtags and you can take actions for the neural network, and it will still create datasets for you. Look for line 3100 if you want to do this. To take actions, you still aren't allowed to click buttons on the browser: you'll have to enter numbers in the terminal. If you think you're likely to enter numbers wrong, you can make changes (or ask me to) and make it crash resistant.
If you want it to print out info you find interesting, let me know and I can do that too.

launcher.py
This launches games. The numbers in the file right now will have it launch a browser and play itself.
To have it play online, comment out all the lines in the function "main" involving MyPool and game.
Replace these with play_game(2, runs) following the 'print "Runs:", runs' line.

data_set_trainer_loop.py
This will will train the network based on the datasets saved thus far. You'll have to tell it "1", as neither of the other two networks exist at the moment. One is retired, and the other is a hypothetical future network that will probably never exist.

RemoteException.py
Credit goes entirely to someone from StackOverflow. Debugging is a pain when errors from child processes don't show.

PyBrainRestriction_Real.py
Credit goes almost entirely to the folks at PyBrain. I just made a few minor edits to accommodate restriction lists, and added an option to submit actions to a network (rather than have it take them).
You still need the PyBrain module, but everything that was changed imports from this file

team_reader4.py
This file handles your and the opponent's team during the game, as well as the game object itself. The cache generated also uses this, and if you're running the cache generator while some caches already exist, you may want to go to lines 180 - 182 and temporarily delete the hashtags and save before starting it. You will have to add them back before any games can be played, so I'd add them back and re-save as soon as you started the cache_generator.

pokemon_class9.py
Contains the pokemon objects and damage calculator functions. Ugly, ugly file. Truly.

I made a bunch of changes recently, so I may have to make bug-related edits.
If you try this and encounter any bugs, please let me know! :)
If I forgot to attach any files, let me know!
 

Attachments

I need to move away from the entire current interface system.
This includes any sort of browser interaction.
But, for now, I occasionally get "Cannot send request" errors. I do not know why they would be caused by my own code, but they typically show up in the first few turns (1-4), or not at all. This is too specific for it to be unrelated to what my code does.
Not sure why specifically you'd be getting that error. It might be best to return the HTTP error code and some stacktrace information to help. However one thing I did notice is that you have a lot of explicit sleeping going on, which doesn't sit well with selenium. Where you instantiate your browser, you can use an implicit wait, which mean Selenium will wait for the time specified/ until it finds the specified element (whichever is quickest). This may help your request problem as it'll wait until elements are ready and available.

You can find more information about waits here: http://selenium-python.readthedocs.org/en/latest/waits.html
 
1 - Concerning the stat-boost+switch problem, what if you set it up so that switching with stat boosts gives a harsher penalty then just normal switching. Like you said it should learn this on its own but a penalty like this might help it learn not to do this faster.
2 - With the Protect+MegaEvo problem, is to ether set it up so that it is rewarded greatly for making MegaEvo Turn+Protect (or perhaps look for the Megastone or the like), or just to hardcode that in.
 
Currently, I am reading "Probabilistic Reasoning in Intelligence Systems" by Judea Pearl to study Bayesian networks.
There are a lot of theoretical advantages, and I expect it to outperform a neural network given realistic numbers of games-played.

Thanks Simperheve. Once I start simulating games again, I'll look for that.

If anyone knows javascript and python, and is willing to work on an interface with showdown's server that doesn't require opening a browser, please let me know.
 
Being an engineer myself I'm really interested in this project.

Do you think you could implement a feature like this? The bot goes on custom games vs a high elo player and after every turn, the human player numerically evaluates the bot's choice and suggests which were the best options.

I'll be glad to train your bot a little bit. If you're interested please PM me.

Best wishes.
 
Lol that thing is awesome. I think I fought against it while doin laddering achievements.

Just a suggestion. If that thing switch that much why don't you try adding a stall team to the base. I know is a boring playstyle but imaging people raging in the chat cause the bot won using stall. Hilarious.

Great work and keep it up i want to face it in medium-high ladder :P
 
I don't agree. Stall requires coming up with a long term strategy, and requires deeper understanding of the game so as to figure how to win the game. That seems much harder to code than offensive playstyle.

Note that I'm not saying stall is harder to play than offensive (at least at high level). But offense is easier to catch up on at basic level because it's more intuitive.
 

Disaster Area

formerly Piexplode
I don't agree. Stall requires coming up with a long term strategy, and requires deeper understanding of the game so as to figure how to win the game. That seems much harder to code than offensive playstyle.

Note that I'm not saying stall is harder to play than offensive (at least at high level). But offense is easier to catch up on at basic level because it's more intuitive.
eh in some ways.. and it depends on the generation. Stall might be more intuitive to a machine since you can get reasonable results with a flowchart type approach [oh they've sent out such and such a threat, I had better switch into this and do this series of actions to prevent myself form losing].. whilst it's perhaps not the same at the highest level so much, and at the lower level you have the issues of it can be destroyed by random crap on occasion, it might be easier for a computer to play
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top