Hi everyone!
A few updates:
-With the release of February stats, a new and improved version will launch online tomorrow. I am expecting it to still be terrible.
-I am posting all the code. It is poorly documented at the moment, but I added some notes. Ask anything about the code, and I should respond within 24 hours.
-I want help on this. The core of the AI is based on temporal difference learning.
Here is a chapter on its success with backgammon. Note, the earliest version - which was decent - had a training set of 300,000 games! Later version incorporated methods looking ahead into future turns (like David Stone's technical machine), and had training sets of 1,500,000 games! So far, I barely have 700.
-I am at a cross roads. I am an unideal project manager for this: I did this as a trial of reinforcement learning concepts. Not for the pokemon. I have since moved on to another textbook ("Probabilistic Reasoning in Intelligent Systems", if anyone is interested). I am also not an experienced programmer, and am at the moment uninterested in learning javascript, or that side of things.
-The method I have been using is far too slow at generating games. It would take 8 years to get to 300,000 games. By then, people will be playing an entirely different generation of pokemon. This means that if this is going to be relevant this CANNOT keep running a browser to click buttons. It cannot simulate turns featuring attack animations.
-I do not want to do this. Statistics and machine learning are my interests, not app development or writing a whole new pokemon battle simulator for this to run on. There are many proper programers who do have loads of experience and can do the things I am uninterested in far more easily and quickly. I am hoping some of you are interested in Javascript, and want to see this succeed!
-If so, I apologize that my code probably looks like absolute rubbish.
-I am happy to continue running battles, as well as dedicating my processing power to training the network.
-I am happy to explain any of the theoretical stuff. I am happy to edit or improve anything related to the learning algorithms. As I learn more, if any of it seems particularly relevant, I may be eager to make changes that I think will improve learning performance. Eg, the current network architecture is almost certainly far from ideal.
Will you add extra tier and generation capabilities? I can see why you'd begin with ORAS OU, it's the flagship tier, but it'd be cool once it became competent at that to have it play OU every gen, and every main tier in gen 6. (and then maybe some other tiers in other gens, and some OMs way off into the future) :)
Mega evolutions make things a lot harder!
As long as there is data, gens 4 and 5 probably wouldn't be difficult.
Tiers other than ubers should also be simple. Ubers and anything goes, on the other hand, would take a lot of work to get going.
Ah clever, it's Python based. I'm not that clued up on Python's syntax, but if you need help with Selenium based stuff I should be able to help.
I need to move away from the entire current interface system.
This includes any sort of browser interaction.
But, for now, I occasionally get "Cannot send request" errors. I do not know why they would be caused by my own code, but they typically show up in the first few turns (1-4), or not at all. This is too specific for it to be unrelated to what my code does.
post
dat
source code
I've actually done a little bit of machine learning stuff professionally, I'd love to help with this.
Awesome! Code is coming up.
If you've done it professionally, odds are you could do things better yourself. Any thoughts, suggestions?
Point taken. In that case, might I suggest that you let it "watch" some tournament games? I'm no programmer, so I'm not sure how exactly it learns, but feeding it high quality games and tournament replays is the best way to learn how top players react in specific situations.
Some modifications that I haven't made yet are going to need to take place, but yes, that is possible.
If someone who has access to Showdown's logs/records is willing to take this bot and train it, or generate datasets and ship them to me, I would make this a priority:
The bot is going to need thousands upon thousands of games. Who actually played the games doesn't matter, but the better the players, the better.
The only thing lost:
The TD-gammon bot developed and improved upon the strategies pros were using at the time; pros have since adjusted their opening plays!
The potential for such creativity would be diminished. However, the alternative I see at the moment is the bot not having anywhere close to the number of games for it to get good enough to ever be worth imitating by anyone.
TD-gammon used 300,000+ games.
This huge number needed is why simply selecting tournament games is not feasible: we need a far and beyond higher numbers of games.
How cool wouldn't i be that the bot became number one on every ladder, it's possible!
I think such a thing is definitely achievable for a similar bot:
a) success of AI in games like chess, backgammon, poker
b) the fact the bot doesn't have to sleep, so it will be able to minimize its variance scores
Having the bot consider possible turn outcomes one or two turns ahead will likely lead to a HUGE almost immediate improvement. This will require a big change.
It will also likely need >100,000 games saved. It doesn't even have 1,000.
The doesn't have to actually have played this many. It just needs to be able to "watch" and generate dataset files.
More changes, and an interest by people more involved with Showdown would be necessary.
about the JavaScript thing, you don't need to run the bot from a browser. Showdown allows you to directly communicate to the server through a websocket, which basically allows you to exchange strings with the server. The showdown client uses this, it basically parses the strings from the server and displays the battles, chat rooms, etc. I don't know the specifics but you can look at the source code of the client, or look at the code of boTTT (I believe it's open source).
I know writing a bunch of code might make me look like a programmer, but I never took a programming class/coded professionally, and hadn't even heard of "Python" until November of 2013. This is far from my strong suite!
I am interested in it because I find a lot of ideas within statistics and AI absolutely fascinating. There are a million things I don't know about these, so my focus is in trying to read about and gain a grasp of these, rather than learning about JavaScript, and what it would all take to get that to start.
Honestly, a big part of it is that as I know absolutely nothing. Knowing nothing while trying to think about something is a REALLY uncomfortable feeling. I don't even know where to go to try and start getting a grasp of what is involved - are there some simple tutorials somewhere covering this sort of thing specifically?
Since I'm taking a Comp Sci course right now (though regretting it heavily), I'm intrigued by the idea of managing to program something this complex. I'd certainly be interested in seeing some replays, whether they're good or bad.
More replays once the caches for February's stats are generated.
The part that got me most was that, when starting, all I had was an idea of all the things I wanted to do.
I did not map out the most effective ways of doing them, and instead started by just picking a part of it, and then continuing to code until I had everything. Then modifying stuff when something broke.
The end is that it is a giant mess, and looks nothing like the elegant code I see posted everywhere else.
I dug through PyBrain's code a lot to make changes, and can't even come close to touching that sort of design. It takes foresight and planning to create something beautiful.
Also, to create something stable. Lots of bugs. And, with lots of code repeated at random places but minor differences, this gets a LOT worse.
Thanks for posting these replays man, this is great. Do you know why it switches so much? Is it trying to predict the opponents switch?
I tried to give some explanations earlier, but I honestly have no idea. Even when I punish switching heavily, the AI will sometimes switch so often through the course of a game it would have had more points in losing without switching than it could with even the most emphatic victory (besides the switches).
Currently, a switch is -0.075
Every turn that passes is -0.01
Winning at most gives +1, and losing a minimum of -1 (adjustments are made based on the number of pokemon KOed on both sides)
Even an emphatic loss without switching would look better to the AI, in terms of reward, than some of the games it plays where one of the copies wins. The other, of courses, loses on top to the massive switch-caused penalties.
Given thousands of games, it should be unnecessary to punish switching at all. If switching under certain circumstances does not increase the probability of winning, it simply should not do it.
Currently, I am impatiently trying to speed up the process. In games where it plays itself, it is painful to see one AI get rewarded for a terrible attack choice because the other AI made a horrible switch choice. Ugh.
Has the bot definitely improved in your opinion? I at least know it's managed to learn the basics so far, but has it stopped switching quite as much, or started predicting moves at least sort of accurately? It will take a LONG time for this to get close to good, but I'm extremely interested in its progress.
When it is first created, it essentially makes attacks at random, so it improved a lot by the time of SirSwitchalot.
I am excited to see how our venerable knight performs in a few hours. :)
Also, does it learn from only its own moves and how they affect the opponent, or does it look at the opponent's team to copy patterns? If the latter is the case, you might want to have it challenge some high-level opponents once in a while, or it might start Toxicing a Manaphy in Rain like your opponent did in that one replay.
I'm not sure how your bot calculates the best move, but when watching the replays, I saw it make many illogical moves, such as sending in Landorus-T on Azumarill, double switching your switch-in that you sent in right after a Pokemon faints, etc.
yuruuu quoted, for the following explanation on how the bot makes choices.
It does not copy the opponent's patterns.
It has an object represents a bunch of information on the game. By bunch, I mean "one thousand five hundred and forty three".
A neural network then takes 1552 inputs: those, plus one for each possible action it can take: 4 attacks, 5 choices to switch to.
For each action it is allowed to take (eg, it is not allowed to switch if the opponent has a gothitelle), the neural network uses those inputs to estimate how many points it is going to get if it takes that. KOing pokemon and winning get it points, switching and losing lose it points.
It then picks the action that it thinks will get it the most points.
When the neural network learns, it looks back on all the games it has played, for each decision it made asks a few questions.
For example, if for decision 18 it took action 3:
-How many points did this turn get me?
-How many points could the best possible thing I could have done on turn 19 get me?
It then adds the answer to both of these questions together, and then adjusts a bunch of weights so that the inputs of turn 18 combined with taking action 3 get a number closer to that answer.
This way, it thinks of both immediate reward, and the change of situation: if the bot is suddenly in a great decision in turn 19, that is extremely important to take into account. If the bot was dumb, and threw the game with a dumb decision, that doesn't really matter: it considers the best thing (what it currently thinks it should have done) on turn 19. Not what it actually did.
So, to answer a question you didn't ask:
If the bot makes bad decisions, it shouldn't learn to do them. Instead, it should learn they're bad. Once it has learned better, training on bad decisions should teach it "yeah, that is bad. Not many points."
The opponent making bad decisions is actually worse. If the opponent does something really dumb, then maybe something else would actually have been much smarter to do. If the bot does something really dumb, AND the opponent does something really dumb, the bot can easily learn "what a great brilliant move I did".
The bot should be punished for stupidity, and rewarded for cleverness. And unfortunately, against bad opponents, it can be harder to tell which is which.
A good player is extremely unlikely to lose to a terrible player, so this isn't a long term problem. The AI should eventually learn that making the safe plays leads to a higher probability of winning.
But the thing about 1543 data points is, you can easily pick out
correlations that don't exist.
We humans are masters of generalization. We look at something called "
Number people who drowned by falling into a swimming-pool correlates with
Number of films Nicolas Cage appeared in", and at most think "lol".
When we look at "amount of rain correlates with wetness of grass immediately afterwards", we think "duh".
The AI has NO mechanism of telling these apart. We learn nothing from either case, because we can use huge banks of knowledge and understanding of how things work, to ignore the first as an anomaly, and think the second is so obvious it goes without saying.
The only way the neural network learns is from relationships in the data. If two things seem equally strongly related, it cannot tell them apart. And with 1543 data inputs, there are LOADs of things that can be correlated out of chance. There are also LOADs of possibilities for modifying, for making exceptions, for coming up with elaborate explanations to explain past experiences that are not actually true. That will not accurately project points gained vs lost in future games it plays.
Some of you have suggested I let it look one or more turns ahead. I think this is a FANTASTIC idea that will greatly increase performance, because incorporating a clear understanding of cause and effect we have - and is written into the mechanics of the game - will make learning easier. It will shift the focus of learning on things we cannot pin down mathematically ourselves: how advantageous is paralysis in situation X?/stealth rock in situation Y? +1 attack & + 1 speed/ etc.
Possible outcomes of actions do not have to be learned, but it is trying to learn them now.
At least your bot can beat low ladder players, watching that toxic/protect/bd/aqua jet Azu was painful.
I lost count of how many times the bot used belly drum, and then switched immediately afterwards.
While I was watching, it only belly drummed and swept once. That sweep, too, was cut short by a switch. :(
Why was switching so bad there? Because it had +6 at this one location that got lost?
Or because of this specific pattern of +1s and -1s it saw at this time?
We know it is the former. It can't, until it has seen loads of games with that sort of thing.
Does your bot have common moves for each poke imputted, so it knows for instance that an opposing Keldeo will probably have secret sword and water STAB, so most likely beat Excadrill? Having something to know the likely victor of a matchup might make it's switches better. You probably have something like this already, but stuff such as bringing Latios into Sylveon makes me wonder if it knows Sylveon normally wins that matchup.
The bot knows the top 6 most likely known moves by each of the opponents pokemon. As well as the amount of damage the opposing pokemon's moves would do to each of your own pokemon.
It also knows the checks and counters data of match ups between each of its own and each of its opponent's pokemon.
It's statusing ability is questionable, taunting a +2 Heracross that will outspeed and KO next turn might not be as bad as it's switching problem, but made me wonder: Does it try to burn fire types, and try to paralyze slow pokes like Ferro? Most of these teams don't seem to use status in the replays I've watched, but it's use of status moves (esp. healing and inflicting status) needs improvement.
Opponent's statting up is absolutely nothing to worry about. In its experience, they almost switch and lose status updates. One of the problems of playing horrendous players (ie, itself).
It hasn't learned that it could crush itself by dragon dancing and then NOT switching.
It knows that will-o-wisp fails on fire types, pokemon with magic bounce, and after I saw your comment, pokemon with guts. Oops, totally forgot to add that.
A problem with battling itself: using will-o-wisp on heatran is probably okay, because they'll probably switch it out and bring in garchomp.
Maybe I'm being hard on it. But it is frustrating to feel like Forest Gump runs intellectual circles around your baby. =p
I will add more teams with status moves to the database!
It needs to actually have games with them to learn.
I am not sure about ferrothorn and thunder wave. This doesn't explicitly do nothing, so I don't want to express that. But it should be able to (eventually) learn that it is far more valuable to paralyze sweepers.
It, cannot, however learn that paralyzing pokemon with gyro ball can backfire. I express how much damage gyro ball is likely to do, and that number will increase, but there is no legitimate information it gets for it to pick out why that is.
This would, however, change if it simulated future turns: it could then see projected gyro ball damage increased. I'm increasingly liking the sound of that idea.
The question is if this is a testament to the bot's ability, or the lack of the low ladder's ability.
why not both
Most of the bot's wins have come against:
a) people with 1 pokemon teams. Duh.
b) Impatient pokemon ranking outside of the top 160 on usage statistics. Loads of people just forfeit rather than wait for it to generate caches at the bottom of the ladder. Or, you know, turn on the timer. =/
Double switching in that scenario can be useful for luring something out or predicting them to switch out. For example, double switching out of Keldeo to T-Tar expecting a Lati to come in.
Yeah. I think a feedback loop that will just take some time to get past is part of the cause:
-Playing against someone who switches a lot, trying to pull off a double switch is more likely to work (since they're switching)
-You're less likely to get punished by an attack
-You're unlikely to get punished by a stat up move, because a) they're poorly reinforced due to all the switching making it hard for the bot to realize they're actually valuable and b) even if the AI takes the free turn to stat up, they'll just switch afterwards and lose it.
I attached a bunch of scripts.
Before running any of them, they need:
Python 2.7.x.
sklearn
splinter
pybrain
statsmodels
numpy
scipy
pandas
statsmodels
To get it to run, you'll also need to download a few things from the most recent month:
http://www.smogon.com/stats/
From the months page, save 'ou-1825.txt' and keep that name.
chaos: 1825, 1500, and 0 ratings. Name them as the names appear, eg 'ou-0.json', etcetera.
leads: ou-1825.txt as 'leads-ou-1825.txt'.
Create two new folders. One named datasets, and the other logs.
You'll also have to download
the neural network (16 mb), and save it as "Medium_Net_Long.xml".
They scripts are:
Team_List3.py
These are the teams the AI chooses from. It will generate the team list file, as well as the action value files, and append to them as necessary in case you decide to add more teams.
If you do, some guidelines:
-It probably cannot learn to protect on the turn a pokemon mega evolves. It is definitely incapable of not mega evolving while doing so to gain sharpedo's speed boost, so that is simply a pokemon it cannot use effectively.
-It cannot handle nick names, shiny pokemon, or genders.
-It cannot handle mega pokemon. Stick with [Non-mega pokemon] @ Metaitem format, instead of [pokemon species]-Mega @ Megaitem with Ability: Megaability.
-It cannot handle you telling it what type of hidden power it has (although, that would be an easy fix). For now, just tell it the ivs, and edit the pokemon_class9.py file, roughly line 357, to make it correctly identify what hidden power it is using. This is already done for:
"Magnezone":fire,"Magneton":fire,"Mienshao":ice,"Raikou":ice, "Thundurus":ice,"Rotom-Wash":fire, "Keldeo":flying if the ivs are [31,31,31,31,30,30], else bug, "Serperior":fire, "Latios":fire, "Tornadus-Therian":ice, "Amoonguss":fire
As I was the only one using this, user friendliness never really crossed my mind. =/
fill_out_team_cache_dictionaries.py
This will create team caches. It'll create the folders on its own, because I don't feel like doing more of two of anything manually myself. You just have to delete or rename the old one each time you download a new month's data so that it doesn't see the old caches anymore.
interface_9.py
You need to go to lines 61 and 62 and replace the three usernames and passwords with real usernames and passwords that actually exist on Showdown. Usernames 1 and 2 cannot be the same. You must leave the \n at the end of the passwords, but this cannot be a part of the actual password ("\n" is a line break, like hitting enter. Basically, the bot immediately hits enter when it enters the password). "SirSwitchalot" is Username3 on my copy of the file.
You could comment a couple lines out of interface_9.py and delete a few hashtags and you can take actions for the neural network, and it will still create datasets for you. Look for line 3100 if you want to do this. To take actions, you still aren't allowed to click buttons on the browser: you'll have to enter numbers in the terminal. If you think you're likely to enter numbers wrong, you can make changes (or ask me to) and make it crash resistant.
If you want it to print out info you find interesting, let me know and I can do that too.
launcher.py
This launches games. The numbers in the file right now will have it launch a browser and play itself.
To have it play online, comment out all the lines in the function "main" involving MyPool and game.
Replace these with play_game(2, runs) following the 'print "Runs:", runs' line.
data_set_trainer_loop.py
This will will train the network based on the datasets saved thus far. You'll have to tell it "1", as neither of the other two networks exist at the moment. One is retired, and the other is a hypothetical future network that will probably never exist.
RemoteException.py
Credit goes entirely to someone from StackOverflow. Debugging is a pain when errors from child processes don't show.
PyBrainRestriction_Real.py
Credit goes almost entirely to the folks at PyBrain. I just made a few minor edits to accommodate restriction lists, and added an option to submit actions to a network (rather than have it take them).
You still need the PyBrain module, but everything that was changed imports from this file
team_reader4.py
This file handles your and the opponent's team during the game, as well as the game object itself. The cache generated also uses this, and if you're running the cache generator while some caches already exist, you may want to go to lines 180 - 182 and temporarily delete the hashtags and save before starting it. You will have to add them back before any games can be played, so I'd add them back and re-save as soon as you started the cache_generator.
pokemon_class9.py
Contains the pokemon objects and damage calculator functions. Ugly, ugly file. Truly.
I made a bunch of changes recently, so I may have to make bug-related edits.
If you try this and encounter any bugs, please let me know! :)
If I forgot to attach any files, let me know!