Data An OU Bot

Status
Not open for further replies.
So what you're saying is, this bot LEARNS stuff? Maybe put it into the installed version of showdown as a team tester :O
From what the OP said, it needs alot mpre work first xD, but when the bot can win decent level games, it would be a great tester. Gimmick teams may be an exception, as they can often be hard to use when using stuff like bots.
 

Kit Kasai

Love colored magic
Maybe instead of letting it loose on the ladder, let it first observe battles of high elo players to mine a bunch of data and make the bot learn from that?

Edit: about the JavaScript thing, you don't need to run the bot from a browser. Showdown allows you to directly communicate to the server through a websocket, which basically allows you to exchange strings with the server. The showdown client uses this, it basically parses the strings from the server and displays the battles, chat rooms, etc. I don't know the specifics but you can look at the source code of the client, or look at the code of boTTT (I believe it's open source).
 
Last edited:

blinkie

¯\_( ͡° ͜ʖ ͡°)_/¯ dank meme crew
Maybe also the bot could like do a calc of how much damage its most effective move would do to the target, and then the more damage it does the more likely it would "predict" a switch?
 
Maybe also the bot could like do a calc of how much damage its most effective move would do to the target, and then the more damage it does the more likely it would "predict" a switch?
You'd have to factor in ladder rank a little, low ladder players are less likely to predict than high ladder.

Simpler still, only put it in once you get to higher ladder.
 
I hate to be a party pooper, but if the bot gets used a lot, should the Showdown mods do anything about it?

In other words, would it be okay if 90% of showdown was bots battling bots?
 
I hate to be a party pooper, but if the bot gets used a lot, should the Showdown mods do anything about it?

In other words, would it be okay if 90% of showdown was bots battling bots?
Only about 10% of users can code a bot, don't worry. Probs less actually, if you exclude the bug fixers, site coders etc. Heck, I'm learning coding at school and probably know alot more about coding than many PS! users, and I certainly couldn't code a bot.
 
You don't need to be a good player to code a good bot. I am a terrible chess player, but I wrote a bot that looked at a good enough game tree to beat quite a lot of experienced chess players. The only requirement is coding skills.
 
i get the concern about botting the ladder, but as mentioned the moment bots git gud they'll be banned, no real concern there. the bigger benefit would be allowing offladder practice/testing, maybe even offline
 
What I think may improve the bot without having to change the algorithm too much or too intelligently is to change the way it uses teams. I think if it had a pool of pokemon to build a team with instead of a pool of teams to choose from it would work better even with the same win+1 loss-1 decision making
 
I noticed you said this
Anyone know anything about programming java script? I know nothing, and my code has awkwardly been running a browser, clicking buttons, and reading text. It would be faster (and probably more stable) to instead communicate to a server directly. I would rather not run any more Firefox windows than necessary; I could not get any headless browsers to successfully run java script.
What technology stack are you using to run the bot? Selenium?
 
Your source code may be awkward, but if you show it , people will be able to help you.

With how learning work, if the bot plays against bad players, he will learn how to beat bad player. It's not what we want. We want him to battle good player. Or he'll never be good.
One problem will definitely the number of match required to have a large enough sample base for the bot to do anything.

Which database is your bot using ? (for computing possible moves etc). With a "local" database, it can be faster (but requires more resources on the clients side).
Same with damage calculator, , how do you calc ?

On obvious thing the bot can't do, don't try to status a pokemon already statused.

PS : How do you plan to adapt against the player you're against ? I think a bit of dynamic programmation is required as if the player can deduce what the bot will do, it's over. How do you intend to implements risk ?
 
Why is learning part of the bot's AI? Isn't there an "elegant solution" to pokemon battling that generally works from turn 1 and only learns items and moves as they are revealed?
 
This idea isn't too far-fetched from the ChallengeCup bot 'Muh Bans'. Even from that tier as something completely different from skills, I'd like to know how this beta bot plays out.
 
just to make the math easier, perhaps you could consider playing LC. The calcs will be easier and there will be less items and variety to deal with. IMO you should perfect the algorithm in LC before applying it to OU.
 
I'm working on thinning the herd of bugs. The bot rarely crashes now, but a lot of bugs screw with the data. For example, for a while opposing mega pokemon were registered as being both impervious and incapable of dealing damage while the bot didn't think it's mega Charizard X gained STAB from dragon moves, even though it registered the type change of every other mega.
This creates unnecessary noise, and may explain why it was willing to try using earthquake against teams of 6 fliers: if mega tyranitar looks immune to earthquake but dies, why not Gliscor?

This is sick as fuck, but be sure you post it in Technical Projects as well.
Aye, I'll work on that.

You should check out the work Obi's done from way back when. Here are a couple links:

http://www.smogon.com/forums/threads/technical-machine-a-pokemon-bot.3455783/

http://doublewise.net/pokemon/
Thanks! Interesting. He is using an entirely different approach. His method has found success in chess, defeating even the best human players.
His method simulates all possible combinations of actions for the following three turns. If there were 9 choices to consider (4 attacks + 5 switch choices), looking three turns ahead would require looking at and comparing 531,441 different possibilities!
Of course, with the possibility of mega evolutions, different move sets and stat spreads, the number could be much higher. This is intense. Heuristics have to be used to weed out unlikely/bad possibilities. It takes a lot of brute force, and a means of actually saying which possibilities are better still needs to be put in place.
What is better, damage, burn, paralysis, increasing your own stats?
In the end, damage of course, but you can't look that far ahead.
Although, the same can be said about chess - what chess position is really better? The problems are obviously solvable: IBM beat the world champion in the '90s.

The method I am using achieved success in backgamon.
It is based on training a neural network (or, if there are a finite number of possibilities, like in blackjack, simply filling out values in a table) to predict how likely a certain action given a markov state will lead to victory.
Of course, like HNA suggested, there is the problem that the skill of your opponents comes into play. When I first started training the bot, it:
a) The value of tau was initialized at 2. This means that if the bot thought one move was guaranteed to win it the game (value = 1), and every other move was guranteed to lose it the game (value = -1), it would have about a 25% chance of picking the winning move. e^(1/2)/(e^(1/2)+8*e^(-1/2)) = 0.253
b) Of course, this was done for a reason: the bot has no idea how any move actually leads to rewards; its guesses were completely random at first, so why actually base a decision on them?
The problem with playing against someone who switches around like a moron, is that you don't really bother to use attacks effective against the pokemon that is actually out. The best action against a moron may differ from the best action against a competent person.

The advantages are that, given a good sample (both big, and involving good players) it should learn how to compute optimal actions from the game state, without bothering to have to grind through massive numbers of computations. The bot makes decisions extremely quickly - faster than people (I could slow it down by letting it consider less and less frequently used items and ev & nature spreads) - and this is independent of how much it learns. If it learns to play well, it wont make decisions any slower. This also appeals to me, because I don't want people to be frustrated playing against a bot that always takes 20 seconds to make decisions. That takes a lot of the fun out of game, and might help turn public opinion against projects like this one.

While Obi's may be good out of the box, the way I'm doing it is fairly terrible without a huge sample. However, with a large sample, I believe it has the potential to learn to play extremely well and properly evaluate a lot of the questions that still have to be answered using the chess method: how likely is every state to lead to victory, how valuable are they?
Of course, the two methods could be combined. It probably wouldn't be too difficult to implement learning in David Stone's bot, so that it learns the optimal policies (ie, weights to place on things like stealth rock and damage) for decision making.

First of all, you shouldn't do the team choice step. It's totally useless. Just use a good one and try to initiate some gaming actions more dependants of the team.
a) A good team now may not be one in the future, so I would prefer it to not learn the specifics of a team too heavily.
b) I do not know what the best team is, and having the same player use multiple different teams to find out how they rank is an idea that intrigues me.

I think the real question here is can the Bot play baton pass? Because if it can it's about as competent as 85% of ladder, if not more so (including me).
I hope so. Unfortunately, it appears to have a hard time learning that u-turn is almost always strictly better than switching. It would take a very large sample of games before it would be able to make this sort of deduction. It would probably need someone to train it, being unlikely to ever find that out on its own.
As I will explain more in answer to later questions:
I modified a script to let a human replace the neural network in making decisions, so that basically the script just asks you for the move. This can be used to build a database of games to train with.

Right now, it is probably worse than >99% of the ladder.

This is sick; I feel like they should specifically make a ladder where you can use bots, lol, it does seem kinda cheap, but hey, if it's not banned, who cares?
On that note, it only takes a few minor tweaks to make the script constantly spit out all of its damage calculations based on the stat probabilities. Pretty sick to see the damage one attack does, and then make updated predictions based on how likely it is to out speed you.
That, however, is probably well into cheating territory, so it is best to avoid that.

Also, would it be possible to make a bot with some sort of AI that works like an amibo (don't know how to spell that :/)? That could be way cool!
Hmm?

Skipping team making would really help, machines are rarely useful for that. Just put a few good teams in there and let it pick between them.
That is what it does. It has 43 teams to choose from, and keeps a value estimate for each of them. The better the team, the more likely it is to pick it.

this is actually p sick. do you by chance have the replay of the bot winning?
This looks amazing! I also really want to see replays of this, mainly to see how the bot works, what misplays it commits and how it can be improved.

No, but I can probably dig up the log the bot saved.
I took the bot offline to make changes, and start working on a human-supervisor-made data base. It'll be back up within a few days, hopefully better than before. I'll post replays.

Am I the only one not hyped about having bots on ladder? Hearthstone had a major epidemic a few months back, and playing against the things was hell (I heard one managed to reach Rank 1 Legend for a little while). While I doubt that this'll become as widespread as that, the whole ethics of the thing bothers me a bit. Most people play on Showdown to play other people, not bots (otherwise people would just play Battle Tower or something).

That said, I'm really impressed that you managed to put this together, so nice work.
If people actually made top level bots then we would handle it appropriately (and thanks to obi's work, the issue has been raised and discussed before so it's not desperately unfamiliar territory) but as it is I don't think anyone's made one anywhere as good as obi's..
Hearthstone bots were slow to make decisions, but more importantly:
They were sold commercially, because having a bot grind points for you would help you gain access to in-game goodies. This is not an issue on Smogon.
This commercial aspect, vs a few members toying with it on the forum is a huge difference.

This bot is not close to Obi's/David Stone's. [Hopefully I can say, "yet".]


A few things I can think of to improve this :
- Preventing the bot from setting up beyond what is necessary to sweep the rest of the team.
- Making sure the bot tries to get SR up as soon as possible
- Using Substitute whenever the opposing Pokemon cannot break it
- Using recovery if the Pokemon is expected to be under 50% health by the end of the turn and if the opponent cannot 2HKO it.
- This is pretty vague but the more useful or needed a Pokemon is to win, the more it should be played conservatively and kept healthy. If a Pokemon is too weak to be of use it should not ba saved.
I am hoping I do not have to do any manual interventions for 1), 2), and 4), and 5). Regarding 3) and 4), I my interventions were to restrict actions.
Of course, restricting all actions but the one that I want is just an extension of this (and already done if a pokemon is locked into outrage), but I am hoping these are things to be learned.

The bot should learn the value of stealth rock on its own, figuring out how likely stealth rock vs no stealth rock is to lead to a game win vs loss.

So what you're saying is, this bot LEARNS stuff? Maybe put it into the installed version of showdown as a team tester :O
It has a lot to learn before it is useful for anything like that, but I'm open.

Maybe instead of letting it loose on the ladder, let it first observe battles of high elo players to mine a bunch of data and make the bot learn from that?
I am not a high ELO player, but I am a start. If people are willing to play at the mercy of a program, I can try using Nuitka to produce an executable if there are any high level players willing to help with this.
Or, if they are willing to install a pile of dependencies, I can give them the Python code.

Edit: about the JavaScript thing, you don't need to run the bot from a browser. Showdown allows you to directly communicate to the server through a websocket, which basically allows you to exchange strings with the server. The showdown client uses this, it basically parses the strings from the server and displays the battles, chat rooms, etc. I don't know the specifics but you can look at the source code of the client, or look at the code of boTTT (I believe it's open source).
Aye. If I am the one to handle this, it is well down the line of priorities (but it is on the list).
Right now, using the browser works quickly enough and doesn't seem too buggy. My focus is on concerns about the bot's play skill.

I am also strongly considering scrapping my old networks, and increasing the number of moves the opposing Pokemon know from 4 to 6, or maybe even more.
More data makes it harder to learn, but so does more noise. If a Pokemon often has a devastating attack - but it isn't listed in the top four most used moves - the bot wont know about it until they actually use it. And then, it'd be too late.

Maybe also the bot could like do a calc of how much damage its most effective move would do to the target, and then the more damage it does the more likely it would "predict" a switch?
You'd have to factor in ladder rank a little, low ladder players are less likely to predict than high ladder.

Simpler still, only put it in once you get to higher ladder.
The bot has all the tools necessary to make such decisions.
It knows the damage, probability of a OHKO, and of a 2HKO for every move (all 24) vs all six of the opponents.
The damage calculations get updated whenever some information changes: used swords dance? Probabilities of different ev spreads of the opposing pokemon changed? Etc...

When the bot just played itself, I had it train on the trailing 2/3rds of games. If it were to run for a very long period of time, I would change this to a fixed size so that the number simply trails (eg, "last 10,000 games"). Thus, if the bot were to improve -- and the caliber of players it played against -- it would adjust to match.


Anyway, as a general response to the idea of changing parameters: this is exactly what the neural network does.
How many of them?
I am training two networks.
The "Small" Network: 163,260 + 16,290 + 91 = 179,641
The "Medium" Network: 326,520 + 48,735 + 136 = 375,391

Those are huge numbers. It is going to take a VERY long time.

What I think may improve the bot without having to change the algorithm too much or too intelligently is to change the way it uses teams. I think if it had a pool of pokemon to build a team with instead of a pool of teams to choose from it would work better even with the same win+1 loss-1 decision making
This is a lot harder than stealing well built teams from the Smogon RMT section.

Thank you for creating skynet, asshole.
Legit question: does it learn?
Yes, and I hope to study AI some day but this is a far cry from anything like that.

I noticed you said this

What technology stack are you using to run the bot? Selenium?
I found Splinter's API really convenient. Splinter uses Selenium.

Your source code may be awkward, but if you show it , people will be able to help you.

With how learning work, if the bot plays against bad players, he will learn how to beat bad player. It's not what we want. We want him to battle good player. Or he'll never be good.
One problem will definitely the number of match required to have a large enough sample base for the bot to do anything.
Yeah. It will take a huge number of games. Time, and as other members have suggested, writing something to let it learn from replays are probably the two best solutions.

Which database is your bot using ? (for computing possible moves etc). With a "local" database, it can be faster (but requires more resources on the clients side).
Local. I decided to use Veekun's database from github, but now wish I looked at SD's. Would probably have made things easier down the line.
From Veekun's, I made a lot of changes. The system is awkward, since at the start I did not have the foresight to no what I will need in the end. A sample entry:
flareblitz: {'secondary_effect': 'burn', 'pulse': 0, 'crit': 0, 'primary_effect': 'recoil2', 'pp': 24, 'stat_change_list': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32), 'effect_array': array([-0.8, -1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. ,
-1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. , 1. , 1. ,
0. , 0.5]), 'effect_chance': 10, 'priority': 0, 'data_output': array([ 0. , 0. , 0. , 0. , 0. , -0.8, -1. , -1. , -1. , -1. , -1. ,
-1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. , -1. ,
-1. , -1. , -1. , 1. , 1. , 0. , 0.5]), 'multi_hit': False, 'stat_change_list2': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32), 'power': 120, 'target_id': 10, 'punch': 0, 'ID': 394, 'sound': False, 'bullet': False, 'type': 10, 'damage_class': 2, 'contact': 1, 'biting': 0, 'accuracy': 1.0}
Not all of these get used. "data_output" is what gets fed to the NN.
It includes: impact of your five stats (indexes 0 through 4), burn chance * 2 - 1, etcetera. The last four numbers are contact, accuracy, priority, and pp/16-1, although it gets updated every turn in battle. Others include whether the attack is leech seed, substitute, etc.
I should have added impact on the opponent's speed because rock tomb is a thing, but at one point in coding I became extremely aggressive towards all inputs I thought might not be essential. I started with an absurd 1,600 data inputs.

Same with damage calculator, , how do you calc ?
Client side as well.

On obvious thing the bot can't do, don't try to status a pokemon already statused.
Is it unreasonable to use a status move again, if you expect a switch?
In a game I played to try and build a data set online, I was down to ferrothorn vs a choiced excadrill and azumarill. The excadrill was seeded and at about 20% health. I had not used protect in the previous turn. One layer of spikes was on my opponent's side of the field.
I figured the logical play was to leech seed again.
Either my opponent switches, thinking I would protect, in which case seeding a second time was clearly the right play.
Or attack again, in which case leech seed fails and I just use protect on the turn excadrill gets sapped to death. They are unlikely to switch in the latter turn, as excadrill would die to spikes, making it nothing for me to worry about.

Anyway, they switched and ferrothorn succeeded in taking home the game.

PS : How do you plan to adapt against the player you're against ?
Modeling it as a Markov Decision Process, I would have to have some variables reflecting estimates of the personality of the player.
I currently have no plans for how to implement that, so the answer is, it wont.

I think a bit of dynamic programmation is required as if the player can deduce what the bot will do, it's over. How do you intend to implements risk ?
What exactly do you mean by dynamic programming? The use of action values for determining good policies is an idea specifically taken from dynamic programming.
See here.
As far as deterministic concerns go: the bot estimates the action value of each of the nine choices (minus forbidden options), similar to dynamic programming.
The bot then picks from among these with probability based on their values: it will most likely choose the suspected best option, and next most likely the second best, etc.
Here is the wiki article on the softmax function.
If two options are believed to be equally good, the bot will pick either with 50/50 probability.

Why is learning part of the bot's AI? Isn't there an "elegant solution" to pokemon battling that generally works from turn 1 and only learns items and moves as they are revealed?
See the discussion of David Stone's methods vs mine.
Honest disclosure: it is my interest in statistics and machine learning that inspired me to start a project in the first place.
I wrote a research paper on food deserts, and after spending months on it I could not get my results published: they were boring, it has been done before. Any good, honest analysis will not change that: the data simply did not contain anything interesting or novel. I am stubborn and often make bad decisions.
After that, I thought I had better start a project where the analysis itself is the criterion by which my work will be judged. If this works, the project is a success. If it doesn't, then it is not. Simple.

just to make the math easier, perhaps you could consider playing LC. The calcs will be easier and there will be less items and variety to deal with. IMO you should perfect the algorithm in LC before applying it to OU.
I know nothing about LC, so it may be harder on me. I used to play OU back in high school, gen IV.
I am not sure how much harder it really would be for a bot to learn OU vs LC.
Memorization is easy for computers and hard for people. What the computer struggles with is generalizing, with learning.
It recognizes every Pokemon that gets used a decent percentage of the time, it knows all the moves they might be using and stat spreads. It knows how much damage all the attacks will do, except for the bugs which are just as likely to come up in LC -- except for bugs related to mega evolutions.
The things the bot finds hard aren't likely to change that much between OU and LC.

This idea isn't too far-fetched from the ChallengeCup bot 'Muh Bans'. Even from that tier as something completely different from skills, I'd like to know how this beta bot plays out.
Any info on that?
 
Last edited:

Disaster Area

formerly Piexplode
Will you add extra tier and generation capabilities? I can see why you'd begin with ORAS OU, it's the flagship tier, but it'd be cool once it became competent at that to have it play OU every gen, and every main tier in gen 6. (and then maybe some other tiers in other gens, and some OMs way off into the future) :)
 
post
dat
source code
I've actually done a little bit of machine learning stuff professionally, I'd love to help with this.
 
Hearthstone bots were slow to make decisions, but more importantly:
They were sold commercially, because having a bot grind points for you would help you gain access to in-game goodies. This is not an issue on Smogon.
This commercial aspect, vs a few members toying with it on the forum is a huge difference.
Point taken. In that case, might I suggest that you let it "watch" some tournament games? I'm no programmer, so I'm not sure how exactly it learns, but feeding it high quality games and tournament replays is the best way to learn how top players react in specific situations.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top