• The moderator of this forum is Austin.
  • Welcome to Smogon! Take a moment to read the Introduction to Smogon for a run-down on everything Smogon.

Programming Future Sight AI

Hello again!

It's been a while, but after taking a break, (or two, or seven), I finally got around to working on the battle AI based on my machine learning models I've been alluding to. And today I'm happy to say I've gotten to a point where I can share with y'all.

What is Future Sight AI:
Future Sight AI is a computer program that chooses it's next move by looking at the possible outcomes several turns after choosing those moves and selecting the one that leads to the highest chance to win. Using the machine learning models I made for the Pokemon Battle Predictor, the AI can determine how likely it is to win at the end of a given turn and use that information to rank the paths a move choice can take it down. This method, looking at turns in advance and choosing the ones most favorable to you, is the same one used by the best chess playing AIs, and now those same techniques can be applied to Pokemon. However, unlike chess, Pokemon is a game that involves random chance, information unavailable to both players, and a highly variable set "pieces" (aka Pokemon) and their potential moves. These additional factors makes creating an AI for this game in this way significantly more difficult to complete and interesting to solve. Despite those challenges, the result of the application in its current form is an AI that matches the play style of a human player, can take any team and understand how the members work together, and can beat the average player (more on that last part latter).

See it in Action:
If you want to see what the AI can do, I've set up a "live stream" on my website pokemonbattlepredictor.com/future-sight-ai to see both it battling on the ladder along with a log of it's inputs and outputs (you may need to refresh the page a few times to get it going). It will be running for the next day as of posting this. You may notice I've decided to run it on last gen's OU ladder and there's two reasons for that. First, I wanted to test it across a meta that's stable so I know my differing results from day to day are because the AI code has changed rather than Pokemon being added or removed from a tier. And second, the machine learning models it uses are made for specific metas and I wanted to run the AI on the meta it was best at predicting (outside of gen 8). The team it's battling with is nothing special and was made to see if it could understand how different Pokemon can work together (Pelipper activating rain for Swampert, Koko volt switching into Hawlucha for the speed boost, etc), so if you want to see it battle with a different team, send it over (especially since I'm terrible at team building).

How does it work:
There are three main steps for getting an AI like this to work:
  1. Predicting an opponents move and understanding your chance to win
  2. Guess what you don't know about the battle
  3. Looking several turns beyond the current turn to see how the battle can play out
The first step was the hardest one, and also why making the battle predictor before this was crucial in this working. If you haven't seen my battle predictor post, I made 4 different machine learning models - predicting at the end of a turn your chance to win, your opponent switching out or choosing a move, what move they will use if they stay in, and which Pokemon they will switch to if they switch - that are used to generate a list of what your opponent will do next and the likely hoods of each. Thanks to having more and better data than my first time making the models and making some big changes to how they work, I was able to increase all of those models to have a > 80% accuracy (and yes, I'll update the browser extension with those models shortly). That list is important as it narrows down the different possibilities the AI must consider and gives weights to how much that action should be considered.

The second step is the easiest, and frankly the one I was the most lazy about. This amounts to figuring out what moves, stats, abilities, items, etc. your opponent's Pokemon have so you can make your prediction accordingly. The way it works now is simply taking the moves, items, and abilities, you know they have and matching them to the most common set that matches those known attributes best. I have everything planned out for guessing stats by looking at damage dealt or received, making a machine learning to guess their moves and items based on their team composition, and few other techniques to make it more accurate, but as it is now is good enough and I had to stop working on it at some point ¯\_(ツ)_/¯.

The third step is the one where those aforementioned chess algorithms come into play. Since the AI has a list of its and its opponents next moves, it can run various turns to see what would happen after all the combinations of moves are ran. The way this is done is by using a slightly modified version of the code that runs Pokemon Showdown to make sure the results are identical to what would happen in the actual game. The reason the code is slight modified is to deal with the random chance as instead of choosing an outcome randomly, it will choose one outcome for that battle while creating a duplicate version of the battle that chooses the other outcome so all possibilities can be considered. Once it runs through one turn, it will find its new possible move options for that next turn, predict it's opponents move options, and run through that next turn. As you can imagine, this can quickly cascade into exploring tens of thousands of battle after only looking a few turns deep, so doing all of this quickly is key. That happens until the 15 second timer runs out, which at that point it collects the chance to win of all the different battles that stemmed from a move option and chooses the one with the collective highest chance to win.

How well does it work:
I've tested it a fair bit on the ladder over the past few weeks using random usernames to start at 1000 each time and I've seen it reach up to the 1300s before stopping it. Those results are with it running on my laptop using the CPU to run the machine learning models. It takes ~1 millisecond to run all the calculations for a turn where ~.7 of those milliseconds are spent running the models, so those models are the speed bottleneck. This leads to it being able to look at most 3 turns deep in the 15 second time limit. That's probably as high as it can go given it's current setup, but, the current setup is very limiting. The program is multi-threaded and uses up all the resources on my computer easily, so just running it on a computer that has more cores and power than my laptop would do wonders. Also, running the models on a GPU rather than CPU can yield a ton of benefits speed wise. And, if I was really feeling ambitious, rewriting the code so in wasn't all in JavaScript to be in C++ or something would allow for so much better optimization (have you ever seen how wack multi-threading is in JS). All of this is a long way of saying it may work fine now, but it has some serious potential to get better.

What comes next?
Honestly, outside of making slight improvements or implementing things I decided to table for later, I don't know. With an approach like this, there's really no upper limit to how well it can play outside of how you use the outputs of the highest chance to win models. I will continue experimenting on that front and adjusting the various shortcuts in play to make it run faster, and exploring those ideas with people with people who know the area can help. I'd love to get people's feedback on what they think of it's play or try it with other teams, but for now what I really want is to see what this can do at its current full potential. Oh, and I guess getting it work with double battle too; just imagine what could happen if this could take on VGC...
 

Geysers

Sheer Cold
is a Tiering Contributoris a Contributor to Smogon
This looks really awesome! Is there a possibility of adding support for guessing sets based off of usage stats, to allow it to play non-OU tiers? By non-OU tiers, I mean AG.
 
This looks really awesome! Is there a possibility of adding support for guessing sets based off of usage stats, to allow it to play non-OU tiers? By non-OU tiers, I mean AG.
Yes, that's what it's doing now. It can work on any gen or tier I I have a trained model for (which I think is all the metas with active ladders), so it could play on a non-OU tier for sure.
 
Are you sure the AI is working...? "Future Sight AI is not running right now or has hit a bug. try coming back later" Is all what I see in that web...
 
Are you sure the AI is working...? "Future Sight AI is not running right now or has hit a bug. try coming back later" Is all what I see in that web...
It's not supposed to be. I said I was only running it for the day of when I first made the post. Will probably run it again when I make another update to it.
 
GIGANTIC UPDATE:

I’ve gone quiet on this and the battle predictor project for the past few months to dedicate my time to working on this AI, and oh my, has progress been made! Let’s start with the key facts:
  1. Future Sight AI has been reworked from the ground up to make all the predictions and win chance assessments better than before without using machine learning.
  2. FSAI can now battle in nearly any singles format with team preview and has learned how to team build its own teams for each of those formats.
  3. It can now find all its opponent’s stats (evs, ivs, and natures included) based on their damage delt, damage taken, and order of all in-game actions which factor in speed.
  4. FSAI’s current version on the gen8ou ladder was able to reach an average rating of 1547 and max rating of 1630.
  5. You can battle it right now by challenging user “Future Sight AI” and I’ve revamped the website for viewing its current battle to include its predictions on all the aspects of the battle. It can battle up to 15 people at once and is running on its least powerful setting (not looking any turns ahead).
That 4th one is still mind boggling to me as I’m sure anyone reading will know that’s no small feat, especially for a computer playing the game as intended. In the 4 runs – a.k.a. letting it battle on the ladder for ~24 hours - I’ve done with the current version, it has been able to get to and stay above the 1600s in 2 of them, so my feeling that had to be an anomaly was mostly quashed. Also, I think it’s worth mentioning the last run I did where it got to 1601 was during OLT. I don’t know if that’s a good or bad thing, but it’s certainly a thing ¯\_(ツ)_/¯.

I could keep going, but I’ve already made a whole video on my YouTube channel talking about how it works, how it did, and what its next plans are. If you’re not the video watching type, there’s also a written version with extra content on the AI’s website.

However, the purpose of this post is to talk about things I didn't mention anywhere else that I think you all will care about more. Because the goal of my channel is to introduce people to computer science topics, I didn’t spend much time on the Pokemon / Pokemon Showdown specific aspects of what I did. This project wouldn’t have been possible without this part of the community, so I wanted to make sure I gave those who are interested here the information that could best fit them.


How does it play / what is its play style?

Well, here’s the thing, a large chunk of why I didn't discuss at all what I thought about how it played is because I'm no good at this game and FSAI's level has surpassed mine so much that I no longer think I'm qualified to make a valuable assessment. Even though I know seemingly everything about how this game works mechanically, I'm perpetually a max 1200s player in anything that's not ranbats. I do keep logs of every battle it has played for watching later, so if anybody who’s skilled at analyzing this game and either sees a battle this week they find interesting or wants to go back in time and discuss the AI’s play in general, I’m all ears as getting an assessment from some who knows what their doing is huge for taking this to the next level. Also, if I see any good assessments, I'll throw links to them right here.

There is one thing about FSAI’s play I can speak confidently on. Whenever I mentioned this project and how well it performed to others, they imagined it as something which would make super hard reads and just completely out predict its opponent by making ridiculous plays. If that’s what you were expecting, you're not going to find that here. FSAI, between looking at its predictions and what its worst-case scenarios of your move choices, is quite accurate at figuring out what you will do next, but I have found that making hard reads is not a sustainable way for it to win. That tends to be a very high-risk, high-reward playstyle, and FSAI prefers to make somewhat obvious, but generally reliable plays. As it has climbed the ladder, this risk averse playstyle has started to be a liability, so I do plan on doing some more experiments to figure out what is a good balance and against what kind of players that playstyle should be used. that last part is key as it already plays differently against people who have different ratings, so I must figure out when is the best time to start increasing its risk-taking factor.


Why are you using ELO instead of ___ to judge its quality?

The only two things I know of which could reasonably fill that blank are tournaments and GXE. For tournaments there were just a lot of concerns that added up to me feeling that was not the best route. The number one reason is it feels a crucial part of getting an accurate assessment of how it plays against humans is its opponents playing the AI as if it was a human. My concern is that if I entered it in a tournament and they knew for sure it was a computer, they would play against it differently and ruin the validity of my test. The only real way around that was to not tell the opponent they're play me rather than a computer, but in a tournament setting, that just reeks of cheating so much I wouldn't have even considered that option. Another, probably irrational, fear I had was I would enter the AI in a tournament when it's not all that good, people would see that it's not that good, and then not want me to have it enter tournaments again even when it gets better. Also, factor in I would have had it play on the ladder anyway to test its ability as tournaments aren't going on 24/7, and it added up to something I didn't think would be worth it. Now that I know the AI can actually play (and depending on how it does this week against non-random challengers), I'll gladly throw it in a tournament!

For GXE, well that's one part me just messing up and one part it wouldn't have been accurate anyway. I should have done testing on a couple of alt accounts and saved the main one for real runs, but because I didn't, the main account’s battle record includes some absolute nonsense. Although, even if I had, there are two parts of my testing strategy which would have made it a mute number anyway. The first, was I would like to have the AI's rating tank by at least 100 points between runs that way I could see if it climbed back up again, so I would just play games myself so it can lose and drop down a bit. Also, I wanted to gain a baseline for how well the AI could be expected to do by comparing it to what happens when you click random moves, so I had it randomly do battles on different parts of the ladder where it would not use its brain at all. Fun fact, of all those random choice battles, it won none of them! Even though my pre run tanking and occasional random choice battle don't account for a lot of the battles that would happen on that account, the fact that I knew they did at all made me think that number was never going to be the accurate assessment I wanted it to be.


How does it build teams?

It's a bit of a process which I plan on going into much more detail soon, but here's the gist. It first looks at all the battles at its disposal, finds what Pokémon win more together and lose more together, and puts that information to use when deciding who are good team members. I say to use because it does look deeper than that when it comes time to pick who goes on what team. The team starts by picking a randomly selected, generally viable Pokémon. To add each subsequent Pokémon to the team, it adds in looking at how each potential team member does against the weighted average of the Pokémon which are in the current tier (so it's more important how it does against Landorus than against Meowth), compares those numbers to the same numbers of the team members have already been selected, and prioritizes adding mons who do better against the mons the selected team members do worse against. Looking at different sets is factored into the above steps, but most of the set editing happens once the team of six is created.

The sets start off as one of the legal sets that Smogon has up and changes them around to increase team “synergy” (i.e. making sure Hawlucha has the right seed for the terrain starter on the team). This part is where the team building is an absolute mess as not only do I currently not edit the set as much as I should, but also there are definitely edge cases I'm not checking for when it comes to making sure something does or does not happen. This is a little bit by choice and a little bit by negligence. I don't want to risk my inability of assessing the teams it builds leading to creating incompatible teams I don’t even know are bad and this side of the AI doesn't hold my interest like the others. This, and many others, is part of the reason I’m starting to look for people to work on this project too, but I want to see how this week goes before making any commitments on that front. I’m going to change it soon to really dive in and pick custom sets as that code does already exist for when the AI is in battle, but for now the teambuilder is still using an older part of the code I haven’t changed since teambuilding is (wrongly?) not my priority. But for now, I do get some pretty funny results like it’s affinity for putting :charizard:, :venusaur:, :blastoise:, :espeon:, and :snorlax: on the same team without having a clue of that’s significance and an obsession with Diggersby in OU I don’t understand!


Why is its current rating if I look it up on Showdown 1### if you said the average rating was 1547?

The average rating is based on all the ratings it had during its entire 24 hour run, the rating displayed is wherever it landed when I stopped it, that just so happened to be what the rating was. I don't even think I've ever stopped it within 50 of its average rating before.


Didn’t pmariglia already do this?

Yes, but no. I must confess something though: up until literally yesterday, I didn’t know their project existed. When I saw their bot while scrolling through the forum to find this thread again, I felt retrospectively awful for making it seem in my video that is the first bot of its kind or the first to get as far as it did. Don’t even get me started on how close I was to removing the section about not sharing my code! But after a few hours of looking through their code and reading what others had to say on their results using it, I realized there’s places where our AIs are fundamentally different and therefore not comparable in the same way:
  • How used teams are handled. With my AI making its own teams and pmariglia’s using teams they think are worth using, the quality of the teams used for testing are very different which as y’all know will make a world of a difference. And, based on others’ experiences, pmariglia’s AI’s success seems to be quite team dependent which suggests (not proves) that the team choice was the one carrying a lot of the weight up the ladder. And considering the teams my AI makes are just… to put it nicely… fine, I have good reason to believe the teams and AI are at least at the same level if not the AI carrying the teams. I’ll be testing that theory next week.
  • How different battle formats are handled. Something which stood out to me in their code is how they’ve hardcoded the amount having certain status or side conditions up effect a player’s score. This can be a highly effective way making the AI good at assessing its position since the programmer can tune these numbers exactly to fit the meta. But if you’re playing in a different generation – or even if the meta game shifts – those numbers can go way off base and throw off how it plays. For example, it currently has webs as by far the best entry hazard to have up while the opposite is probably true. Because of that, my AI works differently by calculating those numbers on the fly; not just on a format basis, but on a turn-by-turn basis. You do lose the guarantee of having it be accurate but make great gains in versatility. I’m fairly sure that when pmariglia was running the AI the constants they had were more useful than what my AI generates as that’s where FSAI is at its weakest.
I hope that didn’t read as me trying to justify why my AI is better or something as the choices they made which were different than mine are good one, I just wanted to clarify they are on two different playing fields; pmariglia’s AI has the advantage of using human inputs in crucial areas, whereas mine figures out those inputs on its own. And, if pmariglia ends up reading this and honestly disagrees with my assessment, I will gladly let the people know that I was second in line.

Why can’t it battle in gens 4 and below?

Well, all the code is there to handle team preview-less generations, make teams for them, and handle its damage calculation, but I simply never had an incentive to test it. I could barely get battles when I tried to have it play in UU, so I wasn't even going to dream of taking on any ladder that wasn't in gen 8. It should work in theory, but for its first exhibition, I don't think this is the time to find out in practice. Also, team building doesn't take leads all that well into account, so it might try to do a lead Breloom (that’s a bad lead right?) in gen 4 and be none the wiser.

Why is its account locked?

No clue? I would love to know why myself since all it does is battle and say "gg".
 
Cool! Just finished watching the video, the amount of work you put into the project is shocking (same as the video's quality)! It seems that I will watch your channel often in the future. Hope you will do well!
 
I'd be more than happy to help test Gen 4. If you're having a hard time finding ladder matches, my suggestion would be to gather users in advance, or to partner with other PS AI devs for a project accumulating playtesters.
 
I'd be more than happy to help test Gen 4. If you're having a hard time finding ladder matches, my suggestion would be to gather users in advance, or to partner with other PS AI devs for a project accumulating playtesters.
Thanks for the offer, but it's going to be a minute before gen 4 and below is ready for testing. I have reached out to a few people I've seen working on other AI projects, so hopefully your suggestion comes to fruition!

Also, I made a couple of mistakes in my initial video, so I made some corrections in this one.
 
I just watched the video and thoroughly enjoed it! As a computer engineering student with an interest in AI/ML stuff it was very exciting seeing how you tackled some of the issues I found when thinking about a pokemon-playing AI. I'm looking forward to see future developments and thrilled to challenge it when it (hopefully) comes to doubles!
 
I just watched the video and thoroughly enjoed it! As a computer engineering student with an interest in AI/ML stuff it was very exciting seeing how you tackled some of the issues I found when thinking about a pokemon-playing AI. I'm looking forward to see future developments and thrilled to challenge it when it (hopefully) comes to doubles!
As a former computer engineering student with an interest in AI/ML stuff, I'm glad you liked it! Good news is doubles should be coming sooner than anticipated because I'm no longer the only person working on the project. Our goal is to have the next big step in the project done by BDSP's release, and work on doubles should start directly after that.
 
question- it doesn't seem like it is running atm (at least, the showdown account doesn't seem to be online)
what sort of times is it usually online?
also the analysis of ou player rating distribution, while reasonable, is somewhat flawed due to people having multiple accounts (with OLT in particular completely messing everything up)
 
Last edited:
question- it doesn't seem like it is running atm (at least, the showdown account doesn't seem to be online)
what sort of times is it usually online?
It's not running right now and there really is no usual time it does. It's basically whenever I add/fix something and want to see how that effects it's play. It's going to be a minute before there's a reason to run again since there won't be any changes to the code until we've finalized how we'll work on it as a team. I might have it run just for though, but that's TBD.

also the analysis of ou player rating distribution, while reasonable, is somewhat flawed due to people having multiple accounts (with OLT in particular completely messing everything up)
Yeah... I knew people having multiple accounts was something I couldn't account for, but in the end I figured it wouldn't make that much of a difference since a majority of players don't. There were a little over 25,000 unique accounts represented on that graph, and I'd be hard pressed to believe the amount of people playing on the OU ladder with multiple accounts in the same month would make a significant difference in how the final distribution played out. Personally, I think it'd be kind of cool if we could have sub accounts on Showdown for those people with multiple alts, but since that's unlikely those accounts will make any analysis like this inherently off.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top