• Check out the relaunch of our general collection, with classic designs and new ones by our very own Pissog!

Proposal Ladder Bots and Usage-Based Tiering

veti

I Am, She Is, We Are
is a Community Contributoris a Smogon Discord Contributoris a Metagame Resource Contributoris a Tiering Contributoris a Contributor to Smogon
Torkoal recently hit 4.54% usage on the OU ladder for November. 56% of these Torkoal were ran on a single team spammed by a bot on the OU ladder. Excadrill hit 6.68% usage, with 38% of the usage coming from this same bot. If trends continue, two mons that would otherwise get about 2% and 4.1% usage respectively will end up rising to OU not off of real people, but off of a bot using a gimmick triple weather team.

From the Tiering FAQ:
Why is usage a good metric for tiering? Why not use something like winrate?

Generally, we assume that most people who play competitive Pokemon play to win, and when people play to win, people generally use what is best. That said, usage is the most objective form of tiering for Pokemon.

The teams bots use that have extremely large impacts on usage stats are not necessarily designed to win, theres a reason real people generally don't use Tyranitar + Pelipper + Torkoal on the same team. Rather than the consensus of thousands of ladder players, a single person who doesnt even actually play a single game on the ladder themselves can rise multiple Pokemon to OU.

Outside of going against the spirit of usage based tiering, bots can easily be used to cause real damage to lower tiers. Excadrill was the #1 most used mon in UU in SCL, and will rise to OU due to a bot if nothing is done. If a malicious bot is made to destabilize lower tiers it can easily cause far more damage than the existing ladder bots.

Banning bots from ladder entirely solves this issue while also preventing the use of bots for unfair ladder scouting and improves ladder quality by letting people play against real players, however I'm unsure how feasible it is to actually enforce this. If this is enforceable, I believe this is by far the best option.

The other solutions I see being possibilities are somehow excluding bots from usage stats or allowing either councils or tiering admins to veto rises caused by bot activity.
 
Banning bots from ladder entirely solves this issue
This is really difficult to enforce for multiple reasons. It's hard to detect besides the few infamous ones and it's impossible to outright prevent the use of a bot because their actions are all client-side

here is a thread from 2022 when bots like this were a little less sophisticated. I'd agree they are more of an issue now, both in terms of popularity and difficulty to detect, but the core issue remains in that they are impossible to prevent.

If a bot is negatively affecting human experience, then we will remove it from the ladder via a permaban.

This was the consensus given at the end of the thread, and although "negatively affecting human experience" is very vague and probably needs an update to be more objective, it seems as though you can just request these accounts be removed.
 
Torkoal recently hit 4.54% usage on the OU ladder for November. 56% of these Torkoal were ran on a single team spammed by a bot on the OU ladder. Excadrill hit 6.68% usage, with 38% of the usage coming from this same bot. If trends continue, two mons that would otherwise get about 2% and 4.1% usage respectively will end up rising to OU not off of real people, but off of a bot using a gimmick triple weather team.

From the Tiering FAQ:


The teams bots use that have extremely large impacts on usage stats are not necessarily designed to win, theres a reason real people generally don't use Tyranitar + Pelipper + Torkoal on the same team. Rather than the consensus of thousands of ladder players, a single person who doesnt even actually play a single game on the ladder themselves can rise multiple Pokemon to OU.

Outside of going against the spirit of usage based tiering, bots can easily be used to cause real damage to lower tiers. Excadrill was the #1 most used mon in UU in SCL, and will rise to OU due to a bot if nothing is done. If a malicious bot is made to destabilize lower tiers it can easily cause far more damage than the existing ladder bots.

Banning bots from ladder entirely solves this issue while also preventing the use of bots for unfair ladder scouting and improves ladder quality by letting people play against real players, however I'm unsure how feasible it is to actually enforce this. If this is enforceable, I believe this is by far the best option.

The other solutions I see being possibilities are somehow excluding bots from usage stats or allowing either councils or tiering admins to veto rises caused by bot activity.
Probably hard to ban bots completely but is there a way to exclude the bots from usage stats so it wont impact tiering? Might be a bit more work but i do agree with the sentiment that it goes against the spirit of usage based tiering and can completely shift lower tiers.
 
I'd agree they are more of an issue now, both in terms of popularity and difficulty to detect, but the core issue remains in that they are impossible to prevent.
Frankly, and I know this may sound crazy but, how can we even use usage based tiering if this is the case. The system imo was already riddled with flaws that make it difficult to defend outside of it being "objective" due to people supposedly playing to win always (looks at Azumarill's UU usage and the fact that Braviary rose to NU at one point). But if this system can and is being abused by a bot to rise things, something needs to be done other than just saying "its how the system is." Id rather not have people just say "Its a feature, not a bug" and move on so this happens again, because if one person can easily make things rise for kicks then even if it is a "feature," its a god awful one that makes lower tiers harder to want to play.

If banning bots isnt a feasible answer and we dont want to discuss the topic of upending the whole system (which I KNOW no one wants to get into even if I hate usage based tiering), then maybe one of the other options mentioned like "vetoing rises done as a result of bots" could be done.
 
I think bots should be banned from the ladder

My main issue with bots is competitiveness. Being able to use them to scout the ladder to see what people are using is ridiculous and is super unfair to people working honestly to test their stuff on ladder.

Alongside that, bots in no way impact QOL positively for the people except using the bot themselves. Games against bots aren't nearly as fun or engaging as having to play another player, or for stuff like tournamewnts it just doesn't make ladder a good place to test your stuff because you don't get games vs actual players.

Stopping every bot ever from laddering is probably not feasible and I don't think every bot needs to be dealt with anyways. The most practical thing is to only act on bots that are reported without a need to be witchhunting them really.

On other topics
excluding bots from usage stats
I don't really have an issue with bots impacting usage stats if they are high enough elo to do so. Whoever is using the bot could achieve the same results if they were playing themselves
councils or tiering admins to veto rises caused by bot activity.
this sounds like a dangerous amount of power to give people
 
Agreed with the OP, and I'd like to add a little more background information for those who aren't familiar. The bot mentioned in the post (ProsodiJ) has been floating around the low 1700s of the OU ladder for around a month now, loading a triple weather team. The code for this bot (and the many other bots on the OU ladder) is publicly available on GitHub. As OU ladder activity slows down with the end of the generation, it'll become easier and easier for any individual to use a bot to affect usage-based tiering. This is even bigger of an issue in lower tiers, where, with far fewer ladder games, bot activity could easily make up a significant portion of the overall ladder's activity.

Outside of just tier manipulation, the open-source code for the bot allows anyone to scalp replays from ladder, which makes it more risky to test tournament brings on ladder. As the bot programming continues improving itself, we'll have to worry about bots being used to cheat on suspect tests, which destroys the point of suspect tests entirely.

I'm not sure exactly how to enforce a bot ban either, but I think it's more than worth the effort.
 
Last edited:
but is there a way to exclude the bots from usage stats so it wont impact tiering?
Not a tech admin etc so I imagine they’ll comment in depth but when the topic of changing how we do tiering has come up previously, Marty has mentioned that editing Antar’s scripts is off the table (which to be explicitly clear is more than valid, Marty appreciation post he does a tonne of unseen for this site and the tiering system as a whole). I’d imagine it’s possible to remove bot data if a list of identified bot accounts exists but even that might not be possible as the stats we get don’t have data on which accounts contributed to a mon’s usage.
 
Im not sure what antars script does so forgive me for my ignorance. If you don't mind educating me that'd be helpful. Overall this feels like a major issue though. If someone had malicious intent (which after ten years on this site would not surprise me even a little bit) they could seriously impact tiering just for their own amusement.
 
I don't bring solutions, but I want to warn about possible other problems.
In addition to triple weather, I believe many are familiar with bots using blim 6 sample team. :Kyurem::Corviknight::Ting_Lu::dondozo::Slowking_galar::cinderace:
Well, Dondozo usage for combined October and November is #39, 5.163%, very close to the cutoff of 4,52%. The topic is talking about 2% and 4,1% for Torkoal and Excadrill, without interference from bots, it is very plausible to imagine that Dondozo would have 0.643% less usage and would drop to UU in the next tier shift.
It's not just about removing Pokémon from low tiers, it also involves preventing low tiers from receiving Pokémon that they would humanly receive.
Lastly, we have the problem that bots are very close to being able to obtain requirements for suspects, OU by example with the current 1750 elo + 80% gxe.
This can not only remove effort from the player but facilitate fraud and manipulation of results.
Well, I just wanted to show that the hole is deeper than it looks, that's all.
 
Stepping in as the person who handles these matters of moderation and tech:
Banning bots from ladder entirely solves this issue while also preventing the use of bots for unfair ladder scouting and improves ladder quality by letting people play against real players, however I'm unsure how feasible it is to actually enforce this. If this is enforceable, I believe this is by far the best option.
This is not feasible to enforce by any means. Either we leave loopholes large enough to let many bots through, thus not solving the problem, or we close them tight enough that we block out innocent users, making things worse for everyone. Banning bots by hand one at a time as they crop up will also not fix the problem, since it can only ever be done reactively, not proactively.

The other solutions I see being possibilities are somehow excluding bots from usage stats
This is also not feasible to enforce. Identifying bot accounts, as above, is an imprecise science at best and would guaranteed lead to innocent real users being removed from usage stats unless done by hand, which is also not feasible for reasons of US team's sanity.

-

On a personal note, speaking not as SS but as myself, and as someone who keeps up with ML with respect to Pokemon:

This is not a problem unique to bots. Other games ban bots because they have distinct advantages, for example first person shooters. Bots have no such advantage in Pokemon, and will never have such an advantage due to the way the game fundamentally works. Manipulating ladder stats by burden of mass games played is also not unique to bots - humans can and have done so before (UU Ambipom, anyone?) This is a structural weakness in usage-based tiering, and at the end of the day whatever inevitably painful solution we could come up with for bots is merely a band-aid on the real problem. I would therefore suggest for future posts in this thread: do not focus merely on bots, or on fixing individual results, but instead keep the problem as a whole in mind.
 
Last edited:
we dont want to discuss the topic of upending the whole system (which I KNOW no one wants to get into even if I hate usage based tiering)
I actually would like to discuss the topic of upending the whole system lol.

Also I guess I should say that even though I'm Senior Staff I am not one of the tiering people or a PS! tech admin, so don't get it twisted, this is just my opinon/post and not representative of what the entire team thinks.

---

For those unaware, some older gens (I forget which besides RBY) use Viability-based tiering.

In short, qualified voters are selected and each make their own Viability Rankings/tier list. The lists are then compiled and an official Viability Rankings gets spit out. Voters (in RBY anyway) also select a cutoff for what "the line" between OU and UU should be. This year, it was decided that everything from rank C1 and up would be considered OU, everything C2 and below would be legal in UU.

In long, here's a thread that explains the intricacies of the system by the wonderful vapicuno

This has been proven to work for old gens. I think it'd be hella difficult to work for the current gen, and I don't have all the answers, that's for sure. But more on that in a bit.

---

I've always been skeptical of usage-based tiering. There have been instances over the years where one person or a group of people have influenced tiering. Off the top of my head, Metang in RU (this is ancient but Molk and friends did this), Ambipom in UU (I think a little less ancient but I cba to search for it), and "the Hitmontop issue," which more of you are familiar with, have all made me of the mindset that usage-based tiering is flawed. And I think the system is easier than ever to abuse now due to the prevalence of bots, and just how good they've gotten.

I think there's a post coming about how banning bots or removing them from the usage stats isn't feasible (oh look, it was posted while I was typing this), so I won't really get into that. But I have heard this argument before:
I don't really have an issue with bots impacting usage stats if they are high enough elo to do so. Whoever is using the bot could achieve the same results if they were playing themselves
And I really want to disagree with this. Whoever is using the bot couldn't achieve the same results as the bot, because the bot doesn't need to eat, sleep, go to work, keep their mental game up, etc. The bot just plays constantly in raw numbers that a human could never achieve. As the OP points out, more than half of Torkoal's uses were by one bot. This is something that one person cannot achieve. Maybe a large group of malicious actors that have a lot of time on their hands could do the same thing, but surely a bot is more of a problem that we're likely to run into. And the problem becomes worse near the end of a generation when ladder activity isn't as high as when things are fresh and new.

Basically, I think it's time to look at reworking the system, because as bots (likely) become better, and (likely) become more plentiful, the system is (likely) going to get abused more and more.

---

Back to Viability-based tiering. I think that this is much less abuseable, because a number of qualified voters are selected to give their thoughts on the metagame and determine what is viable and what isn't, and the banlist is based off of that. Now, I only have personal experience with this in RBY, but it works in other old gens as well. The problem is that current gen is obviously a completely different animal. I still think Viability-based tiering is worth a shot though, as long as we can answer some questions about how we would go about the process.

1. Who gets to vote on the Viability Rankings? How do they qualify?
-I feel like it's obvious that it can't be an open community vote, because we are likely to run into bots voting, a bunch of people memeing something into a higher tier, etc. I also don't think it should just be limited to that tier's council, as that puts a lot of power into the hands of a small number of people. Somewhere in the middle seems good, but I don't know exactly what that looks like, because the generation I'm most familiar with does yearly viability rankings, which wouldn't cut it for an ever-changing current gen metagame, DLC releases, etc. Which leads to my next question:

2. How often do these votes take place?
-We do usage stats every month, so in an ideal world I guess monthly? But at the same time, I also think it's a tall ask for someone to qualify to contribute to the VR every month. But if you don't have someone qualify every month, are they knowledgeable/invested enough in the metagame to have an informed opinion? Again, I don't know. Maybe bimonthly rankings in this case? It's tough, because I'm only familiar with the more stable yearly rankings that RBY does. But I wanted to pose the question.

3. There are probably other questions to be asked that I'm not thinking of being I'm not a current-gen player, so I'll leave that to the rest of you
-Yup

---

All in all, I'm a big supporter of changing the usage-based tiering system. It's always been flawed in my opinion (really, the Metang in RU shit was happening back in like 2013 IIRC) but now the flaws are getting exposed by bots, something that didn't exist back when usage-based tiering was first adopted, or at least not in the sophisticated level that we see today. I do think Viability-based tiering could work based off my experience, and would love to hear some discussion on the questions I posed above. Or other opinions/options to consider too. Really, my main point is that I think we should seriously consider changing how we tier from usage-based to something else. Maybe that's radical, idk, but I'm serious.
 
I am not pro-usage-based tiering or pro-viability-based tiering, but if Smogon ever decides collectively to end usage-based tiering, it would not be perfectly intuitive to continue its current naming scheme with OU, UU, RU, NU, etc., and the reason is that a mon that's not often used but has a strong niche as against a particular archetype, such as stall in the case of Hoopa-Unbound, could have a higher viability ranking than a different mon that's used more on the ladder and tournaments based mainly on its strength into that particular archetype. So if Smogon were to go ahead and commit to changing the way it is does tiering, it should also do a wholesale revamp of the way it names tiers too.
 
I am not pro-usage-based tiering or pro-viability-based tiering, but if Smogon ever decides collectively to end usage-based tiering, it would not be perfectly intuitive to continue its current naming scheme with OU, UU, RU, NU, etc., and the reason is that a mon that's not often used but has a strong niche as against a particular archetype, such as stall in the case of Hoopa-Unbound, could have a higher viability ranking than a different mon that's used more on the ladder and tournaments based mainly on its strength into that particular archetype. So if Smogon were to go ahead and commit to changing the way it is does tiering, it should also do a wholesale revamp of the way it names tiers too.
I do not agree with this: these names have been around for years, and some don't even follow the naming scheme anyways (PU is just a pun, and doesn't correlate to X-used, while Ubers doesn't use it as well). Overhauling the names that have been cemented in this website's history and culture, for new ones that most people likely wouldn't even use, is just a waste of effort, time, and confuses new players.
 
I firmly agree with the aforementioned points about the unhealthy effect of bots on usage, and also wanted to raise a point about the unhealthy impact they have on ladder, as given the vast majority use the same github sourcing it seems very plausible to reverse engineer and gain free elo from the vast amount of games in which you can predict every move your opponent will select. That said, would it at all be feasible to require a small captcha every so often when queueing? Not sure how effective this would be, but I for one would be perfectly willing to make that sacrifice for a better ladder experience.
 
Last edited by a moderator:
Ladder is and has always been a godawful setting for it to be the base of tiering. It is far from representing any metagame with accuracy and other than a very small percentage of games which happen in high ladder, the level of play is quite mediocre. Tournaments are a much better source for this.

There are more than enough tournaments for each tier's circuit to get a decent sample size of usage data with a higher quality of play while removing or at least diminishing issues like the one presented here or people trying to game the system in other ways.
 
Bots may not be good enough at pokemon now to say that they have a clear advantage over humans or even that they play equally as well, but that day will likely come at some point. As soon as bots can play at least as well as humans, they will have the distinct advantage of presumably never losing to timer.

I don't think this is a problem to wait around on or even a debatable question. Bots should be banned from ladder and exist only as a challengeable option. Grandmasters at chess don't run into Stockfish or Mittens or whatever on the chess.com ladder. Chess.com aggressively polices using engines for game assistance with complex algorithms. I'm not suggesting PS needs to or even could police it at the same level, but if there are known bot accounts playing so many individual games that they are driving usage stats, figure out the process to make banning those accounts from ladder happen. I'm not Marty and I'm not an expert on what data is available from PS, but in an ideal world at least it really seems not that difficult to me to run some sort of script on a weekly/biweekly/monthly/whatever basis to identify accounts that have played an exoribitant number of matches for the current gen usage tiers and to IP ban them if they continue to make new alts to ladder as a bot.
 
Chess.com aggressively polices using engines for game assistance with complex algorithms.
Chess.com can afford to act reactively to these problems and just remove people from ladders after the fact, whereas our system of usage stats as it stands right now cannot without in essence having to make things up after the fact. We'd have to throw out bot results entirely, which means inferring human results by virtue of exclusion, which gets thoroughly into the realm of 'torturing the data to tell us what we want it to.'
I'm not an expert on what data is available from PS, but in an ideal world at least it really seems not that difficult to me to run some sort of script on a weekly/biweekly/monthly/whatever basis to identify accounts that have played an exoribitant number of matches for the current gen usage tiers and to IP ban them if they continue to make new alts to ladder as a bot.
As the expert, it is this difficult, unfortunately. In short: just playing a lot of games doesn't necessarily make you a bot. Shared accounts exist, people ladder a lot, etc etc. PS is a massive website with more dumb and weird edge cases than you would reasonably think of, to my eternal chagrin. IP bans won't cut it other for people who are dedicated, and usually ends up requiring painful and potentially wide-ranging games of whack-a-mole. So either we set looser guard rails and let bots through (defeating the point and screwing over only innocent bot owners), or we set incredibly harsh guard rails that net real humans (no).

Banning bots is ultimately not feasible in any capacity that would solve this problem, and is missing the forest for the trees.
 
Last edited:
This is an extremely important topic, how tiers are formed fundamentally shapes how we play the game. I'm excited for bots to be more commonplace in the smogon world as a resource for improvement and hopefully one day metagame development and prediction training. They still seem to be in quite early infancy, yet more than capable of skewing usage stats, especially in non-OU tiers.

Sone good suggestions have been shared already, but I would like to add two more options as potential levers to be pulled:

1) Add a cap to the number of games a single user can contribute to tiering. This could either be done by weighting all games after the cap at 0, or by weighting all games proportionally less for how many were played over by the cap (i.e. if 2000 games were played by a user and the cap is 1000, each game would be weighted 50% as much).The cap could either be a defined threshold, or it could be based on a percentage of the total number of games in a given time period. I know this doesn't stop bots or malicious users from potentially spamming games across multiple accounts to achieve the same effect, but it does add another barrier to entry which could slow it down somewhat. There are also potentially other ways of dealing with multiple accounts.

2) Increase the elo threshold for usage that tiering is based on. Currently, what I know to be the main bot is the 1600s-low 1700s, which generally qualifies for meaningful weighting on the 1695 stats, but would be weighted much lower for 1825 stats. This has a couple obvious drawbacks, including the bots perhaps improving in the future, tiering being based on a much smaller (and therefore more volatile and abusable) pool of games and players, and many midladder players now having little to no impact on tiering. However, it still is a (much, much) bigger sample size than a tour or a council/vr vote. I also wonder if the bot, even if more skilled, would end up having fewer games at higher ladder due to increased wait times. Lastly, if a bot really is able to consistently reach high elo or even peak OU ladder, I think we have a much bigger reckoning to deal with, as mentioned by others in the thread.

As others have mentioned, trying to reliably identify bots is a difficult task, and one very wealthy and well-staffed companies have struggled with. In general, I don't know if it's possible to have both an unlimited entry system (i.e. infinite new accounts can be created) and a secure system. Therefore, I think other ways of approaching the problem make more sense here.
 
Chess.com can afford to act reactively to these problems and just remove people from ladders after the fact, whereas our system of usage stats as it stands right now cannot without in essence having to make things up after the fact. We'd have to throw out bot results entirely, which means inferring human results by virtue of exclusion, which gets thoroughly into the realm of 'torturing the data to tell us what we want it to.'

As the expert, it is this difficult, unfortunately. In short: just playing a lot of games doesn't necessarily make you a bot. Shared accounts exist, people ladder a lot, etc etc. PS is a massive website with more dumb and weird edge cases than you would reasonably think of, to my eternal chagrin. IP bans won't cut it other for people who are dedicated, and usually ends up requiring painful and potentially wide-ranging games of whack-a-mole. So either we set looser guard rails and let bots through (defeating the point and screwing over only innocent bot owners), or we set incredibly harsh guard rails that net real humans (no).

Banning bots is ultimately not feasible in any capacity that would solve this problem, and is missing the forest for the trees.
1) yes, it's reactive. So? Reactive is better than nothing. If you ran something periodically, say monthly, you'd limit the ability of a bot to influence a quarterly usage shift to just usage in 1 month.
2) edge cases: work through them. there likely are not THAT many accounts that play an exorbitant number of games and you should be able to pretty easily tell that an account playing a ton of games is or is not a bot in most of these cases. you can also just, you know, ask them or flag them for further more detailed tracking in the next month if you really can't tell if it's a bot. if I'm wrong on this, please show the data that demonstrates how many of these accounts we're really worried about and why it's hard to tell if it's a bot or not. you're saying this is an intractable problem. I don't believe you. I think it would be best to proactively ban all bots period, but that's not even what I'm asking for really - reactive, periodic banning of obvious bots that play so many ladder games as to influence usage stats just doesn't seem that hard to do process wise.
 
1) yes, it's reactive. So? Reactive is better than nothing. If you ran something periodically, say monthly, you'd limit the ability of a bot to influence a quarterly usage shift to just usage in 1 month.
2) edge cases: work through them. there likely are not THAT many accounts that play an exorbitant number of games and you should be able to pretty easily tell that an account playing a ton of games is or is not a bot in most of these cases. you can also just, you know, ask them or flag them for further more detailed tracking in the next month if you really can't tell if it's a bot. if I'm wrong on this, please show the data that demonstrates how many of these accounts we're really worried about and why it's hard to tell if it's a bot or not. you're saying this is an intractable problem. I don't believe you. I think it would be best to proactively ban all bots period, but that's not even what I'm asking for really - reactive, periodic banning of obvious bots that play so many ladder games as to influence usage stats just doesn't seem that hard to do process wise.

You might not believe her, but it is true. This is also missing some rather key factors that bot account owners who want to maliciously effect things are capable of reacting to any measures we put into place. For example, trying to hide the fact that a bot is playing the games, claiming that they are the ones playing the games instead of a bot, etc. And Cassiopeia is one of about three people currently on PS capable of responding to bot accounts, this is her wheelhouse so to speak. There's negative desire to play whack-a-mole with bot accounts that increasingly try to hide as real users.

Keep the rest of the thread on suggestions to changing the way we tier things, it is a conversation worth having and one that we are reading and considering. Suggesting that PS admins find a way to identify and ban all bot accounts is not.
 
With regards to these
Sone good suggestions have been shared already, but I would like to add two more options as potential levers to be pulled:

1) Add a cap to the number of games a single user can contribute to tiering. This could either be done by weighting all games after the cap at 0, or by weighting all games proportionally less for how many were played over by the cap (i.e. if 2000 games were played by a user and the cap is 1000, each game would be weighted 50% as much).The cap could either be a defined threshold, or it could be based on a percentage of the total number of games in a given time period. I know this doesn't stop bots or malicious users from potentially spamming games across multiple accounts to achieve the same effect, but it does add another barrier to entry which could slow it down somewhat. There are also potentially other ways of dealing with multiple accounts.
I wrote in a previous thread on usage related problems:
"there's likely some other more exotic options that could involve normalizing usage in some way by user or user+team or user+pokemon so that individual accounts that play an inordinate number of games don't unduly influence stats, etc. Any change here though has to be considered in light of the availability of the actual people who work on usage stats. Simpler solutions like changing the tiering threshold cutoffs or the weighting are generally better here, even if they're more blunt instruments."

One outcome of that thread indeed was to change some of the way each of the three months in a quarter are weighted together and rules around rises and drops because those were easy to implement in that case, but don't really do anything to address the current problem. Actual changes to the usage calculation process itself have historically been off the table (see my assumed limitation 1 in my next post). I agree being able to do something like capping the number of games per account or normalizing it or something would be good. I've just never heard that this is actually an option.

2) Increase the elo threshold for usage that tiering is based on. Currently, what I know to be the main bot is the 1600s-low 1700s, which generally qualifies for meaningful weighting on the 1695 stats, but would be weighted much lower for 1825 stats. This has a couple obvious drawbacks, including the bots perhaps improving in the future, tiering being based on a much smaller (and therefore more volatile and abusable) pool of games and players, and many midladder players now having little to no impact on tiering. However, it still is a (much, much) bigger sample size than a tour or a council/vr vote. I also wonder if the bot, even if more skilled, would end up having fewer games at higher ladder due to increased wait times. Lastly, if a bot really is able to consistently reach high elo or even peak OU ladder, I think we have a much bigger reckoning to deal with, as mentioned by others in the thread.
Now this one is touchy. We definitely could go back to the 1760 (1825 for OU) stats. It's been a long, long time since they were used I think.
Pretty sure this is one such set of applicable threads that explains some of the history here... read these OPs:
The initial implementation of weighting by player rating was mostly done as a reaction to the fact that low ladder players play a lot of games and may not actually really be trying to win in the true competitive sense, i.e. they may want to have fun using the mons and sets they want to use first and foremost, and winning with those is a secondary goal. using weighted stats with a quite high cutoff was a reaction to that.
Problems with this...
1) most tiers have a lot fewer games than OU. OU can use much higher limits without getting wonky. I'm not sure how much the lower tiers' limits could be raised without getting into problem areas.
2) in this historical example, raising it to a 1760 basis didn't actually perform better than a 1695 basis, and the "Candles" section of the 1695 thread partially explains why. I'm not sure if this is still the case anymore without how newer users are rated with deviation. In any case the easiest to understand problem is that the higher you go the more stats will only reflect the very top of ladder, which might not be what you want to see. Even if you want to show more of "better" teams, the top of the ladder doesn't necessarily represent the part of the metagame that is most successful in tours. There can be some notable effects of ladder play.
3) To prove that out a bit more, it doesn't necessarily have the impact you might think/want:

For example, looking at all the non-OU pokemon on the ProsodiJ team: Torkoal, Tyranitar, Excadrill, and Pelipper, these were their stats in the 1695 vs 1825 stats for November 2025:

1695:
| 25 | Tyranitar | 8.93882% | 115092 | 6.167% | 100002 | 6.762% |
| 33 | Excadrill | 6.68341% | 80754 | 4.327% | 64052 | 4.331% |
| 43 | Torkoal | 4.54413% | 74076 | 3.969% | 67751 | 4.581% |
| 46 | Pelipper | 4.10133% | 72977 | 3.910% | 67098 | 4.537% |

1825:
| 42 | Tyranitar | 4.19042% | 115092 | 6.167% | 100002 | 6.762% |
| 48 | Excadrill | 3.55856% | 80754 | 4.327% | 64052 | 4.331% |
| 64 | Torkoal | 1.51255% | 74076 | 3.969% | 67751 | 4.581% |
| 80 | Pelipper | 0.76498% | 72977 | 3.910% | 67098 | 4.537% |

So, yes, huge impact on these mons. We go from Ttar and Drill clearly rising and Torkoal (and to some extent Pelipper) on the bubble, to only Ttar kind of on the bubble. Pelipper pretty much vanishes entirely.

But what about the metagame as a whole?

Besides Ttar, Excadrill, and Torkoal being pushed above the OU cutoff in these 1695 stats for November, it also pushed Walking Wake (on this team too) and Garchomp into OU. 1825 stats would have them in UU.

On the flip side, 1825 stats pull Blissey and Toxapex into OU, while the 1695 stats have them in lower tiers, Blissey even in RU maybe even soon to drop to NU I hear. But stall farms ladder I guess (not surprised) and does really well in the 1825 stats. Blissey is at 9.5% usage in the November 1825 stats even, and Toxapex at 7.6%. Nowhere even close to the cutoff. By contrast in the 1695 stats, Blissey is at 3.6% and Toxapex is at 3.1%.

Is stall actually that good and underrated by the 1695 ladder masses or is it just a case that stall is exceptionally good on ladder? Well, looking to this SCL, Blissey was used 4.9%, Toxapex 3.1%. That's much closer to the 1695 stats than the 1825 stats and is strongly suggestive that stall may just be exceptionally good on ladder while not considered by top players to be as good of a tour bring as the 1825 stats suggest it might be.

This is a Sophie's choice no? The OP says this manipulation is bad because the result of that is that it'll pull Excadrill out of UU into OU, but if we swap to 1825 stats to try to mitigate the impact of this one bot, we pull Toxapex and Blissey out of UU (and lower) into OU too, and it's just not that clear to me that that's actually correct. Using 1825 stats also only even works for as long as the bots remain incapable of achieving a higher rating, but that's not guaranteed to last.
 
Last edited:
Post part 2.
Could SS/US please illuminate what you believe the actual options are then.

Assumptions:
Limitation 1) It's been said many times over the years that alterations to the usage scripts are essentially impossible.
Limitation 2) You're taking any sort of disciplinary action on bots off the table.

Are then the only two options:
1) accept that bots already may be having an outsized effect on usage tiering and that this could very possibly grow over time. (Other users have also noted anti-competitive concerns with the replay scouting and possible inability to safely test tournament teams on ladder lest you run into the bot.)
or
2) abandon usage tiering in favor of a viability based tiering

There's not really anything to "discuss" regarding option 1. That's inaction. So are you just asking people to discuss option 2?

I'm not frankly willing to accept viability tiering as a desirable option. The fact that a few bot accounts spamming games could result a fundamental change to how smogon has tiered for 15+ years does not pass the smell test to me. But I don't think just accepting the problem(s) is really ok either. There has to be another way.

But what other daylight for a solution exists if those two assumed limitations are ironclad? I'm not seeing much of any. The only solution that fits within this current assumed framework posted by anyone in this thread I think has been Stads suggesting using 1825 stats, but as you can see in my previous post, that's not necessarily a cure-all either.
 
Would it be possible just to have bot owners register the accounts via some means on Showdown/Smogon and then have those accounts not count towards usage stats? I can only speak for Playing Foul since it’s the only bot I get matched with in the 1900s-2000s but the owner to my knowledge is a perfectly nice guy who just wants to see how strong bot he can create rather than manipulate anything. Maybe you could have some kind of incentive like an interface that shows more advanced stats for bot owners to work with (e.g. what pokemon the bot struggles to beat for example)

Obviously it would still be possible for a hypothetical malicious actor to operate a network of evil tiering bots off the grid to ruin tiering but idk why anyone would even want to do this and it would take a fair amount of resources to do so.
 
I don't think viability based tiering is a good idea. I think it is great for what RBY has been doing with their lower tiers but I don't think it would transition to current gen very well. If that angle was ever pursued it would be subjective no matter what anyways but VR's themselves differ a lot from person to person. Some pokemon work great for certain people because it fits their playstyle, others less so. There are also some questions that I don't really have a great answer for, some including:
  • What is the cutoff for a mon's rank on the VR to be whatever tier?
  • Who all should be able to contribute to the VR to impact tiering? Limiting it to a small amount of people makes it really easy to manipulate which is a big deal when controlling what tier each pokemon is in. However, opening it up to the public doesn't feel like a good idea either so everyone that would be able to contribute to it would have to get vetted through. Also, getting qualified people to do grunt work like VR's is pretty commonly a challenge. I would imagine this is especially so if you have people making one once every month.
  • Lower tiers are pretty frequently unstable. It isn't feasible to have a solid VR right after huge metagame decisions like banning a mon, etc
I think mixing tournaments in to play a part of tiering is an idea that has some potential though. No matter what, I think ladder should still play some part in it still because the tournaments playerbase probably isn't even the majority of the playerbase. However, I don't really think theres an issue in weighting how much mon a usage gets in tournaments more than ladder. There are actually a lot of quesions for this that some I think have a pretty fine answer, others I am less sure about. The main problems I have with this in mind are:
  • How can this be worked to end up making the lower tiers not stale? Are there always tournaments going on for each tier? Even if there are, circuit tournaments have a lot less motion now than when they used to with team tours taking the spotlight for every lower tier instead. The amount of games to decide a mon's tier would also be significantly lower so progress would naturally feel a lot slower as a result. Alongside that, you don't want the tiers to be too volatile either. If theres any good solutions for this I am interested though
  • What tournaments should be able to count towards usage for tiering? The main team tours in each section and all of their circuit tournaments are all easy inclusions. Should tournaments like BLT or country PL's count for tiering? I think they shouldn't, but they are points of discussion
This mainly makes me question if any change is worth it in the first place though. The current way smogon tiering works is simple and easy to understand. Should a few pokemon rising to OU be enough to justify revamping the entire system we have? I think the answer to that question is ultimately no but ladder tiering sucks too so if theres any good alternatives it would be awesome
 
Post part 2.
Could SS/US please illuminate what you believe the actual options are then.

Assumptions:
Limitation 1) It's been said many times over the years that alterations to the usage scripts are essentially impossible.
Limitation 2) You're taking any sort of disciplinary action on bots off the table.

Are then the only two options:
1) accept that bots already may be having an outsized effect on usage tiering and that this could very possibly grow over time. (Other users have also noted anti-competitive concerns with the replay scouting and possible inability to safely test tournament teams on ladder lest you run into the bot.)
or
2) abandon usage tiering in favor of a viability based tiering

There's not really anything to "discuss" regarding option 1. That's inaction. So are you just asking people to discuss option 2?

Or come up with other options that dont compromise either of the two limitations you mentioned. I'm not the most tech-savvy person to say how feasible the idea is, but an example would be Soulwind's proposal to shift the usage stats we use to tournaments instead of ladder, or to mix and match the two in some way via weighting. I'm not sure if this specific example does butt heads with limitation one, it very well might, but more suggestions for actual solutions are welcome to be put forward. Even if there is an element behind them that would make them unfeasible, there might well still be value gained just from the suggestion and considering how to implement any proposal.
 
Back
Top