• Check out the relaunch of our general collection, with classic designs and new ones by our very own Pissog!

Proposal Ladder Bots and Usage-Based Tiering

Viability based tiering should be a non-starter. Viability Rankings are incredibly raw and unofficial.

They do not fit the mold of our defined metagames (OverUsed, UnderUsed, etc., not OverlyViable, UnderlyViable, etc.), they would require guidelines on what defines viability that are impossible for everyone to agree upon, and would shift the deriving of metagames from the playerbase to a small group of people.
 
I do not like Tournaments becoming the source to derive usage stats too much either. You are decreasing the sample of games dramatically while just reframing the manipulation problem rather than solving it.

If a YouTuber/public figure wanted to get a few hundred people to join OST to use a specific Pokemon (or OU SSNL since the above post mentions circuit tours and they happen more often) or a group of participants wanted to game the system, it is now much more possible to flood the box. Having (tens or hundreds of) thousands of games be the base of tiering leaves you much less susceptible to manipulation than having hundreds (or many a couple thousand during OST months).

You are just taking the issue we have with bots and opening it up to more accessible human manipulation instead. Imagine the Ambipom to the top invading tournaments. Tiering is best when there are a larger pool of games to derive usage from because the more games, the less outliers (bots, spammers, etc.) can skew the overall numbers.

The easy “counter” to this is discounting the first rounds of tournaments as this can serve as a soft barrier for these campaigns, which is something we already do with forcing replays to be posted due to the amount of activity calls in this round, but then you’re halving the total sample of games to derive usage from as well. The smaller the sample gets, the more likely outliers and the more margin for error/good Pokemon being discounted is. It becomes a double-edged sword. Apologies for double post.
 
It feels like there's a lot going on in this thread, but I do agree with Bughouse that the limitations put in place by those in charge of PS leave us with very few possible options. If we are unwilling to prevent bots in any form, then usage based tiering is essentially dead. That's not dramatic, the underlying theory behind usage based tiering just doesn't work if it's bots playing the game.

I also agree with Finchinator that, by and large, usage based tiering has served us relatively well. There's outliers sure, but I don't think there's been anything dramatic enough to justify a shift to VR tiering or mixing tournament stats in with ladder stats.

So still, we are left in a situation where we need to do something otherwise our current tiering methods are going to breakdown sooner rather than later. The worst case scenario is that this is timed with SV ending, so we end up with an entire generation of tiers frozen with bad data and no one willing to play them enough to fix them once the new generation is out.

I'd like to suggest a few things:
1. A council veto for a mon rising when it's obviously due to malicious manipulation. This is effectively the same thing as the Kyurem suspect - a coordinated effort to circumvent regular tiering processes. A single user playing enough games in a low tier to rise a mon isn't malicious, that's an outlier. A group of users coordinating to rise a single mon is malicious. A single bot playing enough games to rise multiple mons to OU is malicious. This can be handled on a case by case basis and I believe it's fairly obvious what is and isn't malicious. Vetoing something because it's subejctively "bad" shouldn't be on the cards.
2. Reactively ban the bots which are currently in place, and allow people to report them. I see no reason why we continue to allow the triple weather bot (and others) on the ladder. It's uncompetitive when used for replay scouting, and also entirely unfun to play against. Both of these will lower ladder activity and make the bot problem worse. I totally understand that PS admins are busy and this adds extra work. So my suggestion here is to also ask for more help - there's an almost unlimited amount of people on this website willing to do grunt work in return for some shiny pixels. If you need manpower to play whack-a-mole with bots, there's a huge pool of it ready to go.
3. Find a way to adjust the scripts which produce the usage stats. Currently we're in a situation where potentially the most important part of this website is a) impossible to edit and b) entirely at the mercy of a single individual. If something happened to Marty (god forbid) then what? We all massively appreciate the work Marty does, but having such a core element of Smogon unable to be improved and at risk of accidental loss is a big risk. Again, I'm positive there's a huge amount of people willing to help with this. Of course, it's more skilled work, but this is a community full of nerds, there will be more than enough software engineers willing to try to tackle the problem. I'm one of them and I'll volunteer my time to help right now. But I've got no idea where to start so I'd need to be shown what the current situation is and pointed in the right direction. If we did this, we can start to understand the feasibility of things like removing certain accounts from the stats, rather than all suggestions being a blanket no.

If we did all these things we would at the very least be able to mitigate the impact bots have. It won't be perfect, but no solution here is. Outright refusal to make incremental improvements to our systems is problematic, especially when a lot of those objections can be solved by asking for help from the community.

Abandoning years of progress and knowledge around usage based tiering due to technical limitations seems like a mistake to me. Let's instead try to fix the underlying issues so we can run the process we want, not the only possible process we're left with.
 
The naivety in this thread is staggering.

Your technical experts who live and breathe the systems you rely upon are telling you in no uncertain terms that a thing is not possible and you believe it is a matter of scale? There is no solution to bot influenced ladders that can be solved by manpower, every argument of scaled capacity is optimized and overrun by any dedicated automated attack vector.

This is not a new issue. Usage based tiering manipulation has been occurring as far back as BW RU when it was individual users singlehandedly driving the rise of underused Pokemon. No policy solution was realized then, you've been fortunate it's taken this long for the problem to rear its head.

You have the following options:

1. Status quo: Accept that usage based ladder stats are inherently subject to manipulation by automated sources and take no action.

2. Replacement: Scrap usage based tiering for other forms of metrics. Tournaments and Viability Rankings are both feasible but flawed as stated elsewhere in the thread.

3. Curate: Accept that usage based ladder stats are risk vectors for manipulation and build in structural safeguards against manipulation.

4. Innovate: Design a new system of tiering entirely.


The third option holds promise to my eyes. The suggestion of having council members approve potential tiering changes seems imperfect, but on the right track. Some form of vetting of tiering changes would be an effective way to counteract potential data manipulation, particularly in the stated case of the OP where individual Mon usage can be tied to specific accounts. Whether that discretion is expanded to cases that are less clear is up for debate, but this is far more a structurally achievable solution.
 
The naivety in this thread is staggering.

Your technical experts who live and breathe the systems you rely upon are telling you in no uncertain terms that a thing is not possible and you believe it is a matter of scale? There is no solution to bot influenced ladders that can be solved by manpower, every argument of scaled capacity is optimized and overrun by any dedicated automated attack vector.

...

3. Curate: Accept that usage based ladder stats are risk vectors for manipulation and build in structural safeguards against manipulation.

The third option holds promise to my eyes. The suggestion of having council members approve potential tiering changes seems imperfect, but on the right track. Some form of vetting of tiering changes would be an effective way to counteract potential data manipulation, particularly in the stated case of the OP where individual Mon usage can be tied to specific accounts. Whether that discretion is expanded to cases that are less clear is up for debate, but this is far more a structurally achievable solution.

No one is suggesting that by adding enough scale we can remove the problem of bots entirely. But a refusal to therefore do anything about the problem (such as report and fix obvious cases like the triple weather bot) on the grounds of it doesn't 100% solve the problem also isn't the right approach to take. What I'm suggesting is exactly what you're saying here, let's Curate and add some safeguards against manipulation. Like reporting and reviewing obvious bad actors.
 
1. A council veto for a mon rising when it's obviously due to malicious manipulation.
I proposed a form of this internally earlier as well as a way to jive with the current system while combatting bot skewed usage. It was met with some (very fair) resistance and should be discussed with depth before implemented, but imo it’s the least messy “fix”. I agree with this being an option to consider.

Basically if a tier leader or council can quickban things already, then this would be the inverse — preventing a usage rise. If X Pokemon was going to rise to OU from UU due to excessive botting, then UU tier leadership and OU tier leadership could have a mechanism to prevent this given the right data.

I would say this is more of a “break glass in case of emergency” mechanism that should never be mistaken for a tool of convenience.



Realistically I do not view bots as an existential threat to the existence of usage based tiering, but I do view them as an unfair way to deprive lower tiers of certain tools depending on the volume and usage of the bots. UU can still exist if 1-2 Pokemon get poached to OU due to a bot, for example, but is it entirely optimal? No.

I do not think an entire overhaul is necessary to combat this when every overhaul method is flawed (see my two posts above outlining the issues with viability based tiering and pivoting to small sample tournament usage). However, we can work within our means and common sense to provide patchwork within current infrastructure.

Last thing: we should stop discussing things that are impossible on the technical end. The goal of these threads is to find solutions and better our processes, not nullify the useful life of discussions with hypotheticals and impossibilities when the technical side already addressed this.
 
Assuming the possibility of a veto to prevent undue rises, could the reverse way be considered?
For example, a Pokémon that is already in a tier like OU, with a margin close to the 4.52% cutoff and would only remain due to usage of bots, would the OU Council have the power to drop it? And would the UU Council have the power to evaluate the Pokémon and place it in UUBL?

Dondozo would be the best example currently, 0.643% above the cutoff, and with a notorious use of bots.
 
While usage-based tiering has been in effect pretty much forever, instances like this and the Hitmontop incident show how imperfect of a system it really is. I'm not gonna advocate in favor of usage-based tiering because I don't particularly like it. Rather I'm going to express why I am against pivoting to a viability-based tiering system.

The viability rankings that we currently use do not translate well into tiering mainly because it puts so much power into the hands of very few people. This is different from our tiering councils who take action on their own in only extreme circumstances where a quickban is warranted. The rest of cases come down to a suspect test where the community gets to vote on what they think is the proper action. This system is also not perfect (much of policy review is covering the issues with them so read that if you're curious what I'm talking about) but it's the fairest system that we have. It creates a bigger sample size where personal bias is not as prevalent.

All of that is to say this: viability rankings are extremely prone to inaccurate ranking, they are tedious to create which will become much worse when they are required to be updated monthly in a timely fashion, and the community as a whole no longer has any direct way to impact tiering. Misrankings are almost always not malicious, rather people's playstyles drastically impact how they vote. I've seen this across many tiers where I have taken part in voting on a VR. There are a couple of infamously poor VRs which aren't too harmful as they tend to be a resource for people new to the tier who just need a general framework, but when it comes to tiering which impacts the highest level of play, that becomes a much more serious issue. On top of that, having to create a new VR every month is awful. Getting one together in 3 days is a miracle with the current VR council sizes tat we have. If we were to implement a viability-based tiering system then we would need to increase those VR council sizes by a large amount making the issue of timeliness even worse.

All that being said, I should put forward a solution instead of just ripping on one side of the argument. I think the path of least resistance is to use the current usage-based tiering system while implementing a veto system. I just think we need to be very cautious on what we veto and the reasonings behind it. It's one thing to have a mon in a tier while being unviable, it happens quite a lot and that's honestly fine in my opinion. I take issue when unusual circumstances lead to an outcome where the usage-based system fails miserably.

I am open to hearing how a viability-based framework could be applied but I haven't seen one that wouldn't be a massive downgrade from the status quo which I find unacceptable when we have ways to improve upon the status quo without it.
 
I suggest implementing a hybrid model that retains the objectivity of usage based tiering, integrates the subjective experience of tier experts, and mitigates the influence of Bot data.

1. Implement a two pronged approach to handling Bots. Create a reporting structure with PS to identify bot accounts. Define a policy that allows bot usage but specifically identifies that bot usage data may be removed from the tiering data set. Create a data script that identifies extreme outlier account data sets that indicate potential bot activity for review.

2. Retain usage based tiering, but change the approach to tier movements. Identify a threshold, higher than current, by which usage based tiering applies automatically, after controlling for outliers. Identify a threshold current or otherwise, which serves as the trigger point for potential tiering action. Example range 3-5% where 3% triggers a review, 5% triggers an automatic move.

3. Determine tiering action for Mons in the Review range based on subjective criteria including council expertise, tournament results, and consensus viability. Limit the scope of influence to borderline cases which may or may not be reflective of the competitive state of the tier. This mechanism can be existing tier councils or a new administrative structure.


In principle, this approach should intertwine the value of usage based tiering in reflecting the state of the ladder and synthesizing large data set analysis with the value-add of merit-based opinion with controls in place to limit the influence of non-user introduced data.

There is no benefit to relitigating tiering as a whole, nor to argue over the minutiae of evidence backed performance if tournaments or viability were taken wholesale, but included as part of a package of controls can provide added benefit over and above the mitigation of outlier data sets.
 
Viability based tiering should be a non-starter. Viability Rankings are incredibly raw and unofficial.

They do not fit the mold of our defined metagames (OverUsed, UnderUsed, etc., not OverlyViable, UnderlyViable, etc.), they would require guidelines on what defines viability that are impossible for everyone to agree upon, and would shift the deriving of metagames from the playerbase to a small group of people.

To one of your minor points, if we would switch to viability-based tiering, we can still absolutely keep the OverUsed, UnderUsed, etc. names, even if we're not going off usage anymore. The names are historically entrenched in Smogon's identity and changing them would be purely cosmetic, unlike changing the tiering system, which is looking to actually fix issues with the system.

---

To the more major point, while Viability Rankings may seem "raw and unofficial," the beauty of the method that I mentioned earlier is that you get a pretty objective consensus among the voters, and a good picture of what is viable, even if defining viable is hard. There doesn't need to be definite definitions of viability that everyone agrees on. You just rank the Pokemon approximately in order, and then the script compiles the rankings and spits out groups of Pokemon that are similar to each other. These ranks are then defined as S, A+, B-, whatever, to fit what people are used to. In RBY, we use stuff like "S1," "B2," "C3," etc. instead of +, neutral, - ranks, because of a possibility of there being like, B1, B2, B3, B4 or something based off what the program spits out. But it's still easy to follow in my opinion.

To quote vapicuno:

"Q: This is a pain. Do I really need to rank every Pokemon precisely?
A:
No, the beauty of the method is that you don't need to precisely rank things you don't know! Just rank approximately. If you can't properly rank C-tier mons, but know where the cutoff is from D-tier mons, and everyone else kind of agrees with you where the cutoff is, the tiers will be formed correctly regardless of how you ranked it."

This means that if you have a selection of voters that are ranking, say, 100 Pokemon each, your voters do not need to stress about whether Great Tusk is specifically one spot "more viable" than Kingambit. As long as they are in the same general area (say, well above Pokemon like Chansey and Quagsire), and there's a general community consensus about that, then Great Tusk and Kingambit will be ranked in the same group. The data is "cleaned" to account for outliers too, so if one or two of your voters think that Kingambit is overrated and actually sucks, it's not going to affect the rankings all that much, especially if you have a decent number of rankers.

About individual ranks within groups: it is ordered within the group at the end, so the results will tell you community consensus about which one specifically is "more viable" I guess, but at the end of the day every Pokemon in the group should be "relatively indistinguishable" from another Pokemon in the group. The individual ranks are just used to calculate rises and drops from the previous VR. I see that the current OU VR is not ordered within ranks, it's ordered alphabetically. If you really want to, I suppose that you could just remove the numbers and still keep it like that, since Great Tusk and Kingambit should be "relatively indistinguishable" from each other within their group, so it doesn't necessarily matter which is ahead of the other. But if you wanted to be more specific like RBY and say that yes, Exeggutor is one spot "more viable" than Starmie despite being in the same group, you're free to do that as well.

---

I'm not arguing that viability-based tiering is without its flaws. As you say, it does change who holds "the power" from the whole playerbase to much fewer people, ones that are "qualified" to be a part of the VR process. But then again, the whole purpose of this thread is that one person/bot (or a group of people/bots) is disproportionately influencing tiering anyway at the moment. It seems like a damned if you do, damned if you don't situation in that someone or some people are going to have more influence than others over the tiering process. In my opinion, I would actually rather have it be people that know and are passionate about the tier rather than a literal brainless bot. This is probably going to get called elitist or oligarchical, especially by people on the outside looking in, but that's how I feel.

As I said before though, I don't have all the answers. I don't know how to determine who would get to vote, or how often (because as stated before, monthly seems like a hell of a chore). But I did want to give a defense against the naming conventions argument, as well as the argument that defining viability is impossible. I understand the reticence towards overhauling the current system, but there's clearly stuff about it isn't working as intended and making people unhappy, so I'm just making a potential suggestion based off something that works well for older gens.

Regardless, I feel like if the bot issue really is unresolvable according to PS! tech people, something should change about the system. And if that means a complete overhaul, so be it.
 
I think there are larger problems with bots than just usage-based tiering that I haven’t heard much about.

Because of the sheer number of games 1 bot can accumulate, scouting becomes incredibly easy for anyone behind the bot, and it bypasses the “don’t allow spectators” that dealt with the bots behind the Double Kick Terrak incident. Testing teams on the ladders for tournaments is an incredible risk, which limits the amount of prep players have.

Then there are suspect tests. The bots are not only playing inhuman amounts of games a day, but have gotten good enough to reach high ELO. Anyone looking to get reqs but don’t wanna engage on the ladder can just have a bot get the reqs for them. This tarnishes the competitive nature of ladder and suspect tests in general. If we wanna dodge another Kyurem suspect situation, this is something we must address.

Usage-based tiering has improved overtime even if its not perfect (Braviary in NU has more usage than good mons like Wo-Chien cause of one player). I would rather address the real root of the problem rather than overhaul an entire system just to fix it. It wouldn’t fix the other issues ladder bots present.
 
To one of your minor points, if we would switch to viability-based tiering, we can still absolutely keep the OverUsed, UnderUsed, etc. names, even if we're not going off usage anymore. The names are historically entrenched in Smogon's identity and changing them would be purely cosmetic, unlike changing the tiering system, which is looking to actually fix issues with the system.
You cannot claim the names are “historically entrenched” in Smogon’s identity without also accepting that usage based tiering is also “historically entrenched” in Smogon’s identity. The tiering system is what prompted the names. This is drifting away from the larger points, but if we changed the system, we would reframe the metagames to reflect what they represent.
In RBY, we use stuff like "S1," "B2," "C3," etc. instead of +, neutral, - ranks, because of a possibility of there being like, B1, B2, B3, B4 or something based off what the program spits out. But it's still easy to follow in my opinion.

To quote vapicuno:

"Q: This is a pain. Do I really need to rank every Pokemon precisely?
A:
No, the beauty of the method is that you don't need to precisely rank things you don't know! Just rank approximately. If you can't properly rank C-tier mons, but know where the cutoff is from D-tier mons, and everyone else kind of agrees with you where the cutoff is, the tiers will be formed correctly regardless of how you ranked it."
This application works a lot easier in largely proven metagames with over a decade of set-in-stone history.

I frequent CG OU, voted in every CG OU VR slate besides one for the last ten years, and disagree with ultimate rankings every single time. I respect the other OU VR council members, but sometimes there is a huge range of opinions and sometimes there are variable levels of comprehension, experience, and justification. You are asking for an awful lot out of this imperfect and limited process.

This doesn’t even get into the biggest issue: tiering going from being sourced from the entire playerbase participating in our metagames to a small group of people handpicking outcomes with their votes. Even if you expand VR councils, which will be hard to do while maintaining quality, you’re going to have less than 1% of the prior contributors to usage tallies making the full decisions on where things go. This screams non-starter to me and that is coming from me…I am an insider who theoretically would gain a ton of influence over a shift like this. It does not feel representative at all.

Finally…You use Kingambit and Great Tusk as examples in your post. They will always be OU no matter what system you use, but there are less blatantly obvious Pokemon that can be thrown into disarray. We have shifts that some people experience during SPL or on ladder in January or February, but by the time others really experience them, it could already be March or April. There have been Pokemon that get ranked A by half the people and C+ or B- by others. This is commonplace — it pops up in many slates even later in generations. It isn’t a matter of right or wrong either, but rather one of trying to fit a square object into a circular hole when using viability to dictate tiering placements.
 
You cannot claim the names are “historically entrenched” in Smogon’s identity without also accepting that usage based tiering is also “historically entrenched” in Smogon’s identity.
I feel like this is incredibly pointless to argue for because PU doesnt even follow this, but I digress.

I really think just keeping the status quo is the absolute worst option to take as it just accepts that bots can influence tiers and renders low tiers to being subject to the whims of people with bots rather than the actual playerbase. Plenty of fixes have been suggested and while I do not agree with a lot of them I feel they are all vast improvements over the current system. However, I will discuss some that have popped out to me.

1) Letting a LT Council Veto Rises That are a Result of Bots
This one is pretty self explanatory, but would require there to be some sort of way to tell which account is a bot and which isnt. I do generally like the idea with this one but itd require factors that may not be feasible to attain. If anyone has further comments that could explain how this would be done I would be enthusiastic to read them. For the record, I dont think this "giving power to a small group of people" is a real argument given how the current system is letting one person mess with things.

2) A mix of VR and Usage Based Tiering
I suggest implementing a hybrid model that retains the objectivity of usage based tiering, integrates the subjective experience of tier experts, and mitigates the influence of Bot data.

1. Implement a two pronged approach to handling Bots. Create a reporting structure with PS to identify bot accounts. Define a policy that allows bot usage but specifically identifies that bot usage data may be removed from the tiering data set. Create a data script that identifies extreme outlier account data sets that indicate potential bot activity for review.

2. Retain usage based tiering, but change the approach to tier movements. Identify a threshold, higher than current, by which usage based tiering applies automatically, after controlling for outliers. Identify a threshold current or otherwise, which serves as the trigger point for potential tiering action. Example range 3-5% where 3% triggers a review, 5% triggers an automatic move.

3. Determine tiering action for Mons in the Review range based on subjective criteria including council expertise, tournament results, and consensus viability. Limit the scope of influence to borderline cases which may or may not be reflective of the competitive state of the tier. This mechanism can be existing tier councils or a new administrative structure.


In principle, this approach should intertwine the value of usage based tiering in reflecting the state of the ladder and synthesizing large data set analysis with the value-add of merit-based opinion with controls in place to limit the influence of non-user introduced data.

There is no benefit to relitigating tiering as a whole, nor to argue over the minutiae of evidence backed performance if tournaments or viability were taken wholesale, but included as part of a package of controls can provide added benefit over and above the mitigation of outlier data sets.
This post kinda interested me as the potential for mixing both systems could allow us to maintain usage based tiering while also being able to stop mons from getting stuck in tiers they absolutely do not belong in. The method of implementing this has many potential interpretations, but I think its the best option no matter how its interpreted.

To give a different interpretation, use the VR to determine benchmarks for rising. To give an example,
S to A-: Same usage benchmark
B+ to B-: Higher benchmark
C+ to C-: Even higher
D/Unranked: In my ideal world this just wouldn't be allowed to happen.

This may be a bit screwy and would make the system favor drops far more than rises but, to be honest, this feels like it would make it easier for lower tiers to function rather than constantly ripping each other apart off the whims of the low-activity ladder. It would also somewhat help to fend off the bot problem by making it more difficult for them to cause rises. This system would most likely need to be paired with one of the other solutions mentioned (allow for reporting bots, somehow find a way to remove bots from usage, etc), but i think combining Usage and VR for tiering with these other solutions would allow for the influence of bots to be mitigated quite a lot. Obviously, the rankings and benchmarks associated with them are loose and up to interpretation, but I feel more ideas for VR + Usage Tiering is nice to have, and again I think keeping the system how it is would be a ridiculous decision given how many issues it already has combined with the bot problem.
 
Re: Preventing bots from affecting usage stats

As previously stated in this thread, if people want to run bots that play ladder games, they will. If we don't want bot games to be counted in stats from ladder games, the only viable way to do this (IMO) is to let them play in a way that won't affect the stats.

The simplest way to do this would be to ask people to register their bots as bots via some form somewhere, combined with manually marking those that weren't registered as bots but clearly are, and disclude these from usage stats.

I think the optimal solution here would be what lichess.org has implemented, where anyone can play against any bot that a community member has set up (https://lichess.org/player/bots) - assuming that this would be enough to sway bot writers from putting them on the ladder. But this would also undoubtably be a large amount of work for PS! devs, so I think it's unlikely to be the implemented solution to this problem.

Re: Tiering

During my (admittedly very outdated and indirect) experience as a member of the Monotype council during ORAS, tiering was a matter of which mons were too strong for our metagame, free from the constraints of which metas above ours were using which pokemon.

I've always enjoyed seeing how a mon viable in one meta sees distinct sets in weaker metas, such as Quagsire in ORAS OU through NU. I certainly understand why we tier based on usage, but it still seems like it would only detract from the lower metas if some big threat in OU lead to the mon getting enough usage there to prevent its usage in lower tiers.

All this to say: Usage based tiering has its benefits, but my personal preference would be for tiering based on what would be too strong for the tier below. I'm aware that this would both solve and accentuate the problemts of labels like "UUBL": pokemon clearly OU in both usage and viability might simultaneously be usable in UU, while pokemon not viable in OU might be considered OU. This would require major refactoring of how we label pokemon. But I think it gives the greatest ability for tiering councils, alongside suspect test voters, to create the most playable and enjoyable metas.
 
Last edited by a moderator:
The bot problem extends beyond tiering - people use them to scrape replays to do scouting for tournaments beyond a level that is considered reasonable by most of our community, and some bots are basically just climbing ladder by taking forever every turn until opponents quit.

I get that we can’t implement a bot filter, but why not simply make it against the rules for users to use these bots? If a user is found to be using bots to flood the ladder, have some form of punishment for them. I think this would help to at least discourage things. We don’t need to play wack a mole against every ladder bot ever, but if you read this thread or any discussion about bots anywhere, it’s clear it’s just a few annoying people causing this problem.

Regarding tiering, there’s no perfect solution, but VR based tiering is nonsense. I think keeping the status quo but giving tier leaders an emergency option to block a rise is reasonable as long as it’s only done for super extreme circumstances.
 
I get that we can’t implement a bot filter, but why not simply make it against the rules for users to use these bots? If a user is found to be using bots to flood the ladder, have some form of punishment for them. I think this would help to at least discourage things. We don’t need to play wack a mole against every ladder bot ever, but if you read this thread or any discussion about bots anywhere, it’s clear it’s just a few annoying people causing this problem.
I was wanting to make a post and had thoughts similar to this because I agree it's a fairly good, low stakes first step for this. People will be naturally turned away from using bots if we just entirely ban it from the site, so there will be less of them to deal with and some of the very obvious ones that people are very well aware of such as the ones that have already been brought up a lot can be manually removed. I understand that it's a bigger task than it may seem for people to properly identify bots and have them verified as bots, so it likely isn't realistic to actually ban every single bot, but having a rule in place I think can go a long way and can make the required work lower. The bots impact on tiering is definitely not ideal because in theory, especially with this thread existing and the problem having more eyes on it, someone could be running 100 scripts right now to entirely change every tiering placement if they really wanted to just to be malicious. Bots are also just annoying to play against, I don't ladder usage based lower tiers too often so I don't firsthand see the impact they have but last time I tried playing AG ladder a month or 2 ago I queued into the same bot in 3/5 games in a row since it was late hours with less activity, and legit would rather just wait way longer for a human opponent since the bot took the exact same line in every game and I had to do the exact same sequence of clicking for like 15 mins total between 3 games. I don't think a single positive thing has been said about ladder bots in this thread or in any discussion on discord so I think literally anything that could lower their numbers or dissuade people from making them is a good step, so there's no real harm in just making a rule.

As for the actual problem of tier shifts, I haven't looked at the specific code nor am I a good enough programmer to help out with any changes that might be made, so I'm not entirely sure what can be done compared to what would fix the issue. I think weighting a players game count is the simplest way to go about it conceptually but maybe not coding wise, not only would bots spamming 24/7 games for 3 months have less of an impact but also tiering couldn't be changed as much by 1 person just getting 5000 games with the same team of their favourite pokemon. If someone plays over a certain amount, probably tier dependent to what would have a significant impact on tiering, then how much their games count for usage wise could drop. If the infrastructure isn't there for that to work with our current usage program setup, I think it might also be valid to change the cutoffs again. The usage cutoff to rise a tier could be increased, but the 3 month drop cutoff doesn't have to follow it. That way if someone spams a mon then it'll be harder to get it to a new tier (The torkal example from the weather bot barely makes it with current cutoffs), but it wouldn't punish genuine viable mons in a tier that get like 4.7% usage or something but we raised the cutoff too high. The quickdrop cutoffs are already lower than the 3 month rise and drop numbers so I don't think having them be different is that bad.

I think VR based tiering shouldn't really be a discussion, but I do think there's some merit in putting tournament usage into the formula, just weighted slightly higher since obviously like 500 tournament games from a circuit or whatever is no where close to the 100k or however many ou gets these days. Probably not ideal since in tournaments you're building for an opponent not a ladder setting but also teams tend to just be more optimized so they give a better idea of what's good generally. Probably the lowest priority option I'd say since I think there are issues with deeper roots than just us needing to add more high quality games, given the current sentiment something has to change with either the bot situation where feasible or with the tiering system.
 
With regards to Bots,

Banning bots is an easy win for Smogon and PS! communities, positively affecting everything from casual laddering to top-end tournament play. I don't think there are any positives to allowing bots that can mass-queue ladder games for scouting or other nefarious reasons like the subject usage rates impacting tiering.

--
With regards to Tiering,

The below is taken directly from the Tiering Policy Framework:

"We cater to both ladder players (the higher end of the ladder) and tournament players.
Tiering actions should not be taken based solely on one group’s experience (ladder or tournament) unless the issue is egregious and clear-cut."

I think it's worth noting that to qualify for suspect reqs you have to meet a minimum Elo/GXE/games played threshold established by the respective tiers council/leadership, but to affect tier shifts that are arguably far more impactful you just have to play a lot of games manually or with a bot, doesn't matter which.

Regardless of what is done / technically feasible, something clearly ought to change with tier shifts given the likely rise in bots that will be impossible to identify with 100% certainty. I personally don't hate the idea of having councils more involved in approving/vetoing potential tier shifts but council members are also pretty bad sometimes which is why you get stuff like RE: Council Member Expectations & Accountability and general disdain from communities at how their respective councils just aren't in tune with the player base. Probably best to circle back to that at some point and see if both of these can be addressed in tandem.........
 
I run the SV OU VR, I definitely don't think it's a good idea using it as the tiering basis, even with a hybrid of tiering + VR. There's a lot of room for bias and just general mistakes. For example last slate we just forgot that Porygon-2 is being used in OU on TR and nobody nominated it from UR to ranked. This is not an issue of 'lack of knowledge' or a 'bad VR team' either, as this happened after we revamped the team adding top level players like separation 3d and several others. The point is these errors will happen regardless of the level of skill a team has because its a small amount of people. To catch these errors 'everytime', to ensure an atleast decent ranking you'd need like 30+ minimum people per voting slate, and in addition the people on said VR council would have to be of a higher bar due it effecting the entire tier. Finding 30 top level players (and this is like minimum) to vote on 100+ Pokemon, and to do this routinely at that is unrealistic and that's ignoring the fact that 30 or even 40 is a still small sample size in the grand scheme of the website on top of biases and malicious intent having bigger influence this smaller sample size. There's also the aspect of how replicable is VR tiering. OU could do it, barely, but what about smaller communities? Does NU or ZU have the resources to give VR's accurate enough for tiering? This is not a knock on smaller communities but rather a logistics issue, are there 30-40 month to month active top level players in said tiers that would give the time to rank 100+ Pokemon? I do not know but if hypothetically OU could barely manage that month to month, I can only imagine it would be more difficult to give the same level of accuracy with a smaller pool of resources. This is a problem because tiering is supposed to be uniform, with how usage works even in less played tiers things can be weighted accordingly but with direct human involvement this can't exactly be curtailed. Again since this is tiering there also cannot be delays, and as know these huge projects and resources and even tournaments can suffer from delays. It'd be a bad look if tiering is delayed a few days or a week because of management problems of the resource, and this is more likely than you'd think if we are going to put much much more effort torwards it for tiering. I don't think using the VR is a viable optiob for tiering, the best course of action is council veto's unless someone comes up with a better idea.
 
Last edited:
Would it be possible just to have bot owners register the accounts via some means on Showdown/Smogon and then have those accounts not count towards usage stats? I can only speak for Playing Foul since it’s the only bot I get matched with in the 1900s-2000s but the owner to my knowledge is a perfectly nice guy who just wants to see how strong bot he can create rather than manipulate anything. Maybe you could have some kind of incentive like an interface that shows more advanced stats for bot owners to work with (e.g. what pokemon the bot struggles to beat for example)

Obviously it would still be possible for a hypothetical malicious actor to operate a network of evil tiering bots off the grid to ruin tiering but idk why anyone would even want to do this and it would take a fair amount of resources to do so.
Seconding this. While the worries brought up in the thread make sense, I don't think we have any much evidence of malicious actors trying to manipulate usage stats using bots on any large scale. While the possibility is obviously there, its kind of always been there, only changing recently due to how much better the bots have gotten. Most of the people who make and use bots I've seen or talked to, do it mostly just for the science of it. I'd love for them to give their input in this thread, and theres no reason to believe some agreement can't be found.

There is the more worrying possibility of bots being good enough to get suspect test reqs, I have actually seen evidence of this being possible, but it is for another thread I suppose.

As for unfair ladder scouting... ladder games are not really private. You can't ladder expecting the games you play to stay a secret. Theres some nuance to this of course, its understandable that people are annoyed by this, but I don't think it is a huge problem either.
 
Politely asking bot owners to register their bot is a band-aid solution to a wound that needs stiches. What if a bot owner is largely off-site and isn't aware of the option to register their bot on smogon dot com? How will we even recognize lesser-known bot owners in the first place? Tech admins and whatnot have repeatedly already brought up that catching bots in a consistent and healthy manner is not possible. A suggestion was floated earlier of tapping into the wealth of smogon users who are all too willing to throw themselves into the grinder of unpaid labor, in order to populate a workforce capable of manually reviewing and handling bot reportings, but this suggestion itself comes with it's own labor-intensive startup costs of taking that pool of badgehunters (who let's face it already skew unreliably), and vetting them to see who'd be both fit for PS! global staff positions, as well as those who could be further trusted to thoroughly do their job to analyze individual cases to make sure that it is indeed a bot they're looking at and not the many edge cases that pop up as a result of having an supermassive playerbase. My guess is that the number of fit individuals post-vetting would be quite small.

In regards to a bot report system, perhaps it might be good to establish that it is possible to report cases to PS! admins or whatnot to get them booted, but I'm not a fan of the idea of saving a single replay, saying "hey this guy is a bot" and expecting the admin to do the rest. In order to not overload those people with work, reports should probably be multiple instances compiled, in addition to probably some analysis done on the part of the ladderer to demonstrate why it's reasonable suspicion.

In regards to the likelihood of someone actually devoting the time and resources to maliciously creating a bot that would disrupt usage stats, it's not a problem of if but when. If a structural weakness remains, you'll eventually have someone with enough unhealthy obsession with this game that they'd go and do it. Better to address the issue now than hastily scramble to get a solution only after someone abused the system. Furthermore, the far more likely and equally if not arguably more problematic outcome is that you have bots being used to cheat suspect tests. Hardstuck 1300s LadderScrub69 thinks that Mega Chunguzoid is/n't broken so they hand a bot off to all their 1100s friends and unfairly sway the results of vote in one way or another.
 
A suggestion was floated earlier of tapping into the wealth of smogon users who are all too willing to throw themselves into the grinder of unpaid labor, in order to populate a workforce capable of manually reviewing and handling bot reportings, but this suggestion itself comes with it's own labor-intensive startup costs of taking that pool of badgehunters (who let's face it already skew unreliably), and vetting them to see who'd be both fit for PS! global staff positions, as well as those who could be further trusted to thoroughly do their job to analyze individual cases to make sure that it is indeed a bot they're looking at and not the many edge cases that pop up as a result of having an supermassive playerbase. My guess is that the number of fit individuals post-vetting would be quite small.
Since it's come up a couple of times, I may as well note this sums up accurately most of the issues with recruiting other users. With regards both to a reporting process in general and recruiting others to do it specifically, it should be noted that the access necessary to accurately verify a user is a bot is limited to the single most tightly controlled set of permissions on the site, with four people currently having such access (zarel, chaos, me, and marty).

In short: giving that out for this purpose is not an option.
 
While usage-based tiering has been in effect pretty much forever, instances like this and the Hitmontop incident show how imperfect of a system it really is. I'm not gonna advocate in favor of usage-based tiering because I don't particularly like it. Rather I'm going to express why I am against pivoting to a viability-based tiering system.
Wanted to touch on this.

I think that if we want to truly fix the problem brought up in the thread, we need to fix the root cause - the fact that a certain person can manipulate tiers simply by playing a ton of games at a high enough ELO. ZU just lost Braviary to NU simply because one ladder player was spamming it on the NU ladder - and wouldn't you know it; there went a great balance option. Incidents like these are simply unacceptable - one tier should never be harmed because of a single player or bot spamming their favorite Pokemon to infinity and managing to influence usage statistics. I understand these cases are few and far between, but there has to be a safeguard against situations like these as a whole to solve all the issues present in the tiering system currently.

To propose an actual (if radical) solution, why don't we only have the first 2-3 cycles of shifts after a new generation or DLC be based on usage stats? After that, I believe we should freeze rises completely and only allow suspect test bans (and quickbans) to be the way for something to rise a tier. This would require a major change to tiering, but it would nearly-completely eliminate all problems regarding essential yet unbroken Pokemon rising and destabilizing a metagame.
 
Last edited:
I'm retired, but thought I could add a positive suggestion to maybe help:

A) Was brought up by someone earlier, but you could cap the amount of games (per day) that a user contributes to usage (like 100 per day for example) before any amount over that is ignored. This isn't meant to target bots, it's meant to make a statement that no single user bot or not should have more than 100 games influence (or whatever the number is) on usage stats. It is "technically" consistent with usage based tiering to let a single user influence more than 5% of stats by themselves, but not really consistent with the spirit of usage based tiering for any single user to do that in such a lopsided manner. Does this solve the issue? No, of course not. People can make new alts, new bots etc. But please note that nothing solves the issue entirely, and this will at least help reduce a lot of the abuse from literally a thousand games per day from one bot/user. Which brings me to the next point:

B) Most bot owners are researchers/technology students. They're not malicious. A lot of the negative connotation in this thread about bots is overstated. We shouldn't be banning bots, they have an actual legitimate purpose for scientific and technological advancement of information systems. We should not be stifling technological advancement simply because Excadrill might move to a different two-letter Smogon Dot Com category. That doesn't mean we should do nothing to help preserve the tiering system (see point A above), but I think we need to relax a bit on the bot attacks.

C) You can make botting for malicious purposes a bannable offense, as people have earlier suggested. That does not mean you need to actively police this, it just means that, there's a rule that you can't be malicious with your bots. This is common sense and I can't imagine why there would be any pushback on this. It would almost never come into play anyways, it's more of a deterrent than anything else that is reserved for only a very obvious well-known malicious bot that may come up.

D) Re: tier shifts for ladder vs tour play: you could very easily do a mashup (honestly I'm surprised it's never been done) where you have, say, 80% of cumulative ladder stats count towards usage, and the other 20% is counted by cumulative tour stats for only tours of the highest level of play (SPL/SCL/WCOP for OU, the official PLs only for lower tiers e.g. UUPL NUPL). It takes an extra step to compile but usually usage stats re already compiled for these tours very quickly. So weighting them like this seems very feasible. Ladder play should always be part of the equation though.

That's all, hope everyone has a happy new year
 
Hey. I don't know shit but I am involved with some tiers that use viability based tiering so thought maybe to give my thoughts. I do generally think Viability Rankings end up being relatively accurate. I will say from my experience that stuff can get a bit more hazy when it comes to borderline Pokemon. I run ADV PU which goes by viability based rankings considering these lower tiers never had usage rates. A lot of middling Pokemon have stayed PU but at the same time, we don't update it as frequently and eventually some do trickle down to the tier below, but it is something to note.
For example last slate we just forgot that Porygon-2 is being used in OU on TR and nobody nominated it from UR to ranked.
I would like to ask something here; has there been anything that theoretically would affect tiering if we did go by viability based rankings? Something like Porygon-2 doesn't really matter at all since it would have to have a pretty significant ranking in OU to rise. If we did go by viability based rankings, I would imagine this would fall through the cracks less since the thread would have more people trying to raise their thoughts and add Pokemon they think are OU worthy to the rankings considering it would now affect tiering structure. Maybe that's also a downside though if people flood the thread with nominations.
Does NU or ZU have the resources to give VR's accurate enough for tiering?
I would also say yes, I do think they would. Infact, it would definitely give them a kick up the ass to get this stuff done since for lower tiers, Viability Rankings + other resources matter a lot since they are the premier way to making a tier more accessible to newer players by telling them what works and what doesn't. I think VR's never have to be perfect immediately. Even in like, new lower tier metas, people will be using a lot of different Pokemon, some of which may not be good. I think its probably similar here where people just give their overall thoughts and then eventually it will even out a bit.
 
Politely asking bot owners to register their bot is a band-aid solution to a wound that needs stiches.
You're proposing stitches for a wound that doesn't exist yet. Again, the structural worries about what could happen due to malicious bot usage are somewhat justified and understandable, but they do not represent any actual, currently real problem. Addressing the structural issues can be done, but it shouldn't be blown out of proportion to what is truly happening.

The worries brought up in the OP with regards to usage stats seem mostly to be the accidental consequences of someone's for-fun-and-science project. We might not even need to make them register their bots, we could just ask them to voluntarily cap their number of ladder games per cycle to a certain amount, or just scale down their activities in the case of the most active bots.

With regards to adressing the structural issues, emergency powers in case of obvious malicious action seems like a good start. So does adding rules disallowing the malicious usage of bots, even without any active policing of these rules, just to catch extremely obvious cases (user admitting to it or intending to do it for example?).

With regards to the possibility of future suspect tests being cheated, I think OU leadership should talk with the bot makers to get a good understanding of exactly how realistic that possibility is at a given moment and what the limitations of the bots are. We can imagine a theoretical future where to prevent something like a bot version of the Kyurem suspect scandal the GXE requirements to vote on the suspect are raised based on this information. Hopefully it won't ever come to that though.
 
Back
Top