Partially Implemented SV OU Suspect Reform

i'm not particularly good at the game so i'm not sure if what i have to say here holds any weight, but i absolutely agree with lowering or getting rid of GXE requirements in favor of setting a high ELO requirement. raising the minimum GXE reqs does up the difficulty but not in a particularly desirable way in my opinion, and it honestly only exemplifies some of the issues i have with the current system; laddering only becomes even more of a slog and at the end of the day you're likely still gonna end up at around 1600-1700s by the time you're done. on the contrary, i find ELO reqs to be much more indicative of your knowledge and skill at the game without forcing you to farm low-ladder every time you suffer an early loss to someone else's alt. now of course i also think there should be additional requirements to ensure that you have some level of consistency, like maybe having a max number of losses or a lowered GXE threshold. i'm not quite sure which path should be taken here but nonetheless i am strongly in favor of ELO reqs being used in future suspects. thanks for listening to me yap, have a good friday y'all :)
 
I'm going to raise this question again, as it really hasn't been addressed: why do we need to raise the voting requirements?

If the answer is "For the sake of raising them" then that's just excluding people for the sake of excluding them, which is going to depress involvement in tiering decisions for no apparent benefit whatsoever. I've not seen anything to suggest that "weaker" players are more likely to be "wrong", even given the actual impossibility to define what "wrong" is in community tiering.

From my perspective, it looks like pure elitism, top players wanting their votes to count more. That's fine if there's some justification behind it, but every post so far has just assumed it's beneficial without explaining why.
 
Back with some numbers thanks to Hecate
1727731002918.png

RD<70 corresponds to less than 30 games, i would think its about 20-25
w+l+t just means total games played at least 50

Assuming that this is players' "natural" gxe and can be raised slightly by trying harder (though everyone will raise by a similar amount) here are some implications:
- The current voter pool represents a little over 1% of the player base
- Raising the gxe floor from 80 to 82 would cut at least half of the voter pool out

similarly we can look at lower thresholds
1727731419090.png

If the test was to switch to use COIL, its likely that some lower gxe players would be able to qualify with higher game counts as COIL does not set a gxe floor at 50 games but at something very large like idk say 500. Therefore this can be an example of the amount of players that might be able to vote with a slightly more relaxed requirement using COIL if they put in the time (about triple current counts at a maximum)

thoughts
- The suspect pool already only lets the top 1% vote easily. Assuming the interest in voting is about the same regardless of skill then this would mean for the 137 voters in the last suspect test there were a maximum of 13,700 people that could have been interested in voting if they had the skill. I would question any opinion that the voter pool is not elite enough. If it needs to go from top 1% to top 0.5% then what is really the difference between those 0.5% voting and the council members just making the decision?
- Though sustained effort can be counted on to raise your rating in a suspect, I wouldn't think it would result in a huge increase for most people, especially across 50 games needed to reach the floor. As shown, by the time you get to 50 games the higher gxe bands lose a lot of people (probably also corresponding to higher ladder climbs)
 
why do we need to raise the voting requirements?
First thing is that current reqs aren’t deemed appropriate by a clear majority of people (see: this thread, virtually every discord discussion over the last week, the suspect thread itself, etc.), so looking under the hood and discussing change is welcome. Part of my job as leader is accepting when a status quo is no longer sufficient for shifting community needs and being open minded to changing it. That is how our community interacts with each other and ultimately improves itself.

It isn’t strictly about raising the requirements — it is partially about that for sure (and more on this later), but partially about just finding an appropriate mechanism to determine who qualifies to begin with! It is clear a lot of people dislike the current approach with GXE intervals per games played and it is clear the current threshold is not cutting it for many people either. If a lot of your community takes issue with a practice and provides sound reasoning, you have to reflect on that and be open-minded as a leader. Cannot stress enough how being resistant to change has been a problem on Smogon and how we are trying our best to avoid that moving forward.

Speaking on why slightly stricter reqs are at the forefront of discussions: To put it bluntly, there have been dozens of complaints raised to me firsthand about the quality of voter pool; I have counted 9 alone since the start of the Kyurem suspect before this thread. This has been a thing since the start of the generation, but grew exponentially over the last couple of suspects.

I am not going to throw specific people under the bus, but these complaints range from users pointing out certain voters spreading misinformation or very questionable opinions while qualifying for reqs to others taking a large amount of games against mid-ladder to barely qualify. I think these are fair perspectives, I think elitism is always going to be interchanged with quality control in these discussions, and I think it is impossible to find an ideal degree of inclusion without engaging in difficult discussions like this (which is why having this thread is so important).

A slight raise in quality in return for a slight decrease in turnout isn’t really a bad thing for me given this, but it may take some trial-and-error and it mandates discussions like these, so let’s remember this discussion is all being done in good faith, not as a power grab — implying that it is done as a power grab feels disingenuous.

I don’t agree with some extreme measures like going up 2 whole GXE or even mandating it to the top 20 of ladder (this is where 1900 ELO falls currently, for those proposing that), but I do think exploring other avenues will lead to finding a more appropriate pool of voters.
- The current voter pool represents a little over 1% of the player base
Finally, this is “1%” stuff does not matter to me when you consider 90% of the playerbase doesn’t even know about Smogon/tiering well and 95%+ of the community doesn’t know better. This “1%” is catchy framing more than anything else, to put it bluntly. Just to be posting in this thread, you need to be in the top 1% of Smogon accounts, but when you realize >95% of accounts never come close to badges or hundreds of posts, it becomes more understandable.

I can play someone all the way up in the 15-1600s with a low 70s GXE and they will not even always know what is up for suspect or why it matters much. Your own post alludes to 75% GXE being the top 3% of the ladder…and I think that kind of proves how these general numbers just do not hold weight as nobody holds any weight to 75% GXE. If anything, this just proves the point that we need to move away from GXE as the sole decider of reqs.
 
A slight raise in quality in return for a slight decrease in turnout isn’t really a bad thing for me given this, but it may take some trial-and-error and it mandates discussions like these, so let’s remember this discussion is all being done in good faith, not as a power grab — implying that it is done as a power grab feels disingenuous.

I don’t agree with some extreme measures like going up 2 whole GXE or even mandating it to the top 20 of ladder (this is where 1900 ELO falls currently, for those proposing that), but I do think exploring other avenues will lead to finding a more appropriate pool of voters.

Those extreme measures are why this looks like an elitist "Make my vote count more" move - Pais first called out "1800-1900" and he's got upvotes from top players, so it's not an isolated view.. Right now, 1900 would be good for 22nd on the ladder; even the low end of this range, 1800, would tie for 108th.

Pais isn't the only one to suggest this number, he's just the first, and I don't believe he checked the ladder before throwing out the number, but judging by the upvotes, "literally be a top 100 player to earn requirements" is a fairly popular stance. As an off-the-cuff eyeball for who is deserving to vote, that's exceptionally exclusive, and I don't think there's anything disingenuous about suggesting that elitism is a major driving factor here.

As a comparison point, top 200 would be 1756, and a nice even 1700 ELO would rank 405th.

Finally, this is “1%” stuff does not matter to me when you consider 90% of the playerbase doesn’t even know about Smogon/tiering well and 95%+ of the community doesn’t know better. This “1%” is catchy framing more than anything else, to put it bluntly. Just to be posting in this thread, you need to be in the top 1% of Smogon accounts, but when you realize >95% of accounts never come close to badges or hundreds of posts, it becomes more understandable.

I can play someone all the way up in the 15-1600s with a low 70s GXE and they will not even always know what is up for suspect or why it matters much. Your own post alludes to 75% GXE being the top 3% of the ladder…and I think that kind of proves how these general numbers just do not hold weight as nobody holds any weight to 75% GXE. If anything, this just proves the point that we need to move away from GXE as the sole decider of reqs.

If the second NatDex Terastalization vote showed us anything, it's that non-Smogon showdown players do have their own communities, and that they aren't automatically worse players. They may not know about the suspect, but they're still good at mons, and accumulating a good ELO/GXE/COIL/etc. requires the Smogon user to beat them. If the goal is to test skill and understanding of the meta, and our subjective measure is based on proving yourself some degree better than the average, then it's the skill of those non-Smogon players that matter, not their (lack of) community involvement.
 
Last edited:
Those extreme measures are why this looks like an elitist "Make my vote count more" move - Pais first called out "1800-1900" and he's got upvotes from players like ABR and blunder, who are among the few who could reach the high end without issue. Right now, 1900 would be good for 22nd on the ladder; even the low end of this range, 1800, would tie for 108th.

Pais isn't the only one to suggest this number, he's just the first, and I don't believe he checked the ladder before throwing out the number, but judging by the upvotes, "literally be a top 100 player to earn requirements" is a fairly popular stance. I don't think there's anything disingenuous about suggesting that elitism is a major driving factor here.

As a comparison point, top 200 would be 1756, and a nice even 1700 ELO would rank 405th.
I agree 1900 is too high; I even cited it in my post:
I don’t agree with some extreme measures like going up 2 whole GXE or even mandating it to the top 20 of ladder (this is where 1900 ELO falls currently, for those proposing that), but I do think exploring other avenues will lead to finding a more appropriate pool of voters.

Your response is…holding specific other poster’s beliefs and the people who like their posts against me despite me not agreeing with them? I don’t give a shit what posts ABR likes — I promise this isn’t helping us make a determination lol

And you didn’t even acknowledge the core of my post about entertaining discussion. This feels like you should be dissecting their posts and logic rather than going after me…
If the second NatDex Terastalization vote showed us anything, it's that non-Smogon showdown players do have their own communities, and that they aren't automatically worse players.
Nobody is saying they are worse though!!! I agree and love that these communities exist. This whole bit is twisting and reaching when I never said anything on the contrary in my post. Come on now
 
A problem I've always with suspect test laddering is that anyone can seemingly grab a strong Hyper Offensive team from the samples and grind the ladder super hard without any prior knowledge of the tier, getting reqs, voting on a Pokemon, only to never play the tier again. Maybe people do this just to participate in a discussion, maybe they do it because they want certain badges, idk, but i know i've done this plenty in the past and know other people do it too. idk a specific way to fix this or if other people even have a problem with it.

maybe it's an idea for OU to also start hosting suspect tours, or give automatic requirements to people who have proved themselves recently with tournament results? like with post-gen votes on pokemon, like with black & white voting on cloyster recently. people may say that's unfair to ladder players but i'd argue that someone who has recently proved themselves in a high level tournament has more knowledge and proven skill than someone who just hops on ladder only when the suspect is up. just throwing out some ideas here cause i think this is an important discussion to be had and we should question the effectiveness of ladder suspect test reqs in its entirety cause there's obviously some greater flaws.
 
Speaking on why slightly stricter reqs are at the forefront of discussions: To put it bluntly, there have been dozens of complaints raised to me firsthand about the quality of voter pool; I have counted 9 alone since the start of the Kyurem suspect before this thread. This has been a thing since the start of the generation, but grew exponentially over the last couple of suspects.

I am not going to throw specific people under the bus, but these complaints range from users pointing out certain voters spreading misinformation or very questionable opinions while qualifying for reqs to others taking a large amount of games against mid-ladder to barely qualify. I think these are fair perspectives, I think elitism is always going to be interchanged with quality control in these discussions, and I think it is impossible to find an ideal degree of inclusion without engaging in difficult discussions like this (which is why having this thread is so important).

A slight raise in quality in return for a slight decrease in turnout isn’t really a bad thing for me given this, but it may take some trial-and-error and it mandates discussions like these, so let’s remember this discussion is all being done in good faith, not as a power grab — implying that it is done as a power grab feels disingenuous.

This is an excellent reason to consider GXE to be a poor proxy for knowledge, and a weak reason to raise the floor.

There is no way to directly measure a player's knowledge of the meta (and the one time it was tried led to the Shadow Tag suspect disaster), so skill is used as a proxy, which is reasonable; I certainly don't have any better options to suggest. The selection of GXE is fatally flawed for two reasons, though:

1) It rests on the assumption that you'll be up against players who are at their true level, which isn't the case with so many suspect alts on the ladder, and losing against another suspect account early on means starting over.
2) The fact that "Start over if you lose a game in your first 15/20" is standard practice means that the intended goal of GXE - measuring consistently good play - doesn't apply, even without the first problem.

COIL mitigates those problems by allowing an account to simply play more games, bringing consistency back in the mix; it doesn't much matter if you drop one game to another suspect account at 1100 ELO when you can play extra games when finding yourself just shy of reqs. It also allows slightly weaker players to make reqs if they're willing to put in the time, and since the entire system is meant to serve as a proxy for knowledge and playing more games will teach the player more, I regard that as a benefit.

ELO mitigates those problems because you don't get unduly punished for an early loss, and it requires every player to put in a significant number of games to climb the ladder, thus ensuring they are voting from current experience.

I didn't address this topic before because I was in a hurry, and I consider it a good change. Very marginal players riding a winning streak would need that winning streak to come toward the end of their run, and anyone who is riding a winning streak at 1600+ has proven they aren't actually a marginal player, so as long as we're using skill as a proxy for knowledge, COIL and ELO work to fix the problems where simply raising the floor fails.

When the way we are measuring skill is so fatally flawed, then the answer isn't to raise the minimum measurement, it's to replace it.

Your response is…holding specific other poster’s beliefs and the people who like their posts against me despite me not agreeing with them? I don’t give a shit what posts ABR likes — I promise this isn’t helping us make a determination lol

And you didn’t even acknowledge the core of my post about entertaining discussion. This feels like you should be dissecting their posts and logic rather than going after me…

Nobody is saying they are worse though!!! I agree and love that these communities exist. This whole bit is twisting and reaching when I never said anything on the contrary in my post. Come on now

You're right, I was in a rush and phrased my point poorly.

My point is that top players are reflexively tossing out numbers that are, in and of themselves, unreasonably high, and that such a view isn't a one-off. THAT is why I suggested that this topic seemed largely driven by elitism - it wasn't a criticism of you, it was explaining my view.

Also, I genuinely do not understand the point you were trying to make about why you don't regard "top 1%" numbers to be meaningful. 90% of Showdown players may not know or care about OU tiering policy, but...so what? I don't see where you're going with this, if not to dismiss those players entirely. Was the intent just to cast shade on using GXE?
 
1) It rests on the assumption that you'll be up against players who are at their true level, which isn't the case with so many suspect alts on the ladder, and losing against another suspect account early on means starting over.
just to clarify this isn't as big of an issue with gxe as it takes into account both self and opponents deviation (eg how confident their rating is accurate) when calculating changes so actually it does not rest on this assumption at all. And with the games requirement we pretty much can assume most players with 80 gxe at 50 games could maintain that past that point
 
In an effort to promote a more "balanced" suspect voting regiment in lower tiers, COIL was reintroduced, and seems to statistically cover a lot of the issues brought up in this post (in theory...). I like the premise of COIL fundamentally as a system that rewards high-level players with a lower minimum game count (if you're winning 25-30 straight you probably know a fair amount about mons/the meta in question) while still being somewhat inclusive of players who aren't quite that "good". People willing to play 50-100+ games on a new alt to make an impact in the community should have the right to do so, assuming they meet some level of qualifications, which a strictly-GXE based system fails to meet in some regard.

Below is my submission for a COIL-based suspect, with the left table representing a quick overview of the values chosen (COIL 3110 / b 3.2) and the right table demonstrating the games required by GXE (rounded) up to 85 games. This particular data set does make the requirements a bit more difficult as people, myself included, have wanted from suspect tests, but does not exclude others from actively participating if they can meet the qualifications. In the Spoiler below, I included some minor tweaks for reference.

1727970984487.png


1727972604214.png

3070/3.2 is slightly more forgiving over a higher quantity of games, for those who want to be a bit less "elitist", while still promoting basically a minimum game count of 25-30.

1727972374697.png

3070/4.5 is way more elitist up front, promoting a realistic minimum game count of probably 30-35 for the best of the best, while allowing the 50-75 game count crowd the opportunity to still get reqs in essentially the same fashion as before.

1727972232055.png

2950/6.7 is extremely brutal up front, making 35-40 game minimums realistic, while still promoting a conservative 78 GXE in 82 games "floor".

I don't think any particular system, using any arbitrary values decided by council members / staff / whomever, will be perfect. People who want to contribute will do so as meaningfully as they can, as we've seen plenty of PL tournament-level players not participate in suspects while new accounts will show up and drop 28-2 runs as their first message. It would be cool though, as somebody who has a somewhat vested interest in Smogon and the communities I participate in, to know that potential changes to the metagames I play are likely being determined by a group of similarly invested people who have proven their qualifications to some extent (an extent we'll never be able to determine definitively).

Hope you had a good birthday Finchinator, playing now btboy
 
Greetings to you all,

I have less experience than most of you when it comes to addressing this debate, but I'd still like to offer my modest point of view.

Each suspect test involves a small group of players, usually between 100 and 150, who decide the fate of a tier that is played by thousands of players every day. The current formula seems elitist enough to me, and far from being too easy (I understand that is what it's all about), so I don't think it's necessary to make it even more demanding.

For instance, I'm very active on French Facebook groups for competitive Pokémon. I know a lot of players from these groups, most of whom are also very active and concerned by Smogon, with a fair knowledge of the SV OU metagame and the game in general, and I can confirm that many of them have tried the suspect tests lately and failed to reach the required GXE. As such, maybe I'm wrong, but I don't feel that those who have succeeded in their suspect tests so far have anything else to prove.

In any case, if the formula were to change, I agree with some of the opinions I've read above that increasing the GXE requirement wouldn't increase the difficulty in a reasonable way. You all know that your GXE can be undermined by a game lost on a bit of ‘misfortune’, or more specifically in the case of SV OU, on a random Tera that changes the course of a game you had in hand. These are factors you can't always control, no matter how good your understanding of the game is. So, I think that, if you really want to change the formula (which, as I said, I don't believe is necessary), a suspect test based on Elo, possibly with a maximum number of games to be played, would be a preferable option than a suspect test based on a higher GXE.

Thanks for your attention and enjoy the game!
 
My personal goal is to resolve this thread during this week. It’s important we get this done right while not stifling the progression of the tier.

I would like to use the next suspect to test any and all suspect reforms and we just finished ironing out the details on the separate suspect thread. I do think another suspect will come in the near future, too, so this is becoming timely.

If anyone has specific proposals, speak soon.

For me, I hear a lot of people discussing ELO and I hear you. I think having the suspect itself include games against actually competent teams and players rather than potentially stopping at 16-1650 ELO is a good thing. If I’m barely facing the suspected Pokemon or barely engaging with high-level games, then the point of the suspect is partially defeated after all. There should be a certain degree of core competency tested in order to get reqs, even if it requires a few more games played along the way. Maybe there’s an avenue for pre-qualification for people already achieving high marks on the ladder, but I’ll defer more transformative takes on that for others if they please and focus on my own proposal for now.

I was thinking of something along the lines of an ELO+GXE component. This means you need to play enough to reach somewhat high on the ladder and need to be at least somewhat efficient in doing so — arguments for COIL resonate with me to an extent, too.

What comes to mind is needing 1750 ELO while getting 80 or 81 GXE (or higher or course). This is a bit more rigorous than the current reqs as it will require a few more games of good results against competent teams/players to hit the ELO goal while maintaining a respectable GXE number.

1800+ ELO as a first step is probably a non-starter as that’s only the top 100-120 of the ladder as of right now and with the sheer volume of suspect reqs ladder players, it’s going to cause more harm than good. I am not opposed to reevaluating with each step and keeping this thread open of course.

80 GXE is the current baseline and 81 GXE is a small step up. I think going any further when we are already pushing an influx higher on ladder would be too much for a first step and any lower would defeat the point.

This proposal takes current reqs and forces people to play a little longer at a higher level to assure competency. It may cut down turnout slightly, but I am convinced most people good enough to get reqs now can improve to the point where they get these if they try hard enough — and with improvement comes better understanding, so I view this as a plus.

This is strictly my own proposal and not one representing the entire council.
 
1750 ELO while getting 80 or 81 GXE
We are going to try these minimum thresholds for the upcoming suspect in all likelihood — still discussing which GXE point internally.

The full plan will be this: we use next suspect as a trial to feel out the good, the bad, the unexpected, etc. and then, when we feel it is appropriate, we will re-open this thread for discussion.

This is a relatively conservative first step, but it leaves us room to keep going and potentially undergo a larger overhaul on an as-needed basis heading into next generation. I do think we are not incredibly far from a good solution, but there have been a lot of creative ideas that have been discussed here and those aren’t being ignored at all.
 
Last suspect we:
  • Made reqs slightly more rigorous by forcing people to engage more with higher ladder as opposed to farming lower only (ELO component)
  • Assured the quality control concerns that previously existed and surfaced here were maintained (maintaining GXE component)
  • Maintained similar voter turnout despite these changes (126 for this Gliscor test while we had 137 for Kyurem and 110 for Gouging Fire prior)
  • Implemented a Qualified Discussion thread (over 70 posts from those capable of getting voting reqs)
  • Implemented an automated reqs confirmation and voting system (kills blind voting and by-hand verification in favor of an automated process that tabulates and stops the threat of voter fraud -- big thanks to the people running Smogon and PS for their technical wizardry)
There were a lot of changes that made a longstanding process more modern, safe, and reflective of growing community needs. While the technical side is undoubtedly an upgrade and I am relieved Smogon has such a great team to make this possible, there is definitely still room to debate and potentially improve on other fronts. The OU moderation team (and perhaps the OU council) will discuss the suspect thread split and the future of that, but this thread was used specifically to discuss SV OU voter reqs.

Here is some feedback I received throughout the last suspect, where we used the 1750+ ELO and 80%+ GXE for reqs:
  • Lot of people saying they faced the suspect or competitive teams more throughout the later portion of getting reqs
  • Lot of people complaining it took more battles to get reqs, thus taking up more of their time
    • This was part of the point of the change and forcing people to actually prove their competency naturally takes more time investment
  • Handful of people requesting reqs ditch the GXE component in favor of strict ELO
  • Handful of people (some overlap with the last point) requesting ELO be made higher than the 1750 baseline
  • Couple of people requesting:
    • We revert back to old reqs
    • Lower the GXE requirement
    • Up the GXE requirement
    • Lower the ELO requirement
    • Switch to COIL
    • Probably some other things I am forgetting
To me, the 1750 ELO + 80 GXE requirements were a conservative step away from the prior reqs that put ELO on the map. I am fine maintaining them, but also open to another small step in a number of possible directions.

Please use this thread to provide feedback on what direction you feel is best. Have a nice day!
 
Back
Top