• Check out the relaunch of our general collection, with classic designs and new ones by our very own Pissog!

Rejected Comprehensive Alternatives for Suspect Tests in Lower Tiers

Status
Not open for further replies.

FlamPoke

is a Social Media Contributoris a Community Contributoris a Tiering Contributoris a Top Contributor
PUPL Champion
Hello everyone, this is my first post in the Policy Review forum and I've tried to do my due diligence regarding the topic I'm bringing up, so please bear with me.

Several threads have been generated regarding the topic of suspect tests across the forum as a whole, with many pertaining to specific tiers, proposals, and ideologies. This topic has been gone over many times and was acknowledged at some point recently(?), with lower tiers like NU and ZU accepting revised suspect test requirements (80% winrate over a minimum of 32 games won), though UU, RU, and PU did not subscribe to this change in their suspect tests performed in May/June of this year.
This post serves to cover this new 80% winrate idea, address current issues with lower tier suspect test requirements, and hopefully standardize some approach across all lower tiers moving forward - with regards to the current gen, at least. Some conversations have tried to steer the requirements in the direction of accepting recent tournament play, including ladder tours, which seems practical but could also invite some unintended consequences I'll try to address below.

80% Winrate over a minimum of 32 games

The first thing I'll address is something that appears to have been addressed already but, to my knowledge, isn't really public information - the 80% winrate idea. This seems pretty self-explanatory but it requires you to play a minimum of 32 games on the ladder, using your freshly made suspect alt, at which point you can qualify for the suspect test by having a winrate of 80% or greater - 32-0, 32-8, 40-10, etc. In my opinion, this seems like a better alternative to the current GXE requirements system because you can't really dictate who you end up playing against in lower tiers and will likely get stuck playing the same people multiple times throughout your suspect run. On top of this, you also will likely play people who are lower on the ladder and have a lower GXE and ELO, meaning wins won't do much for your cause but losses will set you back significantly - see attached from my recent RU suspect run where I was 27-3 and still far below the 82 GXE requirement at 30 games. In fact, look at this kind of problem I had all throughout my run - 26-3 was 80.1, 27-3 was... 80.2. I had a game between games 19 and 20 where I won and did not receive a rise in GXE at all, which I didn't know was mathematically possible at such a low quantity of games, but I digress.

1718899501920.png
->
1718899527144.png
->
1718899538785.png

All of this is to say that GXE, despite being the generally superior system for evaluating relative skill, is not without flaws in certain situations such as these lower ladders. To rectify this, a winrate like the proposed (and tested) 80% seems to have some genuine merit. NU had 19 votes on their trial suspect test using this system, one shy of the vote that occurred in October of last year and six more than the vote in July of last year. Similarly, ZU had 19 votes on their recent suspect, a whopping twelve more than the last suspect they ran in December of last year and five more than the suspect run in September of last year. These figures demonstrate a pretty favorable result so far, allowing more users to get suspect requirements without, in my estimation, making the requirements too easy. I lack a quantifiable way to acknowledge the quality of players voting on these suspects, but winning 80% of your games across 32 or more feels like it requires a level of time commitment and skill to accomplish.

The downside to this system is obvious - you can play 32 fresh, 1000 ELO players across your run who are using entirely unviable teams and have little understanding of the mechanics of Pokémon as a whole. This definitely isn't a likely outcome but the thought of playing 32+ very low ranked players and getting requirements still exists and will be used as an argument against this idea, reasonably so, though it is more likely on lower ladders that a handful of committed players will be active enough to force the person going for suspect reqs to play well enough to earn them. Another argument against this system is simply the quantity of games needed - if you play well enough against highly rated opponents, you can get a GXE of 80-82 or greater in 25-30 games, which is what the current system often requires. However, the new system requires a minimum of 32 games, and losses are proportionately more impactful than wins, regardless of opponent. For instance: if you create a fresh alt for the suspect and win your first 20 games, you're off to a wicked start. If you then encounter higher-ladder players and go something like 10-10, you're now at a record of 30-10, meaning you're just shy of the requirement and most likely stuck in the awkward middle-ladder stage of playing against any sort of opponent. If you're 'lucky', you can farm a few wins against low ladder opponents. If you're 'unlucky', well, you might lose a few more against the top players on the ladder and be facing pretty unfavorable odds of getting that 80% on this account. The difference between 30-10 and 32-12 is enormous with regards to a winrate-based system.

Ultimately, this system is still favorable to me over the current GXE requirement because it rewards people who are willing to play ladders that aren't the friendliest in regards to matchmaking at times while still requiring a level of skill and time commitment that should quantify you as 'qualified'.

Recent Tournament Success allowing Suspect Qualification

This kind of ties into two ideas - ladder tours and other tournaments as a whole. Ladder tours are fairly simple in practice: you create an alt and ladder with it during a timeframe where other players who wish to participate in the tournament are also doing so. At the end of a defined cycle (typically one week increments), the top 'x' (6, 8, 12, etc.) players in ELO from that cycle are then put into a standard tournament bracket where they play similarly to how other tournaments operate. Ladder tours do not have any prerequisite for GXE (besides tiebreaks) or any level of skill-qualification beyond using ELO, which doesn't really quantify skill. There will be people who play significantly more games than others are able to and they will qualify because they happened to win enough games to get a high ELO. There will similarly be people who have GXE's that establish themselves as high-quality players who do not qualify during any cycle because they can't/won't commit the same level of time to grind out the number of games needed. Giving people with the time to play more games a vote in suspect tests doesn't seem practical if the goal of the suspect tests is to filter out the less-qualified opinions.
My opinion on ladder tour results as a basis for suspect requirements is simple: they shouldn't be considered unless they run concurrently with the suspect test and the player qualifies for the suspect requirements. Ladder tours generally reward people who play a lot of games far more than they sift out the players who's opinions have merit (in the context of suspect voting), so allowing one alt to qualify for both is acceptable if done at the same time in my opinion.
One thing Feliburn brought up was using previous ladder tour accounts to qualify for current suspect tests, which I think is bad because, in regards to the specific test he was bringing up, the ladder tour cycles 1/2/3/4 all had totally different meta's based on other tiering action that occurred at the time and people who qualified in different cycles probably had different opinions based on what they felt at the time of their playing. Cycle 1 ended in mid-February but the vote for this suspect occurred in mid-March, for reference.

Tournament success, on the other hand, has legitimate merit for this conversation... but also some considerable downsides. For reference, the most recent tournament that will come to mind for most is Smogon's Grand Slam, which has separate tournaments for each of the official lower tiers like UU/RU/NU/PU that reward points based on placement and converge into a top-cut bracket similar to ladder tours. These tournaments are often home to many players of varying skill levels and familiarity with the tiers they're playing - 'tour mainers' will sign up for each tier and try to do well enough in any of them to qualify for the playoffs, usually relying on teams built by others they know or using samples, while 'tier mainers' will probably stick to the tiers they know and maybe dabble in some others for fun, probably using their own teams.
The top 24 or so players in each Open (a general figure based on a recent survey performed in RU where Top 24 in RU Open was considered qualified) will be great players with established knowledge of competitive Pokémon... but they aren't necessarily qualified to vote on specific tiering actions because many of them will likely be 'tour mainers' who are... just good at Pokémon. This standings sheet that Grand Slam uses shows what I'm talking about pretty well - look at the number of players at the top of the standings who have good records in all of the tiers. Again, people who are really good at Pokémon will do really well in these types of tournaments, but they likely aren't knowledgeable about the specifics of the individual tiers enough to vote in a way that promotes actually balancing a tier. Inviting more votes from these highly-skilled players seems reasonable, but unless they spent time to learn the tier well enough to have meaningful opinions, probably doesn't promote healthy outcomes.


Closing Thoughts

My personal feeling regarding this entire topic is simple: how do we incentivize the people who want to contribute to a tier to commit to doing so while still promoting a level of inclusiveness that allows for the votes to matter? To me, winrate accomplishes this as much or greater than GXE does in this capacity. You still have to play 30 games minimum in most GXE-driven suspect tests, so the qualified player-base isn't really playing that many more games, which ultimately shouldn't skew the outcomes. Lower tiers have pretty unfriendly ladders at times and using GXE seems more exclusive than it does inclusive.

I remain open to other alternatives and am hoping that this post, along with the recent trial run in NU using winrate over GXE, spurs some kind of action in the community. I hope that we collectively can decide on something that promotes healthy tiering across the board and makes the game more enjoyable. I look forward to any responses from people who bothered to read this all and I thank you all for your time.
 
Winrate is far worse than GXE for reqs. In an ideal ladder system, you'll end up with a roughly 50% winrate as the game continuously tries to match you with evenly matched players, and GXE is trying to find ur winrate on a given ladder as a representative win chance regardless.

For the record, this has been a topic of conversation among tier leaders for several weeks (sparked by my personal outrage that someone had a GXE of 83.8 in the NU suspect and still had to play 60 games for reqs) and we are working on something that should be announced in the near future that will resolve the concerns of this OP. Just waiting on some final technical details before we do so. We do not currently have any desire to use tournament reqs as standard practice for official current gen tiers though.
 
Winrate is far worse than GXE for reqs. In an ideal ladder system, you'll end up with a roughly 50% winrate as the game continuously tries to match you with evenly matched players, and GXE is trying to find ur winrate on a given ladder as a representative win chance regardless.

For the record, this has been a topic of conversation among tier leaders for several weeks (sparked by my personal outrage that someone had a GXE of 83.8 in the NU suspect and still had to play 60 games for reqs) and we are working on something that should be announced in the near future that will resolve the concerns of this OP. Just waiting on some final technical details before we do so. We do not currently have any desire to use tournament reqs as standard practice for official current gen tiers though.
I appreciate your response and understand that, in a vacuum, GXE is superior to winrate by a significant margin given the complicated formula involved that accounts for factors such as strength of opponents, quantity of games, etc. I'm also happy to hear that I was right in assuming that TL's were working on something behind the scenes.

However, given the practical limitations of GXE in ladders with lower activity levels, winrate feels generally better in establishing the player as qualified while promoting a far more inclusive process. In the NU suspect you referenced, there were 19 players who qualified. Below are the metrics for how they got there:

32 wins - 15 players, 5 of which had GXE's greater than 82 (and, given most suspects have an 80 GXE at 40 games requirement, there were 10 who qualified in this regard as well)
33 wins - 2
35 wins - 1
48 wins - 1

So, as you mentioned, one player was an outlier and took a whopping 48 wins (60 total games) to get their reqs, despite displaying their competency and qualifications with an outstanding GXE given the tier they were playing. I'm not sure of their win:loss ratio at any point during the run, but I assume they were close to 80% winrate most of the way.
This is a fault of the winrate system and will inevitably happen, but my point still stands (and I think this helps support it in some capacity) that winrate is favorable over GXE given the amount of people who qualified at 32 wins vs the one outlier who took 48 (which, in the GXE system, many people took 40-50-60+ games to accomplish the GXE goal).

Again, GXE is the definitive best system we have for establishing how good a player is, but the small sample size during suspect runs (usually 30-50 games) means it is more prone to error on ladders with less activity at the top. Winrate might allow people to get qualifications despite an 'easier' run, but GXE might prohibit certain people from ever getting reqs despite being competent enough to have their opinions matter.
In the run I referenced, I was consistently queuing into 1100-1300 ELO players while I was at ~1500 ELO, which in RU is pretty high ladder. I played during relatively peak hours (8am-12pm -5, with roughly 80-100+ people in the RU room on PS!, pretty high relatively speaking). I won most of my games and saw minimal results from them, instead clawing my way back after every loss set me back significantly. I did not lose early, starting 16-0 and then getting to 24-1, and I maintained my high ladder status for the entire second half of my run. I should not have to play more than 30 games to have gotten that GXE given what I did, but I did, and it was unfortunate and demonstrates exactly the kind of misery that lower ladders can (and do) subject players to at times.
 
I would like to float the idea that having 2 qualifying requirements over a recent period of time should allow you to vote in a suspect test. To give an example of what I mean, uu recently ran suspects on kommo-o and pelipper at the same time as uult. I personally believe that anyone who achieved reqs for the pelipper suspect and qualified for uult independently of that has demonstrated enough competence to vote in the kommo-o test, and having multiple time consuming ladder events close together can lead to people being less motivated to achieve reqs. My proposal would be that anyone who has met requirements (eg by qualifying for a ladder tour or getting suspect reqs) within the last month and met requirements at least one other time within the last 3 months would be eligible to vote in a suspect test.
 
One problem I have with the suspect test format is that it seemingly discourages actually doing what a suspect test is supposed to accomplish, and none of the options here really fix it. The problem is that when you need to pass a ladder skill check in order to prove your value as a human being, are you going to try a bunch of options, experiment both with and against the mon in question and try to research the best counterplay options? No, you're going to use the best team you have and try to grind out 30-50 games with a heavily skewed win/loss ratio. The big problem is that learning often requires losing, and both the gxe requirement and the win rate requirement heavily punish losing of any kind. GXE will usually take 2 wins to offset any loss you take and the win% requirements are even more harsh. I don't think anybody comes out of a suspect ladder run with a better understanding of the tier, they just have the biases of whatever team they grinded with reinforced.

I don't have a solid answer in mind, but I feel like a better suspect test requirement, at least for lower tiers, is one that would emphasize a consistent win rate over more games. Perhaps the range of qualifying GXE could be expanded so playing 60 games could let you get even less GXE or we could switch to an ELO qualification. Something like getting to ~1400 gxe in RU takes a lot of skill even if you do grind out 100 games like a degenerate, but ELO is a system where you can afford to lose a couple games with an experiment then quickly get your rank back with a few wins, while with GXE or an 80% win rate requirement you will need to win 4 to 8 games to make up for 2 losses.
 
One problem I have with the suspect test format is that it seemingly discourages actually doing what a suspect test is supposed to accomplish, and none of the options here really fix it. The problem is that when you need to pass a ladder skill check in order to prove your value as a human being, are you going to try a bunch of options, experiment both with and against the mon in question and try to research the best counterplay options? No, you're going to use the best team you have and try to grind out 30-50 games with a heavily skewed win/loss ratio. The big problem is that learning often requires losing, and both the gxe requirement and the win rate requirement heavily punish losing of any kind. GXE will usually take 2 wins to offset any loss you take and the win% requirements are even more harsh. I don't think anybody comes out of a suspect ladder run with a better understanding of the tier, they just have the biases of whatever team they grinded with reinforced.

I don't have a solid answer in mind, but I feel like a better suspect test requirement, at least for lower tiers, is one that would emphasize a consistent win rate over more games. Perhaps the range of qualifying GXE could be expanded so playing 60 games could let you get even less GXE or we could switch to an ELO qualification. Something like getting to ~1400 gxe in RU takes a lot of skill even if you do grind out 100 games like a degenerate, but ELO is a system where you can afford to lose a couple games with an experiment then quickly get your rank back with a few wins, while with GXE or an 80% win rate requirement you will need to win 4 to 8 games to make up for 2 losses.
I don't play any lower tiers, but this post resonated with me as I recalled the multiple times I tried (and failed) to get reqs for suspect tests for OU during Gen 6. I was a decently competent (though not amazing) player, but the amount of games I had to play before eventually giving up was frankly kind of ridiculous because I'd lose sometimes of course.

I am not going to pretend to understand the underlying systems and math because they are far beyond my expertise, but would it be feasible to have the reqs be something like "have a 50%* winrate after [x] games" after hitting a certain ELO? Obviously we don't want to allow reqs to be too easy to achieve because we want some sort of filter, but we also don't want people grinding a dumb amount of games (or making alts because they got GXE screwed) and trying to keep the win rate high without tilting.

*Doesn't strictly have to be 50% fwiw, just threw a lower number out that wasn't a negative w/l ratio lol.
 
Last edited:
Status
Not open for further replies.
Back
Top