Hello everyone, this is my first post in the Policy Review forum and I've tried to do my due diligence regarding the topic I'm bringing up, so please bear with me.
Several threads have been generated regarding the topic of suspect tests across the forum as a whole, with many pertaining to specific tiers, proposals, and ideologies. This topic has been gone over many times and was acknowledged at some point recently(?), with lower tiers like NU and ZU accepting revised suspect test requirements (80% winrate over a minimum of 32 games won), though UU, RU, and PU did not subscribe to this change in their suspect tests performed in May/June of this year.
This post serves to cover this new 80% winrate idea, address current issues with lower tier suspect test requirements, and hopefully standardize some approach across all lower tiers moving forward - with regards to the current gen, at least. Some conversations have tried to steer the requirements in the direction of accepting recent tournament play, including ladder tours, which seems practical but could also invite some unintended consequences I'll try to address below.
80% Winrate over a minimum of 32 games
The first thing I'll address is something that appears to have been addressed already but, to my knowledge, isn't really public information - the 80% winrate idea. This seems pretty self-explanatory but it requires you to play a minimum of 32 games on the ladder, using your freshly made suspect alt, at which point you can qualify for the suspect test by having a winrate of 80% or greater - 32-0, 32-8, 40-10, etc. In my opinion, this seems like a better alternative to the current GXE requirements system because you can't really dictate who you end up playing against in lower tiers and will likely get stuck playing the same people multiple times throughout your suspect run. On top of this, you also will likely play people who are lower on the ladder and have a lower GXE and ELO, meaning wins won't do much for your cause but losses will set you back significantly - see attached from my recent RU suspect run where I was 27-3 and still far below the 82 GXE requirement at 30 games. In fact, look at this kind of problem I had all throughout my run - 26-3 was 80.1, 27-3 was... 80.2. I had a game between games 19 and 20 where I won and did not receive a rise in GXE at all, which I didn't know was mathematically possible at such a low quantity of games, but I digress.
->
->
All of this is to say that GXE, despite being the generally superior system for evaluating relative skill, is not without flaws in certain situations such as these lower ladders. To rectify this, a winrate like the proposed (and tested) 80% seems to have some genuine merit. NU had 19 votes on their trial suspect test using this system, one shy of the vote that occurred in October of last year and six more than the vote in July of last year. Similarly, ZU had 19 votes on their recent suspect, a whopping twelve more than the last suspect they ran in December of last year and five more than the suspect run in September of last year. These figures demonstrate a pretty favorable result so far, allowing more users to get suspect requirements without, in my estimation, making the requirements too easy. I lack a quantifiable way to acknowledge the quality of players voting on these suspects, but winning 80% of your games across 32 or more feels like it requires a level of time commitment and skill to accomplish.
The downside to this system is obvious - you can play 32 fresh, 1000 ELO players across your run who are using entirely unviable teams and have little understanding of the mechanics of Pokémon as a whole. This definitely isn't a likely outcome but the thought of playing 32+ very low ranked players and getting requirements still exists and will be used as an argument against this idea, reasonably so, though it is more likely on lower ladders that a handful of committed players will be active enough to force the person going for suspect reqs to play well enough to earn them. Another argument against this system is simply the quantity of games needed - if you play well enough against highly rated opponents, you can get a GXE of 80-82 or greater in 25-30 games, which is what the current system often requires. However, the new system requires a minimum of 32 games, and losses are proportionately more impactful than wins, regardless of opponent. For instance: if you create a fresh alt for the suspect and win your first 20 games, you're off to a wicked start. If you then encounter higher-ladder players and go something like 10-10, you're now at a record of 30-10, meaning you're just shy of the requirement and most likely stuck in the awkward middle-ladder stage of playing against any sort of opponent. If you're 'lucky', you can farm a few wins against low ladder opponents. If you're 'unlucky', well, you might lose a few more against the top players on the ladder and be facing pretty unfavorable odds of getting that 80% on this account. The difference between 30-10 and 32-12 is enormous with regards to a winrate-based system.
Ultimately, this system is still favorable to me over the current GXE requirement because it rewards people who are willing to play ladders that aren't the friendliest in regards to matchmaking at times while still requiring a level of skill and time commitment that should quantify you as 'qualified'.
Recent Tournament Success allowing Suspect Qualification
This kind of ties into two ideas - ladder tours and other tournaments as a whole. Ladder tours are fairly simple in practice: you create an alt and ladder with it during a timeframe where other players who wish to participate in the tournament are also doing so. At the end of a defined cycle (typically one week increments), the top 'x' (6, 8, 12, etc.) players in ELO from that cycle are then put into a standard tournament bracket where they play similarly to how other tournaments operate. Ladder tours do not have any prerequisite for GXE (besides tiebreaks) or any level of skill-qualification beyond using ELO, which doesn't really quantify skill. There will be people who play significantly more games than others are able to and they will qualify because they happened to win enough games to get a high ELO. There will similarly be people who have GXE's that establish themselves as high-quality players who do not qualify during any cycle because they can't/won't commit the same level of time to grind out the number of games needed. Giving people with the time to play more games a vote in suspect tests doesn't seem practical if the goal of the suspect tests is to filter out the less-qualified opinions.
My opinion on ladder tour results as a basis for suspect requirements is simple: they shouldn't be considered unless they run concurrently with the suspect test and the player qualifies for the suspect requirements. Ladder tours generally reward people who play a lot of games far more than they sift out the players who's opinions have merit (in the context of suspect voting), so allowing one alt to qualify for both is acceptable if done at the same time in my opinion.
One thing Feliburn brought up was using previous ladder tour accounts to qualify for current suspect tests, which I think is bad because, in regards to the specific test he was bringing up, the ladder tour cycles 1/2/3/4 all had totally different meta's based on other tiering action that occurred at the time and people who qualified in different cycles probably had different opinions based on what they felt at the time of their playing. Cycle 1 ended in mid-February but the vote for this suspect occurred in mid-March, for reference.
Tournament success, on the other hand, has legitimate merit for this conversation... but also some considerable downsides. For reference, the most recent tournament that will come to mind for most is Smogon's Grand Slam, which has separate tournaments for each of the official lower tiers like UU/RU/NU/PU that reward points based on placement and converge into a top-cut bracket similar to ladder tours. These tournaments are often home to many players of varying skill levels and familiarity with the tiers they're playing - 'tour mainers' will sign up for each tier and try to do well enough in any of them to qualify for the playoffs, usually relying on teams built by others they know or using samples, while 'tier mainers' will probably stick to the tiers they know and maybe dabble in some others for fun, probably using their own teams.
The top 24 or so players in each Open (a general figure based on a recent survey performed in RU where Top 24 in RU Open was considered qualified) will be great players with established knowledge of competitive Pokémon... but they aren't necessarily qualified to vote on specific tiering actions because many of them will likely be 'tour mainers' who are... just good at Pokémon. This standings sheet that Grand Slam uses shows what I'm talking about pretty well - look at the number of players at the top of the standings who have good records in all of the tiers. Again, people who are really good at Pokémon will do really well in these types of tournaments, but they likely aren't knowledgeable about the specifics of the individual tiers enough to vote in a way that promotes actually balancing a tier. Inviting more votes from these highly-skilled players seems reasonable, but unless they spent time to learn the tier well enough to have meaningful opinions, probably doesn't promote healthy outcomes.
Closing Thoughts
My personal feeling regarding this entire topic is simple: how do we incentivize the people who want to contribute to a tier to commit to doing so while still promoting a level of inclusiveness that allows for the votes to matter? To me, winrate accomplishes this as much or greater than GXE does in this capacity. You still have to play 30 games minimum in most GXE-driven suspect tests, so the qualified player-base isn't really playing that many more games, which ultimately shouldn't skew the outcomes. Lower tiers have pretty unfriendly ladders at times and using GXE seems more exclusive than it does inclusive.
I remain open to other alternatives and am hoping that this post, along with the recent trial run in NU using winrate over GXE, spurs some kind of action in the community. I hope that we collectively can decide on something that promotes healthy tiering across the board and makes the game more enjoyable. I look forward to any responses from people who bothered to read this all and I thank you all for your time.
Several threads have been generated regarding the topic of suspect tests across the forum as a whole, with many pertaining to specific tiers, proposals, and ideologies. This topic has been gone over many times and was acknowledged at some point recently(?), with lower tiers like NU and ZU accepting revised suspect test requirements (80% winrate over a minimum of 32 games won), though UU, RU, and PU did not subscribe to this change in their suspect tests performed in May/June of this year.
This post serves to cover this new 80% winrate idea, address current issues with lower tier suspect test requirements, and hopefully standardize some approach across all lower tiers moving forward - with regards to the current gen, at least. Some conversations have tried to steer the requirements in the direction of accepting recent tournament play, including ladder tours, which seems practical but could also invite some unintended consequences I'll try to address below.
80% Winrate over a minimum of 32 games
The first thing I'll address is something that appears to have been addressed already but, to my knowledge, isn't really public information - the 80% winrate idea. This seems pretty self-explanatory but it requires you to play a minimum of 32 games on the ladder, using your freshly made suspect alt, at which point you can qualify for the suspect test by having a winrate of 80% or greater - 32-0, 32-8, 40-10, etc. In my opinion, this seems like a better alternative to the current GXE requirements system because you can't really dictate who you end up playing against in lower tiers and will likely get stuck playing the same people multiple times throughout your suspect run. On top of this, you also will likely play people who are lower on the ladder and have a lower GXE and ELO, meaning wins won't do much for your cause but losses will set you back significantly - see attached from my recent RU suspect run where I was 27-3 and still far below the 82 GXE requirement at 30 games. In fact, look at this kind of problem I had all throughout my run - 26-3 was 80.1, 27-3 was... 80.2. I had a game between games 19 and 20 where I won and did not receive a rise in GXE at all, which I didn't know was mathematically possible at such a low quantity of games, but I digress.
All of this is to say that GXE, despite being the generally superior system for evaluating relative skill, is not without flaws in certain situations such as these lower ladders. To rectify this, a winrate like the proposed (and tested) 80% seems to have some genuine merit. NU had 19 votes on their trial suspect test using this system, one shy of the vote that occurred in October of last year and six more than the vote in July of last year. Similarly, ZU had 19 votes on their recent suspect, a whopping twelve more than the last suspect they ran in December of last year and five more than the suspect run in September of last year. These figures demonstrate a pretty favorable result so far, allowing more users to get suspect requirements without, in my estimation, making the requirements too easy. I lack a quantifiable way to acknowledge the quality of players voting on these suspects, but winning 80% of your games across 32 or more feels like it requires a level of time commitment and skill to accomplish.
The downside to this system is obvious - you can play 32 fresh, 1000 ELO players across your run who are using entirely unviable teams and have little understanding of the mechanics of Pokémon as a whole. This definitely isn't a likely outcome but the thought of playing 32+ very low ranked players and getting requirements still exists and will be used as an argument against this idea, reasonably so, though it is more likely on lower ladders that a handful of committed players will be active enough to force the person going for suspect reqs to play well enough to earn them. Another argument against this system is simply the quantity of games needed - if you play well enough against highly rated opponents, you can get a GXE of 80-82 or greater in 25-30 games, which is what the current system often requires. However, the new system requires a minimum of 32 games, and losses are proportionately more impactful than wins, regardless of opponent. For instance: if you create a fresh alt for the suspect and win your first 20 games, you're off to a wicked start. If you then encounter higher-ladder players and go something like 10-10, you're now at a record of 30-10, meaning you're just shy of the requirement and most likely stuck in the awkward middle-ladder stage of playing against any sort of opponent. If you're 'lucky', you can farm a few wins against low ladder opponents. If you're 'unlucky', well, you might lose a few more against the top players on the ladder and be facing pretty unfavorable odds of getting that 80% on this account. The difference between 30-10 and 32-12 is enormous with regards to a winrate-based system.
Ultimately, this system is still favorable to me over the current GXE requirement because it rewards people who are willing to play ladders that aren't the friendliest in regards to matchmaking at times while still requiring a level of skill and time commitment that should quantify you as 'qualified'.
Recent Tournament Success allowing Suspect Qualification
This kind of ties into two ideas - ladder tours and other tournaments as a whole. Ladder tours are fairly simple in practice: you create an alt and ladder with it during a timeframe where other players who wish to participate in the tournament are also doing so. At the end of a defined cycle (typically one week increments), the top 'x' (6, 8, 12, etc.) players in ELO from that cycle are then put into a standard tournament bracket where they play similarly to how other tournaments operate. Ladder tours do not have any prerequisite for GXE (besides tiebreaks) or any level of skill-qualification beyond using ELO, which doesn't really quantify skill. There will be people who play significantly more games than others are able to and they will qualify because they happened to win enough games to get a high ELO. There will similarly be people who have GXE's that establish themselves as high-quality players who do not qualify during any cycle because they can't/won't commit the same level of time to grind out the number of games needed. Giving people with the time to play more games a vote in suspect tests doesn't seem practical if the goal of the suspect tests is to filter out the less-qualified opinions.
My opinion on ladder tour results as a basis for suspect requirements is simple: they shouldn't be considered unless they run concurrently with the suspect test and the player qualifies for the suspect requirements. Ladder tours generally reward people who play a lot of games far more than they sift out the players who's opinions have merit (in the context of suspect voting), so allowing one alt to qualify for both is acceptable if done at the same time in my opinion.
One thing Feliburn brought up was using previous ladder tour accounts to qualify for current suspect tests, which I think is bad because, in regards to the specific test he was bringing up, the ladder tour cycles 1/2/3/4 all had totally different meta's based on other tiering action that occurred at the time and people who qualified in different cycles probably had different opinions based on what they felt at the time of their playing. Cycle 1 ended in mid-February but the vote for this suspect occurred in mid-March, for reference.
Tournament success, on the other hand, has legitimate merit for this conversation... but also some considerable downsides. For reference, the most recent tournament that will come to mind for most is Smogon's Grand Slam, which has separate tournaments for each of the official lower tiers like UU/RU/NU/PU that reward points based on placement and converge into a top-cut bracket similar to ladder tours. These tournaments are often home to many players of varying skill levels and familiarity with the tiers they're playing - 'tour mainers' will sign up for each tier and try to do well enough in any of them to qualify for the playoffs, usually relying on teams built by others they know or using samples, while 'tier mainers' will probably stick to the tiers they know and maybe dabble in some others for fun, probably using their own teams.
The top 24 or so players in each Open (a general figure based on a recent survey performed in RU where Top 24 in RU Open was considered qualified) will be great players with established knowledge of competitive Pokémon... but they aren't necessarily qualified to vote on specific tiering actions because many of them will likely be 'tour mainers' who are... just good at Pokémon. This standings sheet that Grand Slam uses shows what I'm talking about pretty well - look at the number of players at the top of the standings who have good records in all of the tiers. Again, people who are really good at Pokémon will do really well in these types of tournaments, but they likely aren't knowledgeable about the specifics of the individual tiers enough to vote in a way that promotes actually balancing a tier. Inviting more votes from these highly-skilled players seems reasonable, but unless they spent time to learn the tier well enough to have meaningful opinions, probably doesn't promote healthy outcomes.
Closing Thoughts
My personal feeling regarding this entire topic is simple: how do we incentivize the people who want to contribute to a tier to commit to doing so while still promoting a level of inclusiveness that allows for the votes to matter? To me, winrate accomplishes this as much or greater than GXE does in this capacity. You still have to play 30 games minimum in most GXE-driven suspect tests, so the qualified player-base isn't really playing that many more games, which ultimately shouldn't skew the outcomes. Lower tiers have pretty unfriendly ladders at times and using GXE seems more exclusive than it does inclusive.
I remain open to other alternatives and am hoping that this post, along with the recent trial run in NU using winrate over GXE, spurs some kind of action in the community. I hope that we collectively can decide on something that promotes healthy tiering across the board and makes the game more enjoyable. I look forward to any responses from people who bothered to read this all and I thank you all for your time.