Partially Implemented SV OU Suspect Reform

3d

the bobby fischer of pokemon
is a Tutor Alumnusis a Top Team Rater Alumnusis a Forum Moderator Alumnusis a Tiering Contributor Alumnus
RBTT Champion
So as finch alluded to in the mess of a kyurem suspect thread, there are ongoing discussions about changing voting reqs and I would like to weigh in with a proposal: adding 2 gxe points to the current system, allowing reqs to overlap and possibly implementing a game & alt maximum.

The gxe requirement seems most logical and clean cut to me. Using Excel I found out that the average GXE was about 78.3% on the top500 ladder. Keep in mind lots of people are playing casually, memeing, and in general clicking hard and the average is still 78.3%. Yes there are high outliers in the high 80s low 90s but there are also outliers in the high 60s low 70s. Either way, the sense I and most people I've talked to or seen discussed this matter agree that consistency in SV ou is easier now than the start of the gen. Yet the reqs requirement remains the same, why? It would make sense to increase it slightly and I find 2 to be the magic number as 1 is too slight and 3 begins to teeter on excessive.

Game limits used to be a part of suspects but were removed, while I don't feel strongly about them they are something possibly worth revisiting. Lastly, alt limits are usually of no issue but there are people who make several accounts to try and get reqs and when they do end up getting it this seems wrong. People will often say they aren't getting reqs because they don't have enough time but what happens to someone who only gets reqs because he has too much time despite not being that good. After 15 or so attempts you can eventually get on a lucky streak to get an 80 gxe. Sure you can say that guy earned it, but I can say I that it was bound to happen statistically. A suspect signup with 3, or however many alts, related to your name like OLT would completely solve this issue, but might end up being more trouble than it's worth still I'm throwing it out there.

Finally, I think suspect tests should carry over if they're within, say, 2 weeks of each other. The main reason is that people don't get rusty in just two weeks of not playing, and most people who get reqs are involved in the tier and playing at least semiactively anyway. While not having a policy on the surface makes it fairer for everyone, a policy like this further incentivizes people to participate in suspects and, possibly though not guaranteed as it could happen later down the line anyway, rewards them for doing so.

I feel most strongly about the gxe requirement and the reqs carry over and would urge the council to increase the requirements and at least entertain the other limitations I have listed.
 
Last edited:
I am ok moving GXE up for suspect tests if there is support for it. I would prefer trying 81 before 82 as to not move too much initially, but I will defer to people in this thread, my council, and tiering circles before drawing any conclusion. I do think reqs are easier now than they used to be, making it a reasonable time to discuss movement.

I do not think suspect reqs should carry over between suspects due to tiers changing over time and with potential bans. We have seen metagames change drastically between 1-2 suspects after release periods or around tournaments like OLT and WCoP. While it may be convenient, I do not think someone who qualified with the minimum on a suspect early in a month is always qualified to vote on a suspect a few weeks later that has a vote the end of that month, which is roughly as close as it can be given how long suspects run.

Alt limits are probably harder to implement and I do not care much for or about. No comment there.

I hope others share their experience and thoughts!
 
I have no comment about the difficulty of OU reqs and if they should be raised. Obviously raising the GXE requirement would raise the difficulty of reqs.

As long as reqs are GXE based, a maximum games limit (per account) is not productive. There is nothing easier about getting 80 GXE at 1,000 games than getting 80 GXE at 40 games. If anything, the 1,000 game reqs are more legitimate since the error is lower and were more confident that your true GXE is 80. The only way your GXE trends up with more games played is if you got unlucky early and your true GXE is higher than it currently is, in which case, that’s not something we want to filter out.

Limiting the number of suspect alts would work but is logistically challenging. Realistically, we would need people to sign up with their alt(s) in the suspect thread before they begin laddering, which adds another step for tier leaders. This step is also time sensitive as once games have been played, it’s difficult to validate they were signed up with first. Anything PS side to validate how many alts someone used would be challenging to implement, as creating alts is very very easy and would likely still be manipulatable by bad faith actors. We would likely also end up disqualifying several legitimate req getters for not following procedure.

Carrying over reqs I generally think is a bad call; metas change rapidly, and a few months apart even in a stable late-gen meta can be wildly different. If we’re in a situation where two suspects are being run in two weeks, the meta is probably very unstable as well. Especially since your goal here seems to be have more informed voters, letting people use reqs from previous metas seems counterproductive.

In short, if you want to raise reqs, raise GXE, don’t implement a maximum games limit, and maximum number of alts is probably more trouble than it’s worth.
 
Personally I feel that implementing COIL is worth a shot in OU as well, since you are able to specify the parameters to suit the needs of the suspect test. More importantly, the relationship between GXE and number of games played in COIL is exponential, while that in the current reqs system is linear. In other words, a lower GXE requires a more than proportionate number of games played and hence making the suspect test more difficult. In the excel sheet provided by UT in the COIL thread, a COIL value of 3140 (equals to a GXE cutoff of 3140/40 = 78.5%, the average GXE of top 500 ladder) and a B value of 3 yields the following results:

1726818969293.png


Even if you went undefeated, realistically it is near impossible to hit more than 84% GXE in less than 30 games, and as UT mentioned earlier having 80% (in this example 78.7%) GXE in 1000 games is more legitimate due to the smaller error. Understanding the math behind COIL is not even necessary, all the public needs to know is to hit the specified COIL value in order to qualify.
 
I may be wrong, but I believe that relying so heavily on a GXE-dependent suspect test model may not be the most effective solution. Specifically, when it comes to OU, I find it absurd that potentially reaching just 1650 ELO can qualify someone to have a say in the tier's future (the average ELO in suspect test is usually around 1600-1700), regardless of their GXE. In my opinion, GXE is not necessarily a measure of skill or consistency but rather reflects the time it took to reach a certain goal. Someone who finishes a suspect test with 80 GXE instead of 84, at least in my view, is simply a person who took longer to achieve the required qualifications.

I don't want to dwell on this too much since there isn’t a perfect solution, but if any ideas are needed, here are a couple of suggestions:

  1. Set a minimum ELO requirement to reach, either related or unrelated to GXE. If you have a certain level of competence, you should be able to climb to at least 1800-1900 ELO (if not higher...).
  2. I'm not sure if this would be ideal, but I remember several years ago, during the SM OU era, tour requirements were implemented ( https://www.smogon.com/forums/threads/meet-the-overused-tiering-council.3598286/#post-7754114 ), and I don't understand why they were removed or why they're now only used in Old Gens. I think this approach could reward the playerbase that participates in tournaments and bring a sense of realism and credibility to suspect tests, which don't seem as popular as they once were. We can try to bring back tour requirements alongside usual ladder system to get requirements (I don't see why someone who goes x-0 in spl or top cut olt should not vote unless they do the suspect but someone who get reqs doing 1650 elo should vote).
Thank you for your attention. I just want to emphasize that these are merely ideas and, as such, they can be refined and may have flaws that need fixing.
 
Last edited:
I'm not sure if this would be ideal, but I remember several years ago, during the SM OU era, tour requirements were implemented ( https://www.smogon.com/forums/threads/meet-the-overused-tiering-council.3598286/#post-7754114 ), and I don't understand why they were removed or why they're now only used in Old Gens.
Tournament reqs are needed for old generations as they do not have the same active ladder scene that OU or any current generation tier does.

These tiers evolve at a slower pace than CG tiers and the way they change tends to be prompted by the big tournaments themselves rather than ladder activity. Ladder reqs for old generations tend to be a hard sell — just look at the last two BW tests where people cheated in droves to try and get reqs in the first and only few people got it the second time, for example. The ladders aren’t active enough to make standardized reqs easy or convenient like SV OU.

But for CG tiers tournament qualifier reqs were deemed undesirable due to the fact that people playing a handful of games months ago did not necessarily qualify someone for the current metagame when the suspect was ongoing. We received countless complaints about people qualifying because they played X tour (I.e WCOP or SPL), but didn’t play during the suspect just a few weeks or a month later, and then there are also complaints about what truly qualifies someone as well (which also exists in old generations, but at least they reached a point of uniformity).

If there was a way to do this without it devolving into disagreements about current players, quality of qualifiers, etc., it would be an easier sell, but I worry history will repeat itself on a lot of fronts.
 
Lastly, alt limits are usually of no issue but there are people who make several accounts to try and get reqs and when they do end up getting it this seems wrong.
You mention how people can get on a lucky streak and get in by making quite a few alts; however, the reverse can also be true. If you limit the number of alts someone can make to get reqs, then you are going to end up denying players who just get unlucky their first few times. I strongly oppose any count limit on alts.

Otherwise, I am all for making suspect reqs more stringent. The other suggestions all seem fine to me.
 
The combination of putting a cap on games played AND raising the required GXE AND attempting to put some limitation on alt spamming has the effect of denying a massive swathe of players the opportunity to earn voting requirements.

If my "true" skill level was 82 GXE (it's not, I'm nowhere near that good) then I'd need to avoid better players starting out in my laddering, and catching one or two would simply end that alt's attempt - can't make it up by playing enough games for luck to even out, and can't expect to go on a huge winning streak because I'm not significantly better than others I'll be matched up against as I climb. The combination of the three has the effect of requiring either much higher "true" GXE than the limit, or getting lucky.

I oppose all three proposals separately, as well; raising the GXE floor just means I'd need to improve even more before having a chance; capping the allowed number of games, as mentioned by others, actually increases the effects of luck; and capping the number of alts is punishing to anyone who loses a few games early, whether to facing better opponents or to RNG.

If the goal is to minimize the ability of a marginal player to luck into requirements, just increasing the number of games required for lower GXEs would do that. To clarify: instead of 0.2 GXE required per additional game played, set it at 0.1, or 0.15.

Borderline players would require many more games to luck into making reqs (and in that many games, surely they'd have learned more about the tier, making them better informed and more deserving of a vote?), and anyone who could cruise to 84 GXE is unaffected, as the minimum games is untouched.

This would increase the time commitment to make voting requirements, but if reducing the effects of luck is important, then that's a required sacrifice.
 
raising the GXE floor just means I'd need to improve even more before having a chance
That is the point of these proposed reforms, the current suspect process rewards time put in as opposed to actually assessing the skill of the players who get reqs. Of course, it does remove 90% of the idiots in the OU forum, but I think raising the requirements back to the old threshold of 82% would be a welcomed change. Suspect test voting is not a right, and getting 80% GXE on a CG OU ladder is barely a barrier to entry if you ask me.

Alt limits and game limits seem meh to me, I think bumping up the GXE fixes a lot of issues.
 
While we're on the topic, is the 60% supermajority on the table for this discussion?

If the requirements become more difficult and consequently the level of voters is higher, the 60-40 "safety" margin would be too much to change the status quo.

I was expecting something like this in the Gen X preview, but given the opportunity... I'm on the side of 50% + 1 being enough regardless of the "quality" of the voters. But perhaps a middle ground could at least be considered.
 
Another point against an alt account limit that I haven't seen stated is, what happens when you play another suspect test player whos very good in your first 1-10 games? This is a very realistic situation during a suspect test and especially torwards the start of one. Should people just wait until final day so they dont have accounts ruined playing vs people that ELO wise they shouldn't be playing yet? I mean just this supect test I played vs a few 1800-1900 players on suspect alts, while i was in 1000-1300 ELO range on my suspect alt. When accounting for this possibility I think alt limits is an unviable idea imo
 
Last edited:
That is the point of these proposed reforms, the current suspect process rewards time put in as opposed to actually assessing the skill of the players who get reqs. Of course, it does remove 90% of the idiots in the OU forum, but I think raising the requirements back to the old threshold of 82% would be a welcomed change. Suspect test voting is not a right, and getting 80% GXE on a CG OU ladder is barely a barrier to entry if you ask me.

Alt limits and game limits seem meh to me, I think bumping up the GXE fixes a lot of issues.

Raising the floor disincentivizes people from going for reqs, leading to fewer people even trying to improve.

Also, blunt and simple: where is the evidence that reqs are too low, beyond nebulous vibes? What tangible improvements would be made, what are the "lot of issues"?
 
While we're on the topic, is the 60% supermajority on the table for this discussion?

If the requirements become more difficult and consequently the level of voters is higher, the 60-40 "safety" margin would be too much to change the status quo.
The 60% majority is to ensure that there is a clear majority for changing the status quo, it’s not really an insurance against “bad voters.”

There can of course be a discussion on if changing the threshold needed, but it’s not really a factor in this conversation / is probably better suited to as we get ready for gen10.
 
Just for reference I took a look at some numbers based on a normal distribution of gxe ratings basically took a look at the og gxe formula and a sample glicko distribution from lichess
Assuming deviation doesnt matter (because higher deviation is taken care of by the game count requirement):

80% GXE hits a little higher than 81% of the playerbase
82% GXE hits closer to 83% of the playerbase

All you are really doing with changing this is excluding 2% of the playerbase, so out of the current 19% or so of players that will achieve reqs you are removing about 1/10 people.

However this is probably wrong as pokemon is a game with higher variance, which would lead to a tighter rating distrubution than chess(meaning less people at 80%+ and a bigger relative change for increasing reqs) so idk maybe if I had the real ladder gxe distribution but I'm not sure exactly how to find this. Suffice to say 1/10 is probably a low estimate
 
If people are concerned that suspect test requirements are too easy, adopting ELO as the primary metric of skill seems like the clear solution. I'm surprised that only Pais has suggested it so far. After skimming through the latest suspect thread, I noticed that some players qualified at around the low 1600s. This means they played most of their games in low ladder, which doesn't seem very reflective of understanding the metagame where the suspected pokemon is at its most impactful, especially since 1600 ELO isn't even top 500.

On the other hand aiming for something like 1900 ELO would provide a more significant and reliable indicator that the player is well-versed in the metagame. Reaching that level not only demonstrates skill but also shows that the player has likely faced the suspected pokemon more frequently, allowing for a more informed judgment.
 
If people are concerned that suspect test requirements are too easy, adopting ELO as the primary metric of skill seems like the clear solution. I'm surprised that only Pais has suggested it so far. After skimming through the latest suspect thread, I noticed that some players qualified at around the low 1600s. This means they played most of their games in low ladder, which doesn't seem very reflective of understanding the metagame where the suspected pokemon is at its most impactful, especially since 1600 ELO isn't even top 500.

On the other hand aiming for something like 1900 ELO would provide a more significant and reliable indicator that the player is well-versed in the metagame. Reaching that level not only demonstrates skill but also shows that the player has likely faced the suspected pokemon more frequently, allowing for a more informed judgment.
I mostly agree with what you said, but I personally think it would take way too long, and good players or tournament players probably won’t bother at this point. For balance users, it could take 8 hours or more (I don’t get how you can play balance stuffs for suspect reqs but whatever), and hitting 1900 is still a lot if you have to reach it from 1000s everytime there’s a suspect. That being said, this is definitely the best idea I’ve seen so far (I didn’t even notice Pais mentioned it…). Also, laddering takes a lot of practice to get good at, and there are solid players stuck around 1775/1800 who definitely deserve the right to vote. I still support your idea, though.
 
Last edited:
I, too, am open to boosting the minimum GXE up a bit in order to raise the floor. I agree with Finch that jumping to 82 is too much too quickly; if the needle still needs to move in a direction, then I think we can reevaluate later.

That being said, I have always been a big proponent of giving less experienced users and players the opportunity to be involved with our process and grow as players/contributors. While those opinions should not be tier defining, I do think having additional options to attain reqs is a great route, as there are skilled players who don't mesh well with laddering culture due to stamina and time-related issues. I mentioned internally that I think some vein of suspect tournament would be a decent angle, and although that's been debated a decent deal, I think having ELO or COIL as a potential set of alternatives would also be a great way to leverage competent players/metagame thinkers who struggle with stamina while not lowering the skill floor per se. I think people overvalue GXE relative to reqs and in general given how testing teams is standard culture. I can't tell you how many people I've seen tout 70 GXE or whatever as a sign that players suck, when literally that can be influenced by a myriad of factors beyond just winning and losing.

I think ELO as a metric in general is underrated, since your ranking will often plateau relative to your experience. A lot of people around 1600 or so tend to be at least familiar with the metagame, as it's this point where players begin to actually engage with it beyond just it being the highest usage tier. Once you hit the 1700s - 1750s, you start to tread into high ladder where people are beginning to show experience, and then once you hit the 1800s/1900s I'd argue you're at a point where you know what's going on without question. As is, the GXE system is efficient and a decently objective metric for testing player skill and stamina, but I would argue is not really an amazing way to test for experience and moreso rewards players who can win in long stretches and who are lucky enough to run into more inexperienced opponents. This is also why I think the current GXE system is fickle, since your runs tend to end right around where the ladder starts to get serious.

I also think it's telling that there are strategies around getting reqs that in part have to play around random bullshit in low-mid ladder and running into opposing reqs-getters. In fact this is literally what I've done to get reqs for the past 2 years. Across my dozen or so votes that I've done, I've literally designed teams that are honestly pretty bad from a holistic pov and mostly excel at farming low ladder before pivoting to a ladder farming team, which to me just feels bad and undermines the point of reqs in the first place. Having alternate systems that utilize methods like COIL or ELO focus specifically on your experience with the tier by measuring where you are relative to the average player with middle of the road experience. The main issue I personally see is time and availability issues, but that's why we're here to discuss what that sweet spot would look like; it's also why I favor the idea of not outright abandoning the GXE system and having multiple ways of qualifying, since a major appeal of it is its time efficiency, while other systems focus specifically on your skill under higher level pressure.

TL;DR: I'm fine with increasing the floor of GXE a bit to make it more effective at weeding out less experienced players, but I think GXE as a sole measurement for reqs is not super desirable both because it can punish experienced players with a different set of limitations to other experienced players, and because GXE can be a fickle way of measuring metagame knowledge. It's important to provide more options than to take, so long as we're deliberate about what those options are as to keep reqs competitive and test adequately for metagame knowledge.

Also people who unironically think there should be alt limits: respectfully, please abandon that argument. There's no real way to measure who hits the limit and who doesn't, and it just seems like it punishes people who get dealt a bad hand as opposed to people who shouldn't be voting anyway. These players will have their accounts fumble around when they become unsalvageable regardless.
 
If people are concerned that suspect test requirements are too easy, adopting ELO as the primary metric of skill seems like the clear solution. I'm surprised that only Pais has suggested it so far. After skimming through the latest suspect thread, I noticed that some players qualified at around the low 1600s. This means they played most of their games in low ladder, which doesn't seem very reflective of understanding the metagame where the suspected pokemon is at its most impactful, especially since 1600 ELO isn't even top 500.

On the other hand aiming for something like 1900 ELO would provide a more significant and reliable indicator that the player is well-versed in the metagame. Reaching that level not only demonstrates skill but also shows that the player has likely faced the suspected pokemon more frequently, allowing for a more informed judgment.
This is my stance as well. Even if just say hitting 1900 is "easier" due to requiring a less perfect record, you actually have to fight a few decent players on the way and you are more likely to see teams that are indicative of the real meta. At least half of the suspect test in the gxe method is just wading through unserious teams and random players, I don't see how that proves any real understanding of the tier (which is what reqs should be based on). During the gouging suspect I fought 5 monotype teams and 2 hard trick rooms. I won easily, I don't think that kind of play proves anything.

Additionally, I agree with what ausma said on more options is better than less, as well as metagaming low ladder being undesirable but effective in the system. I spammed balance with bulky chip mons that weaker players struggle to play around to farm low ladder. I did not use my main teams because they are less consistant into off-meta cheese. That should not be the standard.
 
i have skipped the majority of suspects this gen as i often find them frustrating and incredibly time consuming. gxe at low elo battles is a nightmare as one early loss (pretty much any game before 20) means you are probably better off tossing the account and this can happen due to luck or inattention pretty easily. i dont really see what raising gxe requirement does, gxe is mainly gained by luck and patience and all you’re doing is eliminating a lot of 28-2 runs for fun. this system already doesn’t require skilled players and doesn’t expose players to the relevant metagame as you will spend most of your run fighting PU pokemon, raising gxe doesn’t make it “harder” or cut off new players it just makes it more frustrating for everybody.

my preferred solution is absolutely something based around elo or coil, preferably elo. elo would reduce the need to toss accounts with poor starts and force players to actually engage with the tier. the current system can create a short suspect process but often leads to a bunch of wasted time with bad gxe pulls or early losses early to luck. elo would let you ride out poor starts and bring riskier teams that create faster games.

i think hitting 1850/1900 elo is perfectly reasonable for skilled players, not sure if there should be an additional gxe/coil requirement but i lean towards no. there’s definitely also a way to allow people who have been above that cutoff for a long time save some time if they remain around there, this could be tracked via surveys, but that’s a different discussion i suppose.
 
Possibly just echoing some of the points above, but it is the best way to push something through

Using GXE in the way we do for suspect seems inherently flawed - its meant to be a prediction of winrate vs a theoretical average player, but its clearly meant to be used over a large sample size and the idea is especially broken when you're facing top level opponents on fresh 1000 suspect accounts, as you will during the suspect process. The idea is predicated on 1) facing a diverse range of opponents from low to very high skill and 2) those opponents actually playing on accounts reflecting their true ranking, so the fact everyone is on reset accounts defeats the entire purpose.

We end up in a situation (especially in old gens here tbh) where you just need to avoid fellow top players for your first 20 or so games because any loss to someone at 1000 or just above just curses your account. You end up rewarding people who can put the time in to run multiple accounts and find a dream run, and this isn't doesn't always overlap with the people you want to be voting.

I'm a strong advocate for having a high ELO requirement and a lower GXE requirement. GXE over a small number of games is meaningless and makes the suspect process needlessly match-up reliant in the early stages.

I would love to see this adopted for SV, and if successful, porting across to old gens also.

(I'm also an advocate for reqs carrying over in some instances, like sometimes it is just common sense but not rly going to die on this hill rn)
 
Last edited:
The problem with gxe reqs is that they essentially measure consistency on low ladder. If you have a gxe below the cutoff with a low rating deviation then raising your gxe is harder than making a new alt until you get reqs as early as possible, which is essentially the problem being posed currently. Coil gets around this by allowing lower gxe rating across a larger sample size which is not whats currently desired. Introducing an elo requirement alongside gxe reqs measures both consistency and ability to reach high ladder which is probably a better test of skill (Gxe is more useful here as its sample size is how long it takes you to hit the elo treshold. If you hit it very quickly your gxe is likely very high and the longer it takes you will likely be closer to the floor). The difficulty is in where you set the floor for each of these.

Because raising gxe with a low rating deviation is more difficult than maintaining a gxe above the threshold with a low rating deviation early resetting is still incentivised if you set the threshold too high which I personally just think is kind of tedious and not really helpful in determining someones competency, but if its too low then it essentially becomes a pure elo requirement which can be brute forced a bit too easily over a large sample size. My guess is that allowing slightly lower gxe ratings than what's allowed currently is probably fine, or you could potentially allow lower gxe ratings at a higher elo floor (ie 80% at 1800 or 78% at 1900).

The elo requirement would probably need to be set at a certain ladder cut off (ie top 20, top 100). I'm unsure whether its better to take this from immediately before the test begins or from a previous suspect ladder, the latter closer simulates the activity levels but presumably would just rise every time as people are trying to hit a specific elo floor so is unfeasible. Wherever the cutoff lies is somewhat arbitrary but it could be adjusted based off how it functions in practise until the right balance is found.

Increasing the minimum games with gxe could also work but it's basically the same as the current system except you have to play more games which doesn't seem like it'd appeal to most people.

Ultimately suspect tests are a kind of weird intersection between the tours community and more casual playerbase regardless of how they function and you will never have a perfect system but I do think moving away from reqs being basically solved as resetting until you get the perfect run would be positive and I think this is the only real way you can accomplish this.
 
I still don’t quite understand why everyone has to settle on one method only. Being able to use multiple methods, such as GXE or ELO or COIL, all with somewhat elevated difficulty requirements, always seemed like a neat idea to me. Let’s you get reqs in the best way for you personally, and the difficulty is high enough so as to ensure the pool of voters is competent.
 
Both GXE and Elo have their merits as potential requirements, for different reasons. GXE is a good indicator for a player's consistency, but not necessarily higher ladder experience. Conversely, Elo is a good indicator for higher ladder experience, but not necessarily consistency. I see a COIL requirement as the best option, because it combines the positive elements of GXE and Elo in one system.

For those unfamiliar with COIL, it is calculated by using three variables. Two of these variables are in the player's control, namely GXE and the number of games played. A higher GXE and a higher number of games played both improve the player's COIL (in a similar way as you'd go about raising your Elo, but with some key differences). The third variable is the B value, which is a parameter the council would pick before the suspect test. The higher the B value, the more the number of games played improves COIL. Most of the issues raised in this thread can be addressed by using COIL and picking the right numbers for both the COIL required and the associated B value.

Issues with GXE reqs
  • Players can get reqs with just 30ish games by farming low ladder, with Elo in the 1600s, without any higher ladder experience.
  • Players can get reqs by creating numerous new alts and going for a lucky winstreak.
Using COIL, players are rewarded for playing a higher number of games and subsequently breaching the higher ladder (the pros of using Elo). This makes resetting by creating new alts counterproductive. If these issues were to persist, they can be addressed by raising the B value, which increases the weight of a higher number of games played, as opposed to GXE.

Example (B value of 4, 3100 COIL required):
jZlc7PR.png


Issues with Elo reqs
  • Inconsistent players could get reqs even with low GXE by spamming games and getting a lucky winstreak (by using cheese strategies, for instance).
  • The minimum number of games played would be higher than before and could be regarded as tedious.
As COIL calculation uses GXE, a certain baseline of consistency is required (the pro of using GXE). A high number of games played improves COIL, but inconsistency is still punished. While it's bound to be harder than when using just GXE, it's still possible for talented players to get reqs with a relatively low amount of games. If these issues were to persist, they can be addressed by lowering the B value (and increasing the COIL requirement), which increases the weight of GXE, as opposed to the number of games played.

Example (B value of 3, 3140 COIL required) (same example used by shooting star):
Dp2Gwv6.png


I am partial to using COIL because it combines pros of both GXE and Elo and grants the council a lot of flexibility with the ability to tinker with both the COIL requirement and the associated B value to ensure a proper voterbase. The key here is to find a sweet spot and pick the right values for the job. (These values can be decimals, by the way.) Because COIL combines elements of both GXE and Elo (and even uses GXE directly in its calculation) and even eliminates some of their issues, I don't believe it's necessary to combine COIL with other methods.
 
Both GXE and Elo have their merits as potential requirements, for different reasons. GXE is a good indicator for a player's consistency, but not necessarily higher ladder experience. Conversely, Elo is a good indicator for higher ladder experience, but not necessarily consistency. I see a COIL requirement as the best option, because it combines the positive elements of GXE and Elo in one system.

For those unfamiliar with COIL, it is calculated by using three variables. Two of these variables are in the player's control, namely GXE and the number of games played. A higher GXE and a higher number of games played both improve the player's COIL (in a similar way as you'd go about raising your Elo, but with some key differences). The third variable is the B value, which is a parameter the council would pick before the suspect test. The higher the B value, the more the number of games played improves COIL. Most of the issues raised in this thread can be addressed by using COIL and picking the right numbers for both the COIL required and the associated B value.

Issues with GXE reqs
  • Players can get reqs with just 30ish games by farming low ladder, with Elo in the 1600s, without any higher ladder experience.
  • Players can get reqs by creating numerous new alts and going for a lucky winstreak.
Using COIL, players are rewarded for playing a higher number of games and subsequently breaching the higher ladder (the pros of using Elo). This makes resetting by creating new alts counterproductive. If these issues were to persist, they can be addressed by raising the B value, which increases the weight of a higher number of games played, as opposed to GXE.

Example (B value of 4, 3100 COIL required):
jZlc7PR.png


Issues with Elo reqs
  • Inconsistent players could get reqs even with low GXE by spamming games and getting a lucky winstreak (by using cheese strategies, for instance).
  • The minimum number of games played would be higher than before and could be regarded as tedious.
As COIL calculation uses GXE, a certain baseline of consistency is required (the pro of using GXE). A high number of games played improves COIL, but inconsistency is still punished. While it's bound to be harder than when using just GXE, it's still possible for talented players to get reqs with a relatively low amount of games. If these issues were to persist, they can be addressed by lowering the B value (and increasing the COIL requirement), which increases the weight of GXE, as opposed to the number of games played.

Example (B value of 3, 3140 COIL required) (same example used by shooting star):
Dp2Gwv6.png


I am partial to using COIL because it combines pros of both GXE and Elo and grants the council a lot of flexibility with the ability to tinker with both the COIL requirement and the associated B value to ensure a proper voterbase. The key here is to find a sweet spot and pick the right values for the job. (These values can be decimals, by the way.) Because COIL combines elements of both GXE and Elo (and even uses GXE directly in its calculation) and even eliminates some of their issues, I don't believe it's necessary to combine COIL with other methods.
Feel free to correct me if I'm misunderstanding something but it seems to me that most people would approach COIL reqs by looking at the table of gxe required across a sample size of games and treat it the same way they currently treat reqs, which basically means aiming to get reqs within the first 30-50 games and instantly resetting any run that loses within the first 20 or so games. I just fail to see how this disincentivises people from resetting early, I don't wanna fixate on the specific numbers in the above tables too much but if its gonna take me 100 games to qualify at 80% gxe instead of 50 if anything I feel more incentivised to reset early because I now want my gxe to be higher early. If you instead require 80% gxe after 50 games, as is the case now, with a lower threshold after 100 games, then you're making reqs easier over a large sample size which doesn't seem to be desirable based off people feeling reqs are too easy now (and is one of the issues that people cited with COIL in the past). With that said the examples in your post would make reqs more difficult while being slightly more forgiving than just raising the gxe threshold and keeps the time requirement at a pretty similar level, so isn't necessarily a bad option, it's just functionally almost the same as the current system.
 
This thread is interesting to me because it's not trying to improve suspect reqs, but just make them harder. I symphatize with that cause I've also always held the opinion that suspect reqs should be much harder than they currently are, but as I was thinking about how I'd improve them (or even why I think that way) I realized that that's an impossible question to answer, since I don't know what the purpose of having suspect reqs even is. You improve a sword by sharpening it, but to know that you need to know that swords are meant to cut things first.

I saw blunder's newest video where he got reqs and copied his team and hopped on the ladder cause i was bored and after getting the reqs i'm even more confused about what the purpose of even having suspect reqs is. The tiering policy framework doesn't answer the question of what suspect reqs are supposed to do and i couldn't find anything anywhere on the forums that does either, so i suggest we do it now.

For this thread to be productive we should determine the answer to the question "What are suspect reqs supposed to do?" and then optimize the suspect test process based on that.

Since I don't believe the purpose of suspect reqs has ever been written down, I'll kickstart the discussion using conjecture. The 3 reasons I feel exist in the zeitgeist as justification for reqs are (feel free to provide more reasons if you feel like I missed any):
  1. to make sure the people voting demonstrate mastery over the tier
  2. to make sure the people voting are familiar with the tier, to be able to think through the consequences of the suspect
  3. to gatekeep the lowest common denominator but allow anyone with a modicum of investment in the tier to be able to vote
1. If it's supposed to demonstrate mastery, then it fails miserably at that, since a cursory glance at other posts in the suspect thread show me that most people (myself included) ended up around 1600-1700 with a few outliers at the low 1700s. If your spl teammate said he was gonna test his SV OU team on the ladder and you saw him playing at 1600 you would think he lost his fucking mind. No real Pokémon is getting played that low. And even if it was, you're only there for the last 5 or so games, the bulk of your games is even lower.

2. If it's supposed to be to make sure the people voting are familiar with the tier, then it fails even harder, cause the majority of your games happen in a separate realm from the tier known as SV OU, as you and I understand it. I wish i had saved all my replays to be able to check the exact numbers for everything else, but since i played all my games in a row I noticed some insane stats as i was playing: In 38 games i played against exactly 1 Raging Bolt, 1 Landorus-T, 1 Glimmora, 1 Iron Moth and 1 Kingambit (which was on a monodark team). In no world where you play 40 serious games of SV OU are you walking away with these numbers. I legitimately forgot RAGING BOLT was a mon in the tier cause it took me 35 games to run into one. Frankly what the fuck did you learn about how SV OU plays with or without Kyurem playing 40ish games, 30 of which are against people that run Harvest Sitrus Berry Trevnant on their team?

3. And if it's just supposed to block out the lowest common denominator while still allowing anyone minimally invested to participate in the democratic process, then that seems kinda silly to me, but by all means, let's optimize for that, because it doesn't even do that particularly well. A guy like daddybuzzwole clearly cares about the tier since he posts about it 24/7, but can't make the reqs under the current structure cause GXE is really punishing if you lose a game early, which at that skill level might happen at any time since both players are just clicking random buttons.

As an addendum, I believe Pais and others in this thread are correct and the same solution with slight tweaks, creates the optimal reqs for each of those 3 scenarios: Minimum Elo requirements.

If we decide that to vote you have to demonstrate mastery: Set an Elo requirement high, for example at 1900, and attach to it a minimum GXE, which other people that are better at math than I can calculate. Getting this is pretty hard, so it'll demonstrate that the players that got it are good at the game.

If we decide that to vote you have to demonstrate tier knowledge: Set an Elo requirement high, for example at 1900, without any GXE component. Getting this isn't as hard, but it will require one to play lots of games against real teams, thereby resulting in knowledge of the tier. (still harder than the status quo btw)

If we decide that to vote you need to demonstrate you care about the tier: Set a relatively low Elo requirement, for example 1750/1800 (there are over 200 people above 1750 and over 100 above 1800 on the ladder rn, and those numbers would definitely be higher if these were the requirement for a suspect), without a GXE component. This isn't particularly hard, anyone that can memorize the type chart can do it, but it would take a while and demonstrate a commitment to the tier.
 
Last edited:
Back
Top