Inconsistency in the Suspect Process

Status
Not open for further replies.

ZoroDark

esse quam videri
is a Tiering Contributor
Hxt6Qx2.png

(written up by dcae who should probably get access in case he wants to add something)​

I would like to preface this by saying: I do not want any bans retroactively affected by this change. This is just a proposition for the future.

After the Dugtrio ban in BW OU where a simple majority was used, I noticed that despite Smogon’s general attempts to keep stuff in line with precedent and have consistency, its strange that our suspect process is still heavily inconsistent. In different gens and different tiers, different requirements are used. A quick glance at the blind voting forum shows this:

1. OU – 60%
2. UU - 51%
3. NU - 51%
4. RU - 51%
5. LC - 60%
6. DOU - 60%
7. PU – 51% to ban, 60% to unban
8. BW OU - 10/19 No percentage specified in the OP
9. Monotype 60%

I believe it's ideal for this process to be standardized. There does not appear to be a reason for the variation across tiers and this system can potentially make suspect tests susceptible to outside biases based on the suspect in question.

The recent BW OU suspect test is a strong example of the issue of standardization. Since the last Excadrill suspect used a different process, I took a look at the initial Excadrill suspect test. That one used a 60% suspect vote while the current Dugtrio ban used a 51% suspect vote. Considering they were both for a relevant oldgen that is still heavily played in official tournaments, it is detrimental to the suspect process that two different requirements were used for the same tier.

Another example of this can be seen in LC, where the tests have seen significant inconsistencies:

1. Torchic - 60%
2. Vulpix - 60%
3. ORAS Drifloon - 60%
4. ORAS Diglett - 50%
5. ORAS Diglett + Gothita - 66%.

The complete lack of consistency actually mattered here, as Drifloon received 62% ban votes, which would have resulted in it staying in the tier if it was under the same requirements as the Diglett/Gothita test done before. In fact, LC here changed its requirements in 3 back to back suspect tests. Both these situations highlight the importance of a standardized system.

Personally, I believe we should standardize the 60% vote requirement rather than 51% majority. In a hypothetical situation in which votes are tied at 20-20 in a field of 41, if the simple majority is attained, the resulting ban is not as definite as it would have with a 60% majority. It is possible this could help minimize future unbanning of controversial suspects.

The implementation of a standardized system has several benefits that make it ideal to make this change. It prevents goal post moving, preventing suspect runners to potentially affect the results with their own biases. It provides a consistent framework across the site, something Smogon has historically strived for. It also streamlines the suspect process and can potentially reduce tiering controversies. These benefits make this standardization a desirable change to make.
 

Camden

Hey, it's me!
is a Battle Simulator Admin Alumnusis a Social Media Contributor Alumnusis a Senior Staff Member Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnus
Congratulations, you figured out how to get me in post in policy. All you have to do is bring up LC's suspect history.

First of all, you have the numbers mixed up for LC. The Drifloon suspect was 50% and solo Diglett was 60%. I've always believed that 60% is the best number to use for suspects. The only reason the Drifloon suspect was 50% was because of macle.

Anyway, as tier leader I always believed that 60% was the perfect balance between having a sizable number of people in agreement but not getting carried away with needing too many votes so that nothing gets done. Honestly, simple majority is crap. 50% + 1, 51%, however you want to look at it, is not a healthy way to be deciding major changes for a metagame. It's not that 51% is too small, it's that 49% is way too large of a group for you to ignore in a suspect when the "victorious" group isn't much larger than that. In some suspects, that can be fewer than 10 people.

There really isn't much else to say about this topic. Feel free to argue 60% vs 66% if you want, but 50% just doesn't cut it for me. Also, regardless of what we end up deciding for suspects, whether you're banning or unbanning something, the % used to make a change should be the same. I don't understand why they would be different.
 

pokemonisfun

Banned deucer.
If this percentage is standardized then why not standardize the requirements too? And the length of suspect? And the process of choosing suspects? And the role of the tiering councils?

Suspect tests are not just the vote in PR and the the percentage needed to ban or unban. It should, in my opinion, be a community coming together to discuss what they want their metagame to be.

I agree 100% with the OP that standardization brings the benefits they mentioned. However, keep in mind that having a pluralist attitude towards tiering also has benefits, as the different tiers can see what works and doesn't.

For example, if tiering was totally standardized, UU would never have been able to invent the "kokoloko" method, which is now widely used, to start tiering at the beginning of a generation, where Pokémon would be quickly banned at the start and then later reintroduced. And UU has just recently discovered how game limits for COIL requirements are antithical to their purpose, since it wouldn't let scores converge. No doubt GameFreak will keep changing the game and mechanics, so our tiering system will likely have to keep up the pace as well - for example, much of the complaints in UU this generation stemmed from the fact that GF kept releasing Megas in the middle of the generation which kept shaking our metagame up.

Standardization to some degree is of course necessary, since this is all under the banner of Smogon, so I don't at all disagree with the recommendation to find a common number. I just want us to remember that tiering is much more encompassing than a mere vote, and that having diversity in our methods is not necessarily undesirable.
 
After the Dugtrio ban in BW OU where a simple majority was used, I noticed that despite Smogon’s general attempts to keep stuff in line with precedent and have consistency, its strange that our suspect process is still heavily inconsistent. In different gens and different tiers, different requirements are used. A quick glance at the blind voting forum shows this:

1. OU – 60%
2. UU - 51%
3. NU - 51%
4. RU - 51%
5. LC - 60%
6. DOU - 60%
7. PU – 51% to ban, 60% to unban
8. BW OU - 10/19 No percentage specified in the OP
9. Monotype 60%

I believe it's ideal for this process to be standardized. There does not appear to be a reason for the variation across tiers and this system can potentially make suspect tests susceptible to outside biases based on the suspect in question.

The recent BW OU suspect test is a strong example of the issue of standardization. Since the last Excadrill suspect used a different process, I took a look at the initial Excadrill suspect test. That one used a 60% suspect vote while the current Dugtrio ban used a 51% suspect vote. Considering they were both for a relevant oldgen that is still heavily played in official tournaments, it is detrimental to the suspect process that two different requirements were used for the same tier.
The precedent is giving as much independence as possible to all sections of our site. Fixing small inconsistencies across similar areas isn't a top priority, because every area has different userbases, needs and preferences. Consistency can be neat, but neatness alone isn't enough justification to force everyone to do the same.

Why should OU, a tier in which the list of Pokemon available only changes when new games or the rare event Pokemon are released, have the same standards as UU / RU / NU / PU, tiers that can see major changes on a monthly basis. Why should older gens, that have very inactive ladders but very strong tournament presence, follow the same standards as current gens, where the least active ladder is more active than all ladders from RBY to BW put together.

Forced consistency isn't necessarily a good thing. There are some arguments in favor of keeping things, such as percentage of votes needed across tiers, consistent across tiers, but precedent isn't one of them.

Another example of this can be seen in LC, where the tests have seen significant inconsistencies:

1. Torchic - 60%
2. Vulpix - 60%
3. ORAS Drifloon - 60%
4. ORAS Diglett - 50%
5. ORAS Diglett + Gothita - 66%.

The complete lack of consistency actually mattered here, as Drifloon received 62% ban votes, which would have resulted in it staying in the tier if it was under the same requirements as the Diglett/Gothita test done before. In fact, LC here changed its requirements in 3 back to back suspect tests. Both these situations highlight the importance of a standardized system.

That's pretty much macle's fuckery in 2016. Why? who the fuck knows, but afaik it's an isolated incident and way beyond my time as tiering admin. That aside, I agree that kind of inconsistencies are terrible and I'll make sure that doesn't happen again. However, forcing tiers to be consistent with their % doesn't mean we need to force all tiers to have the same %.


Personally, I believe we should standardize the 60% vote requirement rather than 51% majority. In a hypothetical situation in which votes are tied at 20-20 in a field of 41, if the simple majority is attained, the resulting ban is not as definite as it would have with a 60% majority. It is possible this could help minimize future unbanning of controversial suspects.
It could also increase the chances of a controversial suspect not getting banned, which can force the majority of the voters to play in a tier with a Pokemon they expressed they didn't want in the tier and/or forcing tier leaders to do quick resuspects. 51% isn't as definite as 60%, but you don't risk going against the wishes of 59% of the voters. One option favors the status quo and potentially more stability, and the other the majority of the voters.

Both options have their flaws and merits. Tiers like OU will always favor the status quo, as the tier rarely changes outside of teambuilding trends. Tiers like UU are changing on a regular basis, for example Amoonguss went up last month and that alone suddenly left the tier without one of the best check / counters to some of the metagames biggest threats, so there's very little reason to go with the "pro status quo" option, so simply following the wishes of the majority of voters is preferable.

It's really not a coincidence that the 4 tiers that can see monthly changes are the ones using 50% + 1.

Anyway, as tier leader I always believed that 60% was the perfect balance between having a sizable number of people in agreement but not getting carried away with needing too many votes so that nothing gets done. Honestly, simple majority is crap. 50% + 1, 51%, however you want to look at it, is not a healthy way to be deciding major changes for a metagame. It's not that 51% is too small, it's that 49% is way too large of a group for you to ignore in a suspect when the "victorious" group isn't much larger than that. In some suspects, that can be fewer than 10 people.

There really isn't much else to say about this topic. Feel free to argue 60% vs 66% if you want, but 50% just doesn't cut it for me. Also, regardless of what we end up deciding for suspects, whether you're banning or unbanning something, the % used to make a change should be the same. I don't understand why they would be different.
I don't understand how "49% is too big to ignore" is an argument used against the 50% cutoff. With the other cutoffs you can ignore 59% and 65%, so by that metric alone they are worse. 50% has some flaws, but it's pretty clearly the better option in terms of not ignoring large number of voters.
 
I agree with some of the above sentiments that different tiers have different needs and maybe shouldn't be forced to standardize. Standardization and consistency are nice things as long as they don't undermine the individual tiering needs.

If we were to establish a modicum of consistency across tiering, there are several ways to group tiers together.
1) Ubers is the "highest" tier with the lowest number of total bans, so it has the highest % required to ban: 66.6%
2) OU, LC, and Monotype are regular tiers with no drops affecting them; they should be mostly static but not too difficult to see change in: 60%
3) UU, RU, NU, and PU see way more frequent change, so a suspect shouldn't really be considered as out of the ordinary: 50% (+1)

If the above is worth applying, I'm not sure. As OU Leader I'm personally comfortable with establishing this so count me in if other leaders are on board as well. At the same time, I'm not sure it's totally necessary to force such a thing if you can have faith in all leaders to act in the best interest of their tiers.

As for the BW Dugtrio vote itself, ideally it would have been 60%. However, there was no formal unifying standard established beforehand so we can't really say it crossed any lines that would justify nullifying the vote. Going forward, I'd definitely like to see 60% required in all old gen tiering decisions.
 
We'll be using the values in ABR's post as the soft standards for current gens. Tier Leaders are allowed to use different percentages if they wish to, as long as they justify properly and are planning to maintain their own standards long term; "pulling a macle" is strictly forbidden.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top