I feel that the current system for determination of tiers, and I’m sure others concur, is substandard. There are many with voting privilege who don’t necessarily have a firm grasp on our current metagame or Smogon’s philosophy. That said, I don't believe Bold Voting will do much better since it can only filter the votes in the current testing process through the arbitrary judgment of whoever chooses which votes count. It's for these reasons that I'm proposing a new way of testing suspects. The voting process will still be based on merit and I feel this process will take a lot of bias out of the voting process. The testing system will require work to implement, but it is necessary in order to filter out the bias.
Setup:
1. Everyone who wishes to partake in the process must register their accounts and decide if the suspect OU or Uber before the testing commences. They will then be separated into two groups, OU and Uber, based on their decision
2. People in the Uber group must use the suspect.
3. People in the OU group may not use the suspect.
4. Testing will commence for a period of time on a different, suspect ladder (a different server or an honor system clause may be necessary).
5. If a player wishes to switch their decision, they must use a new account with a blank rating. Obviously, you may only have one account per person.
6. We will use the statistics on the ladder at the end of the testing to determine the Suspect's tiering. It will be deemed Uber if there are significantly more top players in the Uber group than the OU group(more than a 2:1 ratio)
(Optional) Deviation requirements for voting to promote battling.
This testing method cements the implied definition of Uber that was unverifiable before: A suspect is considered Uber if its usage generates a decided advantage over an opponent who is not using the suspect, even when said opponent has full knowledge of the presence of the suspect on the opposing team and tries their best to inhibit the suspect from fulfilling its intended role. This definition should fit for all Ubers and previous suspects voted into Uber. The question "How much of an advantage makes it Uber" will be determined by the arbitrary margin set in Step 6.
Why it works:
-By doing well in the ladder, you are essentially "proving" your vote. If the suspect is indeed broken, then those in the Uber group will have a decided advantage, and therefore, reach the rank necessary to prove it Uber. This is true for the opposite case as well. Doing well in the ladder against the Uber suspect without employing it also reinforces your vote.
-The system forcibly creates a faux centralization and gauges the suspect’s performance in a certain environment.
-Unlike the current system, there are a limited number of spots which matter, which gives motive for the battlers to do their best instead of simply meeting the requirements to vote.
-The test guarantees that all of those who are assuming its tiering have to be in an environment where they are forced to play with or against the suspect.
-It gets rid of most of the self-interest because in this process, people can only prove or disprove that a Suspect is Uber. i.e. If someone who has a team that is only weak to Skymin, he will vote it into Uber no matter what the circumstances because it helps his team out. However, in this test, he can only benefit the testing process because he is proving or disproving his opinion by winning or losing.
Foreseeable problems for this method of testing:
-Insufficient number of testers- If there are much more players for one group or the other. In that case, take volunteers on the other side or pick testers to switch positions in a random fashion.
-Mass Conspiracy- If somehow the best players are lumped into one category while the worst are in the other, then the results would be obviously skewed.
-Human Error- If somehow the testers voting Uber do not use the suspect to the fullest extent, similar to how Deoxys-S’s full potential was realized quite late into its usage period.
-Too difficult to implement- Unavoidable, that one is really a bummer =/
Thoughts, responses and criticisms are, of course, welcome. Anything thoughtful will only help the testing method.
Lastly, thanks to outofdashwz for helping me bounce around ideas and proofread the post, Obi/ipl for inspiring this idea.
Setup:
1. Everyone who wishes to partake in the process must register their accounts and decide if the suspect OU or Uber before the testing commences. They will then be separated into two groups, OU and Uber, based on their decision
2. People in the Uber group must use the suspect.
3. People in the OU group may not use the suspect.
4. Testing will commence for a period of time on a different, suspect ladder (a different server or an honor system clause may be necessary).
5. If a player wishes to switch their decision, they must use a new account with a blank rating. Obviously, you may only have one account per person.
6. We will use the statistics on the ladder at the end of the testing to determine the Suspect's tiering. It will be deemed Uber if there are significantly more top players in the Uber group than the OU group(more than a 2:1 ratio)
(Optional) Deviation requirements for voting to promote battling.
This testing method cements the implied definition of Uber that was unverifiable before: A suspect is considered Uber if its usage generates a decided advantage over an opponent who is not using the suspect, even when said opponent has full knowledge of the presence of the suspect on the opposing team and tries their best to inhibit the suspect from fulfilling its intended role. This definition should fit for all Ubers and previous suspects voted into Uber. The question "How much of an advantage makes it Uber" will be determined by the arbitrary margin set in Step 6.
Why it works:
-By doing well in the ladder, you are essentially "proving" your vote. If the suspect is indeed broken, then those in the Uber group will have a decided advantage, and therefore, reach the rank necessary to prove it Uber. This is true for the opposite case as well. Doing well in the ladder against the Uber suspect without employing it also reinforces your vote.
-The system forcibly creates a faux centralization and gauges the suspect’s performance in a certain environment.
-Unlike the current system, there are a limited number of spots which matter, which gives motive for the battlers to do their best instead of simply meeting the requirements to vote.
-The test guarantees that all of those who are assuming its tiering have to be in an environment where they are forced to play with or against the suspect.
-It gets rid of most of the self-interest because in this process, people can only prove or disprove that a Suspect is Uber. i.e. If someone who has a team that is only weak to Skymin, he will vote it into Uber no matter what the circumstances because it helps his team out. However, in this test, he can only benefit the testing process because he is proving or disproving his opinion by winning or losing.
Foreseeable problems for this method of testing:
-Insufficient number of testers- If there are much more players for one group or the other. In that case, take volunteers on the other side or pick testers to switch positions in a random fashion.
-Mass Conspiracy- If somehow the best players are lumped into one category while the worst are in the other, then the results would be obviously skewed.
-Human Error- If somehow the testers voting Uber do not use the suspect to the fullest extent, similar to how Deoxys-S’s full potential was realized quite late into its usage period.
-Too difficult to implement- Unavoidable, that one is really a bummer =/
Thoughts, responses and criticisms are, of course, welcome. Anything thoughtful will only help the testing method.
Lastly, thanks to outofdashwz for helping me bounce around ideas and proofread the post, Obi/ipl for inspiring this idea.