The suspect test process is drawing to a close and this thread is really covering two issues.
The stage 3 testing process is beginning, but it isnt completely defined in terms of exactly how things will work out. So I am hoping we can work out something here.
Related to this, is what will happen in future when we want to test something. I think the suspect test process has strengths and weaknesses and I think now we have some experience to fall back on, I think hopefully we can plan for a solid system that can be applied even in future generations.
I think it is important to understand how we will be able to make decisions in the future when we decide how we will approach stage three voting. I mean, obviously we want stage three to be as final as possible, but how easily we will be able to revisit things I think should affect how comprehensive we should be in our testing before the end of this process.
So I am going to start from a philosophical level and then try and get more specific toward the end.
Ok, well the goal of our tiering is to create a ruleset that results in a balanced metagame. The assumption is that balance results in a good game, and that a good game is our goal.
This goal trumps other considerations such as what is wanted by individual players. The point here is we are essentially the caretakers of the rules of competitive pokemon, we need to protect the integrity of the game from the personal preferences of current players on behalf of all future players. If we fail to do this, then in the future Smogon will lose its market, either to another pokemon site, or to another game entirely. Seeking balance is the only way we know how to do this.
This is all well and good, but at some point we need to consider practicality. It is no good having a process that guarantees the best possible metagame, if that process breaks down every time we try to implement it. The example of the official servers failure to introduce Wobbuffet shows that philosophical purity is only worthwhile if people are able to accept it.
The point is we have to come up with a system that is philosophically pure, but still acceptable to the general population. I think the suspect test process has been mostly successful in this regard.
But, practicality goes beyond public acceptance. I think that, at the moment, the general consensus is that the suspect test process in its current form, while it is more or less working, has just been too hard. The impression I get is most people wouldnt want to go through this again. Perhaps part of that is the fact that this is our first time through, and that maybe the second time it would be easier.
So I am going to identify some areas where I think the suspect test is not adequate for the future, at least in its current form.
There have been some other less tangible reasons for opposition to the process. Suggestions like the tests are artificial and are cultivating biases for various reasons. These are hard to prove but are surely worth noting. I'll leave it to others to suggest these in this thread.
The solution I have to issue 2 and 4 and the claims of artificiality is we need to test and have votes based on the standard ladder. It will result in more voters, and it asks less effort of them, as they presumeably already battle on the standard ladder. I originally was going to justify this by arguing that people are already accepting regular changes to the OU tier throughout this process, but then I realised that if we just continue with the process as it is, and then whenever the status quo is changed, we change the standard ladder and a month later test based on standard ladder results within that month. What we need to do is during this period calculate two seperate ratings for battlers on the OU ladder. The normal rating, which wont be affected at all. And a second "suspect rating" which would be reset to the default position at the beginning of each test. This way people wouldnt complain about losing their ratings, but people also wouldnt qualify without a lot of battling during the month.
Also for future generations I would recommend that we start with all suspects unbanned in OU. Then preferably test by subtraction rather than addition. IE test the stage 3 metagame first, and then test removing pokemon rather than adding them. This is just because the philosophy states that we ban only when we are certain. Obviously we couldnt do this pracitcally this time, but in future I think it would be a better idea. Mainly because this way we force changes we make into OU, like if we unban a pokemon on the OU ladder, and people just dont use it, we dont gain anything by testing. But if we unban something, then people are guaranteed to be testing what we want them to..
So when a new gen comes out, the process would be:
Theorymon OU and some suspects. This will probably be largely based on >600 base stats are probably uber.
Then allow all the suspects, and test the metagame for a month or so, to make a priority list of pokemon. This can be done by Badgeholders/PR members whatever. Ideally if a suspect is an obvious counter to another then that pokemon should be tested after the pokemon it counters. Hopefully you wont get circles lol.. Perhaps someone can come up with better details..
Then we have a suspect test for the first suspect for a month, if it is voted uber, we change the rules for OU.
We repeat this process.
And as we repeat this process, whenever that process results in a change to OU we have a months worth of keeping track of the scores of the OU ladder as though ratings had been reset at the start of the month. TRhen at the end of the month we have a vote whose result will be final.
I am unsure if a super majority is necessary or not, remember this pokemon would have been ou at first, so to be voted uber it needs to pass two votes, which will both probably will have reasonably different voters as the second vote can be expected to be significantly larger.
And now that the suspect process is more or less finished and we want to make changes, we just need a method of deciding when to suspect test something. Once we do that, we can have a month of suspect testing, then a month of OU ladder with a second suspect rating and a vote. Now that we have the groundwork in place I think the work needed should be seriously reduced, the hardest part will be deciding what to test. I dont have an elegant solution for this..
What I would also suggest is that if the stage three vote should overturn the original vote, then we can set up a suspect rating like I suggested on the on the OU ladder for a month (or two?) after that vote, and then have an additional vote that will be the final decision on the positioning of that suspect. Hopefully that vote will be large enough to be conclusive (worst case scenario it has won 2 out of 3 votes [a supermajority!])..
Have a nice day.
The stage 3 testing process is beginning, but it isnt completely defined in terms of exactly how things will work out. So I am hoping we can work out something here.
Related to this, is what will happen in future when we want to test something. I think the suspect test process has strengths and weaknesses and I think now we have some experience to fall back on, I think hopefully we can plan for a solid system that can be applied even in future generations.
I think it is important to understand how we will be able to make decisions in the future when we decide how we will approach stage three voting. I mean, obviously we want stage three to be as final as possible, but how easily we will be able to revisit things I think should affect how comprehensive we should be in our testing before the end of this process.
So I am going to start from a philosophical level and then try and get more specific toward the end.
Ok, well the goal of our tiering is to create a ruleset that results in a balanced metagame. The assumption is that balance results in a good game, and that a good game is our goal.
This goal trumps other considerations such as what is wanted by individual players. The point here is we are essentially the caretakers of the rules of competitive pokemon, we need to protect the integrity of the game from the personal preferences of current players on behalf of all future players. If we fail to do this, then in the future Smogon will lose its market, either to another pokemon site, or to another game entirely. Seeking balance is the only way we know how to do this.
This is all well and good, but at some point we need to consider practicality. It is no good having a process that guarantees the best possible metagame, if that process breaks down every time we try to implement it. The example of the official servers failure to introduce Wobbuffet shows that philosophical purity is only worthwhile if people are able to accept it.
The point is we have to come up with a system that is philosophically pure, but still acceptable to the general population. I think the suspect test process has been mostly successful in this regard.
But, practicality goes beyond public acceptance. I think that, at the moment, the general consensus is that the suspect test process in its current form, while it is more or less working, has just been too hard. The impression I get is most people wouldnt want to go through this again. Perhaps part of that is the fact that this is our first time through, and that maybe the second time it would be easier.
So I am going to identify some areas where I think the suspect test is not adequate for the future, at least in its current form.
- Deciding suspects. We havent really got an effective system for this, we just kinda wing it.. People seem to be accepting it though, so this seems kinda minor.
- Voters. There havent been enough, at any point. It has been a big dissapointment, and I would say that of anything, this is the real problem with the suspect test process. But I also know how difficult it is to make the commitment to qualify. Several times I tried and gave up just because of the time needed.
- Timeframe. It has taken us a year. To be honest I think that a year is probably a reasonable timeframe, to take an untested metagame and create something with a sense of finality about it.
- Effort. I feel so relieved this process is finally finishing, and I havent even been heavily involved since the deoxys test. People like Aeolus, Jumpman and Doug have really given a fuckload to this, and I wouldnt like to be the one to ask them to go through it again.
There have been some other less tangible reasons for opposition to the process. Suggestions like the tests are artificial and are cultivating biases for various reasons. These are hard to prove but are surely worth noting. I'll leave it to others to suggest these in this thread.
The solution I have to issue 2 and 4 and the claims of artificiality is we need to test and have votes based on the standard ladder. It will result in more voters, and it asks less effort of them, as they presumeably already battle on the standard ladder. I originally was going to justify this by arguing that people are already accepting regular changes to the OU tier throughout this process, but then I realised that if we just continue with the process as it is, and then whenever the status quo is changed, we change the standard ladder and a month later test based on standard ladder results within that month. What we need to do is during this period calculate two seperate ratings for battlers on the OU ladder. The normal rating, which wont be affected at all. And a second "suspect rating" which would be reset to the default position at the beginning of each test. This way people wouldnt complain about losing their ratings, but people also wouldnt qualify without a lot of battling during the month.
Also for future generations I would recommend that we start with all suspects unbanned in OU. Then preferably test by subtraction rather than addition. IE test the stage 3 metagame first, and then test removing pokemon rather than adding them. This is just because the philosophy states that we ban only when we are certain. Obviously we couldnt do this pracitcally this time, but in future I think it would be a better idea. Mainly because this way we force changes we make into OU, like if we unban a pokemon on the OU ladder, and people just dont use it, we dont gain anything by testing. But if we unban something, then people are guaranteed to be testing what we want them to..
So when a new gen comes out, the process would be:
Theorymon OU and some suspects. This will probably be largely based on >600 base stats are probably uber.
Then allow all the suspects, and test the metagame for a month or so, to make a priority list of pokemon. This can be done by Badgeholders/PR members whatever. Ideally if a suspect is an obvious counter to another then that pokemon should be tested after the pokemon it counters. Hopefully you wont get circles lol.. Perhaps someone can come up with better details..
Then we have a suspect test for the first suspect for a month, if it is voted uber, we change the rules for OU.
We repeat this process.
And as we repeat this process, whenever that process results in a change to OU we have a months worth of keeping track of the scores of the OU ladder as though ratings had been reset at the start of the month. TRhen at the end of the month we have a vote whose result will be final.
I am unsure if a super majority is necessary or not, remember this pokemon would have been ou at first, so to be voted uber it needs to pass two votes, which will both probably will have reasonably different voters as the second vote can be expected to be significantly larger.
And now that the suspect process is more or less finished and we want to make changes, we just need a method of deciding when to suspect test something. Once we do that, we can have a month of suspect testing, then a month of OU ladder with a second suspect rating and a vote. Now that we have the groundwork in place I think the work needed should be seriously reduced, the hardest part will be deciding what to test. I dont have an elegant solution for this..
What I would also suggest is that if the stage three vote should overturn the original vote, then we can set up a suspect rating like I suggested on the on the OU ladder for a month (or two?) after that vote, and then have an additional vote that will be the final decision on the positioning of that suspect. Hopefully that vote will be large enough to be conclusive (worst case scenario it has won 2 out of 3 votes [a supermajority!])..
Have a nice day.