I said I'd make a new thread where I'd argue for a simpler rating/deviation basis for our suspect tests, but it doesnt seem really worth its own thread. It's just rehashing arguments made elsewhere, and this seems like a good enough place for that.
But first I am going to go back to what I have said in this thread earlier.
Based on reading the uu testing thread, I have decided testing on the ou ladder like I have described probably wont work based on:
Finally, since they can't play with the suspect during this one month process, how are they supposed to know if its broken or not? The very idea of taking the suspects out compromises peoples opinions on the matter. Could you imagine if Jump and Aeolus made a post tomorrow saying "ok, to see whether or not you think Garchomp is broken, we are going to prevent you from using it"? That is what the current UU process is doing. Jabba and I propose removing this part of the process completely, since it is not only useless but it also goes against the entire idea of a suspect test.
So if we are testing by subtraction, and we are going to do so on the suspect ladder, it still doesnt really sense to just test not using a suspect. What I recommend doing is testing in a manner similar to the Garchomp test. Except without the deoxys thing. Basically we have all the suspects in OU, and all the suspects except the one we feel is probably the most broken on the suspect ladder. We require people to make qualification requirements on both ladders and then we let them vote.
I mean, just looking back, it seems like the Garchomp test was the most successful test. It had basically a year of pretesting, and it was overwhelmingly the most used pokemon in OU at the time, which helped. But it really did seem to go smoothly.. Just to remind people, the Garchomp test required passing qualification requirements on both ladders..
However, basically if we have a suspect ladder, then we have fewer voters. We got 54 in garchomp. I have some ideas for how we could get a few more, but these are more or less longer term things. I may get back to them later.
Also the suspect ladder raises the issues of artificiality and biases. I think having the OU ladder requirement mitigates these somewhat, for instance the argument that people who like the suspect metagame more are more likely to battle more on the suspect ladder, is improved somewhat by the fact that those people will have to battle a lot in OU. However, unless we have a double rating, or reset peoples OU ratings then people probably wont actually have to battle much at all on OU to qualify. Ultimately though, the suspect ladder is necessary and it is artificial. Perhaps other people have some suggestions but well based on what has gone on so far in this thread I doubt it.
Ok, also there are three other issues which have been raised, which I didnt cover in this thread, and we have two solutions to these issues which I am not a big fan of, so I am going to go over those now (IE the point of this post).
The first is bad faith voting. We are currently dealing with this with written paragraphs. I have a lot of issues with those, but I dont really feel the need to go into that at this point. Suffice it to say I think they are a bad idea.
Clearly we know that some people are going to vote with something other than having a rule set that creates a balanced metagame in mind. But honestly, I know enough about the people in this community to know that those people are an extremely small minority. To be fair, when we have a very small pool of voters, an extremely small minority can appear quite large, I dont think that with the exception of the Shaymin vote, they have had a decisive impact on our testing at all, or have even been close to having a decisive impact.
Now the shaymin test is a special case, I'm pretty sure there were people who would have been able to qualify for that test without battling at all in the testing month. Now for example, there is one person I know who qualified to vote for shaymin, who I asked if he really thought his vote truly represented his evaluation of shaymin's uberness and he replied no. But I also asked that had we specifically asked him to vote in good faith would he have voted the same way, and he replied no to that as well.
I dont think I will ever believe that if we explained to the community that when we ask them to vote we are asking them to protect the integrity of smogon, that 99% of them wouldnt vote based on smogon's philosophy.
The next problem is idiots who qualify to vote. Firstly, I think that the people who qualify for suspect tests are probably generally pretty smart, but that they maybe just dont understand exactly what we are asking of them. Basically I think that if we had a well written guide explaining the responsibilities of suspect voters, we should be able to trust them. And then we just need a couple of failsafe measures to fix any freak accidents that might occur. I mean, if we do make mistakes it really isnt the end of the world. I think that by trying to foolproof the system we have actually made it a lot weaker..
The other is the issue of tests where people refuse to test. This to me is a symptom of testing by addition rather than subtraction.
Firstly when you subtract you have a lot of time prior to the beginning of the test where people have experienced the suspect, which helps people hit the ground running during the testing period.
And teambuilding has become quite a big task over the years since rby. It takes a lot of time and effort to build a decent team, so I think people will try and avoid it if they can. Even when you have people eager to test well, if you add things instead of subtract you are asking people to do a lot of work that they probably will feel isnt really necessary, they can put teambuilding off til later and just use an old team. When you remove things from the ruleset, you kinda prevent that from happening. People have to build new teams because their old teams will be banned. And the suspect will be tested because it will already be in OU.
But even when you arent testing by subtraction, I think if you set your deviation requirements well enough you shouldnt need suspect experience qualifications. If you have say 1 in 5 people using the suspect (so it would be roughly top 10 in our latest suspect ladder), and you want people to battle the suspect at least 20 times in order to have a reasonable go at experiencing it (just a random number that makes maths easy) then set a deviation requirement that makes people battle 100 times.
And about people not using the suspect. Some people can really find suspects broken while also finding them completely inappropriate to use (like Skymin on a stall team). If you let these people battle how they want and qualify if they can and then vote even if they dont use the suspect, you still get a decent view of the impact of that suspect. Because if these peoples opinions are not reflective of the community then they can just be outvoted. If this is the case for the majority of people, and no one is able to convince them otherwise, then why not just accept their views. If they turn out to be godawful, then when that becomes obvious enough we can fix it..
The only issue I see with that, is an issue with our entire testing method, that ladder battling requires slightly different strategy to tournament battling or wifi battling or whatever, and our rule sets shouldnt be entirely based on laddering, this is another form of bias inherent in our testing method, and my only solution to this is to mention it in the guide I suggested and hope that people think about that..
Ok, so about trying to find more voters (note this only works with subtractive testing). I think one thing we could do, is use the top 50 on the suspect ladder, and by doing so guarantee having 50 voters (alts wouldnt count). This actually has another benefit that people wont stop battling after they qualify in case it means they drop off the bottom of the list. The other benefit of this is that essentially when people battle on ladder they do so to try and climb the ladder, I dont know enough about the ladder to really comment on this, but it seems like technically a different style of battling could be better for qualifying for suspect tests than it is for climbing the ladder. It is essentially adding more unnecessary artificiality into the test. But of course you need to have faith in your ladder. x-act has suggested that CRE is not ideal for pokemon, and has another method that hasnt yet been implemented, but if that is a lot better (and I really just have no idea) then that would be a nicer way of finding qualified voters.
But of course that doesnt work with testing on two ladders. So either they have to make the top 50 on both ladders, which means you end up with less than 50 voters, or they only need to make a rating requirement on the OU ladder (or perhaps the other way round, though that doesnt seem right) and then allow the top 50 voters out of all of those who made that requirement.
So yeah, what I would like: tests by subtraction (similar to how the garchomp test was done) and voters being decided entirely by battling ability and then just having faith that if we make it clear to people what we want them to do that they will do their best to do it.
Have a nice day.