dragontamer said:
That was a test, and now Deoxys is pending in the OU ladder environment as a result of that test. However, right now, by changing the ladder pokemon all of a sudden, you're essentially forcing the players to play with Wobbuffet without a test. The Ladder is the game, tests should not be conducted in the game. Hell, you're risking people's ratings on tests. This goes against every good virtue of a game.
When we had a tournament, practically everybody managed to write off the results for one reason or another. "All the players were bad." "Everybody was prepared for Deoxys-e." etc. People repeatedly stated that the ladder would be a better test (even though this was part two of the plan anyway). Furthermore, the tournament was a huge effort with thoroughly disappointing turnout (and it took two and half months). Something of that caliber is just not worth it if the results are going to be shrugged off by nearly everybody anyway.
As for ratings, one of the virtues of the glicko2 rating system is that your rating can quickly adjust if your ability suddenly changes. Of course, a rule change that drastically changes which players are the best would result in rating adjustments--both when the pokemon is unbanned, and when it is banned again (if this happens). Of course, you're already familiar with how the rating system works, so it seems like you're throwing this out here even knowing it's an empty concern.
I disagree that players should never have to do any testing. In essence, this is the best way to find out if Wobbuffet is broken. At the end of feburary, we can compare the total number of pokemon whose cumulative usage percents add up to X% to the number that did the month before (and I plan to do this this month for Deoxys-e). If we find the number to have decreased,
and we find Wobbuffet to be common himself, we have hard statistical evidence that Wobbuffet is creating
centralisation. On the other hand, if we find the number to have changed negligibly, or increased, or if Wobbuffet is not common himself, we know that Wobbuffet has unaffected or
diversified the metagame. (As I said I also plan to do this analysis to help to decide whether to leave Deoxys-e banned.) These sort of statistical tests can just not be done in a tournament because there is no before and after to compare. (You could compare to the last month of statistics, but this would be less than ideal because the tournament does not reflect the actual state of the metagame anyway, since a new pokemon has
just been introduced.)
The main role of the tournament in the case of Deoxys-e was to establish whether there was a chance he might not be broken. In the case of Wobbuffet, many top players (some of whom have been testing Wobbuffet) already agree that it
might not be broken, which is good enough for a full scale test.
You do raise a legitimate concern, however. The ladder should not be a continual testing ground, or it is not a constant competitive environment. However, Wobbuffet is the last pokemon we plan to test on the ladder in the near future, so you don't need to worry about the ladder becoming nothing but a testing ground for possibly standard "ubers".