Would you play on a Smogon server if they had one on Shoddy?

Cathy · Feb 27, 2008

Warthog said:
my main point is that just because something isn't used as much doesn't make it not "uber" still.

I agree. In fact, Wobbuffet and Deoxys-e are both used plenty. I have never made this argument, and that is the first thing I say in my post:

Colin said:
I've never said anything like this. I have never said that if something is used more it is better than something that is used less. Nor have I said the reverse. Maybe somebody's said that, but it wasn't me, and that reasoning does not underscore the method I have been advocating.

Warthog said:
And I have a theory in why it isn't centralized in the metagame and its simply is because it isn't accepted yet. Most respectable players have the same mind frame that the pokemon is cheap and stay clear of it.

I responded to this already as well:

Colin said:
One objection that often comes up is: "Well maybe the pokemon isn't centralising because of X, Y, and Z and therefore it's still uber". Often X, Y, and Z take the form of social taboos. These people like to think of the Shoddy Battle ladder as just an approximation of a theoretical metagame where everybody is using the best sets on each pokemon, and the best pokemon. But the mistake here is that there is no reason to believe such a theoretical metagame exists: it is more likely that the metagame will in fact fluctuate endlessly, rather than approach a final "limit" as the best sets are discovered.

So how do I counter this problem? Instead of viewing "uberness" as an intrinsic property of pokemon, I consider it a function of the pokemon and the metagame it is being used in. There are some metagames where pokemon we now consider uber would not be uber. They are uber in our metagame, however. Rather than viewing an uber list as something that we discover and then let sit, I think it should be viewed as something that can be periodically revised as the metagame shifts.

Does the metagame actually fluctuate enough to warrant revisions? Probably not. But it might, and if factors X, Y, and Z are currently causing Wobbuffet not to be uber, then when those factors cease to exist it will be uber and it can be banned then.

I'm sorry to be so persistent about this but you are really misrepresenting my opinion on these things. If that was not your intent, I apologise, but you did attribute that position to me (and it's not mine). I agree that the method you outline is naive. Fortunately, it's not a method I advocate.

Jumpman16 · Feb 28, 2008

gaudetjaja said:
Firstly, it isn't, secondly, it wouldn't be undergoing a test like that, as every sane person knows that Deoxys-E and Wobbufet and Wynaut do not belong in OU. They were ubers in advance and I don't see any reason why they would be less uber in D/P. Statistics can't determine a metagame, otherwise Chansey would've been UU by now. Most people on smogon, or any other sane competitive pokemon site for that matter, probably agree on this.

For at least the third time, the tests on Wobbuffet and Deoxys-S were not 100% initiated by Colin alone. He has been listening with an open mind to people like Obi, Amazing Ampharos, and myself from the beginning.

And so we're clear on what that actually means, we definitely would be testing Wobby and DX-S on whichever Smogon battling sim were being used at the moment. That may change a few of your votes in this topic, as if it matters...

Jiggy-Ninja · Feb 28, 2008

Warthog said:
And I have a theory in why it isn't centralized in the metagame and its simply is because it isn't accepted yet. Most respectable players have the same mind frame that the pokemon is cheap and stay clear of it. If people didn't frown on it and therefore got more use, centralizing the metagame, then I think we might actually come to a conclusion (even if we don't need any, I think last generation serves as a template from last and it told us Wobb needed to be a banned deucer).

Advance Gen is not D/P. Things learned from previous generations don't always automatically carry over to the next ones.

Colin has addressed in the past that the psychological argument (that people do not use Wobbuffet because it is cheap) cannot be considered when making tiers, because the psychological argument can be twisted to justify literally anything. Instead, the assumption has to be made that in a competitive environment, people will use what is best in order to win. That's the only way any objectivity can be put into determining a Pokemon's uber status.

gaudetjaja · Feb 28, 2008

the Lexx said:
While Deoxys-E and Wobbuffet don't belong in OU in my opinion, tests don't hurt anything. My only issue is that there isn't a way to play ladder without them, and how the public is forced into testing.

Then why didn't he make a separate ladder with Deoxys E and fat wobba or something?

Chansey would be extremely broken in UU, and it would trash the diversity in the metagame. Have you ever played UU?

Have you ever put effort in reading my post? I said that if the only reason something would be uber or OU would be because it was used more than other pokemon, Chansey would have been UU. Of course this isn't the case.

Bologo said:
Dude, no offense, but this kind of post is the reason that this thread is rated 1 star.

You don't speak for everyone, and the kind of crap you were talking about in that post makes you sound like you've been playing DP for 3 days or something.

We had a poll for Deoxys-S a while ago. Over 50% of people voted to say that it should be unbanned. Are you saying that over 50% of people on smogon are insane? There's also been a lot of people that believed that Wobbuffet should be unbanned, so it's being tested. Are all those people insane too?

Just because something was uber in ADV, that sure as hell doesn't mean that it's uber in DP. If you don't see how they could be any less uber, then maybe you should actually prove that instead of just spouting theorymon.

Because Wobbufet requires no strategy to use and there's no way to counter it, except if all your pokemon know U-turn or something. Deoxys-E is maybe slightly less obviously broken, but a base 180 speed and 90 or 100 in about every other single stat aswell as a crazy movepool does seem a bit broken to me.

Warthog · Feb 28, 2008

alright Im starting to agree with you guys more because Im finding so far that Wobb isnt as awesome as he was in D/P after testing him in a few battles last night. Im starting to think though he would fit just fine in an aggressive based team and not in the control type theme I was thinking before (I used the old chaosbi team if some of you guys remember that team + a few other additions.) Not too sure on Deoxys-E yet though have yet to test it but from experience it wrecked my teams. Still think though that wobb has its place in an aggressive based team, those 1 for 1 type of teams were you break down walls etc.

since I have a lot of time on my hands next week I'm gonna mess around with Wobb/Deoxys-E to see if I at least can come to my own conclusions.

Jumpman16 · Feb 28, 2008

gaudetjaja said:
Deoxys-E is maybe slightly less obviously broken, but a base 180 speed and 90 or 100 in about every other single stat aswell as a crazy movepool does seem a bit broken to me.

Didn't Bologo just directly advise against spouting theorymon? The actual numbers concerning DX-S usage literally prove you wrong. It is 44th in weighted usage, by no means "broken". Its usage declined steadily after the iniital boom when it was allowed, which is a clear indication that it indeed is not nearly as "uber" as people like you think it is. Don't post stuff like this when there are facts that can educate you whether or not you think you're right.

Slice n Dice · Feb 28, 2008

Warthog said:
alright Im starting to agree with you guys more because Im finding so far that Wobb isnt as awesome as he was in D/P after testing him in a few battles last night. Im starting to think though he would fit just fine in an aggressive based team and not in the control type theme I was thinking before (I used the old chaosbi team if some of you guys remember that team + a few other additions.) Not too sure on Deoxys-E yet though have yet to test it but from experience it wrecked my teams. Still think though that wobb has its place in an aggressive based team, those 1 for 1 type of teams were you break down walls etc.

since I have a lot of time on my hands next week I'm gonna mess around with Wobb/Deoxys-E to see if I at least can come to my own conclusions.

....sigh.

I agree. If they aren't believing the brokenness of deoxys-e and wobb by now, it is time to prove a point. No one will be laughing at Wobb>Bellypass Smeargle>Lucario.

Cynthia · Feb 28, 2008

Didn't Bologo just directly advise against spouting theorymon? The actual numbers concerning DX-S usage literally prove you wrong. It is 44th in weighted usage, by no means "broken". Its usage declined steadily after the iniital boom when it was allowed, which is a clear indication that it indeed is not nearly as "uber" as people like you think it is. Don't post stuff like this when there are facts that can educate you whether or not you think you're right.

But usage isn't necessarily the same thing as power. A Pokemon can still be uber even if it wouldn't be the most used Pokemon, as there can be various reasons why people don't use a particular Pokemon. For example, less people use Wobbuffet because they consider it cheap or unfun to play, this does not mean it is actually any weaker.

umbarsc · Feb 28, 2008

Jumpman16 said:
Didn't Bologo just directly advise against spouting theorymon? The actual numbers concerning DX-S usage literally prove you wrong. It is 44th in weighted usage, by no means "broken". Its usage declined steadily after the iniital boom when it was allowed, which is a clear indication that it indeed is not nearly as "uber" as people like you think it is. Don't post stuff like this when there are facts that can educate you whether or not you think you're right.

I thought the tiers went something like: OU, UU, and NU aree defined by usage, then ubers and BL are to move Pokemon too powerful for their respective environments, meaning that it has absolutely nothing to do with usage whether a Pokemon is in OU or ubers.

Personally, it's never crossed my mind to use (or prepare for) Deoxys-S, and I have only seen a couple, both of which were shit sets like Knock Off/Spikes/Recover/Psychic or something. I had forgotten it was still being tested even. The same goes for Wobbuffet, except I did make a half-assed Wobba team that I tested a couple of times. I think that for a testing period to be effective, it must be thrust upon the community, because frankly no one has been taking the Deoxys-S test periods seriously at all, and people have found no reason to change their current teams. I don't see how that proves that Deoxys-S is ineffective.

Cathy · Feb 28, 2008

-Cynthia- said:
But usage isn't necessarily the same thing as power. A Pokemon can still be uber even if it wouldn't be the most used Pokemon, as there can be various reasons why people don't use a particular Pokemon. For example, less people use Wobbuffet because they consider it cheap or unfun to play, this does not mean it is actually any weaker.

It's difficult for a pokemon to centralise anything if it isn't used quite a bit. However, the most damning evidence that these pokemon are not broken is that the number of pokemon comprising the top 75% of usages has only changed by one over the last three months or so. See my other posts for more commentary.

Jumpman16 · Feb 29, 2008

-Cynthia- said:
But usage isn't necessarily the same thing as power. A Pokemon can still be uber even if it wouldn't be the most used Pokemon, as there can be various reasons why people don't use a particular Pokemon. For example, less people use Wobbuffet because they consider it cheap or unfun to play, this does not mean it is actually any weaker.

How do you know whether or not a pokemon is uber even if it is not "the most used Pokemon" in a metagame in which it has never been tested? How can you say for sure? We didn't have any data to go on regarding how DX-S worked in the standard DP metagame because it had never been allowed there. The data we do have now in no way points towards it being uber more than not uber, which is my point. If people are still playing to win, then you would expect a pokemon capable of dominating the game so much that it would overcentralize it to actually do so, and therefore not to be used less than 43 other pokemon that have proven (with the dubious exception of Garchomp) to themselves not overcentralize the metagame.

I also don't buy the "cheap or unfun" argument, because I have actually heard the same characteristics attributed to Garchomp and Blissey respectively, and that isn't stopping them from being the most overused pokemon in both weighted and unweighted play. In fact, since the fall of 2006 and all the threads and posts I've seen on Unchartered Territory and Stark Mountain, I am virtually positive I haven't seen any one pokemon more complained about than Garchomp and Blissey with regards to cheapness and boringness, which goes a long way in telling you how much stock we can place in mere talk if we want to use it to come to conclusions about what battlers actually use.

Dragontamer · Feb 29, 2008

I think that for a testing period to be effective, it must be thrust upon the community, because frankly no one has been taking the Deoxys-S test periods seriously at all, and people have found no reason to change their current teams.

Except for the fact that any sort of test is essentially a rule change in the game. Every untested aspect you throw onto the ladder makes it that much worse for a competitive environment.

The one thing I hated most about the Wobbuffet test was how it was conducted. It was a blatant slap in the face to the greater community, forcing everyone to become play testers instead of just simply players.

Regardless of your position on Wobbuffet, it is clear to me that it has created a schism in the Pokemon community, and the arbitrary testing of Wobbuffet directly on the ladder unnecessarily aggravated it. Uber or not, these sort of tests are detrimental to fabric the community. Next time the Uber list is changed, be damn sure about the change.

And if you are going to hide this sort of thing behind the mask of "testing", then make it a real test and not a rule change on the ladder. If the Deoxys-S tournament was that bad, find another means of testing. At least it didn't force the entire community to agree with the change.

I'm willing to move on and forget about this. But I just find it appalling that someone would be willing "thrust" these sorts of things onto the community again, especially when you are unsure of the results. The ladder is not the place for playtesting.

EDIT: And just so I'm clear. I accept the Deoxys-S test to be valid. It went through the tournament and real opinions were made before it was allowed on the ladder. I would prefer that future tests are conducted in this manner, but I understand the frustration of the moderators and organizers.

DougJustDoug · Feb 29, 2008

I am an advocate of using statistics as the primary determinant for the tiering of pokemon. Since the Shoddy usage statistics have been posted, it has changed many perceptions about the power and usefulness of competitive pokemon.

Take a look at Jumpman's recent threads to revise the Threat List. Usage statistics have had a huge impact on the definition of a "Threat". In some cases, the statistics have brought to light some massive differences between theorymon general perception and tangible quantitative measurements.

For example, Slaking is a pokemon that has long been debated as to its power and impact in the competitive metagame. The combination of clearly-uber base stats, a huge movepool, AND a near-crippling ability -- yields a pokemon that will forever be hotly debated in the world of theorymon. But, the "tale of the tape" is unquestionable. Slaking is virtually worthless in the upper-echelons of competitive D/P generation pokemon. Slaking ranks 127th on the most recent weighted usage list; nestled just below the lowly Electrode, a pokemon not normally mentioned in the same breath with "threat" or "uber".

So, I feel that usage statistics, while not perfect, are an incredibly useful tool in proving or disproving certain perceptions of the metagame.

In looking at the usage statistics, the most alarming thing that stands out to me, is the hockey-stick on the slope of the weighted usage graph for the very top-used pokemon. By eye-balling the graph, the average increment separating any two ranked pokemon appears to be roughly 1,000,000 points. However, at the top end of the graph, the difference between rank 3 and 4 is a whopping 11,450,000 points. That is an incredibly high spike in usage.

When you realize that Garchomp and Blissey occupy the top two spots in the usage list, it is not surprising that the statistics are skewed so severely on the top end. In fact, this is consistent with general perception. It is commonly argued that Garchomp and Blissey are overcentralizing influences on the game. Based on the separation these pokemon have from the rest of the field in terms of usage, seems to support the common claim -- Garchomp and Blissey should be uber.

But as they say, "There are three kinds of lies: lies, damned lies, and statistics."

If you noticed earlier, the huge gap in usage is not between ranks 2 and 3 -- it is between ranks 3 and 4. The "massively overused section" of the usage list doesn't have just two pokemon -- it has three. The number three pokemon on the usage list is Gengar.

Based on usage, Gengar is almost equal to Blissey and Garchomp. However, I don't recall any notable debates in the pokemon community that Gengar should be uber. I certainly don't think it's uber. Useful, versatile, and powerful -- but not uber. The question is -- Is it overcentralized? Based on usage statistics, you can certainly make that argument. The argument will quickly be met by lay justifications for all that usage. Gengar is a rare pokemon that fits a combination of needs. While no one thinks it is an overcentralizing force in the game, it DOES seem to fit nicely on an overwhelmingly large number of team concepts. Is this overcentralizing? I think other ghosts like Mismagius and Froslass might think so. I, however, do not.

Despite this fact, I actually think that usage statistics, and perhaps other statistics not yet available, should be the SOLE determinant in making changes to pokemon tier membership. Even if that means that pokemon like Gengar might get "unfairly" mis-categorized. IMO, that is a relatively small sacrifice, in order to have a clear, public, deterministic model for setting tiers. I think it is far more preferable than the current model, which is based on rumor and gut-feel, then wrapped in a false cloak of "expert judgment based on playtesting results". I also think it is preferable to sitting around arguing over who should be the experts and how opinion-based data should be gathered.

There is no perfect solution. And I DO NOT think that Shoddy usage statistics are the only statistics that should be analyzed. But I do think if we could move towards some purely statistical model for determining tiers, it would be preferable to the current system -- which isn't a system at all.

Dragontamer · Feb 29, 2008

While the "pure" useage statistics is indeed purely objective data, it is clear that the "Weighted" useage list is arbitrary. I personally have objections to using the weighted useage list; it certainly has a "feel" in the right direction (higher ranked players count more on that list).

Case in point: what do the points represent on the weighted useage list? I mean, what do they really represent? Perform a dimensional analysis... we are multiplying an arbitrary ranking system with the useage statistic. Do remember that the starting mean rank was arbitrarily decided to be 1500 (IIRC, ~800 Conservative), and that the whole system would work even if the ranks were negative, or hell, even if the starting rank was negative. And there are players with a negative rank.

(Note to Colin if you are reading this: I've accepted the Conservative Rating Estimate as a valid rank based on my own research. No more complaints from me on that part.)

-------------------

Anyway, I think we are not looking at the right statistics in the first place. The question we are asking is simply: What is an Uber Pokemon?. I propose the following simple statistic:

If a certain pokemon increases the probability of win dramatically over other pokemon, then that pokemon is overcentralizing.

For example, if we calculate that Garchomp wins 95% of battles, I think we can solidly conclude that it is an Uber pokemon. If Garchomp wins say... 60% of battles... then we can probably agree that it isn't. But this sort of statistic would be useful. All in all, yes, statistics should be gathered, but we should also opt for more diverse and possibly more telling statistics for these sorts of conclusions.

Cathy · Feb 29, 2008

DougJustDoug seems to be proposing the method I explicitly said I was not advocating in every post in this topic. Just to be clear, I am not suggesting to ban the most used pokemon. Maybe Doug is, but I am not. The reason I am highlighting this is that I get accused of this all day long and I have never once advocated it. I agree with Doug's sentiments about a firm statistical basis being the only thing to underpin tiers, however.

dragontamer: your metric is very problematic. Let's suppose Garchomp often occurs with another pokemon X. Now do we ban pokemon X as well if Garchomp and X are winning 95% of battles? Your metric doesn't even attempt to measure centralisation. It just measures something entirely arbitrary, with no connection to centralisation at all. A pokemon that is rarely used might be on a winning team here and there and end up with 100% win rate. Not to mention a pokemon might be able to effect centralisation without actually winning its battles, simply by forcing everybody to be prepared for it -- it would not tend to win that many in that environment. Aside from being arbitrary, this metric is inferior to one I've already outlined many times in this topic and elsewhere.

I already agreed with you elsewhere that the current weighted list is arbitrary, but so is the one you proposed, and the difference in the top of the list is quite minor. And yes, the top is all that matters, because NU would be decided by statistics in UU, not statistics in standard. I could easily calculate the "sum of probability of each user winning against a 1500-rated player" weighted list for each previous month as well since I already have the script.

jrrrrrrr · Feb 29, 2008

Dragontamer said:
While the "pure" useage statistics is indeed purely objective data, it is clear that the "Weighted" useage list is arbitrary. I personally have objections to using the weighted useage list; it certainly has a "feel" in the right direction (higher ranked players count more on that list).

Case in point: what do the points represent on the weighted useage list? I mean, what do they really represent? Perform a dimensional analysis... we are multiplying an arbitrary ranking system with the useage statistic. Do remember that the starting mean rank was arbitrarily decided to be 1500 (IIRC, ~800 Conservative), and that the whole system would work even if the ranks were negative, or hell, even if the starting rank was negative. And there are players with a negative rank.

(Note to Colin if you are reading this: I've accepted the Conservative Rating Estimate as a valid rank based on my own research. No more complaints from me on that part.)

Your last blurb in there completely contradicts your first two. You said that the statistics dont really represent anything because of the "arbitrary ranking system", which you then go on to claim is "vaild".

Theorymon is fun and all, but the statistics do not lie. Sure, having to get used to the rule changes of Wobbuffet (which i have yet to see in any ladder match) and Deoxys-S might take a while, but having this test data is a good thing. "Forcing" the community to test these pokemon is the only reliable way of accumulating the data that we thankfully now have. The simple fact is that Deoxys-E and Wobbuffet are not overpowered for OU. They may still be high-class OU pokemon, and very good at what they do, but the mountains of battle statistics show that they are both more not-uber than uber (as someone earlier said).

Dragontamer said:
Except for the fact that any sort of test is essentially a rule change in the game. Every untested aspect you throw onto the ladder makes it that much worse for a competitive environment.

The one thing I hated most about the Wobbuffet test was how it was conducted. It was a blatant slap in the face to the greater community, forcing everyone to become play testers instead of just simply players.

Regardless of your position on Wobbuffet, it is clear to me that it has created a schism in the Pokemon community, and the arbitrary testing of Wobbuffet directly on the ladder unnecessarily aggravated it. Uber or not, these sort of tests are detrimental to fabric the community. Next time the Uber list is changed, be damn sure about the change.

And if you are going to hide this sort of thing behind the mask of "testing", then make it a real test and not a rule change on the ladder. If the Deoxys-S tournament was that bad, find another means of testing. At least it didn't force the entire community to agree with the change.

I'm willing to move on and forget about this. But I just find it appalling that someone would be willing "thrust" these sorts of things onto the community again, especially when you are unsure of the results. The ladder is not the place for playtesting.

EDIT: And just so I'm clear. I accept the Deoxys-S test to be valid. It went through the tournament and real opinions were made before it was allowed on the ladder. I would prefer that future tests are conducted in this manner, but I understand the frustration of the moderators and organizers.

Forcing players to test it is the only way anything will ever get done. A 1-week testing period in the ladder is more than fair, especially since it was announced far beforehand. Saying that this test was "thrust" upon the community is misleading and unfair. If you didn't want to play in an environment with Wobby, you could have just not played for that week and it obviously would have been proven broken. Except it failed its test miserably, and can not perform to the "Uber" level that people once thought, so that entire argument is thrown out the window. "The end justifies the means".

You also claim that the test of Wobbuffet was "arbitrary", which is as close to wrong as one can get about the situation. Pokemon like Kyogre, Dialga, etc are obviously overcentralizing, but the same can not be said about Wobbuffet. There has been much doubt as to its tier status since adv. It would be a "slap in the face to the greater community" to not allow this kind of testing, as long as there is reasonable suspicion, because it broadens the metagame, opens up new strategies, ideas and concepts, and ultimately makes the game more fun to play.

DougJustDoug · Feb 29, 2008

ColinJF said:
DougJustDoug seems to be proposing the method I explicitly said I was not advocating in every post in this topic. Just to be clear, I am not suggesting to ban the most used pokemon. Maybe Doug is, but I am not. The reason I am highlighting this is that I get accused of this all day long and I have never once advocated it. I agree with Doug's sentiments about a firm statistical basis being the only thing to underpin tiers, however.

I realized at the end of the post that the bulk of my post seemed to be heavily advocating the usage statistics for tier determination. For that reason, I specifically stated in capital letters that I did not want usage as the ONLY statistic.

There is no perfect solution. And I DO NOT think that Shoddy usage statistics are the only statistics that should be analyzed. But I do think if we could move towards some purely statistical model for determining tiers, it would be preferable to the current system -- which isn't a system at all.

However, I understand that you get accused of that position, and since my post follows several of yours, others might mis-interpret your point.

I refer to usage stats because that is the only objective measurement available at this time. I don't think it is an adequate statistical base to measure the state of the metagame regarding tiering. My Gengar example highlights the limitations of usage statistics as the sole determinant for tiers. I do think usage statistics tell us quite a bit. But, not everything.

In the absence of perfect measurements, which will never happen -- I would prefer using flawed stats instead of using a subjective system for tier determination. Actually, I would even prefer a visible, defined, subjective system to a hidden system or no system at all.

Novatek · Feb 29, 2008

Awwwwwwwwww yeeeeeeeeeeaaaaaaaaaaaahhhhhhhhh!!!!!!!!!!!!!!!!!!!!!!!

Dragontamer · Mar 1, 2008

Your last blurb in there completely contradicts your first two. You said that the statistics dont really represent anything because of the "arbitrary ranking system", which you then go on to claim is "vaild".

The blurb was for Colin, not for you. Colin knows what I'm talking about and I'd rather not get into the unnecessary details.

Nonetheless, I feel this deserves a decent response. While every parameter to the Conservative Ratings Estimate is arbitrary, that does not prevent the fact that it is a very valid method for estimating the true skill of a player. The mathematics are solid in this respect. The Glicko2 rating system would work if every number was negative. It would even work if the best ranked player was 0. This is what I mean by arbitrary. Now, it is fortunate that the default parameters (~800 initial ranking and so forth) put people generally at a positive number, but as far as the mathematics are concerned... the "middle range" of the ranking system could have been chosen anywhere.

This becomes a problem when you start multiplying numbers. Take for example if the #1 player has a ranking of 0 (and the worse players had a negative ranking... which is entirely possible in Glicko2). In this case, the pokemon used by good players would be ignored (their usage statistic is multiplied by a number ~0), while the pokemon that poor players use would be multiplied with a relatively large negative number.

Theorymon is fun and all, but the statistics do not lie.

Statistics do not lie if you interpret them correctly. However, I do not see a correct interpretation of the "Weighted List".

Sure, having to get used to the rule changes of Wobbuffet (which i have yet to see in any ladder match) and Deoxys-S might take a while, but having this test data is a good thing. "Forcing" the community to test these pokemon is the only reliable way of accumulating the data that we thankfully now have. The simple fact is that Deoxys-E and Wobbuffet are not overpowered for OU. They may still be high-class OU pokemon, and very good at what they do, but the mountains of battle statistics show that they are both more not-uber than uber (as someone earlier said).

That does not change the fact that direct testing on the ladder has unnecessarily pissed people off. If the whole thing was tested with better considerations of this community, less people would be complaining about Wobbuffet and more people would be accepting the statistics.

My issue was neither with Wobbuffet nor with Deoxys-S. The politics of this issue were ignored, and now the community is paying the price.

Forcing players to test it is the only way anything will ever get done. A 1-week testing period in the ladder is more than fair, especially since it was announced far beforehand. Saying that this test was "thrust" upon the community is misleading and unfair.

Sorry to interrupt, I was making a point to Blaziken_57 by using elements from his post.

If you didn't want to play in an environment with Wobby, you could have just not played for that week and it obviously would have been proven broken. Except it failed its test miserably, and can not perform to the "Uber" level that people once thought, so that entire argument is thrown out the window. "The end justifies the means".

The end? Wobbuffet is unbanned, relatively few people use him now, and yet when he is used a distinct portion of the community feels isolated: split between quitting the game they love or playing the game with Pokemon they hate.

I feel this could have been at least partially avoided if the test was conducted in a more appropriate manner. There are significantly less people who complain about Deoxys-S than about Wobbuffet.

You also claim that the test of Wobbuffet was "arbitrary", which is as close to wrong as one can get about the situation. Pokemon like Kyogre, Dialga, etc are obviously overcentralizing, but the same can not be said about Wobbuffet. There has been much doubt as to its tier status since adv. It would be a "slap in the face to the greater community" to not allow this kind of testing, as long as there is reasonable suspicion, because it broadens the metagame, opens up new strategies, ideas and concepts, and ultimately makes the game more fun to play.

If you don't mind, I'll use your own words.

Theorymon is fun and all, but the statistics do not lie.

There were no statistics to warrent the unbanning of Wobbuffet on the ladder. Period. This is why I am against this method of testing. You force the community to change to the new rules before you gather statistics.

I only ask that when the next test occurs... that some friggen statistics are gathered before the ladder is affected. Honestly, I don't think that is too much to ask.

dragontamer: your metric is very problematic. Let's suppose Garchomp often occurs with another pokemon X. Now do we ban pokemon X as well if Garchomp and X are winning 95% of battles? Your metric doesn't even attempt to measure centralisation. It just measures something entirely arbitrary, with no connection to centralisation at all. A pokemon that is rarely used might be on a winning team here and there and end up with 100% win rate. Not to mention a pokemon might be able to effect centralisation without actually winning its battles, simply by forcing everybody to be prepared for it -- it would not tend to win that many in that environment. Aside from being arbitrary, this metric is inferior to one I've already outlined many times in this topic and elsewhere.

I can agree with that. Nonetheless, I'd like to continue to think up better methods than the current one. The current measurement just does not sit well in my stomach.

I already agreed with you elsewhere that the current weighted list is arbitrary, but so is the one you proposed, and the difference in the top of the list is quite minor. And yes, the top is all that matters, because NU would be decided by statistics in UU, not statistics in standard. I could easily calculate the "sum of probability of each user winning against a 1500-rated player" weighted list for each previous month as well since I already have the script.

Actually, OU pokemon are defined by their statistics in OU, which is what affects UU. (Aka: Tentacruel's banning from UU). So yes, the middle of the list actually matters a lot for the UU metagame.

Cathy · Mar 1, 2008

dragontamer said:
Actually, OU pokemon are defined by their statistics in OU, which is what affects UU. (Aka: Tentacruel's banning from UU). So yes, the middle of the list actually matters a lot for the UU metagame.

The two lists are similar through the middle. By "the top", I meant all of OU. However, we can still check both lists.

obi · Mar 1, 2008

Dragontamer said:
There were no statistics to warrent the unbanning of Wobbuffet on the ladder. Period. This is why I am against this method of testing. You force the community to change to the new rules before you gather statistics.

I only ask that when the next test occurs... that some friggen statistics are gathered before the ladder is affected. Honestly, I don't think that is too much to ask.

I can guarantee that the next change to the ladder in deciding whether a Pokemon should be banned or unbanned will be based on at least as much statistics as what went into the original decision for that Pokemon.

Dragontamer · Mar 1, 2008

Obi said:
I can guarantee that the next change to the ladder in deciding whether a Pokemon should be banned or unbanned will be based on at least as much statistics as what went into the original decision for that Pokemon.

I can only hope that a legitimate amount of real testing will take place in the future (and I'm not talking "just as much" as original discussion, because I know your opinion that there was little testing for the initial metagame)

EDIT: As long as a minimal empirical test is conducted before throwing it into the ladder, I'll be happy. Till then, I will simply wait.

obi · Mar 1, 2008

Not just as much; at least as much. I suspect there will be more.

Dragontamer · Mar 1, 2008

Not just as much; at least as much. I suspect there will be more.

Then I have no further complaints.

NoMercy · Mar 1, 2008

Yeah, if it would work, unlike Shoddy right now...

Would you play on a Smogon server if they had one on Shoddy?

Cathy

Banned deucer.

Jumpman16

np: Michael Jackson - "Mon in the Mirror" (DW mix)

Jiggy-Ninja

gaudetjaja

Warthog

Jumpman16

np: Michael Jackson - "Mon in the Mirror" (DW mix)

Slice n Dice

Cynthia

umbarsc

Cathy

Banned deucer.

Jumpman16

np: Michael Jackson - "Mon in the Mirror" (DW mix)

Dragontamer

DougJustDoug

Knows the great enthusiasms

Dragontamer

Cathy

Banned deucer.

jrrrrrrr

wubwubwub

DougJustDoug

Knows the great enthusiasms

Novatek

Dragontamer

Cathy

Banned deucer.

obi

formerly david stone

Dragontamer

obi

formerly david stone

Dragontamer

NoMercy