Stage three and beyond.

Jumpman16 · May 31, 2009

obi said:
I have to question why you're calling certain theoretical gen-5 Pokemon "uber" already. No Pokemon is inherently uber. You seem to think that, if Gamefreak introduces another 20 or so Pokemon that can compete with the current ubers, you can make a "balanced ubers" tier and have fun with that, but they're still uber because of some magical, intangible status. That is not how the tiers work. Ubers is defined as the set of Pokemon incapable of being in a balanced tier. In other words, if you can have a balanced uber tier, then it's not ubers...

If you think this isn't the case, you should update the tiers article on the site to include non-competitive terminology.

Or it's more an issue of practicality than correctness. I think that after we learn the new ins and out of Gen 5 we will better be able to see which if any of the current ubers should be Suspect. We were willing to do this with Ho-oh because of Stealth Rock, a move we didn't know would be so central to Gen 4 competitive pokemon until over a year after we started.

As with Chris is me and Lemmiwinks, the onus is squarely on you to make a compelling argument why, right now, with the information about Generation 4, that Rayquaza is not uber, because these thoughts are directly applicable to how I envision we should begin Gen 5. I don't want to hear "because we've never tested it in Standard", because the letter of the law doesn't matter here. I'm completely aware of how much sense it makes to start with no bans, but I think it make more sense to ban the 680 BST legendaries and learn about Gen 5 with less "variables" (not as few as possible, but less).

And kindly don't tell me which articles I should and should not update. I think you need flesh out your thoughts and arguments on the Suspect Test Process in this forum before implying that what I'm suggesting is so incorrect or contrary to our philosophy, obi.

lilyhollow · May 31, 2009

Caelum said:
The "uber-lite" point or something. I'm not even sure what that means. If you mean other ubers checking other ubers I don't buy that. Kyogre can be dealt with by Soul Dew Latias or Special Defensive Dialga; however, I think it's clear to most players that Kyogre is still obviously broken. I don't think we could ever create a balanced metagame out of most of our ubers by introducing other ubers; at least at this time.

Chris is me said:
The idea is that if Nintendo introduces 20 or so BST 680 Pokémon, that would essentially be enough to create a balanced metagame based on Ubers. If that happens, most of the above posters still want the traditional OU bar to be "OU", rather than the first balanced metagame.

This isn't really a "what if" scenario. I am literally saying, "given what Gamefreak has done so far, and how the current Ubers tier is holding up, if we start the Standard 5th gen metagame without any bans, we will most likely end up with a game that centers largely around extremely high BST Legendaries." Yes, if they added 20 680+ BST Pokemon next gen, the possibility of the "first balanced metagame" looking very similar to the Ubers tier would be ridiculously high. I don't think that Gamefreak would need to even come close to going that far, though, to reach an "old UU" kind of situation where the initial banlist is large enough that resetting it results in a completely different-looking metagame that never would have been reached if the original banlist were used as the baseline. In fact, all Gamefreak would have to do is exactly what most of us already expect: add another three or four 680 BSTs, another Darkrai, and another Manaphy/Skymin, increasing the banlist enough so that it almost doubles Advance's right off the bat. Remembering Pokemon like Garchomp, Tyranitar and Lucario, I find it hard to believe that we wouldn't be able to ban the most ridiculous, centralizing forces in that environment and come up with something balanced and enjoyable.

As it stands though, the idea is pretty much incompatible with our current tiering system. Considering that I think such a mode of play isn't just possible, but has merit, I believe that's a waste.

Lemmiwinks MkII · May 31, 2009

Jumpman16 said:
As with Chris is me and Lemmiwinks, the onus is squarely on you to make a compelling argument why, right now, with the information about Generation 4, that Rayquaza is not uber, because these thoughts are directly applicable to how I envision we should begin Gen 5.

I can't answer that and I'll tell you why: I think Rayquaza is 100% Uber. Never did I even suggest otherwise, nor did I profess the desire to start Gen 5 with no bans at all. What I think is best is if we carry forward our Gen 4 Uber list as a preliminary, unofficial Gen 5 Uber list, then decide on an individual basis whether each Pokemon deserves to be tested in standard play, based on how the changes introduced in the transition have affected the metagame, and whether they have made said Pokemon's Uber status less than obvious. There is a 99+% probability that all 670+ base stat Pokemon without any obvious drawback would remain obviously Uber, but I'd stop short at saying that this is an absolute certainty.

My concern was with the way the thread was going in the direction of suggesting that we decide our Gen 5 Uber list purely on decisions made in Gen 4, i.e. if it's definitely Uber in Gen 4, then it's definitely Uber in Gen 5. From your response to my last post, I take it that this is not going to be the case, so I have no problems here. I don't see how you've managed to deduce anything more than that from my previous post.

X-Act · Jun 1, 2009

All I want to say is that, for Gen. 5, I would certainly support a system that contains much less testing to determine the Uber tier. And the BL tier at that.

Caelum · Jun 24, 2009

This seemed like the most appropriate thread for this. In the Stage 3 currently going on right now, how is the situation of possible checks being handled in terms of voting? If my current understanding is correct, all of the suspects are getting their final votes all at once. Someone expressed possible issues about if Latios / Latias are voted uber, but Garchomp is voted OU under the assumption that there are checks in Lati@s for him; couldn't that potentially be a problem?

I don't think this would be a problem working under the assumption that the voters had played in the previous suspect ladders frequently to get that perspective, but I was considering for those players that didn't actively play in the previous suspect test and don't have that background experience. I thought a potential problem could be introduced because of that lack of background experience on the isolated ladders.

Leading to my point, should we have certain Pokemon voted on prior to others in the suspect list in stage 3 so the voter knows going into it that some checks may not exist? For example, Swords Dance Garchomp is often regarded as the most "dangerous" set and arguably the reason it was voted uber (without SD, I doubt it would even be a suspect but yeah) - that leaves Lati@s as potential checks for it always. Should those two Pokemon's status be decided prior so a voter going into the Garchomp vote knows he can't use the reasoning of Lati@s as potential checks. or Shaymin-S could possibly act as a check to Manaphy etc.

I'm not even sure this is entirely an issue, but I thought it was at least worth bringing up as a potential problem. Again, I don't think this is an issue if the voters played on the isolated suspect ladders; but (I'd assume because it's the "finale") that there will be more participants and less (by percentage anyway) of those who did play on the suspect ladders.

(if my assumption regarding the voting is incorrect, someone just delete this heh).

lilyhollow · Jun 24, 2009

http://www.smogon.com/forums/showpost.php?p=1988198&postcount=409

So to my understanding, if Latias and Latios were voted Uber while Garchomp remained OU, Garchomp would just be tested again with whatever other Suspects happened to remain.

Hipmonlee · Jul 23, 2009

I said I'd make a new thread where I'd argue for a simpler rating/deviation basis for our suspect tests, but it doesnt seem really worth its own thread. It's just rehashing arguments made elsewhere, and this seems like a good enough place for that.

But first I am going to go back to what I have said in this thread earlier.

Based on reading the uu testing thread, I have decided testing on the ou ladder like I have described probably wont work based on:

Finally, since they can't play with the suspect during this one month process, how are they supposed to know if its broken or not? The very idea of taking the suspects out compromises peoples opinions on the matter. Could you imagine if Jump and Aeolus made a post tomorrow saying "ok, to see whether or not you think Garchomp is broken, we are going to prevent you from using it"? That is what the current UU process is doing. Jabba and I propose removing this part of the process completely, since it is not only useless but it also goes against the entire idea of a suspect test.

So if we are testing by subtraction, and we are going to do so on the suspect ladder, it still doesnt really sense to just test not using a suspect. What I recommend doing is testing in a manner similar to the Garchomp test. Except without the deoxys thing. Basically we have all the suspects in OU, and all the suspects except the one we feel is probably the most broken on the suspect ladder. We require people to make qualification requirements on both ladders and then we let them vote.

I mean, just looking back, it seems like the Garchomp test was the most successful test. It had basically a year of pretesting, and it was overwhelmingly the most used pokemon in OU at the time, which helped. But it really did seem to go smoothly.. Just to remind people, the Garchomp test required passing qualification requirements on both ladders..

However, basically if we have a suspect ladder, then we have fewer voters. We got 54 in garchomp. I have some ideas for how we could get a few more, but these are more or less longer term things. I may get back to them later.

Also the suspect ladder raises the issues of artificiality and biases. I think having the OU ladder requirement mitigates these somewhat, for instance the argument that people who like the suspect metagame more are more likely to battle more on the suspect ladder, is improved somewhat by the fact that those people will have to battle a lot in OU. However, unless we have a double rating, or reset peoples OU ratings then people probably wont actually have to battle much at all on OU to qualify. Ultimately though, the suspect ladder is necessary and it is artificial. Perhaps other people have some suggestions but well based on what has gone on so far in this thread I doubt it.

Ok, also there are three other issues which have been raised, which I didnt cover in this thread, and we have two solutions to these issues which I am not a big fan of, so I am going to go over those now (IE the point of this post).

The first is bad faith voting. We are currently dealing with this with written paragraphs. I have a lot of issues with those, but I dont really feel the need to go into that at this point. Suffice it to say I think they are a bad idea.

Clearly we know that some people are going to vote with something other than having a rule set that creates a balanced metagame in mind. But honestly, I know enough about the people in this community to know that those people are an extremely small minority. To be fair, when we have a very small pool of voters, an extremely small minority can appear quite large, I dont think that with the exception of the Shaymin vote, they have had a decisive impact on our testing at all, or have even been close to having a decisive impact.

Now the shaymin test is a special case, I'm pretty sure there were people who would have been able to qualify for that test without battling at all in the testing month. Now for example, there is one person I know who qualified to vote for shaymin, who I asked if he really thought his vote truly represented his evaluation of shaymin's uberness and he replied no. But I also asked that had we specifically asked him to vote in good faith would he have voted the same way, and he replied no to that as well.

I dont think I will ever believe that if we explained to the community that when we ask them to vote we are asking them to protect the integrity of smogon, that 99% of them wouldnt vote based on smogon's philosophy.

The next problem is idiots who qualify to vote. Firstly, I think that the people who qualify for suspect tests are probably generally pretty smart, but that they maybe just dont understand exactly what we are asking of them. Basically I think that if we had a well written guide explaining the responsibilities of suspect voters, we should be able to trust them. And then we just need a couple of failsafe measures to fix any freak accidents that might occur. I mean, if we do make mistakes it really isnt the end of the world. I think that by trying to foolproof the system we have actually made it a lot weaker..

The other is the issue of tests where people refuse to test. This to me is a symptom of testing by addition rather than subtraction.

Firstly when you subtract you have a lot of time prior to the beginning of the test where people have experienced the suspect, which helps people hit the ground running during the testing period.

And teambuilding has become quite a big task over the years since rby. It takes a lot of time and effort to build a decent team, so I think people will try and avoid it if they can. Even when you have people eager to test well, if you add things instead of subtract you are asking people to do a lot of work that they probably will feel isnt really necessary, they can put teambuilding off til later and just use an old team. When you remove things from the ruleset, you kinda prevent that from happening. People have to build new teams because their old teams will be banned. And the suspect will be tested because it will already be in OU.

But even when you arent testing by subtraction, I think if you set your deviation requirements well enough you shouldnt need suspect experience qualifications. If you have say 1 in 5 people using the suspect (so it would be roughly top 10 in our latest suspect ladder), and you want people to battle the suspect at least 20 times in order to have a reasonable go at experiencing it (just a random number that makes maths easy) then set a deviation requirement that makes people battle 100 times.

And about people not using the suspect. Some people can really find suspects broken while also finding them completely inappropriate to use (like Skymin on a stall team). If you let these people battle how they want and qualify if they can and then vote even if they dont use the suspect, you still get a decent view of the impact of that suspect. Because if these peoples opinions are not reflective of the community then they can just be outvoted. If this is the case for the majority of people, and no one is able to convince them otherwise, then why not just accept their views. If they turn out to be godawful, then when that becomes obvious enough we can fix it..

The only issue I see with that, is an issue with our entire testing method, that ladder battling requires slightly different strategy to tournament battling or wifi battling or whatever, and our rule sets shouldnt be entirely based on laddering, this is another form of bias inherent in our testing method, and my only solution to this is to mention it in the guide I suggested and hope that people think about that..

Ok, so about trying to find more voters (note this only works with subtractive testing). I think one thing we could do, is use the top 50 on the suspect ladder, and by doing so guarantee having 50 voters (alts wouldnt count). This actually has another benefit that people wont stop battling after they qualify in case it means they drop off the bottom of the list. The other benefit of this is that essentially when people battle on ladder they do so to try and climb the ladder, I dont know enough about the ladder to really comment on this, but it seems like technically a different style of battling could be better for qualifying for suspect tests than it is for climbing the ladder. It is essentially adding more unnecessary artificiality into the test. But of course you need to have faith in your ladder. x-act has suggested that CRE is not ideal for pokemon, and has another method that hasnt yet been implemented, but if that is a lot better (and I really just have no idea) then that would be a nicer way of finding qualified voters.

But of course that doesnt work with testing on two ladders. So either they have to make the top 50 on both ladders, which means you end up with less than 50 voters, or they only need to make a rating requirement on the OU ladder (or perhaps the other way round, though that doesnt seem right) and then allow the top 50 voters out of all of those who made that requirement.

So yeah, what I would like: tests by subtraction (similar to how the garchomp test was done) and voters being decided entirely by battling ability and then just having faith that if we make it clear to people what we want them to do that they will do their best to do it.

Have a nice day.

cim · Jul 24, 2009

Interesting thoughts. I'll go into more detail later, but the Top 50 requirement has a flaw in that alternate accounts ruin the system.

Stage three and beyond.

Jumpman16

np: Michael Jackson - "Mon in the Mirror" (DW mix)

lilyhollow

Lemmiwinks MkII

X-Act

np: Biffy Clyro - Shock Shock

Caelum

qibz official stalker

lilyhollow

Hipmonlee

Have a nice day

cim

happiness is such hard work