What did we learn from tiering in generation four?

Cathy · Sep 12, 2010

I logged onto Smogon today and was disturbed to discover this thread, which proposes to ban a long list of Pokémon before the game is even released.

It's worth noting that the goals mentioned in the original post of that thread fly in the face of the ostensible philosophy of Smogon:

Smogon attempts to avoid bans as much as possible—only when it becomes very apparent that a Pokémon is far too powerful to be in line with a balanced metagame is it banished permanently from the standard arena.

They also fly in the face of Smogon's front page slogan: "Smogon is a Pokémon website and community specializing in the art of competitive battling." If the community chooses to accept the proposals in that thread, we will have to change the slogan to "Smogon is a web site dedicated to having a good time playing Pokémon with friends. No skarmbliss, no substitute, no vilopumes!"

This is not a mistake to be taken lightly. It is for all intents and purposes irreversible. As soon as you publish some long ban list, two things happen: (1) It develops a notion of certain Pokémon being intrinsically Uber or OU just because of their name. We saw this happen last generation not just with the standard metagame, but also with UU. It was eventually fixed with UU. However, the standard metagame resisted correction, largely due to tradition, and a lack of desire to challenge established, but arbitrary, norms. (2) It makes it all too easy to ban a host more Pokémon, because the power level for ubers has already been fixed at a point that is not significant for the new game.

It's easy to say that we will just recognise big shifts in the metagame and adjust this "preliminary ban list" accordingly. Experience shows that doesn't happen. It took years for it to happen with UU this generation, thanks to some dedicated people with a basic grasp of competitive games. It never happened with the standard game.

Before we ruin generation five prior to its release, let's consider an objective review of tiering in generation four. We all know the cliche about letting history repeat itself.

In June 2007, DP tiers are discussed seriously for the first time. There is no consensus in the thread. Some people want to ban fewer Pokémon than others. Obi and AA argue for handling Pokémon like a real competitive game and not hashing out a long ban list based on nothing. Let's keep one thing in mind. In this generation, there are already enough ubers that there is a very playable uber metagame. It's not close to being as balanced as standard, but it's playable. How many changes do you think it will require to make it reasonable balanced? Just sprinkling a few moves here and there, and maybe adding a couple more Pokémon at the highest echelons of power.

One argument repeated in that thread is not people will play ubers anyway. That misses the point. It may be possible to have a balanced game with far fewer Pokémon banned. People today talk occasionally about making a "balanced ubers", but if this can be done, it would be standard. This isn't just a linguistic argument over what to name the tiers. The tier that is identified as OU will get the most play, be the most explored, and generally be the focal point of competitive play.

Ultimately, the discussion in that thread was shut down with such characteristically solid arguments as "Can we please prevent the consensus that was building from being undone by a wall of text?" (Emphasis mine.) If you examine the thread, you will notice that there is nothing even vaguely resembling a consensus. Nonetheless, banning a long list of Pokémon early in the game's life went through and stuck until the end.

The power level defining ubers was never serious reconsidered. We have a duty as competitive players to explore that power level properly, especially in the face of a new game. I have seen a lot of posts by people so confidently stating that not much will change. This couldn't be more wrong. The truth is you have no idea what subtle changes to move pools, move stats (e.g. power, PP), and new Pokémon will have on the relative quality of Pokémon. It doesn't take much to shift the game significantly, and deciding a ban list in advance will effectively blind you to it.

This mistake, made early in the history of DP, laid the foundation for all of the tiering debates to come. It is a mistake that should have been avoided. Only banning broken Pokémon, after plenty of play experience, would have been years shorter than the process that actually ensued, and not tainted by doubts of illegitimacy.

By November 2007, Shoddy Battle 1 had ladder functionality. The Smogon arbitrary ban list had not changed in that time. Unfortunately, that said arbitrary ban list was already well ingrained, and any major change to it was impossible. Independent of Smogon, we (me, AA, obi, tenchi, and others) adopted a very minor testing scheme, involving a tournament to test Deoxys-S. One thing we learned from the tournament is that Swiss tournaments are too complex for most players in this community, at least without software support. More significantly, not a single person of the hundreds of people who had played in the tournament voiced a problem with Deoxys-S being unbanned.

Two weeks after the conclusion of the tournament, some notable Smogon members who were up to that point uninvolved with official server, were so excited by our unbanning of Deoxys-S that they asked if I could unban Wobbuffet immediately, without the benefit of another tournament. It turned out that Wobbuffet was the next item on our list anyway, but we mulled over whether another tournament was worth it. Ultimately, in light of the fact that the previous tournament had failed to convince anybody of anything, we decided to forge ahead and unban Wobbuffet. The backlash was intense. No one wanted to test Wobbuffet. In public, I defended our move, but in private, I was quite upset with AA. I had put in hundreds of hours of work writing a Pokémon simulator, which was extremely popular, and was the basis of competitive Pokémon on the internet at that point, and everybody hated me for some minor tier experimentation that wasn't even my idea. This was extremely grating.

I was so uspet by the backlash that I attempted to devise a statistical argument for banning Wobbuffet. Unfortunately, it couldn't be done. Barely anybody even used Wobbuffet on the ladder. You could play the game as though Wobbuffet did not exist, and you would only lose the occasional match. In effect, this was not a broken Pokémon, because it didn't affect how you constructed your team at all, as far as ladder play was concerned. This never changed for the entirety of Official Server.

The lesson learned here is that popular opinion cannot be ignored in tiering decisions. Strong feelings that a Pokémon is broken prevent it from being tested. In fact, the hatred for this Pokémon was so intense that any vote to ban it would have easily been by a 2/3 supermajority, and probably much more.

Smogon proper re-entered the tiering scene in March 2008. A process was devised to decide whether to ban Garchomp, a Pokémon that everybody knew was popularly disliked. The first Smogon attempt at a Pokémon banning system was the closest the commmunity has come to a good process. It was very simple. Anybody who met simple rating and deviation checks on the ladder got to vote on whether Garchomp should be banned. A supermajority should have been required to ban the Pokémon, but ultimately, the process was still quite good. It didn't even take that long, and it directly measured opinion among competitive players.

Unfortunately, things went very far downhill shortly after this. The next year was spent on entirely pointless "tests" because by its very design, so-called "Stage 2" was 100% pointless. Eventually, when Stage 3 rolled around, the results of Stage 2 were irrelevant.

Let's take a step back and think about the previous paragraph. A whole year was wasted by a process that was designed out of the box to be pointless. I want to make sure that is very clearly understood. Stage 2 was pointless. This is so important to understand because it is often bandied about that proper tiering processes take too long. In reality, poor decisions regarding the tiering process is what makes it take too long. Unfortunately, DP was a case of the latter. A sane process would have been similar to stage 3 from the start. Also important is that a sane process would have stopped at the design of the first test, and considered only a simple rating and deviation check.

Another way in which things went far downhill was the introduction of two extra metrics to filter the voter pool. First, voters had to submit "paragraphs" which were never published for public inspection, and which were arbitrarily used to decide who would vote. This measure alone ruined the system. Particularly ironic is the fact that in ruining the system, it was also made slower, and one big complaint is always how slow things are; this was the fault of the people making this complaint.

The second big mistake that was made around this time was the introduction of "suspect experience". This is a secret measure that no one except for three people know the definition of. We were told repeatedly that it was good, and useful, but of course, since we couldn't see it, we had no idea. At this point, the process was devastated. Voters were excluded on completely mysterious grounds, both through paragraph submissions and a top secret formula that was a terrible idea, and remains a terrible idea.

As previously mentioned, the next year was a complete waste of time, and it was wasted entirely by the same people who complain that the process took too long. No one wanted the terrible process they devised, including paragraph submissions and voodoo forumulas. Most people wanted a populist system like the first Garchomp test. This would have been the way to go.

The next substantative thing to happen wasn't until August 2009, when so-called "Stage 3" started. This represented a process similar to what the process should have been from the start. Particularly jarring was the way it had been designed to make the previous year's work useless. The flaw here was wasting the previous year; Stage 3 should have been the entire process. Stage 3 was still a mess though. My attempts to improve it slightly ended up wasting many dozens of hours of my time, and ultimately led to nothing, despite the large number of people who supported something along the lines I was proposing.

After Stage 3, things became even worse. After messing up immensely over the last year and a half, the wasted time was used as a reason to introduce an another bad process. First of all, after messing up so badly, there should have been a major leadership change in tiering policy. How does it make any sense that after messing up badly you get a second chance? We have plenty of people far more capable of handling tiering than the people who handled it this generation. We need people with special skills. People who not only enter tournaments, but place well in them. People who engage with strategy and the community in Stark Mountain. People who have contributed to site content more recently than two or three years ago. People who are capable of putting in the technical work required to make processes a reality. It's time for other capable members of the community to set the direction for tiering policy.

The Smogon Council was a very bad idea. When it was first mentioned it in #stark, I said in a private message that it was not even worth the time to argue with it, because no one would swallow it. Obviously, I was wrong. Smogon's culture of respect (people with status must be respected unconditionally) has prevented people from pointing out the obvious: that the smogon Council was the worst idea since suspect experience. The Council was not even faster than a simple vote based on a simple rating/deviation metric. The Council consists of people handpicked by two people in a process based on nothing tangible and with no oversight. It's effectively no different from those two people banning pokemon by fiat. It may be better than the previous process, but that's a low bar.

That brings us to today. Everybody knows the first process was a disaster. After all, the flaws with that first process are continually cited as the reason to introduce the council. This alone should raise eyebrows about the same people who designed that previous process having continuing influence on Pokémon policy. Although they don't realise it yet, they also messed up a second time with the "Smogon Council". Twice is more than enough chances. You may not agree with my personal position of not banning Pokémon before the game is released, but if there is one thing you should take away from the history of tiering in DP, it's that some new qualified people need to step up to the plate to spearhead tiering in the next generation. We should avoid banning things hastily. We have plenty of time to do it right. So long as we avoid developing a process as bad as paragraph submissions, top secret formulas, and other arbitrary delays and exclusions, we don't run the risk of wasting years this time. Such a working process is a simple vote with the only filter being a ladder statistic check.

The bottom line is that there is no justification for starting off the next generation with arbitrary bans. The DP ban list is already very long, and the next generation is only going to introduce more pokemon of a similar level of power, or revise older pokemon up to that level. Even the argument about saving time doesn't hold water, because, using a good process, we can balance the game far faster than was done this generation. The best process is a simple vote based on a completely open metric. This is efficient, fair, representative, and completely peer reviewable. Most importantly, we should not ban any pokemon without having played the game for a while.

obi · Sep 12, 2010

I absolutely agree with this post. I hear a lot about how we need to streamline the process, and that's why we can't go with a "start with no bans" proposal, but it seems like the most streamlined process we've used so far is Gen 4 UU, which was essentially a no initial bans strategy.

I completely agree that our tiering decisions should have no "secret sauce" (suspect exp, for instance), nor should we limit consensus-building by anointing people to the "suspect council" in the name of expediency, when a simple vote does not take too much time. An outline of a process that I think would be both more fair and rational is to have set timelines for voting. Pokemon are nominated to be voted on in the next voting period by whatever process, and then we hold voting open for a set amount of time, and then go on that. No additional ladders, no paragraph submissions, etc. Simple, transparent, and fair.

Most of all, I am against trying to legislate rules for a game that does not yet exist. It would be like banning Roar of Time on Smeargle in Generation 4 because it sets the game back 5 turns, or like banning Rhyperior because it literally has no weaknesses with Solid Rock, and that's broken. We should wait to see what the game actually has before we start deciding certain things are clearly broken.

JabbaTheGriffin · Sep 12, 2010

Let's take a step back and think about the previous paragraph. A whole year was wasted by a process that was designed out of the box to be pointless. I want to make sure that is very clearly understood. Stage 2 was pointless.

As the person who pretty much developed the suspect test process I feel the need to respond to this point. It was years since I did this and years since Stage 1 and Stage 2 were implemented so sorry if I'm a bit off on exactly what they were, but I believe my original plan was that Stage 1 was intended to test each Pokemon individually, this way you could judge each Pokemon's individual effect on the metagame and therefore determine in the confines of that metagame whether they were broken or not, since what we believed to be the proper thing at the time was that we'd have this base metagame and that's the accepted standard. Anything dropping down would have to fit in that standard.

Next came stage 2. My original plans for stage 2 were to test the pokemon that were deemed not broken in the confines of the "standard metagame" together. This was to be a quick test used mostly as a safeguard. ie say 2 individual pokemon are not broken but combined they break the metagame. It really should not have been longer than a month.

However these things got misconstrued and not properly implemented and Stage 1 was something weird and Stage 2 was actually what Stage 1 was supposed to be and we never tried the proper Stage 2. So then your argument in the scope of my plans pretty much says that Stage 1 was the waste of a year so I'll just stick with my terminology.

So it boils down to "we test all these individual pokemon do all this shit and then we test them all at the same time and then ban individually again." This is because Stage 3 was completely fucked up. I've complained about this many times but Stage 3 was insanely incorrectly implemented. The problem with my stage 1 and stage 2 was that say a pokemon was broken in the confines of the "standard metagame" but when pokemon were unbanned this pokemon now became unbroken. Stage 3 was developed by Jumpman as a safeguard against this. I assumed, much like Stage 2, this would be a quick process in which we just pretty much check to see if the situation changed at all. When I accepted this into the Suspect Test process I assumed what would happen is it would be another short testing period and at the end of this period we looked at the Pokemon that were considered broken before and go "did anything change for this Pokemon?" I tried talking to Jumpman about what I called "chain reasoning" where you would be forced to explain what has changed in the game for a Pokemon that was previously broken and only if a person could successfully chain reason for a Pokemon could that person claim the Pokemon was no longer broken. However the process turned into more individual votes on Pokemon and broke down into some insane process that could have been avoided.

I'd therefore say the problem was between the planning and the implementation process. What was developed seems somewhat sound on paper, at least as sound as any other process you could configure. However, the implementation was absolutely terrible. None of the stages were implemented in the way they were developed and the times I did say something I don't want to say I was ignored but that's pretty much what happened.

Anyway, I'd just like to throw in that I think the process that is developed can be used for Gen V, it just needs to be instituted correctly this time. I have reasoned before why starting off at something resembling Stage 3 is dangerous if called for I'd be glad to go dig up my posts (although since you contend we completely start fresh ie the way UU was done where there is no banlist then my process is useless as it was developed for process in which pokemon that were arbitrarily banned are being tested in the lower tier but I guess I just wanted to defend the process I created since I'm obviously a bit peeved it turned out this way because of bad implementation. so I guess in this parenthetical I'm just saying I wouldn't be opposed to starting with no banlist (outside a 670+ banlist maybe if that's what people what) and going from there in the same way uu was done but my process is still reasonable for use in dealing with an arbitrary banlist which you have to admit given our community is extremely likely to happen)

rory · Sep 12, 2010

"hey tiering expert, why is pokemon x already banned in gen V?"
"because it was REALLY GOOD in gen IV"

Like obi, I completely agree. Not only do we not know what pokemon will be in the games, we don't know if there will be any mechanics changing, which could have a serious effect on the game as well.

I see no reason to treat this like anything other than what it is: a new game.

Rising_Dusk · Sep 12, 2010

I applaud you for the creation of this thread. I breathed a sigh of relief when I saw that someone with the voice to speak and make people listen posted a thread that basically echoed my deepest concerns about Aeolus's other one.

First, though, before we can even consider a process to ban things the right way, we need to consider what we want to accomplish by banning things in the first place.

We need to identify the desirable standard metagame

One of my favorite threads on this site is a topic related directly to discussing what we need to do here. We need to decide what it is we actually want to achieve by tiering anything in the first place. Many people are comfortable with what we have, in principle. Maintaining the status quo is a naive approach to tiering, but at the same time, it has shown us the kinds of things we want.

As Doug rightly discusses in that topic, there are several characteristics that we want, and one of them is variety. We have learned from DPP Uber that if we allow certain vastly dominant Pokemon, namely the 670+ BST monsters, that the entire metagame revolves around them with interspersed lower Pokemon present who can niche respond to them well. I think everyone here can agree here that the desirable standard metagame is one that is plentifully diverse, and allowing those Pokemon inherently goes against such an idea.

That is why I support the 670+ BST Pokemon as being the only Pokemon that we initially ban to Ubers. Because they go against the principles of what we, the players, perceive as our desirable metagame, they should be removed. I think I can safely speak for basically every competitive Pokemon player on this website in making that claim.

Cathy said:
We should avoid banning things hastily. We have plenty of time to do it right. So long as we avoid developing a process as bad as paragraph submissions, top secret formulas, and other arbitrary delays and exclusions, we don't run the risk of wasting years this time. Such a working process is a simple vote with the only filter being a ladder statistic check.

Absolutely why I made my post in the other thread. Notice how absolutely no Pokemon is banned from the get-go in my list except those that go against what we desire in our metagame. Perhaps people would like Wobbuffet to be banned because he creates some undesirable features in the metagame (an argument is that he removes skill from the game because he inhibits an entire and major skill-based mechanic of the game). Either way, we should strive for as few, if any, initial bans for the new game. Maybe my post in that thread is more than we really need; I'm open for discussion on the matter. I strongly feel that we should not ban any more Pokemon than I have listed, though, or we are going to run into the problems that Cathy outlined in the first post.

I also think that Aeolus is getting the picture in his OP. Besides his rather extensive list of banned Pokemon, he wants us to instate the idea of a "quick ban" for things that are broken as they come early on in the generation based upon those principles that we've decided are desirable in our metagame. This idea of a quick ban, with proper cultivation, is right up the alley of what you want to see judging from your post, Cathy, and exactly what I'd like to have.

We want a simple process

None of this gimmicky paragraph crap. I don't like and have never liked that a few people determined who got to vote, even if I did trust their judgment more or less to be fair about it. I didn't understand that SEXP thing when I first learned of it and still don't care for it. We need to eliminate all of these things from our process. Our process must be simple.

Going off of the idea of a "quickban", basically this means that:

People play the metagame

No gimmicks. You play the game. It doesn't matter how much you use the Pokemon we're trying to test or how often you play against it, because you are playing in a game defined by all of its parts. No matter what specific team you are using against whatever team your opponent is using, it was influenced by the presence of the Pokemon in question. In this regard, all you do is play the game and win matches. That's it.
People get "high enough" ratings on the ladder

High enough would be some arbitrarily high rating. The reason we even need them to meet this rating for is so as to guarantee their battling competency. We do not want random people who cannot play properly and do not understand the game voting. If there is a problem with the rating system such that anyone would not trust a user's vote as competent and relevant with only meeting this threshold, then that means that our rating system needs to be improved upon. What that arbitrary threshold ends up being doesn't matter here, we can sort it out later. What matters is that the only requirement for voting is meeting this threshold. You don't have to use the Pokemon, you don't have to write paragraphs, you only have to play the game enough to do well, and then you can voice your opinion in the voting process.
These people, all of them, vote in a public forum

They vote on Pokemon. That's it. If you think about this, the process of tiering multiple Pokemon shouldn't take more than a month. Most of that time is playing the game. The formality of the vote is a trivial and almost insignificant period of time in the grand scheme of things.

This, as I outlined above, is how I envision the ideal tiering process occuring. I like what Aeolus suggested in the other thread of only considering bans when "disturbances" occur in the metagame. We don't need some rolling ban process that constantly allows people to ban Pokemon from the game. We only need to raise the flag of the tiering process when something happens that causes a disturbance in the metagame. I will carry over Aeolus's definition of a disturbance here, because I completely agree with it.

A disturbance is defined as when a new discovery, innovation, or Nintendo release dramatically alters the game by substantially improving the effectiveness of a Pokemon in a tier other than Uber.

If we expand upon this idea and then elaborate upon the conditions of when a Pokemon causes a disturbance, then we can identify when something is problematic as it occurs. Some people raised concerns about it and how sometimes a Pokemon is just not making the metagame a better place despite that it's been allowed forever. A lot of people, for instance, felt that DP Garchomp was broken for a long time even before the discovery of the Yache Berry set. I agree with them. We should refine the definition slightly, and I think in the following way based on the desirable metagame characteristics Doug outlined in his thread:

A disturbance arises when the presence or qualities of a Pokemon dramatically and negatively impact the desirable characteristics of any metagame other than Uber.

When written like this, all of the Uber characteristics we currently have and all of the characteristics of the metagame we find desirable are factored in. For example: If we as a community feel that BW Garchomp at some point is constantly forcing our metagame to be too centralized and not diverse enough like DP Garchomp did, we may propose a simple test of it as I outlined above using the Offensive Uber characteristic as the explanation for why it is causing the impact it causes. The great thing about this, though, is that we can just automatically pull the ratings off of the official server and generate our list of voters on the spot. No radical policies, no intentionally and forcibly centralized Suspect ladder on the server, nothing. We pull the ratings on the spot and hold a vote. The tiering process takes less than 2 weeks.

Hopefully I've been able to convey with this post how I envision the ideal tiering process working in Gen V. I also hope that with this topic existing, others who feel the same way will stand up and chime in their thoughts for a simpler, more efficient, less biased process. If there were ever a time to fix the tiering process, the onset of a new generation is the time to do so.

lati0s · Sep 12, 2010

Rising_Dusk said:
That is why I support the 670+ BST Pokemon as being the only Pokemon that we initially ban to Ubers. Because they go against the principles of what we, the players, perceive as our desirable metagame, they should be removed. I think I can safely speak for basically every competitive Pokemon player on this website in making that claim.

No you cannot. Ubers is not nearly as un-diverse as many people tend to think, according to august stats there were 35 pokemon in ubers with over 3% usage compared with 46 in OU, while this is significant it is likely due inpart to the fact that the playerbase is smaller and more experienced, the same effect can be seen by looking at the stats for suspect ladders. Also when gen 5 releases several more high BST legends the diversity will almost certainly increase. I support starting gen5 OU with no bans and working from there.

cim · Sep 12, 2010

Unfortunately, things went very far downhill shortly after this. The next year was spent on entirely pointless "tests" because by its very design, so-called "Stage 2" was 100% pointless. Eventually, when Stage 3 rolled around, the results of Stage 2 were irrelevant.

Let's take a step back and think about the previous paragraph. A whole year was wasted by a process that was designed out of the box to be pointless. I want to make sure that is very clearly understood. Stage 2 was pointless. This is so important to understand because it is often bandied about that proper tiering processes take too long. In reality, poor decisions regarding the tiering process is what makes it take too long. Unfortunately, DP was a case of the latter. A sane process would have been similar to stage 3 from the start. Also important is that a sane process would have stopped at the design of the first test, and considered only a simple rating and deviation check.

Words cannot express how glad I am that I am no longer the only person saying this.

We spent an entire year on a pointless part of the test that literally accomplished nothing, and now we claim any testing process is doomed to be long just because an idiotic process was used last time that made battlers do nothing for an entire year. The process can easily be made more efficient, and I hate this whole "testing is inherently slow / flawed / inefficient" argument that keeps going around.

With no efficiency improvements, testing every Suspect with Stage 3 alone exactly as is despite its fatal flaws would have taken 7 months. Cut a few more bits of the process down and speed it up some more and you could easily bring that to 4-6 months.

After Stage 3, things became even worse. After messing up immensely over the last year and a half, the wasted time was used as a reason to introduce an another bad process. First of all, after messing up so badly, there should have been a major leadership change in tiering policy. How does it make any sense that after messing up badly you get a second chance? We have plenty of people far more capable of handling tiering than the people who handled it this generation. We need people with special skills. People who not only enter tournaments, but place well in them. People who engage with strategy and the community in Stark Mountain. People who have contributed to site content more recently than two or three years ago. People who are capable of putting in the technical work required to make processes a reality. It's time for other capable members of the community to set the direction for tiering policy.

I'm going to completely throw myself under the bus and agree with this entire paragraph. I don't feel the leadership in Generation 4 should extend to Generation 5. I think Pokemon battlers that are actually active in tournaments, on the ladder, or in competitive Pokemon in general would be better, but this is yet another reason for such a change.

Thank you Cathy for bringing everything I ever possibly wanted to say to the table in such an eloquent way.

lilyhollow · Sep 13, 2010

This thread is just going to continue the now years-long cycle of bickering over surface-level issues. I feel like people have been failing to address the actual core problem for literally years now, which is that we don't even know what all of our objectives are in designing metagames in the first place.

First of all, I want to throw out whatever issues with "time-management," or "stability," that anyone may be tempted to bring up for whatever reason. We are apparently planning to have a relatively short testing period this time around with or without an initial banlist, so, thankfully, none of that seems to matter at this point. I actually cannot express how happy it makes me that these obscuring factors have been dealt with, because now we're really just left with one question to answer: what do we want our metagames to achieve?

I actually think it's pretty important to make the distinction here that that is the pertinent question here. Everything else comes afterwards-- including the question of whether or not we want to have an initial banlist for Gen 5. The reason that people even support an initial banlist at all is that they hate the idea of having "Ubers-lite" as their "Standard" metagame. With the obscuring factors of "stability" and "time-management" done away with, that is the only reason I can think of for an initial Gen 5 banlist to exist. People have this idea that "Having our main metagame be one rife with over-the-top threats like Mewtwo, Lugia, and Kyogre is bad." They may or may not be correct, but what's important here is that this wouldn't even be an issue in the first place if, for example, we didn't have a purportedly "main" metagame to begin with. Or if we did have a "main" metagame, but it wasn't necessarily the "first balanced metagame." Or if metagames were not necessarily dependent on one-another at all! All of these aspects of our current tiering system have seemingly completely avoided scrutiny for probably years now. As in, nobody has even talked about them except maybe short little conversations on IRC, or maybe me whining periodically in Policy Review or something. At the very least, we need to reaffirm that we are indeed working with what we believe to be the ideal method of tiering. I personally do not believe that we are, but if I'm in the minority then we should at least be able to explicitly justify the current system, making it crystal clear that our tiering decisions from here on out are going to be made based on stable and reliable groundwork. I honestly believe that this entire banlist issue could be completely solved if we looked at our tiering system with a more critical eye and made some key changes. I'll continue to pursue this as a potential solution until it either happens, or it is clarified that our current system is definitely the one we will be committing ourselves to next gen.

I do understand that most people are probably arguing under the assumption that we're just sticking with the same system as always, though. In that case, we're asking ourselves "is it a good idea to have our 'OU' metagame-- the metagame that we purport to be the "main," "most popular" metagame, and on which all other metagames are dependent-- be one rife with over-the-top threats like Mewtwo, Lugia, and Kyogre?" Do we want a dependent "UU" metagame that most players identify with "OU"? Are these things perhaps more confusing and bothersome than a long banlist of Pokemon that even casual players will mostly identify with as "Legendary" anyway? Could this even have a negative impact on the popularity of Smogon, or competitive Pokemon itself? And really, is "philosophical purity," or a resemblence to more "traditional" competitive games, really all that important? I think these are all legitimate qualms with the "start with no banlist" standpoint that need to be addressed.

Really though, I feel everybody would be happiest if we merely stopped promoting the notion of a "Standard" metagame on which all other metagames are dependent. If I had my way, we'd probably have one metagame constructed with "no banlist" as an initial benchmark, one that resembles Aeolus', and then others based on what happens to excite players the most (probably one that resembles "traditional UU" in some way, and a VGC one or something). I feel like many, such as you yourself, Cathy, like the elegance of the "one main metagame which also serves to indirectly define the metagames below it" structure. It doesn't seem terribly compatible with your call for "philosphical purity," though, if your opposition (which has been articulated very thoroughly here) is to be believed in their claims that such a direction would actively lower Smogon's credibility/popularity.

Cathy · Sep 13, 2010

I am not attempting to define a comprehensive policy for the next generation in the opening post of this thread. A variety of directions are possible, including unrelated metagames being developed at the same time. I've even suggested this myself a year ago in #stark, and I'm quite supportive of it. It's not a core part of the opening post.

What I am advising against is particularly bad things that could be done, such as rushing to make decisions either before the game is released (which is a bad idea regardless of where you are coming from), or shortly after, or using paragraph submissions or mystic super top secret formulas, or using an elite council handpicked by two administrators, or those kind of things. Those things are all terrible (see the first post of this thread for details). You may say it's too early to discuss those things, but experience shows that failure to address these issues from the get go renders them impossible to change later on Smogon. So whichever direction we go with tiering, let's purge ourselves of some particularly bad ways to do it.

I don't agree these are surface issues. In fact, to use your term, they obscure the real issues. So getting them out of the way is necessary before we can discuss the details.

Colonel M · Sep 13, 2010

Honestly a lot of what Cathy and R_D has said is my thoughts. I have hated the "hidden" factors of the suspect process, and the concept of it all is probably why I've hated OU so much to this day. I understand that Jabba's point of Stage 2 was perhaps a different perspective, but ooh man do I agree that Stage 2 was a big "?" process in general.

lilyhollow · Sep 13, 2010

I am responding to the part of your post that opposes the idea of forming a long banlist for OU before Gen 5 is even released. I think that if we have significant reason to believe that "Mewtwo in 'OU' + Starmie in 'UU' = bad community reception," it would make a lot of sense, under our current tiering system, to ban Mewtwo right off the bat--even before the game is released-- and just let him wallow in the obscurity of Ubers. I think that this says more than anything that we need to take a good long look at our tiering system, but if people are opposed to doing that, I think the "initial banlist" position is at the very least defensible.

In the past I have had huge problems with creating an initial banlist, but if that initial banlist can be expected to become and remain stable quickly, ultimately I am less concerned with philosophical purity or resemblance to traditional competitive games than I am with popularity (or other measures of 'community strength,' like loyalty to the game). Since our approach this gen seems to be one where a stable banlist will indeed be formed relatively quickly (regardless of its initial benchmark), my stance on the issue is that the banlist which can be expected to be better-accepted by the community in the long run is the most reasonable one to adopt. I think it's impossible to determine which one would actually be better-accepted, which is why I consider the "Ubers Lite vs. Traditional OU" debate to be a surface issue, with the core issue lying in the structure of our tiering system itself. But again, if no one is interested in scrutinizing our tiering system, current attitudes lead me to believe that it would more likely be one with an initial banlist.

I agree with most of the other things in your post though, particularly your assessment of Stage 2.

Erazor · Sep 13, 2010

I do agree with a few points, but with regards to the council -

I think the council will work fine if it is revamped a little. Instead of having two people choosing the members, it would be the whole of Policy Review. The size could be increased to 13 or something. And requirements for the council? Nothing but an appropriate ladder rating, and metagame activity.

Chou Toshio · Sep 13, 2010

This:

Smogon attempts to avoid bans as much as possible—only when it becomes very apparent that a Pokémon is far too powerful to be in line with a balanced metagame is it banished permanently from the standard arena.

and the system built around it are products of mid-4th-gen thinking, not some ultimate philosophy Smogon has always held.

What we learned from 4th Gen is that Tiering ultimately comes down to preferences no matter how you dress it-- it's subjective not objective. A ban list made by a picked council is just as "legitimate" as one made after months and months of testing, as long as the resulting metagame is playable, competitive, and popular.

We learned that while it's all fun and well to dress up lofty philosophies, it's another to face the realities of an exhausted over-worked testing staff, months of tedious labor, and a ban list that has been only solidified (sort of) at the ending of a generation.

Yeah, I agree we learned what not to do again from Gen IV.

Edit:

I should note that I would be completely up for a "start from no bans" and the use of voting to decide tiering, under the following:

1) The first rounds of votes would be held 1 week after the 5th Generation Simulator went up (so the masses could ban Kyogre, Mewtwo etc. etc. without too much pain and suffering).

2) ALL players were allowed to vote without chapter writing or hand-picking them.

This process would be fast, painless, and I'm confident would ultimately result in a playable metagame. I really don't think Smogon's Elitism needs to come into the process at all-- but if a council system is more likely to be approved and forwarded (which I would assume) I will support that.

Ie. It's going to come down to people's subjective preferences either way-- I'm in favor of the way that gets it done the fastest, with the least amount of merda (ie. testing that really has no real meaning or value) in between.

You all know that every player in this community (except maybe obi, maybe) already has pre-conceived notions about what an OU metagame should be like, and that these notions will ultimately give us our resulting ban list. Whether it is a group of 13 hand picked elites or the mass community with a week of play-experience, or high ranked players over 3 years of testing, the resulting lists will be equally legitimate. It all comes down to preferences anyway.

With no efficiency improvements, testing every Suspect with Stage 3 alone exactly as is despite its fatal flaws would have taken 7 months. Cut a few more bits of the process down and speed it up some more and you could easily bring that to 4-6 months.

I will try to say this respectfully (out of respect)-- but while years is ridiculous, over half a year is still really long for a process that to me, is inherently unimportant (testing that is). You can make it more efficient like this:

Make a list in a week. If bad shit comes up, ban it. It's that simple.

Rising_Dusk · Sep 13, 2010

Blame Game said:
I am responding to the part of your post that opposes the idea of forming a long banlist for OU before Gen 5 is even released. I think that if we have significant reason to believe that "Mewtwo in 'OU' + Starmie in 'UU' = bad community reception," it would make a lot of sense, under our current tiering system, to ban Mewtwo right off the bat--even before the game is released-- and just let him wallow in the obscurity of Ubers. I think that this says more than anything that we need to take a good long look at our tiering system, but if people are opposed to doing that, I think the "initial banlist" position is at the very least defensible.

What you say here makes a lot of sense and is something I totally agree with. Evan put this in a very eloquent way during an #is discussion on the matter last night, and that was: "there is a lot of inertia going for a metagame without the Mewtwo, Ho-Oh, or Kyogre of every generation". Not to pretend that purity is a major focus of my agreement with Evan, but that Nintendo constantly bans these Pokemon from its own in-game endeavors, like the Battle Tower, only adds to this inertia. People ultimately want a stable metagame that appears like our current metagame because it is what we've always had and is what Nintendo has always offered us when it comes to singles. (Doubles are a completely different ballgame)

I don't feel that it is a failure of our current tiering policy to be inclined to tier generation V in the same manner as all previous generations in that regard just because we wasted a lot of time trying to handle the more difficult cases. As ChouToshio and Aeolus and many others have suggested, whatever metagame we wind up with will be just as playable as any other. Regardless of this fact, we do need to improve our tiering process to avoid such travesties as stage 2. A lot of generation IV was Smogon trying to learn how best to handle these things, and we learned very harshly that what we were doing was nonsensical and a waste of time. Cathy is right about this on all counts, no matter how much any one of us feels responsible for and therefore accountable for their participation in those tests. Now we need to move on and try to improve the process.

That isn't to say that I don't support initial bans, though. We can wait 4 days or whatever until the game comes out if we so please and if it makes a few people a lot more happy, but no matter what, we are going to be faced with the decision of an initial ban list. Because of that inertia and because of what people ultimately want, we by all rights should keep that the focus of our initial tiering policy. Remove the Mewtwo, the Ho-Oh, and the Kyogre of all generations from play, and then let the dust settle for what remains. That isn't to pretend that we should always listen to the status quo, but in this case, we have an obligation as the premier competitive Pokemon website to at the very least create a desirable metagame that most people want to play in as our "Standard" metagame. If we don't create it, someone else will, and that's only because that's what they want to play.

ChouToshio said:
I will try to say this respectfully (out of respect)-- but while years is ridiculous, over half a year is still really long for a process that to me, is inherently unimportant (testing that is). You can make it more efficient like this:

Make a list in a week. If bad shit comes up, ban it. It's that simple.

I would very much like any banning we do to be based on irrefutably extended periods of battling and metagame development / centralization. A week is not enough time to validate one's decision to ban something from a tier, with repercussions for the entire competitive Pokemon community at large. We have a moral obligation to attempt to do tiering with as much competitive backing as possible. 6-7 months for tiering every Pokemon in Gen V, while not short, sounds like a "good" duration of time for me for us to actually be able to competitively back whatever bans we enforce.

Aeolus · Sep 13, 2010

I agree with the process suggested in this post. If the past is any indication of what is to come in the future, I believe pretty strongly that a beginning banlist of the 670+ pokemon and Wobb/Wynaut is necessary to give us a game that we want to designate as our primary standard.

Rising_Dusk said:
I applaud you for the creation of this thread. I breathed a sigh of relief when I saw that someone with the voice to speak and make people listen posted a thread that basically echoed my deepest concerns about Aeolus's other one.

First, though, before we can even consider a process to ban things the right way, we need to consider what we want to accomplish by banning things in the first place.

We need to identify the desirable standard metagame

One of my favorite threads on this site is a topic related directly to discussing what we need to do here. We need to decide what it is we actually want to achieve by tiering anything in the first place. Many people are comfortable with what we have, in principle. Maintaining the status quo is a naive approach to tiering, but at the same time, it has shown us the kinds of things we want.

As Doug rightly discusses in that topic, there are several characteristics that we want, and one of them is variety. We have learned from DPP Uber that if we allow certain vastly dominant Pokemon, namely the 670+ BST monsters, that the entire metagame revolves around them with interspersed lower Pokemon present who can niche respond to them well. I think everyone here can agree here that the desirable standard metagame is one that is plentifully diverse, and allowing those Pokemon inherently goes against such an idea.

That is why I support the 670+ BST Pokemon as being the only Pokemon that we initially ban to Ubers. Because they go against the principles of what we, the players, perceive as our desirable metagame, they should be removed. I think I can safely speak for basically every competitive Pokemon player on this website in making that claim.

Absolutely why I made my post in the other thread. Notice how absolutely no Pokemon is banned from the get-go in my list except those that go against what we desire in our metagame. Perhaps people would like Wobbuffet to be banned because he creates some undesirable features in the metagame (an argument is that he removes skill from the game because he inhibits an entire and major skill-based mechanic of the game). Either way, we should strive for as few, if any, initial bans for the new game. Maybe my post in that thread is more than we really need; I'm open for discussion on the matter. I strongly feel that we should not ban any more Pokemon than I have listed, though, or we are going to run into the problems that Cathy outlined in the first post.

I also think that Aeolus is getting the picture in his OP. Besides his rather extensive list of banned Pokemon, he wants us to instate the idea of a "quick ban" for things that are broken as they come early on in the generation based upon those principles that we've decided are desirable in our metagame. This idea of a quick ban, with proper cultivation, is right up the alley of what you want to see judging from your post, Cathy, and exactly what I'd like to have.

We want a simple process

None of this gimmicky paragraph crap. I don't like and have never liked that a few people determined who got to vote, even if I did trust their judgment more or less to be fair about it. I didn't understand that SEXP thing when I first learned of it and still don't care for it. We need to eliminate all of these things from our process. Our process must be simple.

Going off of the idea of a "quickban", basically this means that:

People play the metagame

No gimmicks. You play the game. It doesn't matter how much you use the Pokemon we're trying to test or how often you play against it, because you are playing in a game defined by all of its parts. No matter what specific team you are using against whatever team your opponent is using, it was influenced by the presence of the Pokemon in question. In this regard, all you do is play the game and win matches. That's it.

People get "high enough" ratings on the ladder

High enough would be some arbitrarily high rating. The reason we even need them to meet this rating for is so as to guarantee their battling competency. We do not want random people who cannot play properly and do not understand the game voting. If there is a problem with the rating system such that anyone would not trust a user's vote as competent and relevant with only meeting this threshold, then that means that our rating system needs to be improved upon. What that arbitrary threshold ends up being doesn't matter here, we can sort it out later. What matters is that the only requirement for voting is meeting this threshold. You don't have to use the Pokemon, you don't have to write paragraphs, you only have to play the game enough to do well, and then you can voice your opinion in the voting process.

These people, all of them, vote in a public forum

They vote on Pokemon. That's it. If you think about this, the process of tiering multiple Pokemon shouldn't take more than a month. Most of that time is playing the game. The formality of the vote is a trivial and almost insignificant period of time in the grand scheme of things.

This, as I outlined above, is how I envision the ideal tiering process occuring. I like what Aeolus suggested in the other thread of only considering bans when "disturbances" occur in the metagame. We don't need some rolling ban process that constantly allows people to ban Pokemon from the game. We only need to raise the flag of the tiering process when something happens that causes a disturbance in the metagame. I will carry over Aeolus's definition of a disturbance here, because I completely agree with it.
A disturbance is defined as when a new discovery, innovation, or Nintendo release dramatically alters the game by substantially improving the effectiveness of a Pokemon in a tier other than Uber.
If we expand upon this idea and then elaborate upon the conditions of when a Pokemon causes a disturbance, then we can identify when something is problematic as it occurs. Some people raised concerns about it and how sometimes a Pokemon is just not making the metagame a better place despite that it's been allowed forever. A lot of people, for instance, felt that DP Garchomp was broken for a long time even before the discovery of the Yache Berry set. I agree with them. We should refine the definition slightly, and I think in the following way based on the desirable metagame characteristics Doug outlined in his thread:
A disturbance arises when the presence or qualities of a Pokemon dramatically and negatively impact the desirable characteristics of any metagame other than Uber.
When written like this, all of the Uber characteristics we currently have and all of the characteristics of the metagame we find desirable are factored in. For example: If we as a community feel that BW Garchomp at some point is constantly forcing our metagame to be too centralized and not diverse enough like DP Garchomp did, we may propose a simple test of it as I outlined above using the Offensive Uber characteristic as the explanation for why it is causing the impact it causes. The great thing about this, though, is that we can just automatically pull the ratings off of the official server and generate our list of voters on the spot. No radical policies, no intentionally and forcibly centralized Suspect ladder on the server, nothing. We pull the ratings on the spot and hold a vote. The tiering process takes less than 2 weeks.

Hopefully I've been able to convey with this post how I envision the ideal tiering process working in Gen V. I also hope that with this topic existing, others who feel the same way will stand up and chime in their thoughts for a simpler, more efficient, less biased process. If there were ever a time to fix the tiering process, the onset of a new generation is the time to do so.

I think an important thing to realize is that no process will be perfect and there are legitimate complaints and criticisms of every method of tiering along the spectrum of complete democracy to absolute autocracy. We started out Gen 4 with an almost completely democratic process, but that was eventually eschewed because, despite high ratings, some people demonstrated an unwillingness to actually evaluate a Pokemon based on its effectiveness and decided to vote one way or the other for nonsense reasons (it's legendary, or Nintendo let's me use it, etc.). At all points, emphasis was placed on voters being people who had above average expertise of the game though the actual methods of choosing those players evolved over time to attempt to combat issues that arose as testing progressed.

If we decide to return to a completely democratic process, we should realize that the same things will happen this time around that happened last time. I'm hopeful that the level headed population will outnumber the people who vote on unhelpful bases and the results will be acceptable, but one cannot be sure about that. I'd personally prefer a vote of Policy Review people or vote of a council selected by Policy Review people over a vote of the people who populate Stark general.

Chou Toshio · Sep 13, 2010

Rising_Dusk said:
I would very much like any banning we do to be based on irrefutably extended periods of battling and metagame development / centralization. A week is not enough time to validate one's decision to ban something from a tier, with repercussions for the entire competitive Pokemon community at large. We have a moral obligation to attempt to do tiering with as much competitive backing as possible. 6-7 months for tiering every Pokemon in Gen V, while not short, sounds like a "good" duration of time for me for us to actually be able to competitively back whatever bans we enforce.

Please elaborate on:

-Definition of a "valid" decision

-What horrific "repercussions" could come

-What you mean by "moral obligation," considering I see no point at all at which ethics have any relevance with this discussion, or how one process could be more "morally correct" than another for categorizing game characters.

Aeolus said:
If we decide to return to a completely democratic process, we should realize that the same things will happen this time around that happened last time. I'm hopeful that the level headed population will outnumber the people who vote on unhelpful bases and the results will be acceptable, but one cannot be sure about that. I'd personally prefer a vote of Policy Review people or vote of a council selected by Policy Review people over a vote of the people who populate Stark general.

I cannot help but be skeptical of this lack of trust in the community. While there are certainly many bad players or players with skewed ideas of what the metagame should be, we have to remember that this is Smogon and those who come here (and stay here) are at least partly indoctrinated by our community culture. This is where we "grew up" as pokemon players. While I'm sure if you opened an open vote on shoddy, where players from all over the internet congregate, we could see some very odd decisions going (I never cease to be amazed how often I see Cress and Electavire when I play on Pokemon Online), but here with members who stick with this forum, the vast majority of members here consider themselves Smogonites, and have respect for the staff and are ingrained in our culture. They understand the expectations of what we will call OU, and would (on average) chime in well on it.

While I may not have faith on a small group of randomly pulled members, I would have a great deal of faith in our members as a collective pool.

Ultimately, it is the community to which this metagame belongs, and considering tiering is all about preferences at the bottom line-- Even if the results were to be hugely different (which I doubt) from what we would expect, who is to say that the result might not be what actually is most desirable?

obi · Sep 13, 2010

I would argue that we'd need more time than 4 days to determine whether Arceus, Dialga, Giratina-A, Giratina-O, Ho-Oh, Lugia, Mewtwo, Palkia, Rayquaza, Groudon, Kyogre, and the at least 3 new "top tier legendary" Pokemon of Generation 5 can balance out enough with each other in addition to all of the other Pokemon with weather-related abilities (just an example, maybe they'll not be that good, I don't know) and many "base 600" Pokemon, plus the random Pokemon here and there that have other good redeeming qualities to create a varied tier. Consider that all of the Pokemon I mentioned by name (plus the minimum of 3 new Pokemon) are 30% the size of OU alone, and those are just the base 670+ Pokemon.

In other words, if 5.7% of all of the Pokemon (assuming we get 600 Pokemon total next generation) in the next generation are usable, that's already as much variety as we have currently deemed acceptable in Gen 4 OU. Let's not forget that the strongest Pokemon tend to have variety in their move pool and all-around good stats, so an individual Pokemon (such as Arceus) can actually give more variety than something like Milotic. The argument that certain Pokemon are obviously bad and will be broken by some measure suddenly stops being so obvious.

Rising_Dusk said:
This, as I outlined above, is how I envision the ideal tiering process occuring. I like what Aeolus suggested in the other thread of only considering bans when "disturbances" occur in the metagame. We don't need some rolling ban process that constantly allows people to ban Pokemon from the game.

I do agree that we need a way to not ban stuff, too. However, I would use the exact same process I suggested earlier for banning. Create some sort of nomination process (the details are too important to me). You can nominate to ban a Pokemon, unban a Pokemon, or do nothing. If the do nothing nominations have a sufficient number (another detail to be decided later), then we simply do not vote to ban anything that time. I would be OK with having the time between votes get larger some point after we've been playing the game for a while (and then shorten again after the release of a new game), under the assumption that we'll have less need to change things when the game is constant.

In other words, we use the nomination process to determine when a "disturbance" is large enough to change things.

Cathy · Sep 13, 2010

Aeolus said:
If the past is any indication of what is to come in the future, I believe pretty strongly that a beginning banlist of the 670+ pokemon and Wobb/Wynaut is necessary to give us a game that we want to designate as our primary standard.

Although banning a smaller list of pokemon arbitrarily is a step in the right direction, there is actually no reason to ban those pokemon either. It's not "necessary" to create a "primary standard" before it is known what that standard should look like based on playing the game. In fact, there are excellent reasons for not doing so, which I already outlined in my first post.

Banning only pokemon whose base powers exceed a particular value is indeed objective. Banning all pokemon whose names begin with S or G or contain exactly six letters is also objective. Objective doesn't impart any measure of quality, and banning pokemon based on literally nothing is a bad idea if we want to call ourselves a competitive community. Regardless of whatever banning philosophy you subscribe to, assuming it's a sensible one, any bans will have to be based on the game.

Psychic powers run high when people suggest that a certain ban in a game that isn't out yet is so obvious as to be worth discussing not just before playing the game, but before the game is released. It's one (dubious) thing to ban arrays of pokemon based on "theorymon", but it's quite another thing to channel spirits and divine: exactly what old moves have been given to existing pokemon; what new moves have been created and what their effects are and how relevant these effects will be to the game; which pokemon have been given these new moves; how the stats of old moves (PP, power, and accuracy) have been altered and how this affects the game; which new pokemon have been created and what their stats, move pool, and abilities are; which new abilities have been created and what their effects are and how relevant these are to the metagame; which old pokemon have been given these new abilities as a second choice; which subtle mechanics changes to existing moves, abilities, and battle procedure have been instated and what their effects on their games is; and any number of other things; and then, after having this information delivered to you through a crystal ball, synthesise it so quickly to be able to declare certain things are obvious or necessary. This is an impressive feat, but not one that a competitive community can indulge in.

The individual effect of each of these changes may be small, but no matter how many years you have been playing pokemon there is no way, barring an inside contact at GameFreak, to assess the cumulative effect of all these unpublished changes and additions on the metagame, and on whether certain pokemon are so "obviously" broken. These analysis have to be based on the game itself.

It is well worth the time investment to explore this game properly. We are looking at a month or two to possibly dispose of pokemon that psychic powers predict should be banned. The worst case scenario is that these the clairvoyants were right and we used up two months. But this isn't so bad! By doing so, we avoid lingering doubts about whether the results would have been different, and our ban list is actually legitimate. It's better to get this out of the way than to speculate about what things might have been like for the next three years, and risk missing out on a rich game.

DougJustDoug · Sep 13, 2010

Here's what I learned from tiering in Gen 4:

Tiering decisions in Smogon are almost entirely subjective, and every "test" will be driven mostly by bandwagon thinking and popularity contests.

The first major tiering decision of the 4th generation, Garchomp, was a straight-up community vote. To prevent that vote from being a pure popularity contest, we instituted a ratings minimum for voters. Presumably, if we selected "good battlers", we would get mostly intelligent and logical voters that would make a decision based on good reasoning and competitive experience.

Almost immediately after Garchomp was banned, people started bitching about the quality of the vote and the factors driving individual decisions. People were concerned about cheaters inflating their ratings so they could vote. People complained that "voter recruiting drives" were occurring on other sites and within certain clans, and they were voting based on a mandate from others instead of their own personal decisions. Rumors flew around about voters intentionally voting one way or the other "just to fuck with the results". Some people openly argued that their vote was based on what Nintendo allowed in VGC play, or the BST total of the pokemon in question. Complaints flew in from every direction, and they all pretty much said the same thing:

"This was supposed to be a vote of skilled, intelligent, competitive pokemon players and the results would reflect what is best for the metagame based on solid competitive principles. It wasn't. This was a big clusterfuck popularity contest with very little rigor and no meaningful controls to ensure a high-quality end result."

From there, we began instituting more controls and more process to attempt to make tiering decisions less of a big uncontrolled popularity contest and more of a scientific test or legal trial. All the stages of suspect testing, the Characteristics of an Ubers, paragraph submissions, ratings thresholds, deviation requirements, suspect EXP formulas, special testing ladders, voter checkmark badges, restricted access suspect voter forums -- EVERYTHING... it was all an attempt to make a "high-quality decision" on tiering.

At the time it was all going on, I agreed with some aspects of the tiering process and disagreed with others. But, on the whole, I felt that every element of the tiering process was a reasonable way to achieve the result that it was intended to achieve. I emphasize that for a reason, and I worded it carefully. Was each step executed to perfection? No. Did we get perfect results from each step? No. I'm saying I think that each decision regarding the process along the way was a reasonable decision for what was intended at the time it was done.

The big change in heart for me is that I no longer think we should attempt to make "high-quality" tiering decisions based on scientific rigor or legal procedures. Not because I disagree with it in principle, but because I do not think ANY process will ever prevent the result from boiling down to a big popularity contest anyway. Despite all the process and machinations of the Gen 4 tiering process, I think every vote was mostly influenced by populist thinking and a community bandwagon mentality. I can't say that for sure, that's just my read on the situation. And I suspect that the metagame we have right now would not be much different than it would have been if we had been conducting uncontrolled popularity votes all along.

That's what I learned from Gen 4 tiering.
(to answer the question posed in the title of this thread)

Going forward into Gen 5, I think we should stop trying to ensure our tiering decisions are based on logic, specific competitive experience, or even ensuring that the decision is being made by the "best" or "most qualified" people. We should just accept that this is a big popularity contest, and is driven by factors like tradition, spectacle, information and playstyles from past generations, and the publicity of opinions espoused by popular people in the most visible forums and media. If tiering is a science -- then it is political science, at best. Future tiering processes should just acknowledge that this is a big subjective popular opinion issue, and dispense with all the stuff we did in Gen 4 to try and make it a scientific test or trial.

Rising_Dusk · Sep 13, 2010

DougJustDoug said:
Going forward into Gen 5, I think we should stop trying to ensure our tiering decisions are based on logic, specific competitive experience, or even ensuring that the decision is being made by the "best" or "most qualified" people. We should just accept that this is a big popularity contest, and is driven by factors like tradition, spectacle, information and playstyles from past generations, and the publicity of opinions espoused by popular people in the most visible forums and media. If tiering is a science -- then it is political science, at best. Future tiering processes should just acknowledge that this is a big subjective popular opinion issue, and dispense with all the stuff we did in Gen 4 to try and make it a scientific test or trial.

I like this. I like that this basically emphasizes my points, too. We create the metagame we want to play. That's what makes the populist thinking so popular in the first place. I do think that we should enforce only one requirement upon our voters, and that is that they play the game. As I detailed above and Aeolus agreed with, I think a fair way to verify that they play the game is to force that they meet some yet-to-be-determined rating threshold. Not to prove that they are the best or most intelligent players, but rather to prove that they play the game that they are tiering. That, to me, is the critical point. Otherwise, your post is sensible and I agree with what you've gleaned from Gen IV and what you suggest for Gen V. It is a political science, and this is a strong and far less stressing mentality to have for our staff and community.

Aeolus said:
If we decide to return to a completely democratic process, we should realize that the same things will happen this time around that happened last time. I'm hopeful that the level headed population will outnumber the people who vote on unhelpful bases and the results will be acceptable, but one cannot be sure about that. I'd personally prefer a vote of Policy Review people or vote of a council selected by Policy Review people over a vote of the people who populate Stark general.

I am sure that there will be a non-negligible number of people who unfortunately vote for those reasons. That's unavoidable. However, I feel that as a community we've matured to the state where I think our "better" players will vote according to the metagame they want to play. That's crucial. If a Pokemon creates a metagame that we all agree is undesirable for many reasons, then the vote will reflect that and the Pokemon will go. If, however, the general viewpoint is that a Pokemon creates a good metagame, but there are a few outspoken individuals who are masters of English, I think that Pokemon should remain unbanned. Ultimately, it's all about creating the metagame that the people want to play, and I think we all can attest to that in some sense. I'm very pleased that you are on board with my proposal, it is very reassuring.

I do agree with Cathy, though; she presents a valid case that cannot be ignored. We need more information before we set any of these banlist proposals in stone. We don't know if the qualities of these 670+ Pokemon that make them undesirable are going to still be around. That said, I stand by my claim that the desirable metagame is one where these blatantly powerful Pokemon are absent for reasons I've already explained in my other posts... Just as long as they are actually as blatantly powerful as we suspect they are.

Obi said:
The argument that certain Pokemon are obviously bad and will be broken by some measure suddenly stops being so obvious.

I agree, it isn't obvious that they're broken. I would argue, however, that it is obvious that they create a metagame we do not want as our standard baseline metagame. Centralization is something that has never been a part of our banning process, although it has had laid the foundation for many bans. Many people hated the Manaphy metagame of 3-4 because it was totally centralized around Tyranitar and Manaphy. These players fabricated interesting paragraphs about the "offensive characteristic" and so forth when the real underpinnings of the argument were in how centralized the metagame was. This is why I believe variety to be such a quintessentially important part of our desirable metagame. This is also why I can say with absolute certainty that these 670+ Pokemon will disrupt that. You're right, it's entirely possible that Gamefreak turns Kyogre into the next Regigigas in BW. The proposed 670+ banlist is tentative for that reason. Until we know the details of those Pokemon, we can't say for sure.

I do think that there is overwhelming support for a metagame that does not include these BST monsters. I don't think it's just to maintain the status quo, but rather, because it is what most people enjoy. With having DPP Uber as a metagame, we've learned that more people still prefer our OU metagame to the all holds barred metagame of Ubers where all of these big 'mons balance each other out and centralize the metagame around themselves. So that means that even in a metagame where everything is allowed, people still generally preferred the metagame with the BST titans banned. I think that speaks largely for itself when considering what qualities people like to see in our standard metagame. If Ubers saw more games than OU, I'd be on the side of raising the ban bar on OU to almost nothing being banned.

ChouToshio said:
Please elaborate on:

As you wish. I've elaborated in the hidden tag below.

Response

ChouToshie said:
-Definition of a "valid" decision

I didn't specify anywhere a valid decision, rather that we must validate our decisions. We must have sufficient knowledge to claim we understand that a Pokemon should be tiered one way or the other. This is only achieved by letting a metagame settle over time.

ChouToshio said:
-What horrific "repercussions" could come

They're not horrific, but certainly tragic. Much of the singles competitive battling in the western world is based on Smogon tiering. If we wrongfully tier something because we didn't give the metagame time to settle, those repercussions will both include a major hit to our credibility as a competitive site and the unfortunate restriction of tiers potentially to a greater audience than just that reads these forums. This leads directly into...

ChouToshio said:
-What you mean by "moral obligation," considering I see no point at all at which ethics have any relevance with this discussion, or how one process could be more "morally correct" than another for categorizing game characters.

We have a moral obligation, as the basis for singles tiers across a lot of the internet, to perform the testing properly and fairly. If we do not, we are harming more than just ourselves; we are harming other communities. That's where morals come into play. If we were just screwing ourselves up, we wouldn't be morally obligated to anyone else. (Other than ourselves, of course!) That's all I am saying here.

lati0s · Sep 13, 2010

Rising_Dusk said:
This is also why I can say with absolute certainty that these 670+ Pokemon will disrupt that.

You can say things with absolute certainty about a metagame that no one has ever played yet and that you have few details at all about? damn you're good.

On a more serious note there are many reasons to the effectiveness of fourth gen theorymon on the theoretical gen5 metagame. The addition of at least 3 more 670+ BST legends as well as the probable addition of a couple more high powered 600 BST legends and 150 other pokemon that could find their niches will very likely cause an increase in diversity, especially since the low diversity in the current ubers metagame can be contributed at least in part to relatively low number of very powerful pokemon. As I said before another reason that ubers is more centralized has nothing to do with the pokemon at all but with the player base. The main ubers players are a small group of experienced players where OU has a much more broad player base this leads to more diversity. The same effect can be seen by comparing stats of the salamence suspect test ladder to current OU stats they are the exact same metagame but the salamence ladder is more centralized.

The argument that a metagame with high BST legends is undesirable because of OU being more popular than current ubers is ridiculous. OU is the most popular teir because it is the expected standard, not because it is necessarily the best.

Rising_Dusk · Sep 13, 2010

lati0s said:
You can say things with absolute certainty about a metagame that no one has ever played yet and that you have few details at all about? damn you're good.

No. I'm not talking about what the B/W metagame will be like. No one knows that yet. I am talking about what the desirable metagame is like. The difference therein is absolutely crucial. For more information, go look at Doug's thread. It describes it in much greater detail.

I normally wouldn't post something this short, but lati0s's misunderstanding is one I think a lot of people share about this and really needs the explanation.

Hipmonlee · Sep 13, 2010

I disagree with some of what Doug said. While there were people voting for terrible reasons, the majority of people voted based on what they thought was best for the metagame.

And that had we better explained what it was we wanted them to vote on, the vast majority of those who didnt would have.

Ultimately we need to trust people to make this decision, because if we dont trust them, there is no way they will trust us.

What I learned from the gen 4 testing, is that ladder tests arent nearly as powerful as we thought they would be. There's probably a lot of reasons for this, I can think of a few potential ones, but we need to accept that the basis for most peoples decisions still has been and is still going to be largely theorymon.

And theorymon isnt as bad as you all make out. When a pokemon has stats that are far higher than other similar pokemon, it has a great typing and trait, and an excellent movepool, then, unless there is some obvious huge drawback, that pokemon is going to be broken. I mean, theorymon is basically how people play the game. They work out in their heads beforehand what pokemon to put in their teams..

Have a nice day.

Chou Toshio · Sep 13, 2010

Rising_Dusk said:
I didn't specify anywhere a valid decision, rather that we must validate our decisions. We must have sufficient knowledge to claim we understand that a Pokemon should be tiered one way or the other. This is only achieved by letting a metagame settle over time.

My point is that there is no "should" in this, and there is no inherent way a pokemon "should" be valid. Also a stable metagame can certainly be achieved by an initial ban list followed by subsequent bans should a pokemon suddenly go a-wall and find a set that happens to be ridiculous (ala Yache-Chomp).

They're not horrific, but certainly tragic. Much of the singles competitive battling in the western world is based on Smogon tiering. If we wrongfully tier something because we didn't give the metagame time to settle, those repercussions will both include a major hit to our credibility as a competitive site and the unfortunate restriction of tiers potentially to a greater audience than just that reads these forums. This leads directly into...

You missed my question-- what are the repercussions. I certainly can't think of anything serious that could actually happen.

Also "wrongfully tier" is something you will also have to define to make your argument sound.

We have a moral obligation, as the basis for singles tiers across a lot of the internet, to perform the testing properly and fairly. If we do not, we are harming more than just ourselves; we are harming other communities. That's where morals come into play. If we were just screwing ourselves up, we wouldn't be morally obligated to anyone else. (Other than ourselves, of course!) That's all I am saying here.

I fully understand the dependancy of everyone on Smogon. That does not change the fact that no matter what process we pick, the ban list comes down to the same decision of human preferences. Whether we painfully test things, or rapidly ban abusers via counsil or open voting, we will arrive at a stable metagame. It is just a matter if you want to do it fast and efficient or slow and painful.

If you wand to discuss the "entirety of the western singles battling world" depending on us for a list, I would like to highlight that the same "entirety of the singles battling world" waiting an entire generation for a list that never actually came. :/

When really, it should have been done in a month, maybe 2. I'll give that 4-5 is a fairly reasonable amount of time, but we have a "moral obligation" to make this list as quickly as possible.

We also have a great moral responsibility not to force this huge community of players depending on us into an unprecedented and certainly unpopular metagame style as what Obi is suggesting.

This metagame exists for the players. Players have expectations based on tradition. I think those expectations must be respected for us to be respected.

DougJustDoug said:
Here's what I learned from tiering in Gen 4:

Tiering decisions in Smogon are almost entirely subjective, and every "test" will be driven mostly by bandwagon thinking and popularity contests.

Going forward into Gen 5, I think we should stop trying to ensure our tiering decisions are based on logic, specific competitive experience, or even ensuring that the decision is being made by the "best" or "most qualified" people. We should just accept that this is a big popularity contest, and is driven by factors like tradition, spectacle, information and playstyles from past generations, and the publicity of opinions espoused by popular people in the most visible forums and media. If tiering is a science -- then it is political science, at best. Future tiering processes should just acknowledge that this is a big subjective popular opinion issue, and dispense with all the stuff we did in Gen 4 to try and make it a scientific test or trial.

Doug is a genius. I wish I myself could have voiced it as well but this is the bottom line of this discussion.

edit: I also want to echo/agree with everything Hip just said as well! We should view our own users as positively as possible.

capefeather · Sep 13, 2010

I'm not going to name what I agree with or who I don't agree with because that would take way too long and this thread has split into several topics already.

I would first like to say that, for all my defending and agreeing with the general idea of what Jumpman and Aeolus were doing, I couldn't help but feel that a lot of miscommunications were involved in the implementation. Stage 3 was honestly really vague, so that when it finally came into being as a replacement of Stage 2, there wasn't much anyone could say about it. (I'd like to know what took the first couple of rounds so long, if it had to do with red tape effects on Smogon, because I suspect that these delays served to make the process even vaguer.) The general system proposed and built upon throughout this thread seems to be pretty sound, and I agree with it. It's the little disagreements that are the problem.

I would also like to post my agreement that we really have to break away from the cultural influences that have shaped tiering policy as much as we can. I'd go as far as to say that this is the single most important task to achieve. I'm really glad that this thread came up so that I could see what the current contributors really think about all this, instead of it being diluted by people pretending that they care, as it so happens on IRC. I'm glad that we can come to this thread and talk about what is fundamentally "good" vs what we want.

But let's be really honest about a few things here:

1. OU is popular because we made it the standard, and that's it. For all we know, the "best" metagame is Charmander line + Spheal line + Snorunt lines + all basics, or something crazy like that.

2. Diversity is not necessarily good. I've heard a LOT about how important team matchup is in this generation, and so I'm honestly quite puzzled as to why, in spite of this, we keep going on as if we really need more diversity. Of course we need diversity to a point, but we also need to balance that against dependence on team matchup.

3. Smogon's popularity does not ride solely on its tier list! Seriously, we should stop with this belief that masses of people will leave or join because of whether this or that Pokémon is Uber. Smogon is what it is because it strives to build and maintain a standard of excellence in playing Pokémon competitively. We have so many tools to prove this and carry it out OTHER THAN the tier list that I don't think that it's such a huge concern.

All this is why I'm inclined to support starting with no Pokémon bans. My posts in the other thread are based on the assumption that we don't want superbosses in our metagame, but with this post I'm questioning that assumption. One point that I don't think has been brought up is the team matchup factor in 2. above. If we start with no bans, we can let the absolute power level of OU rise naturally as more superbosses are added, so that such additions make up for the loss of inferior Pokémon to the nether regions of "UU", instead of fixing the power level and letting OU "bloat" with each passing generation. And again, whether people accept this or not is their decision (like it's SUPPOSED to be), and Smogon should advertise itself based on other credentials, not some politically-charged tiering process. Ironically, I'm echoing an old user's argument about Salamence here, that having a Pokémon in its own league isn't necessarily a bad thing. (Not that I dispute Salamence's ban.)

I always forget to say some things when writing big posts like this... Oh well, I'm done here for now.

EDIT: OH MAN remembering things 30 mins later

I think that, with our current processes, we focus too much on picking out the good instead of weeding out the bad. Yeah, the paragraph submissions aren't that demanding if you're proficient in English, but I think we can all agree that that is not good enough. I think that having Council members participate in a conversation was a good start. We should have a medium of communication that welcomes posts that aren't stupid, instead of the Salamence thread where anybody could say anything and no one would listen, or the Council where even being a Frontier Brain wasn't good enough. We're deciding to have a rating minimum for voting anyway, so perhaps something similar should apply to a forum thread or IRC channel, maybe even with the allowed talker list updated in real time as people fall off or get onto the requirements. This "place" could also be moderated just to kick out people giving invalid reasoning. I just think that someone like DarkLucario should be able to come somewhere, say "this Pokémon deals 40+% to most switch-ins", and be heard and responded to. Then, should we implement a specific "voting time", most of it will effectively have been done by then! I don't know how feasible such a mechanism would be, though. I just think that we judge each other too much sometimes.

I also think that having long testing periods of a month or more, at least at first, is extremely important. We really need to make sure that we're deciding things based on whether a ban will substantially improve Black and White (I felt that it needed to be emphasized...), and not based on preconceived notions from previous generations. We need time to get used to the fact that, whatever happens with the initial banlist, we're in it for the long haul, and that it's not in our place simply to "undo" what we've started.

What did we learn from tiering in generation four?

Banned deucer.

formerly david stone

Stormblessed

T^T

happiness is such hard work

Banned deucer.

I COULD BE BORED!

✓ Just Doug It

Over9000

Bag

Over9000

formerly david stone

Banned deucer.

Knows the great enthusiasms

Have a nice day

Over9000

toot

Users Who Are Viewing This Thread (Users: 1, Guests: 1)