Introduction
Hello all, jpw234 here with some meta-level thoughts about our tiering system/suspect process. Nothing more fun than talking about forum policy, right? Fear not, this is not an essay about the implementation details of the existing system (I think the mods actually do a pretty good job with the whole thing). Rather, the genesis of this essay was a concern with our current procedures of tiers and bans and their effects on our metagame.
I'm going to start by rehashing the need for a method of banning/changing the pool of OU viable pokemon - the threat of metagame staleness, which is what drove the creation of the suspecting and banning infrastructures we have in place. I'll then go on to identify the problems in our current tiering method and show why it fails to meet some of its core goals. Finally, I'll present what I believe to be the necessary components of any metagame-shifting procedure, and offer an initial proposal (which is not meant to be a definitive final product) of how we could better approach bans and tiering in the future.
Ready? Let's kick it off, then.
The Threat: Becoming Stale
It is initially necessary to justify the existence of bans/tiering (a discussion which has been done to death, but I would like to quickly revisit). After all, it would be possible for us to all just play Ubers (which until recently was the ban-free, anything goes format). The most common justification for tiering is based on enjoyment. Frankly, it can be boring to play nothing but Ubers, since the existence of massively powerful pokemon crowds out the ability to play anything but a small selection of top-tier threats. In short, the game gets stale and un-fun. By banning the egregiously overpowered pokemon to Ubers and creating the OU tier, we can remove this stifling effect and create a more diverse metagame. The suspect process allows players to identify pokemon (or, more recently, other game elements) in OU that contribute to a stale metagame and remove them, making OU "fresh" again and more playable.
This is the accepted narrative, and it's also a true one. But I think it's important to go past the subjective criteria and explain why a stale metagame is not only unfun, but problematic for competitive battlers. The simple truth is that OU will never be perfectly balanced. There will always be more powered, and likely overpowered, elements of any metagame. But fortunately, the complexity of pokemon and the limited time players have to test things makes finding the overpowered elements of any particular metagame difficult. As players search through millions of possible sets and team combinations, we initially choose sub-optimal sets and use sub-optimal pokemon, which contributes to the diversity of a metagame. But, as time goes on and the metagame stays the same, it gets more and more figured out. There is what I term a "settling effect", where initially worse pokemon are exchanged for better, then worse sets changed for better, then worse EV spreads, etc. Eventually, as the metagame becomes more and more optimized, and the difference between good and great hinges on a couple of EVs rather than a couple of pokemon, the relative power differences between different pokemon become more and more pronounced. That is to say, it's possible that in metagame X pokemon A is broken. But it the initial stages of X, when players aren't using the best possible supporting cast for A, or the best moveset for A, or the best EVs for A, A doesn't seem that oppressive. As metagame X gets older and all of these "bests" are discovered, the latent potential of A is recognized, and it becomes more and more central to X, and makes X more and more stale. This "figuring-out" tendency makes fighting metagame staleness not just a subjective, "fun"-based necessity, but a competitive necessity.
The Current Approach
The current approach to combating metagame staleness is a simple tier list. I'm going to limit myself to a discussion of OU and Ubers here (since this is the OU forum and I'm only concerned with the upper bounds of the tier), but the process is effectively the same with the lower tiers and their respective BLx tiers. As it stands, at the beginning of a generation we have a list of "no-shit" pokemon that are unarguable far too strong for the metagame (Arceus will never be fine in OU). After a bit of time playing, we can add to this "no-shit" list with the use of quickbans. Finally, after we settle down, candidates for bans are identified and we hold suspect tests to decide if pokemon should be moved to Ubers. It's important to note that bans are essentially a one-way street. To my limited knowledge, a pokemon that was banned in a suspect test has never been unbanned (with the exception of Garchomp, but the original ban was due in large part to the belief that Sand Veil pushed it over the top, and the Evasion Clause ban was what saw it returned to OU).
Problems
The essential problem with the current system is that every ban represents a constriction of the metagame. Whenever we lose a pokemon to a suspect test, we don't get it back. While this fact is not a structural necessity of the current system (it is technically possible to unban pokemon), it is politically unviable to expect unbannings. Users invest lots of time and energy into banning a pokemon and precedent indicates that reversing such a decision is infeasible. In short, every ban is an irrevocable loss of creative potential in our metagame. Now, for the obvious quickbans that I mentioned above, this isn't really an issue, because the downsides of losing a pokemon like Arceus are far outweighed by the gain in diversity that is unlocked when such a ridiculously strong pokemon leaves the metagame. However, the crux of my argument is that by and large, the bans we make in suspect testing are not that sort of ban.
An argument long bandied about (typically by anti-Smogonites like Verlisify, which admittedly does not put me in great company) is that the banning process of Smogon leads to a "next one up" conundrum where the successively best pokemon in each metagame is removed. E.g., we ban A, but then B is the best pokemon, so we ban B, but then C is the best, so we ban C, etc. Of course, the fallacious weakness of this argument comes out when it is applied to all bans - some bans are in fact justified to improve diversity. But, at some point, I fear that Smogon does fall into the "next one up" problem where bans are not done because of some massive problem in the metagame, but simply as a way to solve the issue of metagame staleness.
As evidence for my suspicion, I looked to the timings of the Gen 5 suspect tests. Links to the suspect threads are in the hide tag:
Gen 5 suspect test times:
1: 12/1/2010, http://www.smogon.com/forums/threads/np-ou-suspect-testing-round-1-wait-im-not-jumpman16.83031/
2: 1/3/2011, http://www.smogon.com/forums/thread...ng-round-2-who-am-i-to-break-tradition.84513/
3: 2/12/2011, http://www.smogon.com/forums/thread...-3-so-long-and-thanks-for-all-the-fish.86062/
4: 4/24/2011, http://www.smogon.com/forums/threads/np-ou-suspect-testing-round-4-blaze-of-glory.3446403/
5: 6/15/2011, http://www.smogon.com/forums/thread...sandstorm-excadrill-thundurus-banned.3449630/
New suspect process caused delay
6: 9/12/2012, http://www.smogon.com/forums/threads/np-bw-ou-suspect-testing-round-6-enter-sandman.3472253/
7: 10/17/2012, http://www.smogon.com/forums/threads/np-bw-ou-suspect-testing-round-7-ice-ice-baby.3473636/
8: 11/13/2012, http://www.smogon.com/forums/thread...sting-round-8-mr-roboto-see-post-240.3474686/
9: 12/25/2012, http://www.smogon.com/forums/thread...ng-round-9-rock-you-like-a-hurricane.3476523/
10: 2/4/2013, http://www.smogon.com/forums/threads/np-bw-ou-suspect-testing-round-10-hazard.3478726/
11: 6/14/2013, http://www.smogon.com/forums/thread...nie-in-a-bottle-landorus-is-now-uber.3484919/
12: 8/14/2013, http://www.smogon.com/forums/thread...ays-i-wanna-be-with-you-see-post-263.3487169/
Gen 6:
1: 1/26/2014, http://www.smogon.com/forums/thread...n-extreme-speed-demon-read-post-1278.3498655/
2: 3/28/2014, http://www.smogon.com/forums/thread...-round-2-voter-identification-thread.3503177/ (THIS ONE IS THE ID THREAD, CAN’T FIND THE SUSPECT TESTING THREAD)
3: 5/25/2014, http://www.smogon.com/forums/threads/xy-ou-suspect-process-round-3-baton-pass-read-post-590.3507765/
4: 6/21/2014, http://www.smogon.com/forums/thread...g-round-4-alienation-of-the-wretched.3509824/
1: 12/1/2010, http://www.smogon.com/forums/threads/np-ou-suspect-testing-round-1-wait-im-not-jumpman16.83031/
2: 1/3/2011, http://www.smogon.com/forums/thread...ng-round-2-who-am-i-to-break-tradition.84513/
3: 2/12/2011, http://www.smogon.com/forums/thread...-3-so-long-and-thanks-for-all-the-fish.86062/
4: 4/24/2011, http://www.smogon.com/forums/threads/np-ou-suspect-testing-round-4-blaze-of-glory.3446403/
5: 6/15/2011, http://www.smogon.com/forums/thread...sandstorm-excadrill-thundurus-banned.3449630/
New suspect process caused delay
6: 9/12/2012, http://www.smogon.com/forums/threads/np-bw-ou-suspect-testing-round-6-enter-sandman.3472253/
7: 10/17/2012, http://www.smogon.com/forums/threads/np-bw-ou-suspect-testing-round-7-ice-ice-baby.3473636/
8: 11/13/2012, http://www.smogon.com/forums/thread...sting-round-8-mr-roboto-see-post-240.3474686/
9: 12/25/2012, http://www.smogon.com/forums/thread...ng-round-9-rock-you-like-a-hurricane.3476523/
10: 2/4/2013, http://www.smogon.com/forums/threads/np-bw-ou-suspect-testing-round-10-hazard.3478726/
11: 6/14/2013, http://www.smogon.com/forums/thread...nie-in-a-bottle-landorus-is-now-uber.3484919/
12: 8/14/2013, http://www.smogon.com/forums/thread...ays-i-wanna-be-with-you-see-post-263.3487169/
Gen 6:
1: 1/26/2014, http://www.smogon.com/forums/thread...n-extreme-speed-demon-read-post-1278.3498655/
2: 3/28/2014, http://www.smogon.com/forums/thread...-round-2-voter-identification-thread.3503177/ (THIS ONE IS THE ID THREAD, CAN’T FIND THE SUSPECT TESTING THREAD)
3: 5/25/2014, http://www.smogon.com/forums/threads/xy-ou-suspect-process-round-3-baton-pass-read-post-590.3507765/
4: 6/21/2014, http://www.smogon.com/forums/thread...g-round-4-alienation-of-the-wretched.3509824/
Outside of the massive gap between Gen 5's 5th and 6th suspect test caused by a revamp of the process (which doesn't really count), there has been a maximum of 4 months between tests. And in fact, that one 4 month gap (between test 10 and 11) seems to be an outlier, as every other gap is less than 2.5 months, with most being between 1.5 and 2. Now, if we believed that each of these suspected pokemon was actually broken to the extent that a suspect test was definitely needed, we might be surprised that these suspect tests occur so regularly, since that implies that they are each about "the same level" of broken (as it took about the same amount of time to determine that they required a suspect). On the other hand, I think this data points to a different conclusion. I think it indicates that it takes, on average, about 1.5 to 2 months for the "settling effect" to kick in and a metagame to become stale, and that when the metagame becomes stale, players turn to their only available mechanism of changing the metagame - the suspect process. And so they clamor for the "next one up" - the best pokemon at the time - to be removed. I think this observation is additionally supported by the fact that the voting margins for suspects that occur earlier in Gen 5 are much more decisive than for the suspects introduced later.
Let me state the entire argument. Essentially, I argue that since no metagame is perfectly balanced, shaking up any metagame every so often (the data points to about every 2 months) is necessary to keep it healthy. The concern is that the only method to do so that is made available by our tiering system is with a ban, which is effectively permanent. Essentially, over time we are frivolously shrinking the pool of good OU pokemon just to keep the metagame fresh. In fact, I suspect the only thing that has previously saved us from doing this to an unsustainable extent is the fortunate fact that GameFreak reliably releases a new generation before we hang ourselves by overly restricting the metagame (and recalling the grumbling and divisiveness over the late Gen 5 suspects of Landorus-I and Keldeo, it may have been a close call).
What We Need
What the above argument seems to suggest is that we need a method of tiering that does not completely exclude entire pokemon. Regretfully, we are not DotA 2, so we cannot balance Mega-Kangaskhan by lowering its Attack stat. We must work with what GameFreak gives us, which means we can't change the characteristics of a pokemon itself. What's more, we've committed to not fiddling with a pokemon's movepool (e.g. "Mewtwo is allowed in OU without Psystrike"). It seems we will have to deal with entire pokemon.
In order to do a decent job of "shaking up the metagame", any system must deal with the top-tier pokemon in OU (as changing around bit players isn't changing much at all). To avoid the criticisms I've leveled of the current system, a proposal should either allow for flexibility in unbanning previously banned pokemon, or else provide some not-so-strict delineation between tiers. My proposal will rely on the latter.
A Proposal
My initial proposal (which is meant to serve mostly as a jumping-off point for additional discussion, rather than any sort of polished and complete blueprint) relies on the observation that our banning decisions are typically more clear-cut earlier in each generation, initially with quickbans and then with large margins in suspecting voting, before settling down such that each potential ban is much more controversial. What this implies to me is that there is a set of clearly OP pokemon to which the "next one up" concern is not applicable, but the rest of the bans can probably be pinned more to the settling effect and the need to keep the metagame fresh than any overwhelming concern about brokenness.
This motivates a 2-tiered banning process that serves as the basis for my proposal.
First, in the beginnings of a new generation (some explicit timeframe would be established - I would suspect between 2-6 months after the gen is fully working on PS) we would play with only a barebones list of clearly uber pokemon. Then, analagous to current quickbans, we could identify clearly broken pokemon (e.g. M-Kanga, M-Blaze, M-Gengar, etc.) and remove them solidly to Ubers. This first tier of banned pokemon would be irrevocably banned, much like the status quo.
After this period was up, a pool of top-tier pokemon would be created. There are several ways this could be done - through usage statistics, voting (perhaps modelled on our current Viability Rankings), etc. This "Rotational Pool" (RP) would represent the "cream of the crop" of OU that, while strong (and potentially eventual targets for suspect tests in the old system), were not self-evidently broken. Then, we could set defined time periods (an initial suggestion would be 2 months based on the above data) where we rotationally banned, say, some 20% of this pool. We'd pick an initial 20% to ban and, after a 2 month period, unban them and ban a separate 20% of the pool. 20% is an arbitrary number (I suspect if it changed it would go higher, perhaps even much higher), but the effect of the structure is to consistently and predictably shake up the metagame without ever conclusively removing any pokemon from consideration.
Advantages
The effect of the current system is to say when pokemon A is suspected, "Ah, it turns out A was broken in metagame X all along - well, off it goes". The proposed system recognizes that even after we get rid of very strong pokemon that are broken across most or all metagames (the first tier of bans), there will always be some pokemon broken in metagame X. And, rather than feebly attempting to always find it and ban it, we can instead say "Ah, we've discovered that A is broken in X. Well, on to metagame Y - maybe A isn't broken there." Rather than shutting off potential for creativity by banning a pokemon, this system shifts us into a new metagame and challenges us to be even more creative again.
Additionally, this system is much more effective in actually shaking up the metagame. In the current system, when we ban pokemon A from metagame X, we don't so much create metagame Y as we create metagame "X without A" (see, for some very clear examples, the ban of Landorus and proposed ban of Keldeo in Gen 5). Simply removing the top threat doesn't necessarily do a very good job of changing the rest of the metagame, as evidenced by the fact that we keep on having to remove the next top threat. A Rotational Pool strategy would be more likely to create a fully new metagame on each change, which is the desired effect.
Finally, the proposed system is far more predictable and transparent, which is nice for competitive battlers. If there is a clear signal that "Pokemon A-E will be banned for the next 2 months, then it will be F-L, then M-Q, etc.", there is potential for testing ahead of time, clear boundaries to the metagames, etc. The current system is uncertain (since we don't know how votes will turn out) and somewhat unpredictable (since we don't know when suspect tests will be announced). This is not the largest of concerns, but it would be a nice extra benefit.
Disadvantages
This is by no means a perfect proposal; there are many problems with it. First, the Rotational Pool will have to be updated, and this would be a pain. It is possible that some pokemon in the RP need to become fully Uber, does this defeat the purpose of the system? What about pokemon that aren't initially in the RP but become popular, how is their inclusion managed? Each variable in the proposal would need to be tuned (length of the metagames, % of pokemon in the RP banned, etc). How would we choose what pokemon in the RP get banned in each rotation? There are many implementation concerns.
What's more, would it even be worth the trouble? The most clear-cut benefit of a tier list is that it's simple. Anybody new to pokemon can look at a tier list and understand what pokemon are usable in what tiers, an RP system would be much more complicated.
What if some metagames are simply terrible? It could be that an unfortunate combination of bans in the RP make it so that one or two pokemon are simply unstoppable, could a metagame be shut down if this was the case? To what extent would that defeat the purpose?
Conclusion
I don't have a definitive answer to the problem of tiering at Smogon. However, I do have a very real concern that the existing system is majorly holding back the potential of competitive Pokemon. Please use the discussion on this thread to offer new suggestions or defenses of the current system. I don't expect anything to be implemented immediately (or really any changes to be made at all, given the entrenched-ness of the current system), but I do think intellectual stagnation is a problem and more discussion is always a good thing, so please chime in.