[META] The Limitations of the Tiering System, and a Proposal

jpw234 · Jun 30, 2014

Introduction

Hello all, jpw234 here with some meta-level thoughts about our tiering system/suspect process. Nothing more fun than talking about forum policy, right? Fear not, this is not an essay about the implementation details of the existing system (I think the mods actually do a pretty good job with the whole thing). Rather, the genesis of this essay was a concern with our current procedures of tiers and bans and their effects on our metagame.

I'm going to start by rehashing the need for a method of banning/changing the pool of OU viable pokemon - the threat of metagame staleness, which is what drove the creation of the suspecting and banning infrastructures we have in place. I'll then go on to identify the problems in our current tiering method and show why it fails to meet some of its core goals. Finally, I'll present what I believe to be the necessary components of any metagame-shifting procedure, and offer an initial proposal (which is not meant to be a definitive final product) of how we could better approach bans and tiering in the future.

Ready? Let's kick it off, then.

The Threat: Becoming Stale

It is initially necessary to justify the existence of bans/tiering (a discussion which has been done to death, but I would like to quickly revisit). After all, it would be possible for us to all just play Ubers (which until recently was the ban-free, anything goes format). The most common justification for tiering is based on enjoyment. Frankly, it can be boring to play nothing but Ubers, since the existence of massively powerful pokemon crowds out the ability to play anything but a small selection of top-tier threats. In short, the game gets stale and un-fun. By banning the egregiously overpowered pokemon to Ubers and creating the OU tier, we can remove this stifling effect and create a more diverse metagame. The suspect process allows players to identify pokemon (or, more recently, other game elements) in OU that contribute to a stale metagame and remove them, making OU "fresh" again and more playable.

This is the accepted narrative, and it's also a true one. But I think it's important to go past the subjective criteria and explain why a stale metagame is not only unfun, but problematic for competitive battlers. The simple truth is that OU will never be perfectly balanced. There will always be more powered, and likely overpowered, elements of any metagame. But fortunately, the complexity of pokemon and the limited time players have to test things makes finding the overpowered elements of any particular metagame difficult. As players search through millions of possible sets and team combinations, we initially choose sub-optimal sets and use sub-optimal pokemon, which contributes to the diversity of a metagame. But, as time goes on and the metagame stays the same, it gets more and more figured out. There is what I term a "settling effect", where initially worse pokemon are exchanged for better, then worse sets changed for better, then worse EV spreads, etc. Eventually, as the metagame becomes more and more optimized, and the difference between good and great hinges on a couple of EVs rather than a couple of pokemon, the relative power differences between different pokemon become more and more pronounced. That is to say, it's possible that in metagame X pokemon A is broken. But it the initial stages of X, when players aren't using the best possible supporting cast for A, or the best moveset for A, or the best EVs for A, A doesn't seem that oppressive. As metagame X gets older and all of these "bests" are discovered, the latent potential of A is recognized, and it becomes more and more central to X, and makes X more and more stale. This "figuring-out" tendency makes fighting metagame staleness not just a subjective, "fun"-based necessity, but a competitive necessity.

The Current Approach

The current approach to combating metagame staleness is a simple tier list. I'm going to limit myself to a discussion of OU and Ubers here (since this is the OU forum and I'm only concerned with the upper bounds of the tier), but the process is effectively the same with the lower tiers and their respective BLx tiers. As it stands, at the beginning of a generation we have a list of "no-shit" pokemon that are unarguable far too strong for the metagame (Arceus will never be fine in OU). After a bit of time playing, we can add to this "no-shit" list with the use of quickbans. Finally, after we settle down, candidates for bans are identified and we hold suspect tests to decide if pokemon should be moved to Ubers. It's important to note that bans are essentially a one-way street. To my limited knowledge, a pokemon that was banned in a suspect test has never been unbanned (with the exception of Garchomp, but the original ban was due in large part to the belief that Sand Veil pushed it over the top, and the Evasion Clause ban was what saw it returned to OU).

Problems

The essential problem with the current system is that every ban represents a constriction of the metagame. Whenever we lose a pokemon to a suspect test, we don't get it back. While this fact is not a structural necessity of the current system (it is technically possible to unban pokemon), it is politically unviable to expect unbannings. Users invest lots of time and energy into banning a pokemon and precedent indicates that reversing such a decision is infeasible. In short, every ban is an irrevocable loss of creative potential in our metagame. Now, for the obvious quickbans that I mentioned above, this isn't really an issue, because the downsides of losing a pokemon like Arceus are far outweighed by the gain in diversity that is unlocked when such a ridiculously strong pokemon leaves the metagame. However, the crux of my argument is that by and large, the bans we make in suspect testing are not that sort of ban.

An argument long bandied about (typically by anti-Smogonites like Verlisify, which admittedly does not put me in great company) is that the banning process of Smogon leads to a "next one up" conundrum where the successively best pokemon in each metagame is removed. E.g., we ban A, but then B is the best pokemon, so we ban B, but then C is the best, so we ban C, etc. Of course, the fallacious weakness of this argument comes out when it is applied to all bans - some bans are in fact justified to improve diversity. But, at some point, I fear that Smogon does fall into the "next one up" problem where bans are not done because of some massive problem in the metagame, but simply as a way to solve the issue of metagame staleness.

As evidence for my suspicion, I looked to the timings of the Gen 5 suspect tests. Links to the suspect threads are in the hide tag:

Outside of the massive gap between Gen 5's 5th and 6th suspect test caused by a revamp of the process (which doesn't really count), there has been a maximum of 4 months between tests. And in fact, that one 4 month gap (between test 10 and 11) seems to be an outlier, as every other gap is less than 2.5 months, with most being between 1.5 and 2. Now, if we believed that each of these suspected pokemon was actually broken to the extent that a suspect test was definitely needed, we might be surprised that these suspect tests occur so regularly, since that implies that they are each about "the same level" of broken (as it took about the same amount of time to determine that they required a suspect). On the other hand, I think this data points to a different conclusion. I think it indicates that it takes, on average, about 1.5 to 2 months for the "settling effect" to kick in and a metagame to become stale, and that when the metagame becomes stale, players turn to their only available mechanism of changing the metagame - the suspect process. And so they clamor for the "next one up" - the best pokemon at the time - to be removed. I think this observation is additionally supported by the fact that the voting margins for suspects that occur earlier in Gen 5 are much more decisive than for the suspects introduced later.

Let me state the entire argument. Essentially, I argue that since no metagame is perfectly balanced, shaking up any metagame every so often (the data points to about every 2 months) is necessary to keep it healthy. The concern is that the only method to do so that is made available by our tiering system is with a ban, which is effectively permanent. Essentially, over time we are frivolously shrinking the pool of good OU pokemon just to keep the metagame fresh. In fact, I suspect the only thing that has previously saved us from doing this to an unsustainable extent is the fortunate fact that GameFreak reliably releases a new generation before we hang ourselves by overly restricting the metagame (and recalling the grumbling and divisiveness over the late Gen 5 suspects of Landorus-I and Keldeo, it may have been a close call).

What We Need

What the above argument seems to suggest is that we need a method of tiering that does not completely exclude entire pokemon. Regretfully, we are not DotA 2, so we cannot balance Mega-Kangaskhan by lowering its Attack stat. We must work with what GameFreak gives us, which means we can't change the characteristics of a pokemon itself. What's more, we've committed to not fiddling with a pokemon's movepool (e.g. "Mewtwo is allowed in OU without Psystrike"). It seems we will have to deal with entire pokemon.

In order to do a decent job of "shaking up the metagame", any system must deal with the top-tier pokemon in OU (as changing around bit players isn't changing much at all). To avoid the criticisms I've leveled of the current system, a proposal should either allow for flexibility in unbanning previously banned pokemon, or else provide some not-so-strict delineation between tiers. My proposal will rely on the latter.

A Proposal

My initial proposal (which is meant to serve mostly as a jumping-off point for additional discussion, rather than any sort of polished and complete blueprint) relies on the observation that our banning decisions are typically more clear-cut earlier in each generation, initially with quickbans and then with large margins in suspecting voting, before settling down such that each potential ban is much more controversial. What this implies to me is that there is a set of clearly OP pokemon to which the "next one up" concern is not applicable, but the rest of the bans can probably be pinned more to the settling effect and the need to keep the metagame fresh than any overwhelming concern about brokenness.

This motivates a 2-tiered banning process that serves as the basis for my proposal.
First, in the beginnings of a new generation (some explicit timeframe would be established - I would suspect between 2-6 months after the gen is fully working on PS) we would play with only a barebones list of clearly uber pokemon. Then, analagous to current quickbans, we could identify clearly broken pokemon (e.g. M-Kanga, M-Blaze, M-Gengar, etc.) and remove them solidly to Ubers. This first tier of banned pokemon would be irrevocably banned, much like the status quo.
After this period was up, a pool of top-tier pokemon would be created. There are several ways this could be done - through usage statistics, voting (perhaps modelled on our current Viability Rankings), etc. This "Rotational Pool" (RP) would represent the "cream of the crop" of OU that, while strong (and potentially eventual targets for suspect tests in the old system), were not self-evidently broken. Then, we could set defined time periods (an initial suggestion would be 2 months based on the above data) where we rotationally banned, say, some 20% of this pool. We'd pick an initial 20% to ban and, after a 2 month period, unban them and ban a separate 20% of the pool. 20% is an arbitrary number (I suspect if it changed it would go higher, perhaps even much higher), but the effect of the structure is to consistently and predictably shake up the metagame without ever conclusively removing any pokemon from consideration.

Advantages

The effect of the current system is to say when pokemon A is suspected, "Ah, it turns out A was broken in metagame X all along - well, off it goes". The proposed system recognizes that even after we get rid of very strong pokemon that are broken across most or all metagames (the first tier of bans), there will always be some pokemon broken in metagame X. And, rather than feebly attempting to always find it and ban it, we can instead say "Ah, we've discovered that A is broken in X. Well, on to metagame Y - maybe A isn't broken there." Rather than shutting off potential for creativity by banning a pokemon, this system shifts us into a new metagame and challenges us to be even more creative again.

Additionally, this system is much more effective in actually shaking up the metagame. In the current system, when we ban pokemon A from metagame X, we don't so much create metagame Y as we create metagame "X without A" (see, for some very clear examples, the ban of Landorus and proposed ban of Keldeo in Gen 5). Simply removing the top threat doesn't necessarily do a very good job of changing the rest of the metagame, as evidenced by the fact that we keep on having to remove the next top threat. A Rotational Pool strategy would be more likely to create a fully new metagame on each change, which is the desired effect.

Finally, the proposed system is far more predictable and transparent, which is nice for competitive battlers. If there is a clear signal that "Pokemon A-E will be banned for the next 2 months, then it will be F-L, then M-Q, etc.", there is potential for testing ahead of time, clear boundaries to the metagames, etc. The current system is uncertain (since we don't know how votes will turn out) and somewhat unpredictable (since we don't know when suspect tests will be announced). This is not the largest of concerns, but it would be a nice extra benefit.

Disadvantages

This is by no means a perfect proposal; there are many problems with it. First, the Rotational Pool will have to be updated, and this would be a pain. It is possible that some pokemon in the RP need to become fully Uber, does this defeat the purpose of the system? What about pokemon that aren't initially in the RP but become popular, how is their inclusion managed? Each variable in the proposal would need to be tuned (length of the metagames, % of pokemon in the RP banned, etc). How would we choose what pokemon in the RP get banned in each rotation? There are many implementation concerns.

What's more, would it even be worth the trouble? The most clear-cut benefit of a tier list is that it's simple. Anybody new to pokemon can look at a tier list and understand what pokemon are usable in what tiers, an RP system would be much more complicated.

What if some metagames are simply terrible? It could be that an unfortunate combination of bans in the RP make it so that one or two pokemon are simply unstoppable, could a metagame be shut down if this was the case? To what extent would that defeat the purpose?

Conclusion

I don't have a definitive answer to the problem of tiering at Smogon. However, I do have a very real concern that the existing system is majorly holding back the potential of competitive Pokemon. Please use the discussion on this thread to offer new suggestions or defenses of the current system. I don't expect anything to be implemented immediately (or really any changes to be made at all, given the entrenched-ness of the current system), but I do think intellectual stagnation is a problem and more discussion is always a good thing, so please chime in.

doughboy · Jun 30, 2014

Sorry to say jpw, but I think the premise of your argument to change the tiering system is misguided.

jpw234 said:
As metagame X gets older and all of these "bests" are discovered, the latent potential of A is recognized, and it becomes more and more central to X, and makes X more and more stale. This "figuring-out" tendency makes fighting metagame staleness not just a subjective, "fun"-based necessity, but a competitive necessity.

This simply isn't true. If this were the case, then why are ADV OU and DPP OU metagames considered some of the best around by many players in the highest level of play, the tournament circle? The GSC metagame has not changed in a decade, yet some still considered it more fun than BW and XY, whose metagames are still evolving. These older metagames are competitive, without rotating bans, and are arguably much more competitive than the ones still evolving (BW and XY).

The false assumption you made is that players find enjoyment in a metagame for depth of strategy. You extended that idea on to the tiering process, whose purpose you assumed to be to artificially increase the depth of strategy by allowing for "more options" or the "discovery of new options". You show your thought process here:

jpw234 said:
We'd pick an initial 20% to ban and, after a 2 month period, unban them and ban a separate 20% of the pool. 20% is an arbitrary number (I suspect if it changed it would go higher, perhaps even much higher), but the effect of the structure is to consistently and predictably shake up the metagame without ever conclusively removing any pokemon from consideration.

The people, at least within this community, find the fun in winning the game. "Broken pokemon" are not removed from the metagame because they limit the amount of options people have, but because they remove the chances or ability of victory from the hands of a player. It just so happens that when you ban an uncompetitive element of the game (eg. Arcues, Mega-Kangaskhan, etc.) people are forced to move on from the best strategy that gives them the win to the ones that are less obvious to find or more difficult to pull off. The past generations have less depth in strategy simply because they have a smaller pool of pokemon, items, moves, and hell even stat distribution. Those past generations though are not uncompetitive nor stale, but your argument suggests otherwise.

Competivness is the goal, diversity just happens to be a symptom of aiming for that goal. It isn't the other way around (i.e. diversity leads to competitiveness)!
__________________________________

That being said, I agree with you that the tiering system now is not optimal considering the current state of XY. Your solution however, is misguided and would not be a fix to our problems. Our problem is that there is simply are large pool of pokemon / strategies that only have a handful, if any, ways to play against. This just stems from the amount of pokemon growing and GF going overboard with the power creep. IMO, UU's system of liberal bans and dropping pokemon into OU would have allowed a healthy OU metagame to develop faster and get the verdict on troublesome pokemon more quickly

jpw234 · Jun 30, 2014

Doughboy , thanks for the contribution.

My suspicion is that you're extrapolating our experiences with generations 4 and previous a bit too heavily. Your observation that DPP OU (which is the earliest I was around for) was a fantastic metagame with few bans (for those who weren't around, the tier list is here and to my memory the only bans voted on were Garchomp, Salamence, Manaphy, Lati@s, Deo forms and Wobbuffet/Wynaut) is true, DPP OU was (and still is) a great metagame. But I think we were fortunate to have a set of pokemon that weren't nearly as ridiculous as the power creep we're presented with today. I think the only close bans in DPP OU were Chomp and Mence.

When we look at what happened last generation (and I think two things are generally agreed upon - last gen was the true bringer of the power creep, and last gen's tiering was an absolute shitshow) I think we need to recognize that what GameFreak has given us is qualitatively different. Introducing the musketeers and the genies alone (the second set with two forms, no less) instantly gave us 10 very viable OU pokemon and by my count 7 (all but Virizion/Cobalion/Tornadus-I) top-tier OU pokemon. Gen 6 brings mega evolutions and a couple more top threats. Basically, the number of truly formidable and possible suspectable pokemon has been much higher in the 5th and 6th gens than in any previous generation.

I think the critical portion of your comment was here:

"Broken pokemon" are not removed from the metagame because they limit the amount of options people have, but because they remove the chances or ability of victory from the hands of a player. It just so happens that when you ban an uncompetitive element of the game (eg. Arcues, Mega-Kangaskhan, etc.) people are forced to move on from the best strategy that gives them the win to the ones that are less obvious to find or more difficult to pull off.

This was true at one point. I don't think we can say this has been true in Gen 5 or Gen 6. This was my argument about the two tiers of banned pokemon.
This is subjective and everybody would set a cutoff differently, but in Gen 5 I think it's reasonable to say the first portion of bans - Skymin, DrizzleSwim, Manaphy, Blaziken, and possibly Excadrill would be my list - were the type of bans that you talked about: bans on pokemon that "remove the chances or ability of victory from the hands of a player". I would be hard-pressed to believe any argument that the later suspects of Deo-D, Genesect, Tornadus-T, Landorus-I and Keldeo were necessary to salvage an utterly broken metagame. On the contrary, there were thriving metagames with all of these pokemon, and I think it would be ludicrous to say that any of them were self-evidently broken in the style of Speed Boost Blaziken or perma-flinch Skymin.

When I talked about the settling effect and the competitive necessity of shaking up the meta, it was in reference to these tests in particular (which should have been made more clear in the OP). I think your post fails to take into account the specifics of the testing that was done in Gen 5 and I fear will soon be done in Gen 6. I think that people felt that Genesect was broken because the metagame was allowed to become figured out to the extent that Genesect's qualities became overwhelming. I think the same was true for Torn-T, Lando-I and Keldeo. I don't think these pokemon were banned because they made competitive battling impossible or extraordinarily difficult, but rather because the meta settled around them as the "next best pokemon" and they became stale.

In Gen 4 it was possible to ban all of the pokemon that were uncompetitive at all simply because there weren't that many. We could comfortably ban Wobb, Chomp, Mence and Manaphy and have fun with the remaining set, aware than none were likely to become oppressively dominant. Any experienced OU battler can look at the DPP OU tier list and be close to 100% confident that nothing on there is broken at all (seriously, though, people should look, nothing seems even potentially out of control). To do the same in Gen 6 would be much more difficult. I think we've already passed the first tier of "obviously broken mons" with Mega-Kanga, Mega-Gengar, Blaziken and Mega-Lucario. We're starting on the tier 2 stuff already with the Deo forms. Just looking at the current list, I can forsee legitimate suspect tests for Aegislash, both Charizards, both Deo forms (already), Keldeo, Kyu-B, Landorus, Mega-Mawile, Mega-Pinsir, Talonflame, Thundy-I and Mega-Venusaur without thinking too hard. I think we'd eventually clear out upwards of 15 pokemon to attain the same kind of effect we enjoyed in Gen 4. The question becomes whether that sort of clear out is worth the benefits. I think that something like the RP proposal would do a better job keeping the metagame competitive without running roughshod through the large selection of devastating threats that GameFreak has given us.

Ash Borer · Jul 22, 2014

I'll make a longer response to this perhaps at some point but your essay does not address political inertia in smogon. Your argument is based on the premise that suspect tests take place right when the community decides or when it's actually apparent something is broken and not just when the ou council gets around to it.

Another thing I think is inconsistent in your essay is that you state that banning only shrinks the pool of OU Pokemon, but it can make a lot of lower tier threats much more viable. Certainly Medicham, ALakazam, and Gardevoir will flourish in post-AEgis OU.

Finally, I do think that current tieiring is partially suffering from the idea of simply knocking off the next best Pokemon everytime, but at the same time we have old metagames that have gone through that and exhausted this process to the point of near perfect balance. ADV OU really has no Pokemon on the cusp of suspecting not because it's old but because it truly doesn't need it. Who is to say playing the regicide game with XY OU won't produce something similar? I just wish it would happen faster.

But, regardless of these points I do agree that there are definitely Pokemon that can be revisited and should at some point.

Edit by Haunter: if you don't know the facts, don't make ignorant statements.

[META] The Limitations of the Tiering System, and a Proposal

jpw234

Catastrophic Event Specialist

doughboy

jpw234

Catastrophic Event Specialist

Ash Borer

I've heard they're short of room in hell