[OUTDATED] OU's tiering policy framework

Status
Not open for further replies.

Aldaron

geriatric
is a Tournament Director Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
In all OU debates, be sure to use the following framework when debating points and proposing actions.

Posts will be monitored and moderated with the expectation that everyone is following the framework.

Disagreements with the framework will NOT be part of specific policy threads; they will occur in threads about the framework itself.

We are looking to keep specific policy threads on point as much as possible and separating the framework / basis for how you argue and debate from the actual argument / debate will help us in this effort.

Be sure to take the assumptions (first section) as stated and use the way we have defined uncompetitive, broken, and unhealthy to show how a specific suspect element reduces a component of skill (seconds section) while keeping our goals (third section) in mind.

For some added context, we on the OU Tiering Council and Policy Review in general noticed a few pressing concerns.

First, people were throwing around words like "uncompetitive" and "broken" and "unhealthy" but using different definitions wen debating. This caused a lot of confusion and many times completely derailed topics.

Second, people were applying their personal philosophies in how we should tier in these debates. While it is fine to have your own thoughts on how tiering should work, if the debate ends up having numerous tiering philosophies conflicting with each other, the topic will once again be derailed. Therefore, we decided to separate the tiering philosophy (the how and why) from the actual debates (the what). If you disagree with how we tier bring it up in framework topics and not the actual tiering topics.

Third, we needed to update the community to show our assumptions and goals for tiering. Doug wrote the characteristics of a desirable metagame and the offensive / defensive / support clauses for suspecting elements in gen 4, and while we borrowed from that, we felt it was necessary to update what the community looks to as a tiering basis.

This version of the framework will hopefully guide all OU tiering policy debaters and narrow and focus topics to more beneficial terms.

-----------------------------------------------------------------------------------------------------------------------------

Assumptions in Tiering Policy:

I.) We play, to the best of our simulator's capabilities, with the mechanics given to us on the cartridge.
A.) The ONLY exception to this is Sleep Clause.
B.) Suggestions to "remove critical hits" or "make Baton Pass fail in battle" are not valid tiering solution proposals.
II.) We cater to both ladder players (the higher end of the ladder) and tournament players.
A.) The majority of our accepted "elitism skill" is concentrated in tournaments, but the overwhelming majority of our battles occur on ladder.
B.) For actions to be taken in tiering policy, it is important to show how that action affects BOTH the ladder scene and the tournament scene.
C.) Stats for both will be highly emphasized but not a sole determining factor.
III.) Providing justification is the onus of the side changing the status quo.
A.) It is important to note that the status quo can be changed in the case of releases. This is the situation with Hoopa-Unbound, where it started directly in OU unlike other 680 BST legendaries which start as Ubers and then potentially get suspected to drop to OU.
B.) If a proposal is made to ban a Pokemon, Ability, Item, or Move, the side suggesting this ban must demonstrate all of why this is necessary, how it affects the ladder and the tournament scene, and provide evidence for both.
C.) If a proposal is made to unban a Pokemon, Ability, Item, or Move, the side suggesting this unban must demonstrate all of why this is necessary, how it affects the ladder and the tournament scene, and provide evidence for both.
D.) Complex bans proposals must provide additional information into why the simpler bans are not sufficient.
IV.) Probability management is a part of the game.
A.) This means we have to accept that moves have secondary effects, that moves can miss, that moves can critical hit, and that managing all these potential probability points is a part of skill.
B.) This does NOT mean that we will accept every probability factor introduced to the game. Evasion, OHKO, and Moody all affected the outcome "too much" and we removed them.
C.) "Too much" is if a particular factor has the more skilled player at a disadvantage a considerable amount of the time against a less skilled player, regardless of what he does. In relation to the latter part, "too much" also refers to factors that nearly completely take a game out of the player's hands and turn the PRIMARY point of the game to wait for the RNG.
1.) OHKO moves are an example of the "too much" portion. With a 30% success rate, the other player will be put in an immediate disadvantage by the OHKO move user a considerable amount of the time no matter what he does.
2.) Moody and SwagPlay are examples of the "taking the game out of a player's hands". Both turn the PRIMARY point of the game waiting to see what the RNG spits out.
V.) Team match up management is a part of the game.
A.) This means we have to accept that we will be at an advantage or disadvantage from the very beginning.
B.) This does NOT mean we will accept a component that the majority of the time will turn the battle against the more skilled player. This component must both be an issue a majority of the time AND influence the battle dramatically.
C.) With optimal team building skills, the pool of options (Pokemon, Moves, Items) present in the tier should allow you to build teams addressing the different team-archetypes at least decently, and offer a solution in-battle to a large majority of the principle threats of the metagame.
D.) There is also an important point to note in that team match up is only an issue if there is an extraordinarily low chance to win from the get go.
1.) This means that, even if the better skilled player made the right plays, he lost.
2.) Team match up is only a concern if no matter what the better player did, he had zero or an extremely slim chance of winning.
3.) Basically, for tiering debate purposes, even if the better player had a team disadvantage and made the better moves the majority of the game, did he screw up a turn or two? If he did, then yes, part of the reason he lost was the team match up, but a major factor was also the poor decision.
VI.) Even though assumptions I., IV., and V., limit us, we will, within those limitations, work to maximize the concept of "player skill" determining the result of a match the majority of the time.
A.) Skill is defined in more depth in the next section.
B.) The majority of our potential suspect discussion will center around the defined versions of uncompetitive, broken, and unhealthy and how a particular suspect element lowers some component of player skill within those 3 constructs.
C.) Any of the sub-sections in skill can be emphasized for a potential suspect.
1.) If Shadow Tag reduces the battling skill component too much via removing smart switching and reducing the ability to assess risk, these should be mentioned when stating Shadow Tag is uncompetitive, broken, or unhealthy.
2.) If Mega-Sableye is uncompetitive, broken, or unhealthy, point out how it reduces player skill from being the major determining factor in a match and which component of skill it drastically takes away from.​
---------------------------------------------------------------------------------------------------------------------

For what it is worth, we lay out skill in various sub-sections before defining uncompetitive, broken, and unhealthy because all three "buzzwords" have to be used within the context of the suspect element reducing some component of player skill. Our tiering goal, stated later in more detail, is to create a game where the better player wins the majority of the time.

This means that, in suspect debates, we need to show why and how a suspect element is uncompetitive, broken, and/or unhealthy within the context of reducing the effect skill has on the outcome of a battle. Specifically point out which component or components of skill are being affected and how and to what extent.

---------------------------------------------------------------------------------------------------------------------

Definitions for Tiering Policy:

I.) Skill - the subjective metric we use to judge player worth in competitive Pokemon
A.) Team Building Skill - the part of skill that is involved in the preparation for a battle
1.) Assessing threats - ability to recognize major threats in the metagame and identify how they both individually and in tandem deal with your team
a.) Involves having metagame knowledge through playing and observing
b.) Involves the ability to think beyond individual Pokemon threats and into the realm of threatening strategies and concepts​
2.) Dealing with threats - ability to maximize the 6 Pokemon slots, 24 move slots, and 6 item slots to handle metagame threats
a.) Ability to recognize which slots are not serving maximum utility
b.) Ability to replace low efficiency slots with higher efficiency options​
3.) Building Towards a Strategy (or strategies) - ability to build a team that is "greater than the sum of the individual parts"
a.) Having the 6 Pokemon work together to cover weaknesses and emphasize strengths instead of just having 6 Pokemon with no cohesive strategy
* The most basic and common examples for covering weaknesses include combinations like CeleTran (Celebi and Heatran) or GyaraZone (Gyarados and Magnezone) in DPP
* One of the most basic and common example for emphasizing strengths includes a combination like DoubleDragon (using two Dragon Dancers to punch holes for each other).​
b.) Obviously isn't limited to combinations or trios; can refer to overall team strategies (think BP chains before outlawed or simple stall cores that work to cover each other's flaws)​
4.) Creativity - ability to come up with unique strategies or sets to swing momentum in your favor
a.) This means being able to surprise the opponent with a unique set or strategy without losing on general utility (too much)
b.) Doesn't just mean creating new sets, but also being able to use existing sets in a creative manner​
5.) Catering to Metagame / Opponents - ability to predict opponent trends, patterns, and tendencies
a.) Involves knowing the percentages of what you'll encounter on ladder and being able to build accordingly.
b.) Involves knowing your opponents in tournaments and take note of their common trends in building and prepare accordingly.​
B.) Battling Skill - the part of skill involved in actually battling
1.) Picking the Right Lead - ability to look at your team and your opponent's Pokemon and make an intelligent determination of what your win condition is and which Pokemon will best promote that in the beginning
2.) Recognizing the Win Condition - ability to look at your opponent's team in addition to the information gathered during a battle to recognize viable win conditions
3.) Picking the Right Move - ability to pick the best move in a discrete moment in time
a.) Encompasses ability to judge the opponent's potential moves
b.) Encompasses ability to choose between short and long term benefits and choose accordingly​
4.) Smart Switching - ability to switch intelligently to swing momentum in your favor
a.) Encompasses the ability to predict an opponent's moves and switch for the best scenario
b.) Encompasses the ability to continuously switch (double or triple switching) if necessary​
5.) Gathering Information and Making Assumptions
a.) The ability to predict or assume opponent sets in order to better plan a win condition
b.) The ability to to set probabilities for what the opponent has based on his actions in order to maximize predictions​
6.) Long Term vs. Short Term Goals
a.) The ability to weigh when to bring in a potential win condition
b.) The ability to judge whether an immediate benefit, such a revenge kill, is worth showing your hand or bringing out the win condition too early.​
7.) Assessing Risk
a.) Knowing when to sacrifice for a greater position later
b.) Knowing when and how to make a high risk, high reward move​
8.) Probability Management
a.) The ability to take into account the numerous probability factors that are in the game, including accuracy, secondary effects, and critical hits, and consider the best strategy
b.) Knowing how to minimize the risk presenting by probability factors​
9.) Prediction
a.) The ability to take into account all of the opponent's potential actions, apply weights to them, and move accordingly
b.) The ability to double or triple switch based on opponent tendencies to move momentum back in your favor
II.) Uncompetitive - elements that reduce the effect of player choice / interaction on the end result to an extreme degree, such that "more skillful play" is almost always rendered irrelevant
A.) This can be match up related; think the determination that BP took the battling skill aspect out of the player's hands and made it overwhelmingly a team match up issue, where even with the best moves made each time by a standard team often were not enough.
B.) This can be external factors; think endless battle clause, where the determining factor becomes internet connection over playing skill.
C.) This can be probability management issues; think OHKOs, SwagPlay, Evasion, or Moody, all of which turn the battle from emphasizing battling skill to emphasizing the result of the RNG more often than not.
D.) Note uncompetitive elements are almost always present in the battling skill aspect; they will, however, be present in the team building aspect should we allow them in the sense of having to rely on excessively specific counters (such as loading teams with Sturdy or Keen Eye Pokemon and the like).
III.) Broken - elements that are too good relative to the rest of the metagame such that "more skillful play" is almost always rendered irrelevant
A.) Important to note that it is a relative statement; a 200/200/200/200/200/200 BST Pokemon with standard movepool would be broken in a metagame where the average is say, 100/100/100/100/100/100, not where the average is 200/200/200/200/200/200
B.) Examples are mostly Pokemon and include strong Ubers like Kyogre, Groudon, and Arceus. These aren't necessarily completely uncompetitive because they don't take the determining factor out of the player's hands; both can use these Pokemon and both probably have a fair chance to win. They are broken because they almost dictate / require usage, and a standard team facing a standard team with one of them would be at a drastic disadvantage. These examples limit team building skill.
C.) Examples also include ones whose only counters or checks are extraordinarily gimmicky Pokemon that would put the team at a large disadvantage elsewhere. These examples also limit team building skill.
D.) Uncompetitive and Broken defined like this tend to be mutually exclusive in practice, but aren't necessarily entirely so.
1.) BP was deemed uncompetitive because of how drastically it removed battling skill's effects and brought the battle down to match up, but it could also be deemed broken because of the unique ways in which you had to deal with it.
2.) While this isn't always the case, an uncompetitive thing probably isn't broken, but a broken thing is more likely to be uncompetitive simply due to the unique counter / check component. For example, Mega Kangaskhan was deemed broken because it was simply too good relative to the rest of the metagame and caused the tier to centralize around it, but it could also be labeled as uncompetitive because of the severe team match up restriction it caused by punishing players if they did not pack one of the few gimmicky and obscure counters or checks for it.
IV.) Unhealthy - elements that are neither uncompetitive nor broken, yet deemed undesirable for the metagame such that they inhibit "skillful play" to a large extent
A.) These are elements that may not limit either team building or battling skill enough individually, but combine to cause an effect that is undesirable for the metagame.
1.) We haven't really had an example of an unhealthy ban yet, but a potential example is Stealth Rock; it certainly is on the mind of every team building experience and games are often steeped in Stealth Rock strategy. Whether or not this adds up to limiting team building skill or battling skill is part of the conversation to be had.
2.) One important thing to note with this is that distribution both matters (in the case of large distributions) and doesn't matter (in the case of low distributions).
a.) If Stealth Rock or Scald weren't so common, they probably would not be as controversial issues as they are.
b.) However, just because something isn't highly distributed, like Shadow Tag, doesn't mean it isn't unhealthy. Some tried to state that Shadow Tag wouldn't be broken on a 10/10/10/10/10/10 BST mon, but this is the wrong way to look at it.
c.) Things aren't broken (or unhealthy or uncompetitive) only in vacuums; they can contribute to the whole being greater than the sum of its parts. Instead, consider how potentially broken elements would be with average distribution on average BST Pokemon. If Shadow Tag was on, let's say 4-5 OU potential Pokemon as opposed to 1-2 and the average BSTs were something like 80/80/80/80/80/80, would it be broken?The take away from this is to not ignore distribution, but if lowly distributed, to assume how the element would take away from team building or battling skill if it was distributed to average pokemon in an average quantity.(Yes, we will provide average statistics)​
B.) This can also be a state of the metagame. If the metagame has too much diversity wherein team building ability is greatly hampered and battling skill is drastically reduced, we may seek to reduce the number of good to great threats. This can also work in reverse; if the metagame is too centralized a particular set of Pokemon, none of which are broken on their own, we may seek to add Pokemon to increase diversity.
1.) The Mega-Metagross suspect could be said to fall under this umbrella; Mega-Metagross wasn't really broken, but it was the best Pokemon in a game with far too many good to great threats. It was felt that, for the sake of metagame health, we should seek to reduce the number of these threats (however, you'll note the community voted to keep it in the tier).​
C.) This is the most controversial and subjective one, and will therefore be used the most sparingly. The OU Council will only use this amidst drastic community outcry and a conviction that the move will noticeably result in the better player winning over the lesser player.
D.) When trying to argue a particular element's suspect status, please avoid this category unless absolutely necessary. This is a last ditch, subjective catch-all, and tiering arguments should focus on uncompetitive or broken first. We are coming to a point in the generations where the number of threats is close to overwhelming, so we may touch upon this more often, but please try to focus on uncompetitive and broken first.​
-----------------------------------------------------------------------------------------------------------------------

Again, you'll note that for all three of uncompetitive, broken, and unhealthy, reducing skillful play was emphasized. We're going to center future OU tiering debates around people showing exactly how a potential suspect element is any of uncompetitive, broken, or unhealthy as defined here and which aspects of skill are affected.

-----------------------------------------------------------------------------------------------------------------------

Overall Goal and Purpose of Tiering Policy:

I.) To create a metagame that is conducive to the more "skilled" player winning over the less "skilled" player a majority of the time.
A.) "Skilled" is, as stated previously, a bit of a nebulous term, but it encompasses both team building ability and battling ability. More on this in the previous definitions section.
B.) What this means is that, with all of the probability management inherent in the mechanics of Pokemon and with all the team matchup factor inherent with the sheer number of threats in Pokemon, we strive to create a metagame in which the better player winning over the less skilled player happens significantly more than the less skilled player winning over the better player
C.) This does NOT provide justification for using win:loss ratios in tiering decisions...win:loss ratios don't tell us anything because they don't take skill into account.
D.) It is difficult to break down whether or not the metagame is achieving this, but certain metrics can help us. For example, looking at records on the Showdown Ladder and looking at Tour records for a tier. If we have people winning consistently, we are moving towards the goal of having better players win the majority of the time (for what it is worth, look at the Tour statistics from Adv - BW and you will note that every generation has had players who win consistently).
E.) What all 4 of the previous points seek to maximize is keeping the biggest determining factor in the match PLAYER CHOICE such that the better player wins the majority of the time.
II.) To ensure that both our ladder and tournament crowds are catered to regarding I.)
A.) Because ladder tends to be a scene where you play many battles in a short amount of time, "skill" for the ladder emphasizes beating the overall set of threats in a general sense.
B.) Because tournament battles tend to be a scene where a well played surprise wins the match, "skill" for tournament battles emphasizes the ability to both possess creativity and deal with creativity.
C.) The previous two are not mutually exclusive, just pointed out for emphasis. We strive to create a metagame in which someone can both deal with the general set of threats and be creative while dealing with creativity (read: balancing act between diversity and centralization).
D.) For tiering change suggestions, justification can be provided for either or both tournaments or ladder. Both is preferred as it makes an argument more complete.
1.) There will very rarely be a case where a true suspect element is not a problem in both environments, so be sure to be complete in your suspect justification.
2.) If something is overwhelmingly a problem in one environment and not the other, be sure to show how it is a problem in one and try to explain why it isn't a problem in the other.
3.) We expect some differences in both environments, but if a suspect element is non-existent in one environment, it is worth delving into why this is the case and whether there is something else to look at.
III. To ensure that actions are taken with appropriate and complete justification.
A.) Statistics help frame the context of a discussion.
1.) Be careful with adding spin to statistics instead of just reporting them; there are countless examples of using statistics incorrectly to draw deterministic conclusions that inevitably ruin a thread. Don't do this.
2.) Usage statistics and their implications correlate most strongly with how we have defined broken.
a.) While they can certainly provide context for uncompetitive and unhealthy, the way we have defined both means something does NOT need to be highly used to be either.
b.) This doesn't mean they don't need to be used at all. If say, Shadow Tag / Gothitelle is brought up as an uncompetitive suspect element but it has only 1 usage in 100 competitive tournament battles, people would rightly be justified in pointing this out as a counterpoint.
c.) What specifically constitutes "enough usage" is specifically left open-ended and it will be judged on a case by case basis for each suspect element.​
B.) Do not haphazardly and brazenly declare anything is uncompetitive or unhealthy and shoot down objective counterarguments.
1.) It will be on you to demonstrate how a particular component of skill is drastically reduced to a significantly damaging extent in spite of any potential low usage.
2.) This is NOT an easy task and suspects for uncompetitive or unhealthy will NOT be pushed through "willy nilly".​
C.) We will expect and demand in-depth analysis into what particular factor(s) of skill is reduced, how the proposed suspect element is actually the cause, and why and how removing (or adding) this element will improve the metagame.
D.) If logs are provided, don't simply provide logs where the suspect element won the battle.
1.) Show that the battle was won or lost in spite of the player's mostly correct moves.
2.) If another person points out that the battler did not make optimal moves, be prepared to debate if the element's suspect nature was the cause of the loss or the player not making the best moves (or both).​
E.) Arguments that show how a specific suspect element affects skill in relative terms to other elements in the metagame will be very, very, highly emphasized.
1.) Arguments emphasizing relativity were emphasized in defining broken, but this is referring to the "how x is it" part of the argument, where x is any of uncompetitive, broken, unhealthy.
2.) Simply stating Gothitelle is uncompetitive because it reduces player skill by limiting smart switches is not enough; show how it does so more than other elements in the metagame and how this is detrimental.​

-----------------------------------------------------------------------------------------------------------------------------
 

Aldaron

geriatric
is a Tournament Director Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
In case you missed the Policy Review thread that led to this:

http://www.smogon.com/forums/thread...rk-for-competitive-and-uncompetitive.3550201/

OU mods will be moderating threads to ensure that people follow this framework when debating policy.

Be sure to recognize the separation of the how and what; don't clutter threads with your own paradigms of tiering.

We have one and that is what you will use for debates. If you wish to discuss the framework itself, a separate thread focusing on the framework should be opened.

Note that the framework will be fluid throughout and between generations. Examples will be updates to reflect the current state and points within the framework itself may be changed.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top