The Usage Stats Problem: Using 1760 Stats for Tiering

Jukain · Mar 3, 2014

Everyone knows the usage stats are a problem. It takes anyone who's played OU for a while competitively to know that Chansey is generally better than Blissey, that Donphan should not be OU, that Thundurus should not be #45 in the usage stats, and that Manaphy/Kyurem-B should be nowhere near UU. The fact that we use these stats to determine what's in a competitive tier is silly. The weighted stats aimed to fix this problem, but they're just covering up stuff even worse than this. People complain about the usage stats more and more each time they come out. It's just getting tiring at this point. After talking to some others, I'm proposing we use the 1760 stats (Antar just released these) for tiering. A few OU jumps:

- Thundurus, a top 3 mon, up to #13 (from #45)
- Manaphy from UU, up to the 30s
- Kyurem-B from UU, up to the 30s
- Chansey from UU, to #33, and Blissey down to #48

List of BL/UU mons that would become OU: Landorus, Deoxys-S, Latias, Keldeo, Kyurem-B, Chansey, Deoxys-D, Manaphy, Terrakion, Zapdos

List of OU mons that would become UU: Donphan, Klefki, Smeargle, Starmie, Trevenant, Galvantula, Tentacruel, Sableye, Cloyster, Forretress, Salamence

The 1760 stats are so much better. They should be used for Smogon tiering.

Jukain · Mar 3, 2014

Sorry for double post but these won't work if I just edit

Tagging relevant tier leaders Aldaron M Dragon McMeghan Haunter Nachos kokoloko Molk for opinions (didn't tag leaders of tiers unaffected by usage)

kokoloko · Mar 3, 2014

don't care tbh. but if we do this I want it to be done like literally right now. not waiting around for this to go through to move on with UU tiering.

basically, support if it's done within 48 hours opposed if it takes longer.

that is all.

Aldaron · Mar 3, 2014

DougJustDoug posted some tl;dr about being against weighted stats for tiering and Antar agreed

so ask them first :P

Imanalt · Mar 3, 2014

Aldaron said:
DougJustDoug posted some tl;dr about being against weighted stats for tiering and Antar agreed

so ask them first :P

Just for clarity... arent we already using weighted stats for tiering...? theyre weighted for 1500 not 1760, but my understanding was our "standard" stats now are weighted...

So all were talking about doing is lifting the bar from being the top 50% to the top 2% roughly

Antar · Mar 3, 2014

Honestly, I'm going to need Doug to remind me why I was so against it.

First off, we can easily decide to use 1630 (one std.dev, or top ~17%) if 1760 is too severe. I would prefer this, I think, because otherwise we're looking at an incredibly small handful of users determining the tiers, but even then there's some validity to that, since the only way to get to the top (besides cheating, which is always a concern) is to be the best.

Basically, we need to decide who our tiers are for: are they for the typical player, or the typical competitive player? If the former, we need to keep the cutoff right where it is. Otherwise, there's no harm in entertaining the idea of raising the cutoff.

Imanalt, yes and yes. But you say that as if going from 50% to 2% were nothing.

Jukain · Mar 3, 2014

They should be for the typical competitive player for one key reason: they're used in our tournaments. True tournament stats are unrealistic, but it's silly for the best to play with ladder stats not representative of a competitive playerbase. Typical players will play our tiers, but should not affect the usage stats that determine what randomgoodplayer123 plays with.

dcae · Mar 3, 2014

I'd like to note that tiers, in the way that they are currently made, are more oriented towards competitive play. However, the situation is not as black and white as you seem to make it out to be. People who want to have fun will still use the Pokemon available to them. For example, I and some others enjoyed using Mightyena in RU last generation; it being classified as NU.

Unless I am misunderstanding, I do not see how changing the cutoff will prevent people from having fun. I'd like to use the mons listed in the OP as an example:

List of BL/UU mons that would become OU: Landorus, Deoxys-S, Latias, Keldeo, Kyurem-B, Chansey, Deoxys-D, Manaphy, Terrakion, Zapdos

List of OU mons that would become UU: Donphan, Klefki, Smeargle, Starmie, Trevenant, Galvantula, Tentacruel, Sableye, Cloyster, Forretress, Salamence

Out of the BL/UU Pokemon, the majority of the Pokemon are already banned due to their broken aspects, as they are generally agreed to be OU Pokemon level.

Any Pokemon dropping from OU to UU does not affect people who want to use them in OU in any way, shape or form. In fact, this encourages variety, as bad Pokemon in OU can potentially become a great part of a lower metagame, while they were only inflated due to people using them for fun.

As such, the tiers as a whole enjoy an increase in variety, it prevents lower tiers from having to ban loads of things that should be tiered higher but aren't used like they should in the lower ladder, and people who want to have fun can still have fun.

The underlying argument I am making is that people who want to have fun should have fun, but that should not be affecting a tiering process that is crucial in high level tournaments. The two do not clash, and tbh there is really no downside to using 1760 stats.

trc · Mar 3, 2014

I whole-heartedly agree with this proposal. To be honest, each tier (descending) is considered to logically contain the Pokemon that are, to say, not up to the standards of the tiers above it. Therefore, it is logical to assume that a tier is meant to consist of the most viable (or best) Pokemon available. The reason we use usage as the way of creating our tiers is because of the assumption that a Pokemon is used a lot because it is good. While this is generally a true statement, it is not indicative of every player, as some players are (obviously) mediocre. We can determine that the purpose of tiers is to determine the viability of each Pokemon; how good it is. Therefore, in order to create the most accurate competitive tiers, we need to base the creation of these tiers on the most competitive players.

It's not like there are no benefits obtained from this either. Players in lower tiers will be able to use Pokemon more suited for that tier, while players in higher tiers will be encouraged to use Pokemon that are better in that tier, instead of using ones that just aren't as good. To be honest, there really isn't an inherent downside.

DougJustDoug · Mar 3, 2014

The original argument on weighted stats was back in 2008-2009, IIRC. It was a much less sophisticated world then, as it pertained to online competitive pokemon simulator play. I won't go back through all the arguments from then, but they were heavily influenced by the amount of server traffic (which is orders of magnitude higher now), the amount of blatant cheating on ladders (not sure if that has gone up or down these days, but it doesn't seem to be as top of mind as it was back then), and general concern with the quality of our ratings and whether we actually measure player skill or not (still a hot debate, but I think we mostly accept the tradeoffs and limitations). I could list more differences between now and then, but I think you get the idea -- different times demand different policies.

I think the positives of using weighted stats will probably outweigh the negatives.

Subjectively, I can't deny the gap in pokemon used by good players and new players is wider than ever. We were talking on IRC earlier today scratching our heads over the relative stat differences between pokemon and movesets used in OU, Suspect, and even Battle Spot play. It is hard to argue that general usage stats really reflect any kind of "competitive consensus" (not that usage was ever that great, but still).

Like Antar mentioned, we should all take pause at the idea of using the usage patterns of 2% of the player base to set the tiering policy for 100% of the player base. And yes, I know players can use shitty pokemon in OU if they want to. Our tiering is not forcing anyone to do anything. But Smogon tiering policy is perceived as a binding thing, and we need to own the repurcussions of having that kind of influence on the metagame. So yeah, deriving policy from 2% of the players, is not great. But then again, 2% of the players today is probably a greater number than 100% of the OU playerbase in 2009, right?

So, FWIW, even I agree with using higher rating weighted stats for tiering nowadays.

a fairy · Mar 4, 2014

couldn't Control+F either on this page, so just making sure because honestly i assumed this was Elo and there didn't seem to be any hints to the contrary

23:58 jukain ls
23:58 jukain it is not
23:58 jukain elo
23:58 jukain it is
23:58 jukain in glicko2
23:58 jukain uh glicko

i might write up a legitimate post on why i think this is a bad idea but honestly i dont think it'd change anything at this point

Raseri · Mar 4, 2014

This is an issue that would need a lot of discussion between everyone involved (so PS staff, antar, and tier leaders) before any sort of fair decision could be made.

There are a few issues I do have with this proposal that I would like clarified before I take a formal stance:

What will you do if you don't like the way a tier looks in the future? Say Donphan does make it back to OU under 1760, what do we do then? Because it is pretty clear that the point of this is to make OU look the way you want it to, will this system be modified further in the future to make the tier look the way you want it to?

Why 1760? It is apparently the top 2%, but that seems like a pretty arbitrary number, and it also just seems so small to me. There are a lot of competent players below that, but they dont count as much?

Truthfully, my main opposition is because I don't like the idea of manipulating how we tier just to get a "better" result. We didn't do this in BW to my knowledge, even when things didn't belong in their tier. So why start this now?

So I'm going to oppose this for now, because there are to many unknowns about how this is going to be handled, and why this is better. I can be swayed, but I really think we need to put a lot more thought into this rather than just running with the idea.

trc · Mar 4, 2014

I don't think that "we didn't do this in BW, why do it now" is a good argument at all. A proposal of change shouldn't be opposed just because it wasn't in effect at another time, that's an absurd notion. Not that I personally disagree with your post as a whole, I was just addressing one part of it.

Super Mario Bro · Mar 4, 2014

The main reason that we should increase the tier baseline is to prevent people experimenting with new (and often flawed) teams, as well as people who have little to no experience battling competitively, from having as much influence on the stats as those who do play competitively on a regular basis.

Another option we could entertain is raising the tier cutoff from 3.41%. Frankly, this number resembles a credit union interest rate, and I'm not entirely sure of its mathematical basis, even after reading Smogon articles on the matter. In any case, the metagame now is much different than when that percentage was conceived, and I would wager that we can come up with a better one.

Antar · Mar 4, 2014

Super Mario Bro said:
The main reason that we should increase the tier baseline is to prevent people experimenting with new (and often flawed) teams, as well as people who have little to no experience battling competitively, from having as much influence on the stats as those who do play competitively on a regular basis.

Oh right! This was part of my objection to doing tiers based on "1850" stats--if a player wanted to spam some Pokemon but didn't want them to influence tiering, he or she would just have to alt-reset before they got close to 1760. Granted, I don't know *why* someone would do this... it's really the reverse that tends to be the problem.

Really, though, I very much want Smogon PS to limit alts. I really want all alts to be "display only," and I want every alt to have a different Elo but share the same Glicko. It's help with so many problems. But this is a topic for another discussion.

Another option we could entertain is raising the tier cutoff from 3.41%. Frankly, this number resembles a credit union interest rate, and I'm not entirely sure of its mathematical basis, even after reading Smogon articles on the matter. In any case, the metagame now is much different than when that percentage was conceived, and I would wager that we can come up with a better one.

Sure, if your goal is to shrink the size of OU. That's all upping the cutoff will do. Also, just to be pedantic 3.41% = 1-0.5^(1/20), so to be clear, 3.41% is not the arbitrary number, 20 (as in, 1 in 20 teams) is.

a fairy · Mar 4, 2014

I find the idea of making Donphan UU because "the tier lists are shitty" is a really bad trade off for giving into elitism again.

I'm honestly surprised Control+F gave no hits, because that's what this is - elitism.

I won't argue with the OP & supporters on the concept of "tier lists are shitty" because I find that to be something completely up to opinion. If I built a team that no Scizor ever could touch, Mega/SD/CB/Lefties/you name it, then I would naturally think the tier lists are shitty, because this crappy Pokémon that can't touch my team is so high up on it!

The idea of "let's do this so stuff can drop to OU to increase diversity" is also one that makes me confused. That is a one time deal - if we change this RIGHT NOW, and retroactively put this into effect, then yeah, dcae's list of OU-worthy Pokemon like Klefki (hell we're discussing about banning that thing from OU) and Trevenant will drop into UU, and then you can use them in 2 tiers, three if you count Ubers. Then what? What happens when next tier shift, we find that <insert group name here> had seen this thread, and decided to abuse this easily exploitable system, and move a whole ton of shit into OU; shit that was needed in UU for balance? Are we going to become more elitist?

To make a comparison, you don't see the 2% (or 1%, if you're into all that cool numbers) of the richest people in the USA lobbying to drop literally everyone else from stuff like the average income, just because "it makes are average income look shitty :("

This is not your OU, your UU, your RU, NU (if you are the tier leader of any of these and you are reading this, this doesn't technically apply to you). This is Smogon's tiers as a whole, and we shouldn't be using the stats of the 1% because "the tier lists are shitty". Elitism. Do you want a revolt? This is how you get revolts.

DTC · Mar 4, 2014

Raseri said:
What will you do if you don't like the way a tier looks in the future? Say Donphan does make it back to OU under 1760, what do we do then? Because it is pretty clear that the point of this is to make OU look the way you want it to, will this system be modified further in the future to make the tier look the way you want it to?

People will always have their own personal tiering opinions. However, that does not mean they will suggest for a change just because they disagree with a few things. We feel as if the current tiers do not accurately represent the metagame at all. That is why we are suggesting change. We aren't suggest changing because "lol Donphan is OU, Trevenant is OU, and so on and so forth"; we are suggesting change because there are so many Pokemon displaced. (this is subjective; yet there is a large enough displacement that a lot of people agree that there is a problem) People won't seriously suggest change until they feel the tiers are very inaccurate. Regarding your point (paraphrasing) "we didn't do this in BW when people had a problem - why start now"... because even though people had issues with the Pokemon in the tier, not many people felt it was enough of a problem to warrant change like this. In addition, we DID do something about it: it was when we first implemented weighted stats in the first place. Anyways, now people feel as if there is a problem with the stats, which is why they are suggesting change.

I emphasize - we're not proposing this change because Donphan is OU and Latias is not; we're proposing the change because there are so many Pokemon displaced that the tiers are not representing what is actually good in the tier. This may not be the central goal of tiers, but it is certainly an important attribute in them.

Lady Salamence, this is very similar to how we do suspect tests (and for the record, these account for even less users). You can always have influence on tiering, as long as you are willing to put in the time and are good enough. It may seem "elitist", but in the end, we are the lead in our tiering system and... again... you can still affect usage stats as long as you put in the time and are good enough... which should be a goal in a competitive Pokemon community. Tiers should accurately represent (for the most part, of course people will have their personal opinions) what is good in the metagame.

I think you bring up a good point - whatever we go with, the exact Glicko number should not be announced, so it's harder to take advantage of. We can of course agree on exact % we want (which is still up for discussion), but whatever Glicko number that ends up being should not be revealed. While we're on that topic, it's not as easy to take advantage of as you think. For one, the top 2% of the ladder is still a huge number. Even if one person decides to spam something like say Suicune in OU, they're still only 1 person. Plus, if a Suicune is getting them that high on the ladder... isn't it worth some consideration at least? And while you could bring up people like ShakeItUp who manage to get really high with stuff like Mighteyna in OU... the amount of people that can and are willing to do that is much smaller in comparison to people playing "legitimately" is very small.

The exact % is still up for discussion. The top 2% may seem very restricting, but in a huge ladder like OU, it still accounts for a lot of people. Of course, we can still change the numbers if people feel as if it's too restricting still. UU and RU do not need to use the exact % as OU either - it can be modified to account for more people if it would be benefical.

In essence, even if this solution may seem arbitrary and elitist, in the end, if it makes our tiers look a lot "better" (again this is subjective... but a lot of people agree there is a problem... and it's clear that taking stats from "better" ranked people on the ladder improves our tiers) then as DJD stated, the tradeoff is worth it.

a fairy · Mar 4, 2014

DTC

Regarding suspect tests: We do this because it is literally the only legitimate option for the most part. We tried a "my way or the highway" system in ADV with Aeolus and Jumpman. We tried various different systems throughout the entirety of DP, DPP, BW1, and BW2, of which included: Pure Ladder, Pure Council, Open Council (think DPP's Salamence test) and possibly others I'm forgetting. Each one of those, for various reasons, did not work. If they did, we'd be using those systems, and not the current one.

Regarding "tiers should be what is good in the tiers": Then why the hell are we using usage stats to decide our tiers? Usage stats is what the name implies - what is used in OU (or UU/RU if you go down lower). It doesn't imply what's good in OU/UU/RU, so clearly, instead of marginalizing 98% of our users, give or take, we should run something akin to SPL, 24/7, and derive our stats from that, since SPL and co. are considered the top of competitive? At least then, we could say "We tried usage stats, and since people got pissy about Donphan being OU, we've stopped using that and started using a more competitive system because people want OU to be "what's good" as opposed to it being based on usage stats

Antar · Mar 4, 2014

Lady Salamence said:
That is a one time deal - if we change this RIGHT NOW, and retroactively put this into effect

So just a heads-up, folks, we *can't* do this retroactively. Glicko2 was unequivocally *broken* (did not accurately assess player skill) before mid January, when we "rescaled" the ratings. So we can't generate "1760" or even "1630" stats on two-month-old data as of now. Okay, fine--there's a way I *can* do it, and that's by calculating a player's Glicko score myself as it would have been if we'd been using the new rating system and had reset the ladder on December 1, but that'll involve quite a bit of coding and will take a *long time* to compute.

DTC, I was with you up until:

DTC said:
I think you bring up a good point - whatever we go with, the exact Glicko number should not be announced, so it's harder to take advantage of.

Security through obscurity is never a good policy. Also it would mean never publishing the usage stats we actually use for tiering and taking my scripts off of github (since you could probably find the number buried somewhere there).

Lady Salamence, when we made the decision to use weighted stats, we made a decision about what tiers primarily mean. Based on this premise:

Antar said:
Again, this is a controversial subject, but my answer is that OU means what it stands for, that these Pokemon are simply "overused," and that the primary function of tiers is as threat lists. To elaborate, I'm going to point you folks to the original defining of our current OU-UU cutoff: in short, a Pokemon is OU if, in playing 20 battles, there's at least a 50% chance of you encountering that Pokemon at least once. This is an acknowledgement of the fact that there are 649 Pokemon out there--if you're designing a team of six Pokemon, it's unlikely that you're going to be able to make sure that your team has a way of dealing with each and every Pokemon out there. But if you're making an OU team, you probably will never have to worry if your team gets completely wrecked by Leavanny, since it doesn't even appear on one team in a thousand. What the OU/UU cutoff literally says is: "if said Pokemon is UU or below, you still have a good shot of going 20-0 even if your team is super weak to that Pokemon."

we made the decision that the tiers should be threatlists for the "average player." But do we really want that, when, I'd venture, at least 50% of the Showdown playerbase are not particularly interested in having competitive teams?

All upping the cutoff to 1630 would do is say that the threatlist is now for the "above average" player. Ideally we'd want to settle on a number that corresponds to the "average competitive player," but that's so subjective, I don't think we'll ever be able to do that.

Antar · Mar 4, 2014

Zarel, a thought occurs: you know how there used to be a ladder for unrated randbats? Why don't we do that for OU? It's possible that will solve our problem quite nicely.

DTC · Mar 4, 2014

Lady Salamence said:
Regarding "tiers should be what is good in the tiers": Then why the hell are we using usage stats to decide our tiers? Usage stats is what the name implies - what is used in OU (or UU/RU if you go down lower). It doesn't imply what's good in OU/UU/RU, so clearly, instead of marginalizing 98% of our users, give or take, we should run something akin to SPL, 24/7, and derive our stats from that, since SPL and co. are considered the top of competitive? At least then, we could say "We tried usage stats, and since people got pissy about Donphan being OU, we've stopped using that and started using a more competitive system because people want OU to be "what's good" as opposed to it being based on usage stats

This is a case of black-and-white syndrome: what you are saying is we should either go with our current system, or go with the #1 top level of play. The problem with the #1 top level of play is, while the top level of play, the sample size is very small. We are not suggesting we should only go with the absolute best of the best, but we are suggesting we should cut out all of the players that are not great while still maintaining a large enough sample size. Literally the only reason we suggested 2% is because that is what Antar went with in his stats... like I said, it can be changed.

Regarding using ways other than usage stats for deciding the tiers... there is no other good way to decide tiers. If we manually decided stats, either by say Viability Rankings or the 5 guys at the top, it becomes a gigantic mess. Usage Statistics are a lot simpler and cleaner and less subjective, even if we go with an abritary cutoff.

Antar -- good point regarding security through obscurity. I didn't really think it through well enough.

a fairy · Mar 4, 2014

Antar I'll not bother responding to the retroactive thing because it's not the point, but in regards to tier lists being an average user's threat list...

Antar said:
What the OU/UU cutoff literally says is: "if said Pokemon is UU or below, you still have a good shot of going 20-0 even if your team is super weak to that Pokemon."

Then, if the tier lists are an average user's threat list, then this is all the more so reason to not use weighted stats.

If I am randomuser7086, just starting out in competitive, and I'm making a team, and the 1 rate I got on my exportable before Remedy locks it for breaking 3 RMT Forum rules is "weak 2 donpahn", and he goes on the ladder anyway, because hey, Donphan is in the lower tier, and the stats guy on the forum says that due to this, you should theoretically be able to go 20 battles without seeing one, only to find a Donphan in his first three+ battles, because nobody let him know that the stats are based off of a super elite with hours to burn on the ladder, as opposed to actual usage stats, then that defeats the purpose of an average user's threatlist, because it is completely different than what you will actually see on the ladder.

A threat list is a list of Pokemon that can cause issues to an individual team. A threat list is not a general list of "this thing can use outrage and it has 120 base attack stat".

I personally disagree on your definition of a tier, but the time to argue that is sadly long gone.

edit DTC: We should not cut out large parts of our community in order to make Donphan UU for an unknown amount of time. That is a horrible idea.

Antar · Mar 4, 2014

Lady Salamence, but if only bad players use Donphan, then it doesn't matter if your team is weak to Donphan--if you're a good player, you can probably still net the win.

a fairy · Mar 4, 2014

I thought this was an average user's threat list. An average user has to get used to a new team he or she has built, has to get the feel of the ladder, et cetra. If we're changing OverUsed / UnderUsed / RarelyUsed / NeverUsed to "threatlist for spl players/OST semifinalists" then whatever, fine. I just heavily disagree with that, because

[09:34] <ladysalamence> this is a suggestion where it would invalidate who knows how many thousands of users ... i think that's a horrible idea

Antar · Mar 4, 2014

Lady Salamence said:
I thought this was an average user's threat list.

This is what we're discussing, because quite frankly, the average player is *horrible.* There should be no reason why I would be facing mono-normal teams on the OU ladder at ~1500 Elo.

The Usage Stats Problem: Using 1760 Stats for Tiering

Jukain

!_!

Jukain

!_!

kokoloko

what matters is our plan!

Aldaron

geriatric

Imanalt

I'm the coolest girl you'll ever meet

Antar

Jukain

!_!

dcae

plaza athénée

trc

DougJustDoug

Knows the great enthusiasms

a fairy

Raseri

trc

Super Mario Bro

All we ever look for

Antar

a fairy

DTC

a fairy

Antar

Antar

DTC

a fairy

Antar

a fairy

Antar