114, it's true that top players count a lot more than players with decent ratings, but as I showed in the sample calculations, (most) players with decent ratings count just the same (a bit more actually) than they used to, when considering fraction of the entire stats--it's just that players with ratings less than ~1600 (keep in mind, I'm talking Glicko, not Elo) don't count at all, and their contribution is made up for by people at the top.
And no, it turns out that rating does not correlate with number of battles played, even at the top-most levels. Surprising, I know.
As for how many players have a weight of at least 0.5? That's actually an easy question to answer, because all I need to do is count the number of players with Glicko rating of at least 1760: that number is 1510.
Kind of off-topic but not sure where else to ask this: why is the top 500 based off elo while weighting is basef off Glicko? Why not use one system for both? Been curious about this for a while
UU is really desperate to get off the ground, and given that there was no actual precedent (no previous UU banlist for Gen VI), it was decided that, for the sake of expediency, we'd move on this right away.My one and only complaint about this decision is that this topic should have been made before implementing this change on PS.
In truth, you're exactly right: this is a bunch of BS that we're using to justify our decisions. I really can't say I care for the decision to increase the ratings cutoff beyond 1500, because it means that players with ratings above 1500 can end up mattering less than players who have never played a match before. And we are exploring alternatives to make this no longer necessary. But in the meantime, raising the cutoff is the least bad decision, and trying to inform where to put the cutoff based on data and metrics rather than just straight-up "I think it should go here," makes this least bad decision slightly better.Big Concern: So if I'm understanding the new policy, each month you use a combination of common sense and magical statistics mumbo jumbo that I can't even begin to understand to determine these "candles of known brightness", with which you will use in combination of more magical statistics mumbo jumbo to produce a single number that we interpret as the glicko rating of the "average competetive player".
I'll start off by proposing some candles, but I don't have the best knowledge of the metagame, so I'll be listening to suggestions (from both the tiering councils and from ordinary members) to come up with more.Related Sub-Concern 1: Are you determining these candles unilaterally?
I know, but I can't think of a better name for it."cutoff number"(massive misnomer btw, you may want to consider trying to come up with a better name, as it's causing a fair bit of confusion even to relatively informed members, like the ones reading and posting in this topic)
I'll probably start a thread, and I'll definitely listen to feedback, but at the end of the day, these "candles" are being used as helpful markers, not to directly determine anything, so it'll probably be a constantly evolving process rather than a matter of "okay, here are the set of candles everyone has agreed to."in which case you should probably open it up to others for some discussion.
Items are probably better candles than moves, by and large. Assault Vest / Sitrus Berry / Leftovers in Little Cup is the best example. But Sitrus Berry on non-Harvest/Recycle sets would probably be a useful candle. And Focus Sash on Sturdy sets. The easiest candle metric would definitely be subpar abilities: there is no competitive reason to use Truant Durant (outside of Doubles), Flame Body Talonflame (you give up too much not running Gale Wings)...Related Sub-Concern 2: So I noticed that you are largely using the presence of certain moves and items as flags for possible "candles".
IMO, this is the exception that proves the rule.Example 1: Choice specs adaptability porygon-z with hyper beam
I've seen this discussed before. First off, if you're using Slaking in OU, I think you're "doing it wrong," but secondly, the truant move can (and should) be used to switch out, so Giga Impact isn't even generally considered to be a good move on Slaking, especially when Return has 102 base power. Or so I understand it...Example 2: Slaking with giga impact
It's fine to use lower-tier Pokemon in OU. I'd argue every team should have at least one. The only Pokemon that could be considered candles are ones who are 100% outclassed by another Pokemon. So Gyarados > Feraligatr used to be an example. Chansey/Blissey > Audino would be another example (Regenerator doesn't help *that* much).Related Sub-Concern 3: How do we handle pokemon of lower tiers being used in higher tiers? For example, Donphan is currently UU, but you have specifically stated that Donphan in itself is not a "candle". However, what about RU and NU mons? What about LC mons? Some more examples for consideration.
If anyone reading this objects to my specific examples, please save your thoughts for when I make the "candles" discussion thread. I don't want this thread getting derailed.
And with that, I think I've responded to everything! But feel free to ask more questions and attack any of these arguments, as long as you're not discussing specific examples (you can say, "I disagree with some of your examples," but please leave it at that).
That being said, judging from responses on the candle thread, there is prolly going to be some considerable discussion on what exactly is a candle, because many players felt the bar was set way too low and defeats the purpose of using 1760 stats.
This is also a bit of a concern of mine. Swarm Scolipede is terrible. Hyper Beam on Mega Gardevoir is terrible. Competitive players will never use them.
Limber Ditto is not just terrible, it's not just something an "uncompetitive" player will do, it's either something going to be used by someone so functionally incompetent they can't handle a pokemon with two abilities and one move, or someone that just accidentally forgot to set the ability. I know the ladder has become saturated with awful players but I think the bar is being set too low, some of this stuff is only going to be found under 1200 rating or whatever.
It'd be nice to see some stats, I'd be glad to be proven wrong.
Or someone who simply doesn't know what imposter does. When I first started playing PS, I actually hadn't played BW so I only really had knowledge of moves/mons/abilities from gen 4 and before. It is entirely possible that a guy, seeing some random ability he's never heard before, is going to go for the status immunity ability just because he knows it's at least somewhat helpful. It doesn't help that a lot of descriptions get cut off when you click on them or try to /data them in PS.
Not saying that we should call this guy "competitive", but ignorance != incompetence.
Not to mention some competitive sets are just downright non intuitive. Sheer force special attacker Nidoking, one of UU's top wall breakers, anyone?
if there were a shortage of crap stuff used on the ladder, I'd be going for the niches. But the case is that as long as Limber Ditto is around, there's no need to worry about Water Shuriken.
And by "really good" I mean stuff that's in no way ambiguous or controversial. There is zero competitive advantage to Limber Ditto. There might be some small niche for Donphan or Water Absorb Politoed. So what that means is that if I plot the distribution of Glicko R's for players that use Limber Ditto, I can put the cutoff above all of them, but if I use Water Absorb Politoed, the cutoff will probably be only such that it filters out 99% of them. Does that make sense?I *do not* want to be thorough. I'm hoping for a small number of really good candles.
I think the major issue here is that this stuff is so bad that even the 1500 stats filtered it out just fine. Even with the ladder as awful as it is, there is no way anyone will ever win with Limber Ditto, and there is also no way anyone who would ever purposefully put a Limber Ditto on their team would ever win, let alone stay above or anywhere near 1500 Glicko. The same is true for Water Absorb Politoed, Pressure Bisharp, Focus Sash Blissey (Heck, Blissey itself is bad enough since it is outclassed in every meaningful way in OU by Chansey), and other such examples, all of which are completely useless options and are actually somewhat comparable to Leftovers in LC. The Limber Ditto users were never a problem becasue they never even influenced the usage stats to begin with. The entire reason the cutoff was raised was that nearly all above average OU players felt that the OU list did a horrible job of actually representing what was good in the tier, which is the entire purpose of the tier lists. Staples of good OU teams such as Keldeo, Kyurem-B, and Manaphy were languishing in UU or BL while awful and entirely outclassed Pokemon such as Donphan and Forretress that saw next to zero use in high or even mid level competitive play were solidly OU. Determining the cutoff based on candles such as these would completely undermine the whole idea Jukain was proposing by suggesting to use 1760 stats for tiering and would quite honestly make next to no sense.xam13124
And by "really good" I mean stuff that's in no way ambiguous or controversial. There is zero competitive advantage to Limber Ditto. There might be some small niche for Donphan or Water Absorb Politoed. So what that means is that if I plot the distribution of Glicko R's for players that use Limber Ditto, I can put the cutoff above all of them, but if I use Water Absorb Politoed, the cutoff will probably be only such that it filters out 99% of them. Does that make sense?
I think the major issue here is that this stuff is so bad that even the 1500 stats filtered it out just fine.
let alone stay above or anywhere near 1500 Glicko.
The same is true for Water Absorb Politoed, Pressure Bisharp, Focus Sash Blissey (Heck, Blissey itself is bad enough since it is outclassed in every meaningful way in OU by Chansey), and other such examples, all of which are completely useless options and are actually somewhat comparable to Leftovers in LC.
The Limber Ditto users were never a problem becasue they never even influenced the usage stats to begin with.
Determining the cutoff based on candles such as these would completely undermine the whole idea Jukain was proposing by suggesting to use 1760 stats for tiering and would quite honestly make next to no sense.
What about obvious troll/scouting teams, like one-mon teams (usually packing Imposter Ditto or Transform Smeargle)?
Fair enough, I should have looked it up first. However, this does raise a question: How many of the Limber Ditto had Limber intentionally and how many were used by players who simply forgot to change Ditto's ability in the teambuilder? This isn't that hard to miss, considering the fact that the default ability is Limber. You can get at least reasonably high on the ladder if you forget to change the ability and simply win your first few battles without ever sending out Ditto, I doubt that anywhere near .758% of Ditto users intentionally ran Limber. Stuff like this is even easier to miss with Pokemon that have useless default abilities that aren't as visible as non-Imposter Ditto. For example, 3.311% of Starmie run Illuminate, a completely useless ability that happens to be the default one. Realizing that you forgot to change your Starmie's ability is actually reasonably hard to tell as none of its abilities have an easy to notice impact on the battle. For this reason, I don't this Limber Ditto is a good candle, as it signifies absent-minded players more than it signifies bad ones. (even if it didn't, there's no real way to tell)Except, of course, that it didn't. All of these candles appear at nonzero percentages in the stats. This is why I ruled out shit like "Pound," which *never* appears, even though it's even worse than Limber Ditto.
I think you're confusing 1500 Glicko with 1500 Elo. The only way to see your Glicko rating is to type /rating or go onto the leaderboard. That starts at 1500. Elo is the one that starts at 1000.
Oran Berry in OU is also comparable. There is absolutely zero reason to use Leftovers in LC when you can use Berry Juice or Eviolite, just like there is absolutely zero reason to use Water Absorb Politoed in OU when you can use Drizzle Politoed or any of the dozens of Water-types that are better than Water Absorb Politoed in every single way. Even if it weren't outclassed in any way, Water Absorb Politoed would still be worthless in OU because its stats are just so bad. Heck, Ninetales is barely viable in XY OU with Drought, let alone without it. It's just that bad.I personally think this analogy is just plain wrong: ORAN BERRY in OU is what's comparable to Leftovers in LC. But all these above examples are somewhat viable, just out-classed by better options. You can make a team that wins matches with all of the Pokemon you mentioned... you just probably won't end up very close to the top of the ladder.
It took me a while to realize this, but the purpose of the standard candles is to identify a subset a of Pokemon Showdown users using Pokemon, movesets, abilities, and/or items possessing unequivocally no competitive merit. The candles are not meant to filter out, but the provide one with a sample of obviously uncompetitive players as one would use this information to determine what a rating of an uncompetitive player should be? The designated criteria of the standard candles are obviously theoretically useless in gameplay, and thus their usage should be correlated with a player's ineptitude (such ignorance of game mechanics and/or lack of knowledge of the Pokemon's movepool) or lack of competitive intent (such as trolling). Moreover, even competent players who use the standard candles would be handicapped severely and this would adversely affect their potential rating.I understand the idea behind this, but it does have some big concerns about seeming arbitrary and elitist. To work around that, I'm curious if you've considered finding ways to filter out noncompetitive teams directly? It might be possible to compile a list of characteristics which we could reasonable say no competitive team would have, at least in a given tier: particularly certain Pokemon, moves, and items which, if present on a team, would disqualify the team entirely from counting for a subclass of usage stats which would be considered "competitive" and therefore used for tiering. This seems along the lines of your "candles" idea, but targeting the candles directly rather than blanket sweeping away 98% of the teams just because that group contains the candles.
...
Whether what we really want are usage tiers or effectiveness tiers, these could be ways to pick one of them and make it work without letting it be influenced by blatantly uninterested players either way, rather than going for a strange midpoint by deciding over and over again which group of players represents the "right" usage tiers.