Wow. I expected the 1630 stats to be quite a bit more of an improvement than this. Call it confirmation bias, but the 1760 stats represent the OU metagame at a competitive level much better, while the 1630s are just disappointing. In making this proposal, my goal was for us to end up using usage stats that are
good, not
slightly less mediocre. Because, that's exactly what the 1630 stats are.
Initially, one may look at the 1630 stats and think, well, most of the things that 'should' be OU are OU. But, we still see the mediocre Pokemon in OU. Trevenant, Donphan, Klefki, Tentacruel, Starmie, Galvantula...all Pokemon that are either plain bad or at least fairly mediocre in OU. These stats represent not nearly as much of an improvement, as the stats do not represent Pokemon used in a competitive environment by good players. Take Thundurus, the most-used Pokemon in SPL. We see it just in the top 30 in the 1630 stats, but up to the top 15 in the 1760 stats. The 1630s do not place Manaphy and Deoxys-D as OU. Less obvious things are Chansey below Blissey, Zapdos' MUCH decreased usage, and Infernape up so high.
At this point, it might seem like I'm trying to shape the OU tier according to my preferences, but the fact of the matter is that with Chansey used less than Blissey (which is clearly different in the 1760 stats), and top-tier sweeper Manaphy as well as top hazard setter Deoxys-D in UU, as well as one of the five best Pokemon in the tier barely in the top 30 (Thundurus), the 1630 stats are just disappointing. They're an improvement, but not
good. Is the goal to settle for a little bit better, but still pretty mediocre, or to actually get something good/decent? I sincerely doubt anybody wants mediocre stats; thus, I see little reason not to favor 1760 stats.
One problem that has been brought up is sample size. 50,000 battles of what we can deem to be competitive play between two players that are at least pretty good seems like a more than large enough number to me.
So it seems to me that the only clear reason to not use 1760 stats would be concerns over a small sample size meaning one truly dedicated player can have a distressingly large effect. (By my math, we're looking at the effective sample size for last month being 40,792 in the 1760 stats). This means someone playing 500 battles (which seems to be about as much as is practical in one month) can be as much as 1.2% of the stats, which is troubling. (For reference, 1630 gives a sample of ~356,936). A better number than this ~40,000 yielded by the 1760 stats would be ~150,000 games (this drops one dedicated person down to ~.333%, so basically removing the sampling concerns from the 1760 stats. If we assume this is our goal, and that we have a relatively normal distribution of people, I'm coming out with ~1690 giving the correct sample size... would it be possible to get these stats to compare (and check my math on how large of a sample this gives)?
There are also questions about what this would do for lower tiers with smaller usage, but 150k battles in ou was about what we got in february 2012, and this wasn't raised as a concern for lower tiers then (with the one exception being Metang, but that would be very difficult seeing as this forces you to keep a reasonable winning record), so I'm tempted to say this isn't too big of a concern.
One player who has played
500 whole battles (which no high-up current alt on the PS ladder has -- or at least before the reset) that has a high ranking making up 1.2% of the stats is not a bad thing IMO. This player is both skilled and extremely dedicated, and 1.2% is not exactly a large amount. The most I've seen is like 350 battles, anyways, and this is one person. Very skilled, dedicated players making an actual veritable impact on the stats with a lot of battles is also a motivation for said player to ladder more, thus increasing the quality of players on the ladder. There isn't anything inherently wrong with these players making a somewhat sizable impact on the stats that isn't even that big -- at least, as far as I can see.
Anyway, let me respond to the thread itself:
Weighting is a good idea. PS, by being so easy to get into, has massively increased the number of casual users, and specifically the number of users who simply aren't playing to win (by the Sirlin definition).
I think it's relatively uncontroversial to say that only teams built for the purpose of winning games should count towards usage stats. Teams of favorites (Karen Pokémon), monotype teams, and show-off teams (laddering OU with an LC team) should not.
This is, after all, why we have weighted stats.
However, our weighted stats aren't very good at correcting for this anymore. A new player has a weight of 0.5, and no player has a weight above 1.0. In other words, a team used the best player ever has a weight of less than twice that of a team built by someone who has never played before. Our new Elo system should cut down on alt creation somewhat, but I'd venture that there are a lot more players playing PS for the first time than good players.
(Antar, feel free to answer that: What percentage of usernames on the ladder have never played games on more than one day ever? I'd guess at least 25% and probably closer to 50%.)
With a playerbase that tilts casual, this simply isn't good enough anymore. You may call it elitism, LS, but a new player with a desire to play to win and an hour of tutoring can probably reach 1600-1700ish Glicko with no problem, so I don't think it's an unreasonable cutoff at all. It is not elitism to say that while anyone can play for whatever reason they want, only players interested in winning should affect the usage stats, and having a weighted cutoff above 1600ish is probably our best approximation of that.
I think it should take some amount of effort. There are a lot of players playing to win. I don't think anyone walks onto the ladder saying 'I want to lose'. That's not how it works. A cutoff of around 1600 says 'hey you somewhat know what you are doing'. A cutoff of around 1760 says 'hey you're pretty good'. 1760 is not too elite -- there is still a plenty large sample size/amount of players that will battle with this rating. If we go with 1630, we are not getting good players above the weighted cutoff -- we are getting just above the average (mediocre) player.
Overall, using 1760 stats gives us stats that actually, at least partially, represent a high level of competitive play. The 1630s do not give us the same benefits.
e: oh and 3k :]