Apologies in advance if I'm somewhat hijacking this thread with my wall of text, but the 'rated statistics' discussion I find especially important.
I think we need to make a distinction based on usage vs viability, since they do not necessarily go hand in hand. The OU tier is based strictly on overall usage, and should remain that way. However, unweighted statistics cannot tell us everything we need to know about how healthy the standard metagame is. I really believe we make a mistake when we consider a rise in the number of pokemon which are used on one in twenty teams to mean that the metagame is by definition 'better'. We need to really weigh the statistics in order to determine that, as competitive games are balanced around the pinnacle of play - not the average of all players.
For instance, if Electivire is used by 0 players in the top hundred, can we reasonably say it contributes to diversity? Or maybe something less extreme - if, when considering the same hundred players, there are more instances of Porygon2 (which is UU now) than there are Electivire, what does
that tell us?
I think we need to ask ourselves:
What questions do we want the statistics to answer?
Question 1: Which pokemon are Overused?
This is mostly where we draw the line based on the statistics and why.
So, where should the cutoff line be placed? How much do we take into account estimations of usage in the past metagames OU was based on in the first place? Is T=20 a valid choice? If so, is it okay if the top pokemon in OU is used over 7 times more often than the bottom one? Should OU pokemon cover over 75% of the used pokemon in standard as they currently do? Even T=10 covers well over 50%. This is where someone who is much more versed in statistics than I am needs to take over. These aren't rhetorical questions - I'm really not sure if the numbers are acceptable or not.
Question 2: How many pokemon are competitively viable?
Here, we have removed the distinction of overall use and can focus strictly on true diversity. This is where weighted statistics are necessary, or perhaps
only statistics from the top 100 battlers or so are necessary. If we have a picture of what the leaderboard looks like as far as usage, we can get a good idea of which pokemon are actually viable in high level play.
For example: Porygon-Z is OU, but can I use it on a team and hit top 10-20 on the leaderboard? It has serious trouble switching in on anything due to typing and weak defensive stats, and it's speed lets it down for sweeping purposes. It can run a scarf set, but then becomes much easier to wall. It's incredibly powerful in some circumstances, but is it viable when you're dealing with the top teams? Finding out how many times P-Z is actually featured in top-level games can help us answer questions like these, and in a way that nothing short of asking IPL and other top battlers to go make a team with X pokemon and try to hit the top can.
ST5 has a lot of potential to help out in that aspect. Will the best teams be utterly standard, or will they feature pokemon we thought weren't as viable?
The main concern with weighing things is that the better players often react to the metagame faster, so it becomes tough to gauge just exactly what it means when 75% of the top few hundred players are using both Scizor and Heatran (made up statistic, but not really stretching). Will those two pokemon be replaced on these players' teams in a month or two with threats which are covered less at that time? Is this a predictable trend which is healthy for the metagame? If it turns out the top teams rarely ever contain pokemon outside of a group of 20, for a period of 6 months, what would that say about how many pokemon are truly viable? We could theoretically have a 50 pokemon OU with only half of them being usable once you hit leaderboard level.
Question 3: What is suspect?
This one is obviously more difficult to assess. However, I feel it tells us exactly how weighted statistics can be paired with detailed ones in order to help determine suspects. Specifically, DJD's
recent post on predictability opens up use of the detailed statistics to see just how unpredictable the most used pokemon are.
We've already seen how long it can take for us to realize pokemon are suspect, and, as much as it doesn't seem like it should matter, overall usage is a real factor in determining how long something sticks around before officially becoming suspect. This happened for quite some time with Wobbuffet. Many people believed that it wasn't worth considering Wobb uber as long as his usage remained lower than many other pokemon. Of course, that stance completely disregarded the many other factors at hand: unwillingness to use a taboo pokemon ('boring' as well); loyalty toward certain websites' tiering; and most importantly, disparity of use between the good and bad players (not to mention correct use as well). If everyone had seen the ratio of top players using Wobb to the ratio of those who weren't, how much faster would it have been made suspect?
Obviously, Wobbuffet serves as a striking example of when predictability does not actually make a pokemon any less viable. Of course, it doesn't have much of a movepool though. Garchomp is a better case for that, as detailed statistics would have allowed us to see exactly how predictable his moveset was becoming over the months. At that point it would have been easy to see that Garchomp's viability was increasing even though he was becoming more predictable. That, combined with the fact that 'there was no end in sight', would make Garchomp an easy suspect.
An interesting subset of this method is that of lead pokemon. We can see that Aerodactyl usage has been rising substantially, and also that he is without a doubt a shining exemplar of predictability (
seriously). However, scarf Jirachi has jumped by a huge margin in order to counter Aerodactyl's rise (it doesn't hurt that it stops the number 1 and 2 leads either). Since Aerodactyl can't do anything to counter this other than scarfing itself, which defeats it's entire purpose as a lead to begin with, we can consider it standard metagame behavior that his usage as a lead will likely stop increasing as we continue to see more Jirachis leading. In this way, the lead metagame is the microevolution to the overall metagame's macroevolution.
-----
I wanted to address the specific OU-BL-UU issues separately from weighted statistics, so there wouldn't be any confusion.
Porygon2 is a great example for this, because it is certainly reasonable that it will be considered to be balanced for play in UU, but at the same time it handles so many top tier OU threats that it could also possibly breech through to OU based on usage (it was within less than 500 of Donphan in December, placing 54th). Not too long ago, this exact same thing happened as Aerodactyl filled a niche in OU as a fast taunt/SR.
Having a pokemon switching between OU and UU begs the question 'What is BL?', which is pretty damn important right now. Is the definition the same as Uber, or does it need more specific restraints? If BL are simply pokemon which 'break' UU, then UU itself needs a clear definition. Does UU need to be distinct from OU? Does it need to be 'fun', or does competition reign as the most important aspect?
If Aerodactyl usage drops off, but remains steady at a position which is slightly below the cutoff line for OU, will it be tiered back in UU? This creates a potential situation wherein Aero would be used quite a bit in both tiers, and likely for quite the same function (some of the previously BL pokemon are having this happen as we speak). I'm not sure this is a problem, but it certainly is 'weird', and would alienate a lot of the traditional UU crowd. If closeness to OU is a concern, I could definitely see how BL could become a 'limbo' tier with a top and bottom set along the lines of: All pokemon which are too powerful for UU, or have a usage ratio of between T=20 and T=30. Previously, it seems that BL has acted that part without ever being defined as such.