![]() |
August's stats, and the new tier cutoffs
As I'm sure most of you have seen, I've been working on compiling usage stats for August. However, there have been some technical issues.
The long and short of it is that the method that R_D used to come up with the usage stats for May and June is NOT POSSIBLE any more, since the plugin that generated those stats is not present on the server. Instead, all we have to go on is the battle logs saved on the server. For anyone who's actually looked at a PO battle log, what's clear is that it's less-than-ideal. Ignoring the technical headache of extracting ANYTHING from an html file designed to be easily converted into a warstory, these battles logs do not contain the full teams, only the pokemon that actually appear in the battle. Note that the current version of PO (1.0.30) *does* generate full team lists, but it only does so on client-side battle logs. With all that said, I have, using the Smogon PO battle logs for August (plus July 31), generated two different versions of the usage stats. In the first version, I look at all battles that lasted at least six turns (which discards only forfeits and accidental disconnects) and count the pokemon that appear in each battle. Note that "percent" refers to the odds of a pokemon appearing in a given battle, divided by two. I'm posting only the OU stats here for the sake of brevity, but the full stats for all five "Standard" tiers is available here:
Original Method
Code:
Standard OU Rated Battles for August 2011In addition, I've looked at what happens if we only count teams where all six pokemon appear in the battle. About 55% of all teams were represented.
Full Team Method
Code:
Standard OU Rated Battles for August 2011So the question now becomes, how do we move forward? Except for suspect bannings, UU is unchanged since it was introduced back in May, and those active in the metagame are anxious for an update. However, we're left with the issue that August's stats are incompatible with May's and June's. The first method underrepresents sweepers while overrepresenting leads and "pivots," while the second method, compared to R_D's method, overrepresents switch-heavy teams and underrepresents, well, winning teams. R_D did not keep the battle logs for May and June (he had no reason to, and the files are HUGE), so performing my analysis on previous months' stats is not possible. So we're left with a choice. Do we:
This is the first question I'm posing to PR. Assuming the community decides on (a), historically, we have done a 20-3-1 weighting scheme (I still think this is valid, even though the server was down for July--let's just pretend that the metagame was "frozen in time" during that month). But the problem is that, with the "original method," the combined usage stats do not add up to 600%--if we just naively add the stats, usages across the board will be underrepresented. The naive solution would be to simply multiply these stats so they DO add up to 600% (the factor is 1.17). However, if we do this, leads and pivots will be overrepresented in the totals. I present below the combined three-month usage percentages using the 20-3-1 weighting for all three of the above schemes: naively using the "original method" stats, taking MaestroXXVI's suggestion and multiplying August's stats by 1.17 before adding, and using only the data for teams where all six pokemon were seen. Ignore the "Usage" column--it was needed for compatibility with some of my scripts (UU stats for determining the UU-RU cutoff, and RU stats for determining a hypothetical RU-NU cutoff will follow as soon as a decision is made on how to move forward).
Original method
All teams used, percent is calculated as odds of being seen in a battle, times two. Code:
+ ---- + --------------- + ------ + ------- +
MaestroXXVI's method
All teams used, percent is calculated as usage divided by total pokemon, multiplied by six. Code:
+ ---- + --------------- + ------ + ------- +
Full Teams Only
Only counts teams where all six of the pokemon on a team were shown. Code:
+ ---- + --------------- + ------ + ------- + Here, of course, is the bottom line: OU list for...
At stake are:
So then, the second question I pose to PR is, if we DO decide to go ahead and try to make do with the May-June-August stats, which of the three methods do we use? Finally, I'd like to present one further aside, which is more relevant to how we move forward from here and ties into the first question I posed. For future cycles, we need, as a community, to discuss exactly what we want "OU" to mean. I think the old definition--that a pokemon is OU if there's greater than a 50% chance of it appearing in at least one of twenty battles--is a good one, but it leaves open the question, what about pokemon that don't appear in battle? Should they count? I can see this one of two ways:
Phew. |
I'm going to be the person that says we should arbitrarily combine the stats from August with those from May and June. The reason behind this is because 1. I just want the new tiers and 2. they really aren't that far off (all three methods) than what R_D would have collected with the plug-in. I think everyone can agree that all three methods pretty accurately describe what someone would see in the OU metagame, and whether one is more accurate over the other is just a matter of preference really. The top 10 for all three are pretty much the same, and the only things really being affected are the bottom Pokes, which really only matters for UU (which in itself isn't that big of a deal what drops, we shouldn't worry about that. What happens happens).
I'm more in favor with the full teams method simply because it still represents more than half of the playerbase battles. Another thing that we could theoretically do is just not even worry about May and June, and just use the August stats for the new tier usage placements. I know it kinda goes against our normal policy, but I believe we've had enough of a time gap to just start fresh, and since we probably won't have this kind of problem again because we've re-added the plug-in to the server, we will be able to go back to the old method of determining stats. So, that way we wouldn't have to somehow combine the weird methods of August with May and June nor September, October, and November. Just my thoughts, I really don't care what we end up doing, I just want to get things done lol. |
Agreeing on "we should just settle for something as soon as possible", so essentially I agree with using the May, June & August stats. I don't have a thorough opinion regarding the second question though.
|
I don't know why you even made your removals the way you did because both of them remove too many battles. Is there no way to find out if a given battle ended with one player fainting all six of the opponent's Pokémon? I find that unlikely given how logs are written.
I also think that we should just stick with stat compatibility for now and use the full stats of August with the stats from May and June. (It's kind of funny because I was just thinking about the usage stats today. I don't have much of a useful conclusion, though.) |
Quote:
|
Just moving the convo here.
Quote:
Quote:
Quote:
Whichever method wins, run with that, and combine those stats with those from May and June. I think that's the best course of action. |
Quote:
If we go with this proposal, we still need to decide what to do for future months, where stats can be collected consistently. Do we use the "Original method" with a cutoff of 2.91%? Do we do Maestro's every time that full-team stats aren't available (Innocent Criminal says his patch should be done by the end of the week)? At first glance, this is just a case of semantics, but the underlying issue is this: should unseen pokemon count towards usage? Maestro's correction is a way of accounting for the unseen pokemon, whereas the original method with a lower cutoff is saying that they DON'T count, but we want to keep the cutoff in roughly the same place for historical reasons. Does that make sense? Can you see why that might matter? |
I guess I'd say that unseen Pokemon should count towards usage. I wouldn't say that for last gen, but with the advent of Team Preview, where you can prepare to "hide" certain Pokemon for the whole battle because you know your opponent's team and vice versa, then they should count towards usage.
Also I thought that we wouldn't consider using any other method besides the original (R_D's) since we now have the plug-in for our servers again? IIRC that method takes into account unseen Pokemon, and it gives a full look at what was really going on in every battle that took place, instead of only half guessing or missing minute pieces of detail. |
Quote:
We do? If so, that's news to me. All that I've heard is that Innocent Criminal is working on making an alternative and that PK Gaming is working with coyotte on getting the original one working. |
Quote:
EDIT: course he could just have it and hasn't implemented it yet.... |
It's my personal opinion that battle logs are the superior raw data set (assuming we can get them to contain a teeny tiny bit more information). With any kind of plugin, we can only analyze the data that the programmer of the plugin thought to include. With a full battle log, there's really no limit to what we can do.
For example, who would be interested in seeing a "sweeper score," that is, the average number of kills a pokemon gets per match it appears in? Similarly, what about a score of percent of times KOed in a battle? We could also get data for the most frequent switches into any given pokemon or most frequent replacements for a given pokemon (as in, you switch out x--or x faints--and you switch in y). A proper battle log will let you analyze anything you want, rather than just anything you thought of before the battle. They also provide a sort of "paper trail," in case anything needs to be re-counted or verified. The downside is, of course, the huge size of the datasets. But disk space is cheap these days, and I really don't think that should stop us from keeping and utilizing them. Of course, this is all a conversation for another day. |
well I guess I would suggest to continue working with log-based stats, but until that time where you are able to gather all of that extra information, we should just use R_D's method. (August being the exception of course.)
I'm just thinking for time's sake. We've been waiting for these stats for a long time and I just want to get the tiers situated asap. |
I normally have no reason to chime in on PR topics due to my rather infrequent play time, but since I am primarily involved in c&c, this is pretty important to me. Yes, the technical limitations imposed by battle logs are unfortunate, but they do reflect literally what is used in battle - saccing and pivoting are just as important as sweeping/walling/etc, for the record. This is strengthened by the graphs shown above and the stark similarities between the two methods, statistically.
Like Oglemi says, the tiers are better off being generated now. Until we have a better solution, logs are the way to go. |
For the purposes of tier generation, usage should mean what is selected for use on teams imo, rather than what is sent out which is vulnerable to early forfeits. Usage is our best objective estimate of power and this is the reason it is used for tier lists. Thought experiment: Imagine a situation where in one month players all played the games through to the end, and in another they all gave up as soon as a game was effectively unwinnable. The power of each Poke in the two months is the same, and the results given by the classic "this is what was selected for each team" method are, correctly, the same. However, the "this is what was sent out" method give different results (even in wifi clause metagames, since some pokes will be lead with more), showing that it introduces a confounding factor into the equation. Even restricting it to battles with x turns or x pokes seen does not solve it entirely, and new biases are introduced, for example battles with all pokes seen are probably more likely to not have hyper-offense teams on both sides, since those games last less time and have far fewer switches than, say, a stall v stall game.
This is not to say we can't get some very, very interesting information from logs, but just that if aiming to use usage as a measure of power to base tiers on, team selection is more appropriate than what is sent out. Additionally, log collected usage stats should be easily close enough to the classic to base tiers on until another source can be found. |
Quote:
UU-OU are problematic. |
Quote:
Quote:
Quote:
Quote:
Quote:
|
If the tiers were generated TODAY, using MaestroXXVI's method for UU and RU as well, here's the results:
Three-month stats for UU
Code:
+ ---- + --------------- + ------ + ------- +
Two-month stats for RU
Using two-month = (aug*20+june*3)/23 Code:
+ ---- + --------------- + ------ + ------- + And these would be the tier lists: [Removed in favor of the tier list in the post below] The rest would, of course, be NU. HOWEVER, before you get all excited about the birth of NU, consider the following: TANGROWTH IS NU because it was UU when RU was created, yet dropped *just* below the threshold into RU this time around--but since Tangrowth was never used in RU... I suspect there are many such anomalies [Edit: a few additional examples--Tornadus and Darmanitan]. This is the problem with updating all your tiers at the same time, while basing the tier cutoffs on usage in the tier above, rather than basing everything on OU usage (which, by the way, I think we still need to do--otherwise, Eviolite pokemon that are *almost* as good as their evos will fall to NU simply because there's no reason to use them in OU). Edit: the solution is simple. A pokemon is only allowed to drop one tier between cycles. This would involve a bit of extra coding, but I'm confident I can get a new tier list to you guys by the end of the day tomorrow. Edit 2: Crap. This is the WRONG METHOD (teams greater than 4 pokemon, rather than all teams). Still, as the above post shows, the results are pretty much identical, so I wouldn't expect too much to change. |
So from what I gathered talking to people on IRC, we're working on the ability to gather stats the way RD did right? I think we should definitely just wait until that happens so that we have fully compatible stats rather than trying to fit puzzle pieces together by cutting the edges.
Now if this is how we'd have to gather stats permanently, obviously for the time being we'd have to fit them all together. However, this is a temporary issue and the only thing it matters for is creating a new OU (which of course only matters for what drops in/out of UU). But for people who want it just for that reason, I'd like to note that I intend to keep UU static anyway until some important tiering decisions are made. |
Quote:
Also, holding the suspect test (which I assume is what you mean by tiering decisions) before the update will only give us less accurate bans. The things on the chopping block will likely be Chansey and Hail (either an outright ban, or a nerf strong enough to send it back into NU obscurity). But things like Mamoswine going OU and Whimsicott falling to UU have a direct effect on those votes. Maybe without Mamoswine, Froslass is the only broken part of Hail? Maybe with Whimsicott and Mienshao in UU, Chansey becomes enough of a liability that she isn't broken after all? We can't know the answers until we try it, but if we ban first and update later we won't get a chance to. I know you're not any happier about the long delay than the rest of us, but waiting until after testing to update the tiers seems to me like it's just making a bad situation worse. Updating UU ASAP will give us more accurate bans and a more enjoyable metagame. Please reconsider it. |
Quote:
First, no matter what I use to crunch the stats--only counting full teams, only counting teams of four or more, or counting all teams--the tiering that I get is the same (with the exception of Hippowdon), as long as I normalize the stats. This invariance supports the validity of the various methods. I'd be willing to wager that if we were able to perform stats using R_D's method, we would get the exact same result (with the exception of maybe one or two pokemon). Second, even if the plugin went into effect TODAY, you wouldn't get a full month's stats until NOVEMBER. At this point, UU is four-and-a-half months old. By November, it'll be six. |
Quote:
|
Quote:
Quote:
Quote:
|
Sorry about that, folks. THIS is the new tier list, assuming we use my stats and Maestro's method:
Uber Code:
MewtwoCode:
VenusaurCode:
WobbuffetCode:
BlastoiseCode:
Charizard
Three-month stats for UU
Code:
+ ---- + --------------- + ------ + ------- +
Two-month stats for RU
Code:
+ ---- + --------------- + ------ + ------- + |
Quote:
|
Quote:
|
| All times are GMT -4. The time now is 9:09:59 AM. |
