Proposal Weigh Usage Stats for Multi-Month Periods by Games Played in Each Month

KineSquared

Ubers UU Founder
is a Forum Moderatoris a Community Contributoris a Metagame Resource Contributor
This is my first PR proposal, sorry if there's an issue! just let me know.
The Problem
I have been calculating the rises and drops between Ubers and Ubers UU myself, as well as the usage stats for Ubers UU, since the switch from 1 month to 3 month tiering in generation 9 (starting in q2 of 2024). However, some people have noted that my numbers don't always line up with what they'd assume. They took the average of the pokemon across three months and called that the statistic. For example, let's look at Kyurem-White in q3 of 2024:
(2.16% in July + 4.73% in August + 11.72% in September)/3=6.20%

So if there was ever a usage based tier below us (if "Ubers RU" became official for example) Kyruem-White is at no risk of dropping down, it's a solid Ubers UU 'mon. But is that a truly objective description of how much Kyruem-White was used?

In July, we had 3186 games played. In August, we had 4449 games played, a 40% increase! In September, the numbers plumeted down to 2861! Thats also when it had its highest usage in terms of percent. One way to handle this is to weigh each month by number of games played.
1735356770801.png

If we re-do the kyurem-white numbers they get to 5.80%. A 0.4 drop is pretty significant when you're only 1 or two percentage points away from the cutoff. In reality, people used kyurem-white less than 6% of the time, but if you lump all 2861 games into "one block" and give the same weight to that block as one made up of almost twice as many games, there are bound to be some discrepancies.

This conversation began here between me and Marty, and I think it would be an easy to implement measure for all tiering that would be preventative of future problems, with little to no detriment for the future, especially if each generation we continue switching between 1 month and 3 month tiering (which I fully support and don't want to discuss in this thread). I would just like to re-open the conversation and ask if weighing each months usage by games played is as simple, obvious, and beneficial as I see it.

The Benefits
Preventing Disaster
Put simply, what if there's some big problem on pokemon showdown, and one or more metagames are unplayable for a significant amount of one month? In the extreme case, should one game in month one be treated on equal level to a hundred thousand the next? That feels ridiculous, but preparation against this sort of fiasco would be excellent.

Makes 3 month tiering more similar to 1 month
In the 1-month system, it does not matter whether you use your team at the beginning or end of the month. the 1st and the 31st are weighted equally. Imagine if instead, the people who played at the end of the month got their usage counted more than the people at the beginning? That seems wacky, but its the system we currently have in place. We've accidentally decided that 1 month is the unit to chunk these stats just because we wanted to extend the total stat collection time. This fixes the problem.

Helps represent reality
Let's say some big event, like a suspect test, viral video, or big competition begins and one month sees a ton of usage compared to the other two. Shouldn't that be represented? We almost have an "electoral college"-like system (it's a US politics reference) going on where the highest month (most populated state) has each individual team (voter) count the least. Every team should have an equal representation. This is especially important for non-OU/randbats tiers where usage and total games played are much more variable

It's simple
This would be a tiny backend change that almost no one would notice. I'd be truly shocked if implementation was a limiting factor. It also won't matter for 99% of cases, and very consistent metagames like current gen OU probably would tier exactly the same.

Anticipating Counter-Arguments
As Marty first suggested,
Each month has fewer games than the one prior, that's just how every format pans out as interest goes down throughout the generation. Having the first month of the three be worth the most when the meta has had two more months to develop since then just seems bad on paper.
That will not always be the case, especially in lower tiers. Individual tiers may have events such as suspects, tournaments, or memes at any point in the timeline that can inflate one month over another. I'm also not thinking of it as "the first month of the three be worth the most" but rather "why should one game in a low-gameplay month be worth more to the overall usage stats than one game in a high-gameplay month?" Inherently, if you don't factor in the amount of games, the impact of each game is greater in the lower months. That doesn't feel to me like "usage."

This is only a problem in lower tiers like yours that have low gameplay. Just get more games and you'll fix this problem.
Believe me I am trying that as well for Ubers UU, but the fact is that ladders with less games are more susceptible to these trends, and it would help people like us the most. I also believe it wouldn't really affect higher tiers, so why not?
 
Sorry to bump this without a real contribution / conclusion but I just wanted to point out that stastistically there's not much reason to do something like this. A statistical snapshot of the metagame only needs a certain minimum sample size to get a confident projection for usage overall. The standard being I think based on 50% chance to be seen in 20 games, it should be pretty easy to get a minimum theoretical sample size such that the estimation is statistically valid. I would guess it is not very high. (For example a sample size of 1658 teams or 829 games gives a 95% confidence level that a pokemon will hit the 4.52% threshold, with a 1% margin of error - someone check my math pls)

With that in mind, one month being 10x the required sample size vs another being 100x does not make the second one worth ten times as much. In fact they are probably very close to the same accuracy in terms of meta snapshots due to the diminishing returns of a bigger sample size.
 
Does this argument assume the meta is stagnant within the 3 month window? If a 'mon is barely used in the first two months and the ladder is sparsely used, but then the 'mon gets tons of usage as some big event pops up which also increases the ladders population, shouldn't it be included? This feels like a question of future-proofing.

I agree that if we took 3 random samples of games from the entire 3 month period, of varying sample sizes as our "units" the way a month of usage currently acts, there should be no discernable difference between weighted and raw average. But we're not taking random samples, they're grouped by time, and the usage shifts in that time. Ubers UU has very different usage stats month-to-month within both the top 10-30 range and for the mons that make the 4.52% cutoff.
 
Does this argument assume the meta is stagnant within the 3 month window?
The opposite actually, the method values each month as a separate entity completely, so the meta shifts are included in that each month is allowed to be totally different. The only change in combining three months into one result is that I believe the most recent month is weighted more, so as to more easily reflect recent changes in the meta vs older trends.

The only effect of weighting number of games would be if the ladder is super active month 1 and then not 2 and 3, but also somehow months 2 and 3 have a radically different meta that somehow we wouldn't want reflected in the usage stats? Which does not seem ideal
 
sorry, but I'm confused. why is the argument that we should weight by games played in each month? that's the totally wrong framing.

using usage from each month as is is the unweighted usage and simply adding all 3 months together to get 3-months worth of usage is what I would have assumed we were doing when we made the change. If that's not what we've been using ever since we changed to "even" weighting, then we've just basically been imposing an arbitrary weighting of the months. when we compile 3 months of stats together, we're effectively raising the importance of relative usage from months where there are fewer games and lowering it for months where there are more games. That doesn't seem like what people intended when we agreed to not weight months.

Now, maybe there's a technical reason we can't easily add 3 months together at a rawer level and recompute usage. If so, doing some multiplication to weigh the three together by % of games played as proposed by the OP may be an "easier" way to back into an unbiased usage for the 3 months.

secondly, any argument about sample sizes is entirely irrelevant here. we don't calculate usage stats from a random sample of games, but rather from the whole population.
 
using usage from each month as is is the unweighted usage and simply adding all 3 months together to get 3-months worth of usage is what I would have assumed we were doing...
I agree that this framing makes sense and might be a better/less confusing way to market it. The reason I worded it as I've done is that when I calculated the stats for the tier, it was far simpler for me to just weigh each month appropriately than to re-stuff all the stats into one big pool. Therefore I've been thinking about it in terms of weighing, but you're right they're equivalent. But yea, neither of those options are currently being done right now.
 
Now, maybe there's a technical reason we can't easily add 3 months together at a rawer level and recompute usage. If so, doing some multiplication to weigh the three together by % of games played as proposed by the OP may be an "easier" way to back into an unbiased usage for the 3 months.

secondly, any argument about sample sizes is entirely irrelevant here. we don't calculate usage stats from a random sample of games, but rather from the whole population.
Sample size isn't quite the right term I agree, but I think it's correct to say that each month's stats are equally valid regardless of number of games. Just because player count is lower in the third month doesn't mean the third month's stats are less important of a view of the metagame.

Let's think of an example:
In month 1 Tyranitar has 7% usage in OU with 1000 games
In month 2 Tyranitar has 2% usage in OU with 500 games
In month 3 Tyranitar has 2% usage in OU with 400 games.

With the current model, our favourite sand streamer would end up with 3.7% usage over the 3 months and fall out of OU to UU.
With weighted stats, however, Tyranitar would end up with 4.6% usage over the 3 months and stay OU. This is despite the fact that not only was he not OU usage in the last two months, he wasn't even particularly close.

Sure the metagame might have been more active in the first month but how does that make it worth more in terms of the view of the metagame as a whole? A single game's worth isn't the point, our threshold is based on % of teams. Your single game isn't what determines usage, it's the percent of games (and teams) where a pokemon showed up.

This becomes even more obvious in terms of bans and other metagame changes. If excadrill gets banned from OU at the end of month 1, it's clear Tyranitar is going to take a dip in usage. But if the ladder experiences a slump in that month (say because its summer and people are going outside more) then weighted stats would preserve the metagame state from the first month instead of a more accurate picture.

The previous system of weighting the most recent month much more highly (which I didn't realize had been discontinued until now) was meant to ensure that the most recent state of the metagame is reflected in the stats. That's been removed now, but I don't see any reason to make it even less reflective of the most recent month.

As for abuse, weighting stats would make it actually way easier to "stack the deck" and get a mon boosted into a higher tier, as you would only really need to do the work in one of the three months as long as you could spam enough games to get that month weighted higher than the other two.
 
Let's think of an example:
In month 1 Tyranitar has 7% usage in OU with 1000 games
In month 2 Tyranitar has 2% usage in OU with 500 games
In month 3 Tyranitar has 2% usage in OU with 400 games.

With the current model, our favourite sand streamer would end up with 3.7% usage over the 3 months and fall out of OU to UU.
With weighted stats, however, Tyranitar would end up with 4.6% usage over the 3 months and stay OU. This is despite the fact that not only was he not OU usage in the last two months, he wasn't even particularly close.
ok and the obvious counterexample is just reverse the order of the months? now tar is "clearly" OU in the last month, shouldn't we care about that? No. The point of not weighting by month is that the order of the months shouldn't matter. Subsumed in that decision that took place many years ago to remove the weighting 20*3*1 by month, I thought, was that N of games in a month also shouldn't matter, we'd just be calculating usage over 3 months. I mean, the current regime of 1*1*1 literally already says we care about january and march games 10% less than february, since they have 31 days and february has 28. every game in february is worth 1.1 games in january or march.

As for abuse, weighting stats would make it actually way easier to "stack the deck" and get a mon boosted into a higher tier, as you would only really need to do the work in one of the three months as long as you could spam enough games to get that month weighted higher than the other two.
No. You really don't seem to understand this issue. Multiplying by N of games in a month is essentially serving to "deweight" back to truly raw usage over a 3 month period, effectively the same way as we calculate 1 month usage, just over a 3 month period. The current regime is what encourages stacking the deck. If you want your games to have greater impact, ladder in february, not january or march. there's fewer days, hence fewer games. Currently, playing more games in any given month makes your games in that month matter less not more, as compared to playing those same games in a month that had fewer games.
 
Back
Top