How Should We Be Counting for Usage Stats?

Antar

is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
Official Data Miner
#1
So right now, we calculate usage stats the following way:
  • count up the occurrences of each species on each player's team, weighting using our weighting function
  • empty slots get counted as well
  • duplicate mons (or multiple empty slots) get multiply counted (so a team with one Magikarp and five empty slots will produce 5x the count of empty slots as Magikarp counts)
  • At the end, you divide by the total and multiply by 6
I don't like this, and here's why: it's not how we assume usage stats behave with regards to tiering policy. Our tiering policy (3.41% cutoff) is based on the premise that if a Pokemon's usage number is below 3.41%, then if you were to play 20 battles in a row, there'd be less than a 50% chance of encountering that Pokemon.

As an example, imagine you're playing a tier without species clause. 1 in 100 teams consists of six Magikarp. Otherwise, no one uses Magikarp. Using the above system, 'Karp's total usage would be 6% (OU threshold), even though the odds of encountering a Magikarp in 20 battles is significantly less than 1 in 2.

So what I propose is this:
  • count the number of teams on which each Pokemon appears, weighted by our weighting function
  • divide by the sum of the weights of all teams
For tiers with species clause (read: all the usage-based tiers), I *think* the two methods are equivalent, so maybe this is a distinction without a difference. But I still think it's important to be clear about this.
 

Antar

is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
Official Data Miner
#2
I should also explain how this translates to other metrics:
  • Leads: how many teams lead off with X? For Doubles/Triples we do not double-count
  • Moveset stats: how many X run Y? Here, you DO count every member of a team. This means usage stats will not be directly computable from moveset stats, as they are now
  • Teammate stats: how many teams that run X also run Y? No double-counting, but there's also a special case: how many teams with X run a second X?
What I also like about this schema is that it generalizes in some cool ways: how many teams with Snorlax run a Dark-type? How many teams with a Stealth Rocker user run a second Stealth Rocker?