Graphical Representations of Centralization

eric the espeon

maybe I just misunderstood
is a Forum Moderator Alumnusis a Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
When X-Act was working to find a mathematical method to measure centralization as a single number (here, here, here and here) he made several graphs which I thought were extremely interesting. The graphs contained more information than any single number, and could be understood in a general intuitive sense quite easily. Sadly whichever host he used no longer has them up, but the concept lives on and I decided to try my hand at putting together a few of them myself. When my data sheet is a little more organized I'll add it as an attachment to this post so anyone who wants can play around with it.

This is a simple line graph of % usage against Pokemon rank for the three most popular tiers. You can quite clearly see that usage in Ubers is much more tightly bunched around the high ranking Pokemon, that OU is somewhat middle ground (note Scizor sticking out there like a sore thumb!), and UU is spread out much more evenly compared to the other two.

% Usage from this graph is the same as given in Doug's stats: the % chance you will find the Pokemon on a random team.


Next I decided to make a cumulative usage graph to stop the "bumpyness" making the centralization less clear and to show the information in a different form. After trying a few versions I settled on 100-Cumulative usage/6 as the version which displays the information best. The division by six changes the usage from % chance of finding a Pokemon on a random team to % usage out of all Pokemon.

If anyone has suggestions for more of these I'd like t hear them. I intend to put together versions to compare the last few months centralization and probably some comparing centralization over the last few years, taking a sample every 4 months (original stats from the Official Server can be used for this) if there is interest.

So, any surprises? UU seems significantly less centralized now than just after the BL merge judging from what X-Act was saying. Has it just settled naturally into a more diverse state or was the previous centralization due to broken Pokemon which are now banned? Do you think that the centralization of all metagames (particularly OU) will have generally decreased or increased over the 2 and a bit years we have statistics for? And what will have caused spikes and dips in centralization?

Big thanks to Doug for the stats and X-Act for the original graphs.


Edit: If you're going to download one of the Jan stats zips use the second (Jan '10 stats2.zip), the other is only for historical interest/keeping track of how many people have got it, yet more people keep downloading it while the other is almost untouched.

Also, BSR BST Usage is probably more interesting for anyone who does not want to mess around and make their own graphs (Jan stats is Doug's stats in Excel form, with a few extra columns of data quickly generated from them), it has a big table of numbers about predicted BST/BSR based on usage rank and a graph included.

Edit2: If you want the stats just for easy copy pasting not to mess about with it in Excel this is probably easier.
 

Attachments

Going by the number of ous, and the usage of the top 10 ou is certainly far more centralized than say a year ago. How many things have managed to get into ou? only 1 that I can think of(roserade) vs having several leave; alakazam, rhyperior, cress, porygonz.
 

Erazor

✓ Just Doug It
is a Smogon Media Contributor Alumnus
This is really interesting ete! I love the way Scizor's part of the curve just sticks up :)
 
It's funny, because Scizor has the lowest overall stats of all the Pokemon in the OU top 10 in December.
It goes to show how far the right stat distribution, typing, ability and movepool can take a Pokemon.
It even made the Ubers top 10 for the same month at number 9, being on a healthy 26% of teams.
 
I'm curious what the results would be if one tried to fit a functional form to those distributions. They look like they go as e^-ax.
 

eric the espeon

maybe I just misunderstood
is a Forum Moderator Alumnusis a Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
Made a new data sheet from the Jan stats with all of the Pokemon that have 0.01% total usage or more rather than just the top 100 (was forced to cut the rest off since it was just over Smogon's max file size) and attached to the OP. No graphs since they would push it over the max filesize, but you can add those pretty easily and it would be nice to see some variety.

Antair: That's exactly what X-Act thought, they do indeed follow an exponential distribution very, very closely (though not exactly). The problem is approximating the value of the exponent accurately, he settled on only checking the value at a single point and using that to compare centralization in different metagames. This works so long as the each graph sticks very closely to the exponential, but does not take account of the slightly bumpyness which could lead to a slightly skewed measure of centralization.
 
Edit: alright thanks cantab, the file is working now for me, it probably just was my computer trying anything to get me to download a virus
 
Yeah, just did a quick analysis myself:



The integral of e^(-x/t) (from zero to an arbitrary value) is t(1-exp(-x/t)). Thus, a good way of checking the validity of the exponential fitting would be to compare the decay constants for the fits of the usage and cumulative usage. If the t's are close, then the exponential fit is probably pretty good.

So we see:

  • for Standard, t=35.3+/-0.3 for usage and 22.0+/-0.2 for cumulative (60% off). Not a very good match.
  • for Uber, the t's are 14.3+/-0.2 and 12.6+/-0.1 (14% off). Much better agreement.
  • for UU, 40.0+/-0.16 and 31.3+/-0.3 (28% off).

So for the Uber tier, the exponential decay is actually a pretty good model.

Edit: I'd actually be really interested in looking at time-series plots of this sort and try to develop some kind of theory of Pokemon usage.
 
ClamAV and BitDefender both agree the file is clean.

Most likely one of the following is the case:

* You already have malware, which is affecting your download.
* Your virus scanner is using a heuristic that reports a false positive.
* You have rogue security software.
 
* You have rogue security software.
This one's my guess. In the past few years, I've been seeing a series of increasingly convincing "You have a virus. Click here to fix it" pop-ups (that are, in fact viruses). They disguise themselves quite cleverly to look like notifications by Symantec / McAffee. The people who develop those things should be drawn and quartered.
 
It's funny, because Scizor has the lowest overall stats of all the Pokemon in the OU top 10 in December.
It goes to show how far the right stat distribution, typing, ability and movepool can take a Pokemon.
It even made the Ubers top 10 for the same month at number 9, being on a healthy 26% of teams.
as opposed to ~29% in 7th place in Jan 2010.
 

eric the espeon

maybe I just misunderstood
is a Forum Moderator Alumnusis a Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
Very nice graphs Antar, what are you using to make them exactly? Surprising how far off the exponential distributions for some tiers look. This probably makes a measure of centralization from a single point somewhat unreliable.. Wonder if there is a better model that's not prohibitively complex.

And it would be interesting to see see about some kind of theory of centralization, see how the major shakeups like new games with tutors or formes and various bannings affect centralization.

Maybe some comparisons for NU and LC at some point, see what Misdreavus's 70+% usage looked like on a graph.. My guess from the data so far would be that the further down the tiers you go (LC is a metagame, not a tier) the less centralized things will get so long as broken Pokemon are banned. Usage weeds out the more powerful Pokemon very well, in many ways better than any tiers generated by a team of experts like past gens. And due to the way Nintendo made Pokemon (competitively somewhat randomly, with a load of NFEs) there are clearly significantly more "meh" level Pokemon than "top class" ones.
 

eric the espeon

maybe I just misunderstood
is a Forum Moderator Alumnusis a Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
Few more

Improved the excel sheet attached to the OP, fixed a few inconsistencies due to the removal of Rotom formes that had been missed before, added as many Pokemon as possible within file size limits. Almost ready to be able to copy over past stats.

And some graphs made from my slightly extended version:





 
Something I think might be interesting: to try plotting usage data alongside the various stats that CAP came up with. (sweepiness and tankiness, offensive/defensive and physical/special balance, I forgot who devised them)/
 

eric the espeon

maybe I just misunderstood
is a Forum Moderator Alumnusis a Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
Something I think might be interesting: to try plotting usage data alongside the various stats that CAP came up with. (sweepiness and tankiness, offensive/defensive and physical/special balance, I forgot who devised them)/
That's actually quite a good idea, though plotting against BSR rather than 4 other variables would be much more useful. I'll see what I can do about digging up the required lists and getting them into the right format.

does this not belong in pokemetrics?

please delete when noted
Pokemetrics is locked for posting, I hope this will be worth moving there by the time I finish but I'd rather it stays here for the time being.
 

eric the espeon

maybe I just misunderstood
is a Forum Moderator Alumnusis a Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
Thought this would be better in a separate post:


Comparison of BSR (from the older thread, here, since I could not find an easily available list using the newer method) and BST against January Usage Rank for all Pokemon that had at least one OU usage in January. I'm just attaching the data file to the OP if anyone wants to play around with it. Edit: Done.

The data file also has some interesting extras, like a list of how far each Pokemon's BST/BSR is from the expected value given it's usage rank, and the Standard Deviation/Variance of the expected BSR/BSTs. You can clearly see Pokemon that have exceptional movepools, abilities or just fit the metagame well are predicted much higher BST and BSR, and the reverse is true. Shedinja and Smeargle are predicted more than double their BST, and Slaking is predicted over 200 less! Edit: you can also see where the old BSR runs into major trouble with spreads that one very low defensive stat, Blissey predicted 200 greater BSR than it has, Forry, Skarm and Rotom-A predicted well over 100 more each, all because the predictions are too low.

I'll probably see about a version of this which compares actual use rather than usage rank at some point (unless someone else wants to try their hand at it?).

Edit: Also maybe coming soon: Lead centralization graphs.
 
Hey, Eric, is there any chance you can re-do the stat vs. usage rank plot as stat vs. usage pct.? I'm thinking that would probably be a more useful metric, and I'm curious what--if anything--that does to the trend.
 

eric the espeon

maybe I just misunderstood
is a Forum Moderator Alumnusis a Researcher Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
Makes sense, I'll have a go at it tomorrow. I'd guess that both would have a slightly closer fit?

Also, I uploaded my version of the PokeStats file (with almost everything I've done so far, bar the BSR/BST stuff) to RapidShare here. And the HTML table thing expired or something, so I'll find a better free host.
 
Augh! Don't have the new Excel. Any chance you could export it as a csv or tsv at some point?

By the way, to answer your question from a way's back, the plotting software I use is called Origin. Windows only, evil licensing and fairly buggy, but it makes beautiful graphs, and seeing as how I once invested an entire summer learning how to make it dance, so I'm pretty much locked into it for life.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top