Overall Rating vs Usage

Eraddd · Aug 22, 2009

You put slaking and pidgeot at the bottom two times. Better remove em.

I liked how you put scizor and slaking as reasons why they don't correlate. However, I would also add in typing as a majour factor. Ice is shit. Literally, because of it's weakness to stealth rock (majour part of the metagame), and weakness to fire, fighting, rock, steel, etc, which are common types in the game.

Paperfairy · Aug 23, 2009

I want to, but how do I do that without going too much into the game itself? It felt like explaining typing would turn this into a primer on battling, as I stated before.

X-Act · Aug 31, 2009

The Overall Rating formula produces a rating of a Pokemon based purely on its Base Stats. While base stats are important, they are not the only aspect of a Pokemon that makes it good. In fact, what makes a Pokemon good (or bad) is the following list of properties together:

Base Stats
Typing
Movepool
Abilities

So Entei and Slaking have very good base stats, but are not even UU, while Pokemon like Forretress don't have very good base stats, and yet are OU.

If I somehow were to combine the above four properties together in one formula, then that should correlate with the usage of the Pokemon. Typing is not that difficult to do, but it's a long process. I managed to tackle the types of attacks, though, which pertains to the movepool of a Pokemon (watch out for my article in Issue #3 of The Smog). However, movepool and abilities are much harder tasks, if not impossible (how are you going to quantify abilities like Keen Eye and Own Tempo mathematically?)... and that is why I haven't done it.

As it is, the base stats of a Pokemon alone do not correlate with usage.

Res Ipsa Loquitur · Aug 31, 2009

My guess would be that it does correlate, even though it is not a 1:1 correlation. I would use a linear regression, including a ratio variable for each base stat, plus an additional one for total stats. I would probably also add one for the sum of each Pokemon's higher attack stat and speed, plus one for the sum of both defenses and HP. Then, I would add a binary dummy variable for every type and one for every ability. This might create some sample size problems, particularly in cases of extremely rare abilities, like Hydration, or those mostly given to unusually powerful Pokemon (I'm thinking specifically of Pressure). This list would of course not contain the distorting stats of the Ubers, since it would be based on OU. NFEs would also be omitted (except the totally different ones, like Vigoroth). The variable of interest would be usage in OU.

Of course, this would not consider movepool. Including a dummy variable for every single move, in addition to being very time-consuming, would not work due to massive multicollinearity. It might be possible to include binary variables just for a few specific coverage combinations normally considered to be powerful, such as Ice/Electric, Water/Ice/Grass, and Fire/Dragon. There could also be one more variable for "outclassed," which would be determined via consensus among high level players; it would be "1" for anything considered to have a flat-out superior counterpart, such as Blaziken, Rotom basic, etc, and "0" for everything else. If I were running the data, I would try it with and without this variable to see if it distorts any of the others, and my guess would be that it does have a statistically significant relationship with usage.

All that being said, I doubt there is much demand for such a robust model. I can't imagine it producing anything really useful. That's a lot of work, and I would guess that any of us with the statistical knowhow to manage it would have to be asked pretty darn nicely.

X-Act · Sep 2, 2009

Even though what you wrote above barely made sense to me (sorry), I'd like to comment on your last paragraph. Such an analysis would be really needed, as it would potentially remove the need for testing Pokemon for Uber/BL.

Res Ipsa Loquitur · Sep 2, 2009

Okay, my apologies for some obfuscating terminology there. Let me try to be a little more clear.

A regression allows you to check how a single thing is correlated with several other things when those things can depend on one-another. For example, in politics, you might want to know if how much money you make affects which party you support. However, we know that, on average nationwide, blacks make less money than whites and are of lower education level, and both education and race are highly correlated with political party, so you need to add race and education to your model so you don't get what appears to be data about income but is really data about a lot of things mixed together, and therefore useless.

Dummy variables allow you to factor "yes/no" questions into a model. For example, in your voting model, you might have a variable just asking whether or not the person is a veteran, since most veterans vote Republican and you don't want that messing up the model. I am suggesting using a list of dummy variables to control for typing. Each Pokemon would have a value of "1" for one or two types, namely its own, and "0" for all of the others. The regression would then factor these in with the other stuff.

Essentially, the model would produce a "significance" score for each variable, answering the question, "all other things being equal, does this characteristic have an impact on how much play a given Pokemon gets?" We can guess, for example, that having a high attack stat will generally correlate with OU usage (even if there are a few outliers, like Rampardos). By including some coverage combinations, like Ice/Electric and Fire/Dragon, I think we can unravel type from movepool just a bit, also.

I would probably dedicate a whole thread to some discussion of what variables should be included and how to code them. In the end, the result would be a answers to a list of questions along the lines of "How much does having a high base HP have to do with OU usage?" and "How much does BoltBeam have to do with OU usage?" with each of these answers as free as possible from distortions due to other factors.

Does that make more sense? And if so, do you still think that it would be useful? It could be used to predict the usage level of a novel Pokemon, and possibly speed up some testing once HG/SS or Gen 5 come out. I should reiterate that it would be very time-consuming to build this model.

petrie911 · Sep 2, 2009

While the correlation between BSR and usage is admittedly low, I think a very interesting question would be this:

Does BSR predict usage significantly better than a simple Base Stat Total?

Tangerine · Sep 8, 2009

My guess would be that it does correlate, even though it is not a 1:1 correlation. I would use a linear regression, including a ratio variable for each base stat, plus an additional one for total stats. I would probably also add one for the sum of each Pokemon's higher attack stat and speed, plus one for the sum of both defenses and HP. Then, I would add a binary dummy variable for every type and one for every ability. This might create some sample size problems, particularly in cases of extremely rare abilities, like Hydration, or those mostly given to unusually powerful Pokemon (I'm thinking specifically of Pressure). This list would of course not contain the distorting stats of the Ubers, since it would be based on OU. NFEs would also be omitted (except the totally different ones, like Vigoroth). The variable of interest would be usage in OU.

Of course, this would not consider movepool. Including a dummy variable for every single move, in addition to being very time-consuming, would not work due to massive multicollinearity. It might be possible to include binary variables just for a few specific coverage combinations normally considered to be powerful, such as Ice/Electric, Water/Ice/Grass, and Fire/Dragon. There could also be one more variable for "outclassed," which would be determined via consensus among high level players; it would be "1" for anything considered to have a flat-out superior counterpart, such as Blaziken, Rotom basic, etc, and "0" for everything else. If I were running the data, I would try it with and without this variable to see if it distorts any of the others, and my guess would be that it does have a statistically significant relationship with usage.

I agree that including a dummy variable for every single move is useless due to multicollinearity, but I think a lot of other things will have this problem when you start accounting for 100+ Pokemon. Imagine the slight differences in each Pokemon, imagine trying to treat them all as separate variables, and you get a lot of variables that are so similar to each other it'll be a nightmare to sort through. This is why once you get to abilities and movepool, it becomes an absolute nightmare to properly model.

What I was planning on doing (but I never got around to getting the data for - maybe i'll do it now) is simply treat each Pokemon as a separate variable, simply by claiming that they are all unique and different from each other, and then running a regression on usage, with the variables being tested being OTHER Pokemon. This doesn't answer the base case on how people initially start using Pokemon (I think the biggest flaw in any of the analysis done is that we're running these kind of analysis 2 years after we theorymonned everything), since usage/movepool/typing does not matter as much in the decision of how people choose Pokemon, but the usage/movepool/typing relative to everything else in the game. The usage/movepool/typing analysis really needs to consider this - certain types will be used more often (steels) than others (ice) due to the presence of powerful offensive threats (dragon, ice) and this thing changes over and over again as we continually play and find new ways around the threats. Meaning that our picks get more and more influenced by the metagame as the metagame goes on, hence how we choose based on simply usage/movepool/typing changes across time.

This is just a theory of course - it could be the case where a single type just dominates for the entire time and our choices are static - but I'm going to theorize how we choose changes over time as metagame changes. I might actually just study this for a while and see where I go with it - I'd like more feedback on this theory so i can start setting up tests and experiments.

Res Ipsa Loquitur · Sep 9, 2009

Just to make sure I understand your model, you are talking about regressing each Pokemon's usage against the usage of all other Pokemon, correct? So we would have data for each period of time, presumably using Doug's data from each month, and then look at how rising and falling of one Pokemon's usage affects that of others? Are you thinking of a logit model, with variable of interest being the probability of a given Pokemon's usage on a novel team?

If this is indeed your idea, it would certainly solve a lot of the multicollinearity problems with the one I suggested. It would be worse for predicting the viability of a new, novel Pokemon (the CAP forums would much rather have the model coding stats, movepool, typing, etc.), but it would be much better for predicting the metagame impact of removing a certain Pokemon from the metagame. For example, if we saw that, as Scizor's usage has risen each month, the use of a large number of lower-ranked Pokemon has dropped out of proportion with that rise, with a few other high-ranked Pokemon, such as Salamence and Rotom-H, rising also, we might extrapolate that banning Scizor would have a much larger impact than just opening the gates for a couple of weaker revenge killers like Weavile, and in fact make the entire metagame substantially more diverse. Such data could be very useful to suspect voters, to say the least.

Tangerine, if you are strapped for time, I might be willing to compile some of this data. I think the first step would be getting an Excel spreadsheet with all of the data back as far as possible (at least back to the Garchomp days would be ideal), since we could import that to SPSS or whatever stats package we end up using for the model. I might also recode the usage into decimal form, if a logistical regression ends up being the way to go. I'd be interested in thoughts from you or anyone else with the stats knowledge to weigh in.

familyguyman · Sep 10, 2009

One way to try it, in a simple but not totally perfect way, is to give a Typing Score for Offense and Defense.

Using a type chart, see how much coverage a pokemon's STAB gets (34 being perfect Super Effective and 0 totally immune) and see how much their typing is effected by enemy STABs (0 being no damage from everything and 68 being 4x SE damage from everything, this would require some normalization to compare easier). You can weight types by usage/frequency so resisting Dragon or Steel is more important due to high frequency (Steel types were used in 400000 of the 500000 battles last month) or usage in OU (10 pokemon in OU are Steel types).

The justification here is that almost every pokemon has an offensive STAB move unless they are purely for support reasons like Smeargle or Umbreon. Defensive typing is obvious too.

As for move pool, you can use the frequency of move usage (available on Smogon) and cross reference it with a pokemon's move pool. Give it a score based on the moves appearing in the move pool where each move is valued at it's usage in some way. This way, Smeargle's perfect move pool will have the highest score, justifying it's usage since it wouldn't be close with the second best move pool score.

For example: Earthquake, Thunderbolt and Stealth Rock add up to 46.69% of OU moves used so any pokemon with all 3 of those moves will have a score of 46.69 already, Smeargle would get 100 in this way.

As for abilities, you would really be hard pressed to find out the impact but you can "cheat" by doing Typing and move pool and putting the rest of your "trend" into Ability. Like if you want it to be 1 and Typing + Move Pool + Stats = 0.8 then Ability = 0.2 or something.

Oh and a good tip in general is to look at percentiles instead of just stats. Like, how much better is a 600 pokemon from a 550 vs the comparison of a 550 vs 500 pokemon? I doubt it is linear. This could be done for any of the afformentioned ideas.

Anyways, those are some ideas to consider, I like what you did in your paper as it was a nice refresher to stuff I learned last year. Good luck and if I find time, I might pitch in some more work.

Overall Rating vs Usage

Eraddd

One Pixel

Paperfairy

X-Act

np: Biffy Clyro - Shock Shock

Res Ipsa Loquitur

X-Act

np: Biffy Clyro - Shock Shock

Res Ipsa Loquitur

petrie911

Tangerine

Where the Lights Are

Res Ipsa Loquitur

familyguyman