Hi Smogon!
You probably don't know me, as I am not very active on the forums, but I am active Showdown user who has a particular interest in math and the power of data analysis in competitive Pokémon. I was recently thinking about how suspect testing and ban discussions tend to be extremely subjective. This led me to read this thread: http://www.smogon.com/forums/threads/characteristics-of-a-desirable-pokemon-metagame.66515/. Doug identifies Balance as one of the main desired characteristics of a metagame, and one that should be taken into heavy consideration during the banning/suspecting process. But I noticed that one of the main responses is that Balance, as well as the other characteristics he identifies, is hard to quantify. So I, inspired in part by the Gini Coefficient I learned about in microeconomics, attempted to formulate a way to quantify the relative balance of a given metagame at a given point of time by analyzing usage statistics. The fruits of my labor are contained in the Excel file I attached.
Now, on to my methodology. I started by pulling the usage percentages of the top 100 most used Pokémon in Gen 5 OU for every month, from April 2011 to November 2013. For each month, I plot those 100 points: the rank in the usage stats is the x-coordinate and the usage percentage is the y-coordinate. From here I ran a logarithmic regression for each month's data set, most of which yielded a respectable R^2 value of around .98. Now, logically speaking, a metagame in which there is a sharp drop-off in usage beyond the first handful of top Pokémon is relatively unbalanced, by Doug's definition of Balance. So one way to analyze balance is to look at the regression: the more negative the coefficient of the logarithmic term, the less balanced the metagame is. As points of comparison, I also calculated the 20:20 ratio and Palma ratio for each month's statistics (two metrics commonly used to calculate income inequality). I was also going to calculate Gini, Hoover, and Theil indices but those are more tedious, and I was lazy. So I then plotted the logarithmic coefficients, 20:20 ratios, and Palma ratios over time. In comparison, the graphs of the three different balance metrics appear to be very consistent. By looking at those graphs, we theoretically have a post-mortem of the Gen 5 OU metagame: valleys indicate relative balance, peaks indicate relative unbalance. Also attached is a picture of the logarithmic coefficient graph, marked with bans and significant events that may have affected the balance of the metagame. And some of the trends make sense: during the early stages of BW and BW2, the tier appeared relatively balanced as people were getting a feel for the new metagame. And the downward trend from September 2011 onward indicated that Smogon's bans were generally good for the balance of the OU metagame.
If this idea of mine takes off, it could have significant implications for Smogon policy. We could compare the balance of a metagame before and during a suspect test, and use this to help make a ban verdict. In a perfect world, we would have ways to quantify all of the characteristics that Doug described in his post. That would help shift the metagame development process one from being subjective to more objective. As of now, the only empirical evidence commonly used in suspect discussions are damage calcs (lol).
So anyways, if you managed to read all of this, congrats and thanks. Let me know what you think!
You probably don't know me, as I am not very active on the forums, but I am active Showdown user who has a particular interest in math and the power of data analysis in competitive Pokémon. I was recently thinking about how suspect testing and ban discussions tend to be extremely subjective. This led me to read this thread: http://www.smogon.com/forums/threads/characteristics-of-a-desirable-pokemon-metagame.66515/. Doug identifies Balance as one of the main desired characteristics of a metagame, and one that should be taken into heavy consideration during the banning/suspecting process. But I noticed that one of the main responses is that Balance, as well as the other characteristics he identifies, is hard to quantify. So I, inspired in part by the Gini Coefficient I learned about in microeconomics, attempted to formulate a way to quantify the relative balance of a given metagame at a given point of time by analyzing usage statistics. The fruits of my labor are contained in the Excel file I attached.
Now, on to my methodology. I started by pulling the usage percentages of the top 100 most used Pokémon in Gen 5 OU for every month, from April 2011 to November 2013. For each month, I plot those 100 points: the rank in the usage stats is the x-coordinate and the usage percentage is the y-coordinate. From here I ran a logarithmic regression for each month's data set, most of which yielded a respectable R^2 value of around .98. Now, logically speaking, a metagame in which there is a sharp drop-off in usage beyond the first handful of top Pokémon is relatively unbalanced, by Doug's definition of Balance. So one way to analyze balance is to look at the regression: the more negative the coefficient of the logarithmic term, the less balanced the metagame is. As points of comparison, I also calculated the 20:20 ratio and Palma ratio for each month's statistics (two metrics commonly used to calculate income inequality). I was also going to calculate Gini, Hoover, and Theil indices but those are more tedious, and I was lazy. So I then plotted the logarithmic coefficients, 20:20 ratios, and Palma ratios over time. In comparison, the graphs of the three different balance metrics appear to be very consistent. By looking at those graphs, we theoretically have a post-mortem of the Gen 5 OU metagame: valleys indicate relative balance, peaks indicate relative unbalance. Also attached is a picture of the logarithmic coefficient graph, marked with bans and significant events that may have affected the balance of the metagame. And some of the trends make sense: during the early stages of BW and BW2, the tier appeared relatively balanced as people were getting a feel for the new metagame. And the downward trend from September 2011 onward indicated that Smogon's bans were generally good for the balance of the OU metagame.
If this idea of mine takes off, it could have significant implications for Smogon policy. We could compare the balance of a metagame before and during a suspect test, and use this to help make a ban verdict. In a perfect world, we would have ways to quantify all of the characteristics that Doug described in his post. That would help shift the metagame development process one from being subjective to more objective. As of now, the only empirical evidence commonly used in suspect discussions are damage calcs (lol).
So anyways, if you managed to read all of this, congrats and thanks. Let me know what you think!