The OU List algorithm version 2.0

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
The old algorithm to generate the OU list was used to create the current OU tier in early January, but I feel that the algorithm has a few shortcomings. These are the following:

1) It relies on the amount of times each Pokemon WAS used previously only. That means that the OU list is obsolete practically as soon as it is posted.

2) Its 75% cut-off point is not only arbitrary, but also lacks meaning. A more meaningful cut-off point would improve things.

Hence I set out to try to overcome these shortcomings in time for the tier list update due for the 1st of April.

To address point 1), I used a form of linear extrapolation to predict how the OU list would look like in the seventh month given the amount of percentage usage each Pokemon had in the previous six months. The recently added Wobbuffet and Deoxys-S would have only information from the previous 2 months and the previous 3 months respectively, and hence predicting their future usage is less accurate.

To address point 2), instead of "listing the Pokemon within the first 75% of the cumulative frequency distribution" (which lacks concrete meaning), I list the Pokemon having a high probability of featuring in a team from the predicted list generated above, which threshold we can agree upon. Yes, this threshold will still be arbitrary, but at least the phrase "probability of a Pokemon featuring in a team" is much more understandable than "a Pokemon is within the first 75% of the cumulative frequency distribution", and hence our familiarity with the term provides us with a better opportunity of finding a good threshold.

Okay, so here's the new algorithm:

1) Take the weighted usage lists of the previous six months and convert them into percentage weighted usages.
2) For each Pokemon, predict what percentage usage it will have in the next month.
3) Convert each predicted percentage usage into the percentage probability of how much likely each Pokemon will feature in a team.
4) Those Pokemon that have more than x% probability of featuring into a team make it into the OU list.

I'll leave out the function that does the prediction in step 2) for the time being. I'll just say that I've been experimenting with various methods of prediction, and finally I settled to one which is both good and simple. Here's a graph of what the predicted values of Garchomp, Gengar, Blissey, Gyarados and Tyranitar are (this only uses the previous 5 months instead of 6):


To find the probability that a Pokemon will feature in a team, I used the formula:

P = 6*p*(1-p)^5 where p is the predicted percentage usage.

When applied to Garchomp, this would be 21.1%, meaning that Garchomp is predicted to be featured in more than 1 out of every 5 teams (unless it's banned, lol).

To conclude, I'm asking two things:

1) Do you agree with this new way of considering OU?
2) What should be the minimum percentage probability of a Pokemon being featured in a team to allow it in OU? This new threshold can be used even for the old system if you don't agree with the new one.
 

Great Sage

Banned deucer.
While I'm sure you put a lot of research into this, I don't think that picking a number for "x" is any less arbitrary than the 75% previously decided. Using an algorithm to determine predicted usage is also pretty arbitrary, even if the function itself is mathematically sound.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
The usage statistics do not tell the probability of a Pokemon appearing on a team. If you sweep a team with just Garchomp, only Garchomp is counted toward the usage statistics. If I use all six of my Pokemon, all six are included.
 

Cathy

Banned deucer.
You could just substitute that language in his post for the pokemon being "used" by the definition on the statistics page, and the algorithm is unchanged (except for the probability function, but that is not part of the core algorithm -- which is actually the prediction function which hasn't been posted). But it doesn't represent much of an advancement over the old one, especially since the cut off point is still arbitrary. Another problem with this model is that it necessarily will have to make predictions based on the usage data. It might be the case that as the usage of pokemon X goes up, the usage of pokemon Y goes down. A clever enough prediction function would easily handle cases like this. But in practice I suspect most of the interesting changes are a result of forum posts etc. -- i.e. things external to the data, which you won't be predicting.

In order to avoid an arbitrary cut off point, I think we need a better idea of the purpose of OU. Here is one possible method. The main functional purpose of OU is to ban pokemon from UU (since if you just want a most common pokemon list, you can look directly at the statistics). And since UU is supposed to contain pokemon you don't see everyday, perhaps we can look at this literally. Let's say the average battler on the ladder sees x pokemon a day (this is a figure we can actually determine). Then we could define OU as all the pokemon satisfying f > t / x where f is the number of usages of the pokemon, and t is the total number of usages. This would be a way of defining OU connected to OU's main function and UU's purpose, and all of the figures are directly derived from data.

I mean this mainly as an example to illustrate that a less arbitrary way of defining OU can come from a more fleshed out understanding of the purpose of the OU list.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I just had an idea.

If we sum up the number of lead Pokemon and divide by 2, it will give us the number of battles played during that month. For February, this was 120679.

If we divide the unweighted usage of a Pokemon by this number, this will be the probability that a Pokemon shows up in a particular battle. We could use that.
 

chaos

is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Smogon Discord Contributoris a Contributor to Smogonis an Administratoris a Tournament Director Alumnusis a Researcher Alumnus
Owner
I recommend having Brain look over this. (And when you're done, you need to give me some pseudocode for me to update the Smogon sources with)

oh yeah and the tiers description page will need updating too
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Okay, I'll talk to Brain.

I need to know:
  • if you prefer the predictive approach to the usage (old) approach or not
  • either way, how to change the criterion for a Pokemon to be called OU
From ColinJF:
Let's say the average battler on the ladder sees x pokemon a day (this is a figure we can actually determine). Then we could define OU as all the pokemon satisfying f > t / x where f is the number of usages of the pokemon, and t is the total number of usages

So you're saying, if I'm understanding correctly, that an OU Pokemon is one that is seen at least once by every player everyday on average. Isn't this also an arbitrary choice? Why not twice a day for instance? Or three times? There is also the problem of not knowing how many players played on the ladder in a month, though I'd assume you have access to such data.
 

Cathy

Banned deucer.
Well, the reason a day isn't arbitrary is that it s also the time used as a rating period in the rating system and is supposed to represent a 'session' of play. I suppose the number of times you would expect to see an OU pokemon each day by that definition is still arbitrary, but the idea was that if it is common enough that you would expect (in the mathematical sense) to see it in a typical session of standard play, then it is too common for UU. We can call anything arbitrary, but ideally this is justified by virtue of the fact that a player would remember everything they saw from a typical session of standard, and not want to see those in a typical game of UU.

If even that is 'arbitrary' then at least we have shifted the place at which at 'arbitrary' decision has to be made to something more directly related to the purpose of the OU list.

As I said above, this method is just intended to supplement my point about how we might justify a cut off with reference to the purpose of the OU list. This method might not be acceptable for actual use.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Okay, to answer some posts:

Great Sage: I said that in the original post. :( My point is that the definition is still better (now it's changed to 'probability of a Pokemon featuring in a battle') since it's more intuitive.

Obi: I fixed that problem now by using half the sum of lead Pokemon as the total number of matches played, from which the probability that a Pokemon features in a battle can be determined.

Anyway, this seems to not have too much support, unless someone else posts to the contrary.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
I'd have to see just how you plan on predicting before I can fully support the idea in its implementation, but I like it in theory.
 

Aldaron

geriatric
is a Tournament Director Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
I support this method of determining OU, as I also have always have had an issue with Tier Lists actually being outdated.

However, I would like you to go into detail regarding this "I'll leave out the function that does the prediction in step 2) for the time being. I'll just say that I've been experimenting with various methods of prediction, and finally I settled to one which is both good and simple."
 

Blue Kirby

Never back down.
is a Top Tutor Alumnusis a Site Content Manager Alumnusis a Battle Simulator Admin Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnusis a Past SPL Championis a Three-Time Past WCoP Champion
I like the sound of the predictive approach being used, as it seems to have the ability to overcome many problems that the old method faced. However, I'm also very much interested as to the function you "settled" on using, and whether it is indeed any less arbitrary than just using the old 75% cut off.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Before I reveal the prediction function, I need to first say that prediction is something that will never be exact, no matter what you do.

With that out of the way, the way the prediction function works is based on linear extrapolation. Assuming you have points (1,y_1), (2,y_2), (3,y_3), ..., (n,y_n), the gradient between (i,y_i) and (i+1,y_i+1) is found for all i between 1 and n-1. Then, a weight is assigned to these gradients depending on how far they are from the last point. The further they are, the less weight they have. I provided the following weights: 1, 3, 6, 10, 15, ..., n(n-1)/2 to the gradients, from the furthest one to the last point to the nearest one to the last point. Finally, these gradients are averaged out to find the gradient of how the function should continue when x = n+1. This provides the following formula:

y_(n+1) = (n+4)(y_n)/(n+1) - 6(y_1 + 2y_2 + 3y_3 + ... + (n-1)y_(n-1))/n(n^2-1)

For example, suppose we have the following values:

1,5,4,3,2,4.

Predicted next value = y_7.

y_7 = 10(y_6)/7 - 6(y_1 + 2y_2 + 3y_3 + 4y_4 + 5y_5)/(6)(6^2-1)
y_7 = 10(4)/7 - 6(1 + 2(5) + 3(4) + 4(3) + 5(2))/(6)(35)
y_7 = 40/7 - 6(1+10+12+12+10)/210
y_7 = 40/7 - 6(45)/210
y_7 = 40/7 - 9/7
y_7 = 31/7
y_7 = 4.43

The predicted value makes sense because there was an increase between the last two values, and the increase between the last value and the predicted one is not too much since previously the value was decreasing slowly.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I need to start a debate regarding the criterion on what makes a Pokemon OU, keeping in mind that OU means 'used a lot'. The question is: how much is 'a lot'? Something I've noticed in our OU list is that it contains too many Pokemon, which makes something like Umbreon (listed as OU) being a bit of a 'debatable' OU considering it was used less than 10% of how much Blissey was used at that time.

We could use either u_n, the predicted percentage usage of a Pokemon, or p_n, the predicted probability of a Pokemon appearing in a battle. I prefer p_n over u_n, but there's not much difference either way.

Garchomp's predicted probability of appearing in a battle is about 40%, to give you an idea, and its probability is the highest of all Pokemon. That means that it's more likely NOT to see a Garchomp in battle than it is to see it, and this, of course, holds for EVERY Pokemon. So keep this in mind before saying, for example, "OU should have Pokemon that have at least a 50% chance of being seen in a battle", since such a definition would have 0 Pokemon in OU.

Let me provide with a few suggestions of criteria for Pokemon to be OU:

1) Sort the p_n's in descending order for each Pokemon. The first x Pokemon in this list are OU. (For example, if x is 40, then OU would consist of the 40 most used Pokemon.)

2) All Pokemon having p_n at least x% are OU. (For example, if x is 10%, then OU consists of those Pokemon that appear in at least 10% of battles.)

3) Let p_1 be the maximum of all the p_n's, and calculate p_n/p_1 for every Pokemon. Those having this number at least x% are OU. (This means that, if x is 25% and the most common Pokemon is Garchomp, say, then OU would consist of those Pokemon that feature a number of times that is at least 25% of the number of times that Garchomp appears in battle.)

Which of the above is the best criterion for OU, and what should the cut-off point be? Or is there maybe a better criterion? I lean towards options 2) or 3) more than option 1), with around 10% for option 2) or 20-25% for option 3), but that's just my opinion.

Debate starts!
 
I am not sure if I entirely understand #3 but #2 seems the right one to me.

EDIT: Alright, still second one then.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
The third is the same as the second, except that instead of saying "Pokemon with a 5% chance to appear in a match" we say "Pokemon with a chance to appear in a match equal to 20% of the probability of the most common Pokemon". The first is absolute usages, the second is relative to the most used Pokemon.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
After researching further, I finally am happy with the cut-off point we chose in the old OU algorithm.

Let's say that there are 10000 usages in a month. The old criterion states that the OU Pokemon should contribute to 75% of those usages. This means that whenever a Pokemon is used for the first time in a battle, it's an OU Pokemon 3/4 of the time.

Now a battle will contribute at least 7 usages and at most 12 usages, whose median value is 9.5. This is in line with the perceived statistics, since the average number of usages per battle was 9.63 in November, 9.79 in December, 9.67 in January and 9.52 in February, with an average of 9.65 usages per battle. (EDIT: In March, it was 9.44.) With this definition of OU, a battle should contain an average of 9.65 x 3/4 = 7.24 OU Pokemon, and since this number is slightly more than the minimum amount of Pokemon usages that a battle must have, I think it's a very good number indeed.

Because of this, I propose to leave the criterion of deciding whether a Pokemon is OU or not unchanged. The only thing that will change in the algorithm is using the predicted usages instead of the old usages to determine what will be OU. The algorithm in words is thus the following:

1) Take the weighted usage lists of the previous six months and convert them into percentage weighted usages.
2) For each Pokemon, predict what percentage usage it will have in the next month, using the predict function mentioned in an earlier post.
3) Sort the Pokemon according to their predicted percentage usage (highest first), and make a cumulative frequency of these percentage usages.
4) Those Pokemon that have less than 75% cumulative predicted percentage usage are OU.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I decided to check the performance of the prediction function, comparing the predicted values for April with the actual April values.

Only 3 Pokemon in the top 50 were mispredicted:

Mamoswine
Oct: 0.0089 Nov: 0.0087 Dec: 0.0080 Jan: 0.0086 Feb: 0.0108 Mar: 0.0122 Apr: 0.0097 Predicted Value: 0.0134

Here the algorithm predicted that Mamoswine will continue to rise in usage, given the stats for January, February and March. But it didn't.


Yanmega
Oct: 0.0113 Nov: 0.0104 Dec: 0.0096 Jan: 0.0090 Feb: 0.0098 Mar: 0.0119 Apr: 0.0093 Predicted Value: 0.0128

Same thing happened here. Given the stats of January, February and March, the algorithm predicted Yanmega will continue to rise in usage, but it didn't.


Tentacruel
Oct: 0.0060 Nov: 0.0118 Dec: 0.0082 Jan: 0.0086 Feb: 0.0075 Mar: 0.0068 Apr: 0.0078 Predicted Value: 0.0061

Tentacruel's usage was plummeting as seen in the January, February and March stats, so the algorithm predicted it's going to be plummeting further. However, it didn't.


The others were more-or-less on par with the predicted values. I'm especially pleased with how the algorithm predicted correctly that Spiritomb will not be OU in April even though it was technically OU in March, which resulted in us correctly removing Spiritomb from the OU list prematurely instead of having to wait for July to remove it.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
I was just about to post either asking someone to do this or doing it myself. Thanks.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
July 1st is nearing, and as I did last March when April 1st was looming, I decided to look into the algorithm again.

Overall I'm satisfied with how it works. What I'm not completely satisfied about is the prediction function used. I tried to test the prediction function to try to predict May's stats, and then compared with the actual May stats posted recently, and the result wasn't quite what I wanted.

What I did was as follows: I worked out the absolute difference between the predicted value and the actual value, and then summed up all these differences (called deviations). For example, if my predicted value for Garchomp was 0.0413 and the actual value was 0.0431, I calculated 0.0431 - 0.0413 = 0.0018, and so on for all Pokemon, then I summed up all these deviations together. The result was too high for my liking, meaning that too many values were mispredicted.

To add insult to injury, when I made the prediction function just copy the stat of the last month (April), the sum of the deviations was better than the one for my prediction function! :( This means that something had to be done to improve the prediction function. Furthermore, this showed me that using stats from months that are too far away from the last month is not good, since using only the last month produced a reasonable approximation to what happened in the next month (better than my prediction function anyway).

I thought of the following. We have lots of data to play with now. Why not just find the prediction function that produces the least sum of deviations artificially, i.e. via a short computer program? And that's what I did. I made a computer program to find me the best prediction function that generates the nearest value to the actual one using just the three previous months. And the function produced was the following:

Artificial Prediction Function (APF): u_0 = (86 x u_1 + 13 x u_2 + 4 x u_3) / 103

where u_n is the value of n months from now.

This confirmed my suspicion that taking values of months that are too far away from the current month is useless. The latest month contributes 83.5% of the predicted value; the month before that 12.6%; and the month before that only 3.9%. This is illustrated by the following pie chart, showing the extent of how the latest month overshadows the other months as regards to its contribution to the APF:


Anyway, it goes without saying that I'll be using the artificial prediction function over the one used in April 1st in subsequent OU charts. Hopefully this version (2.1?) will be the last tweak to the OU algorithm I'll ever do.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top