Greetings fellow smogon nerds,

Have you ever experienced the frustrations of laddering under our current ranking system? I reach out specifically to players who are ranked in the very top percentages of the ladder. Anyone who ladders competitively and who has made it to the top 10 or so of the ladder will really know what I'm talking about. I have been on pokemon showdown since late 2013, right when generation 6 made its debut, and have played a variety of metagames. As of now I simply focus on playing Anything Goes, which is by far my favorite metagame to play. I have consistently been a high ranked player on the Gen 7 AG Ladder, and have made it to number 1 only once, although I am quite often in the top 10. Anyone who plays Anything Goes knows that out of all of the ladders on the website besides Random Battle and occasionally OU, the AG ladder typically contains the highest ELO rated players. Have a look at this screenshot for instance:

As can be seen, 4 players are rated above 2000, while there is a big gap between the ranked number 4 player and the ranked number 5 player. In other ladders, such as the ubers ladder, top rated players are in the high 1800s, and for inactive ladders top rated players can be as low as in the 1200s-1300s. At this time, I was at my highest ever elo rating, and I felt very satisfied. I had been playing 6 games per day, which as far as I know is the minimum number of games needed to be played in order to prevent decay. The next day, however, in the very first game I played, I was matched up against a player who was rated in the low 1800s, and I lost due to making a terrible misplay, scratching a terrible 30 something points off of my rating. I then proceeded to win several games in a row in order to get my rating back up, but before I made it to 2065+ again, I faced another crippling loss, knocking me down even further than where I had been knocked the first time. It wasn't long before I started playing badly due to frustration, and I fell to the low 1900s. Each time I lost I would win several games in succession, each one earning me a mere 4-6 ladder points, only to see myself get knocked down by 30 points again from a single loss. After I had "tilted," I decided to take a several day break from laddering due to how frustrating it was for me. And trust me, I had been through this many times already. Tilting is a common scenario that high ladder players experience, and to prevent it they often have a policy such as "if I experience 2 losses in a row, then I will take a break from laddering until tomorrow to prevent tilting." After going through the tilting process countless times, it dawned on me one day, "is the pokemon showdown rating system really that accurate?" I raised this question not only out of my frustration but also due to the several, severe issues that I noticed with the way players on showdown are ranked.

This is something that I hinted at earlier. Every ladder has a different peak elo score, and every ladder has a different degree of variance. As of right now, for instance, the number 1 rated player in Anything Goes has an elo of 2087, and the ranked 500 player has a rating of 1581, while in OU, the number 1 player is at 1979, while the number 500 player is at 1688, as can be seen in these screenshots:

So the big question about these screenshots is, which player is better, the number 1 AG player or the number 1 OU player? What about the number 500 AG player and the number 500 OU player? Based on elo, this question is simply impossible to answer. A 1688 elo in OU is not equivalent to a 1688 elo in AG, because both ladders have different degrees of variance, and different population sizes. The AG ladder obviously has more variance than the OU ladder, despite the fact that I am pretty sure that the OU ladder contains a greater number of players. To top that off, elo, simply put, is not an accurate method of displaying a player's unknowable true rating, especially when it comes to the game of pokemon. This will be further emphasized in the next issue with it.

This, I must say, is a horrendous issue that needs to be fixed. There are some players who are extremely skilled, and easily capable of topping the ladder, yet they do not ladder very often. As a result, they are rated much lower than they should be, due to the fact that they decay by going long periods without laddering. When they actually do ladder, any higher rated sucker who is unfortunate enough to get matched up with them risks getting beaten and losing more points than they should.

As far as I know, rating decay begins at 1500 and does not occur at ratings 1499 or below (although I've heard that in some ladders this value is different, such as 1399, but maybe this isn't true). Regardless of the "minimum rating," that elo decays to, I do not think that every player above this rating should eventually decay down to the same rating. Why on earth should a 2100 rated player decay to 1499 after not playing (in this case, for a very long time) for a while, and a player rated 1550 decay to that same rating (in a shorter amount of time, though)?

This is perhaps the biggest reason why the elo rating system is not suited for pokemon. As we all know, in pokemon, there exists a little battle mechanic that game freak decided to implement into the game called accuracy. This acts hand in hand with the secondary affect chances that some moves have, such as a 10% chance to burn paralyze or freeze. We all know the frustration of getting "haxed," when we miss a 90% accurate move 3 times in a row, or when we get smacked with a double or even triple crit (I witnessed a triple crit occur in a battle yesterday, not sure if this is a bug or not, but the chance of a crit occuring is supposed to be 1/16), or when the opponent freezes a pokemon with ice beam. And don't forget about the dreaded para or flinch hax! Due to the existence of these "hax" factors, we can sometimes lose battles on the ladder that we clearly should have won, and simply lost because of luck. People may argue that hax happens to everyone, and that it is just a part of the game, and that while it does result in upset losses for a player it also results in upset wins, which is true, but there is no reason why a 2050 rated player should lose 40 points to a 1700s player simply because he lost due to an extremely unlucky game. In other words, the pokemon showdown elo rating system simply does not offer an accurate representation of a player's or pokemon team's consistency.

This is an interesting question to answer. In the past, pokemon showdown used ACRE, a ranking system that was horribly inaccurate when it came to estimating ratings that had a high rd (rating deviation, which is a fancy term for standard deviation), and was meant as a way of interpreting the glicko-1 rating. Eventually, showdown switched to elo, and made a few changes to the system afterward such as introducing a rating floor of 1000, and some other tweaks. Currently, the pokemon showdown ladder displays a player's elo, glicko-1, GXE, and on suspect ladders, COIL. Of all of these ratings, which one takes the cake as being the most accurate? That answer is, without any doubt whatsoever, GXE. If you don't believe me, take a look at X-Act's original post that introduced the concept of GXE, and how it is calculated, which can be found here: http://www.smogon.com/forums/threads/glixare-a-much-better-way-of-estimating-a-players-overall-rating-than-shoddys-cre.51169/ . If you read everything that X-Act wrote, you will see that he calculated the

The big reason why gxe is not used to rank players is that it is a percentage, rather than a solid whole number. People tend to prefer their rating as a whole number, rather than as a percentage referring to their estimated chance of winning a match against a random opponent. Another issue with GXE is that it is less accurate in determining a player's rating when the player has a high rating deviation, and so rating deviations above 100 result in a rating of "provisional," or 0. It also shares some of the problems with glicko in that after many battles are played, it is difficult to change. However, I have a simple solution to all of these issues.

In reality, the goal of a rating system is to get as close to an estimate of a player's true rating as possible. X-Act's GXE formula does an extremely good job of doing this, so I really think that we need to consider the work that he left behind, since he is not available on showdown and hasn't been online since 2012. I am honestly baffled at why the smogon staff chose to use elo as the primary ranking system, when GXE is far superior, and can have its issues fixed with a few tweaks.

The original GXE formula looked like this:

Given a player rating R and a rating deviation RD:

GLIXARE Rating = 0, if RD > 100

GLIXARE Rating = round(10000 / (1 + 10^(((1500 - R) * pi / sqrt(3 * ln(10)^2 * RD^2 + 2500 * (64 * pi^2 + 147 * ln(10)^2)))))) / 100, otherwise

I have tested this formula with the glicko and rds of current top ladder players, and I did not get the same resulting GXEs as what the ladder displays, so I am almost certain that this formula has been modified in some way. Regardless, the above formula is pretty darn accurate, and if it has been modified in some way for the pokemon showdown ladder, which I am sure it has, then I don't doubt that the new formula is just as accurate if not more accurate. I must say that despite the claim that GXE is not as accurate when glicko-RD is high, the pokemon showdown ladder has a maximum RD of 350 for glicko-1 ratings, and honestly, if you compare a gxe that is based on a glicko rating with an RD of 0 with a glicko rating with an RD of 350, the resulting GXEs are not tremendously different.

Example:

2051 glicko rating rd of 0 >>> GXE: 89.3

2051 glicko rating rd of 350 >>> GXE: 84.6

Yes there is a 4.7 point difference here, but it really isn't that big when you consider how far apart a deviation of 0 (which by the way, is virtually impossible) is from a deviation of 350 (the deviation of a player who has not played any games). These results were calculated using the original GXE formula in microsoft excel.

I feel that it would be very nice if a player's GXE were to be converted into a whole number, preferably a 4 digit number that falls between 1000 and 2100-2300 or so, similarly to the elo rating on pokemon showdown's ladder. X-Act provided a simple way of doing this, proposing that the GXE simply be multiplied by 20, therefore resulting in a whole number rating with a floor of 0 (for a person with a 0 gxe, which is virtually impossible), and a maximum of 2000 (for a person with a 100 gxe, which is also virtually impossible). The problem with a rating system like this is that as a player gets into the 1900s, winning battles results in extremely small point gains, and reaching 2000 or even reaching 1960 is nigh impossible (1960 would be the equivalent of a 98 gxe, which is extremely hard to obtain, unless you are that sweetlol2 guy on the ubers ladder who apparently has a 98.3 gxe).

And so I wondered, "what if I solve the gxe formula for glicko in terms of gxe, using an rd of 0?" And so I did. Here is how you would derive glicko (without an RD) from the original GXE formula:

I then considered how the elo rating system relates to the glicko rating system, and I found this: http://www.glicko.net/ratings/report08.txt , which contains a formula for doing a rough conversion of the FIDE chess rating (a chess rating system which is virtually equivalent to elo) to the USCF (United States Chess Federation) chess rating (which is virtually equivalent to glicko). From this, it is possible to derive a rough conversion of the glicko rating system to the elo rating system, with a centered elo rating of 1250 (which I am sure is pretty close to the current mean elo rating on pokemon showdown's ladders, and yes this would mean that the starting rating would NOT be 1000).

The equation looks like this:

Elo = ROUND(((B78-720)/0.624),0) if Glicko < ~1969.2

Elo = ROUND(((Glicko+350)/1.1585),0) if Glicko > ~1969.2

From all of this, we can derive an equation for essentially converting GXE into Elo. However, the rating obtained from this equation is not equivalent to elo, but rather an estimate of what a player's TRUE Elo is, a value that is FAR more accurate in depicting a player's skill level than the crappy elo system that pokemon showdown uses. I'm not sure how the community will view this, but if I were to come up with a name for this rating I suppose the most basic name would be TRUE ELO, or TELO. This equation is listed here:

TELO = ROUND((((-((LN(100/GXE-1)/LN(10))-2.50901023943244)/0.00167267349295496)-720)/0.624),0) if GXE < 85.9

TELO = ROUND((((-((LN(100/GXE-1)/LN(10))-2.50901023943244)/0.00167267349295496)+350)/1.1585),0) if GXE is > or equal to 85.9

https://docs.google.com/spreadsheets/d/1ITHZxJcczf4Hd9xZT0mPLYjCDi8tc7HfRS0t61Y7mB8/edit?usp=sharing

Essentially, yes, but it displays GXE in a more convenient fashion. I do not feel that we should make any changes to the current GXE formula, except that no rating with an RD below 350 be displayed as "0" or "provisional," since the Glicko RD acts as a decay agent.

Have you ever experienced the frustrations of laddering under our current ranking system? I reach out specifically to players who are ranked in the very top percentages of the ladder. Anyone who ladders competitively and who has made it to the top 10 or so of the ladder will really know what I'm talking about. I have been on pokemon showdown since late 2013, right when generation 6 made its debut, and have played a variety of metagames. As of now I simply focus on playing Anything Goes, which is by far my favorite metagame to play. I have consistently been a high ranked player on the Gen 7 AG Ladder, and have made it to number 1 only once, although I am quite often in the top 10. Anyone who plays Anything Goes knows that out of all of the ladders on the website besides Random Battle and occasionally OU, the AG ladder typically contains the highest ELO rated players. Have a look at this screenshot for instance:

As can be seen, 4 players are rated above 2000, while there is a big gap between the ranked number 4 player and the ranked number 5 player. In other ladders, such as the ubers ladder, top rated players are in the high 1800s, and for inactive ladders top rated players can be as low as in the 1200s-1300s. At this time, I was at my highest ever elo rating, and I felt very satisfied. I had been playing 6 games per day, which as far as I know is the minimum number of games needed to be played in order to prevent decay. The next day, however, in the very first game I played, I was matched up against a player who was rated in the low 1800s, and I lost due to making a terrible misplay, scratching a terrible 30 something points off of my rating. I then proceeded to win several games in a row in order to get my rating back up, but before I made it to 2065+ again, I faced another crippling loss, knocking me down even further than where I had been knocked the first time. It wasn't long before I started playing badly due to frustration, and I fell to the low 1900s. Each time I lost I would win several games in succession, each one earning me a mere 4-6 ladder points, only to see myself get knocked down by 30 points again from a single loss. After I had "tilted," I decided to take a several day break from laddering due to how frustrating it was for me. And trust me, I had been through this many times already. Tilting is a common scenario that high ladder players experience, and to prevent it they often have a policy such as "if I experience 2 losses in a row, then I will take a break from laddering until tomorrow to prevent tilting." After going through the tilting process countless times, it dawned on me one day, "is the pokemon showdown rating system really that accurate?" I raised this question not only out of my frustration but also due to the several, severe issues that I noticed with the way players on showdown are ranked.

**Issue Number 1: Elo Ratings From One Ladder Cannot Be Accurately Compared to Those From Another**

This is something that I hinted at earlier. Every ladder has a different peak elo score, and every ladder has a different degree of variance. As of right now, for instance, the number 1 rated player in Anything Goes has an elo of 2087, and the ranked 500 player has a rating of 1581, while in OU, the number 1 player is at 1979, while the number 500 player is at 1688, as can be seen in these screenshots:

So the big question about these screenshots is, which player is better, the number 1 AG player or the number 1 OU player? What about the number 500 AG player and the number 500 OU player? Based on elo, this question is simply impossible to answer. A 1688 elo in OU is not equivalent to a 1688 elo in AG, because both ladders have different degrees of variance, and different population sizes. The AG ladder obviously has more variance than the OU ladder, despite the fact that I am pretty sure that the OU ladder contains a greater number of players. To top that off, elo, simply put, is not an accurate method of displaying a player's unknowable true rating, especially when it comes to the game of pokemon. This will be further emphasized in the next issue with it.

**Issue Number 2: Good Players Can Decay and then Rob Higher Elo Rated Players of Points**This, I must say, is a horrendous issue that needs to be fixed. There are some players who are extremely skilled, and easily capable of topping the ladder, yet they do not ladder very often. As a result, they are rated much lower than they should be, due to the fact that they decay by going long periods without laddering. When they actually do ladder, any higher rated sucker who is unfortunate enough to get matched up with them risks getting beaten and losing more points than they should.

**Issue Number 3: All High Rated Players Decay to the Same Rating**

As far as I know, rating decay begins at 1500 and does not occur at ratings 1499 or below (although I've heard that in some ladders this value is different, such as 1399, but maybe this isn't true). Regardless of the "minimum rating," that elo decays to, I do not think that every player above this rating should eventually decay down to the same rating. Why on earth should a 2100 rated player decay to 1499 after not playing (in this case, for a very long time) for a while, and a player rated 1550 decay to that same rating (in a shorter amount of time, though)?

**Issue Number 4: Pokemon is not only a game of skill, but also a game of luck**

This is perhaps the biggest reason why the elo rating system is not suited for pokemon. As we all know, in pokemon, there exists a little battle mechanic that game freak decided to implement into the game called accuracy. This acts hand in hand with the secondary affect chances that some moves have, such as a 10% chance to burn paralyze or freeze. We all know the frustration of getting "haxed," when we miss a 90% accurate move 3 times in a row, or when we get smacked with a double or even triple crit (I witnessed a triple crit occur in a battle yesterday, not sure if this is a bug or not, but the chance of a crit occuring is supposed to be 1/16), or when the opponent freezes a pokemon with ice beam. And don't forget about the dreaded para or flinch hax! Due to the existence of these "hax" factors, we can sometimes lose battles on the ladder that we clearly should have won, and simply lost because of luck. People may argue that hax happens to everyone, and that it is just a part of the game, and that while it does result in upset losses for a player it also results in upset wins, which is true, but there is no reason why a 2050 rated player should lose 40 points to a 1700s player simply because he lost due to an extremely unlucky game. In other words, the pokemon showdown elo rating system simply does not offer an accurate representation of a player's or pokemon team's consistency.

**So if Pokemon Showdown Does Not Use Elo to Rank Players, What Ranking System Can Possibly Replace It?**

This is an interesting question to answer. In the past, pokemon showdown used ACRE, a ranking system that was horribly inaccurate when it came to estimating ratings that had a high rd (rating deviation, which is a fancy term for standard deviation), and was meant as a way of interpreting the glicko-1 rating. Eventually, showdown switched to elo, and made a few changes to the system afterward such as introducing a rating floor of 1000, and some other tweaks. Currently, the pokemon showdown ladder displays a player's elo, glicko-1, GXE, and on suspect ladders, COIL. Of all of these ratings, which one takes the cake as being the most accurate? That answer is, without any doubt whatsoever, GXE. If you don't believe me, take a look at X-Act's original post that introduced the concept of GXE, and how it is calculated, which can be found here: http://www.smogon.com/forums/threads/glixare-a-much-better-way-of-estimating-a-players-overall-rating-than-shoddys-cre.51169/ . If you read everything that X-Act wrote, you will see that he calculated the

*exact*true rating of 250 chosen players by examining the glicko and rd of each player, and using an equation devised by Mark Glickman, the inventor of the Glicko-1 and Glicko-2 rating systems, to calculate the probability of every single player beating every single other player. He then matched these true ratings up with his GXE formula, and the results were astoundingly accurate. The GXE formula ordered the players in the exact same order that their true ratings would have ordered them in, a degree of accuracy which is absolutely stunning.**So if GXE is More Accurate Than Glicko-1 and Elo, Why Don't We Use it to Rank Players?**

The big reason why gxe is not used to rank players is that it is a percentage, rather than a solid whole number. People tend to prefer their rating as a whole number, rather than as a percentage referring to their estimated chance of winning a match against a random opponent. Another issue with GXE is that it is less accurate in determining a player's rating when the player has a high rating deviation, and so rating deviations above 100 result in a rating of "provisional," or 0. It also shares some of the problems with glicko in that after many battles are played, it is difficult to change. However, I have a simple solution to all of these issues.

**My Rating System**

In reality, the goal of a rating system is to get as close to an estimate of a player's true rating as possible. X-Act's GXE formula does an extremely good job of doing this, so I really think that we need to consider the work that he left behind, since he is not available on showdown and hasn't been online since 2012. I am honestly baffled at why the smogon staff chose to use elo as the primary ranking system, when GXE is far superior, and can have its issues fixed with a few tweaks.

The original GXE formula looked like this:

GLIXARE Rating = 0, if RD > 100

GLIXARE Rating = round(10000 / (1 + 10^(((1500 - R) * pi / sqrt(3 * ln(10)^2 * RD^2 + 2500 * (64 * pi^2 + 147 * ln(10)^2)))))) / 100, otherwise

I have tested this formula with the glicko and rds of current top ladder players, and I did not get the same resulting GXEs as what the ladder displays, so I am almost certain that this formula has been modified in some way. Regardless, the above formula is pretty darn accurate, and if it has been modified in some way for the pokemon showdown ladder, which I am sure it has, then I don't doubt that the new formula is just as accurate if not more accurate. I must say that despite the claim that GXE is not as accurate when glicko-RD is high, the pokemon showdown ladder has a maximum RD of 350 for glicko-1 ratings, and honestly, if you compare a gxe that is based on a glicko rating with an RD of 0 with a glicko rating with an RD of 350, the resulting GXEs are not tremendously different.

Example:

2051 glicko rating rd of 0 >>> GXE: 89.3

2051 glicko rating rd of 350 >>> GXE: 84.6

Yes there is a 4.7 point difference here, but it really isn't that big when you consider how far apart a deviation of 0 (which by the way, is virtually impossible) is from a deviation of 350 (the deviation of a player who has not played any games). These results were calculated using the original GXE formula in microsoft excel.

I feel that it would be very nice if a player's GXE were to be converted into a whole number, preferably a 4 digit number that falls between 1000 and 2100-2300 or so, similarly to the elo rating on pokemon showdown's ladder. X-Act provided a simple way of doing this, proposing that the GXE simply be multiplied by 20, therefore resulting in a whole number rating with a floor of 0 (for a person with a 0 gxe, which is virtually impossible), and a maximum of 2000 (for a person with a 100 gxe, which is also virtually impossible). The problem with a rating system like this is that as a player gets into the 1900s, winning battles results in extremely small point gains, and reaching 2000 or even reaching 1960 is nigh impossible (1960 would be the equivalent of a 98 gxe, which is extremely hard to obtain, unless you are that sweetlol2 guy on the ubers ladder who apparently has a 98.3 gxe).

And so I wondered, "what if I solve the gxe formula for glicko in terms of gxe, using an rd of 0?" And so I did. Here is how you would derive glicko (without an RD) from the original GXE formula:

Glicko = -((LN(100/GXE-1)/LN(10))-2.50901023943244)/0.00167267349295496

I then considered how the elo rating system relates to the glicko rating system, and I found this: http://www.glicko.net/ratings/report08.txt , which contains a formula for doing a rough conversion of the FIDE chess rating (a chess rating system which is virtually equivalent to elo) to the USCF (United States Chess Federation) chess rating (which is virtually equivalent to glicko). From this, it is possible to derive a rough conversion of the glicko rating system to the elo rating system, with a centered elo rating of 1250 (which I am sure is pretty close to the current mean elo rating on pokemon showdown's ladders, and yes this would mean that the starting rating would NOT be 1000).

The equation looks like this:

Elo = ROUND(((Glicko+350)/1.1585),0) if Glicko > ~1969.2

From all of this, we can derive an equation for essentially converting GXE into Elo. However, the rating obtained from this equation is not equivalent to elo, but rather an estimate of what a player's TRUE Elo is, a value that is FAR more accurate in depicting a player's skill level than the crappy elo system that pokemon showdown uses. I'm not sure how the community will view this, but if I were to come up with a name for this rating I suppose the most basic name would be TRUE ELO, or TELO. This equation is listed here:

TELO = ROUND((((-((LN(100/GXE-1)/LN(10))-2.50901023943244)/0.00167267349295496)+350)/1.1585),0) if GXE is > or equal to 85.9

**The Advantages of This New Rating System****It has built in decay, due to the fact that GXE decays as RD gets bigger****Not all players will decay to the same final value, since the maximum RD is 350****It is FAR more accurate in depicting a player's true skill than elo, due to the fact that it uses GXE****Good players will not lose a TON of points from losing to weaker players because of hax/misplays****Players with high ratings will not get robbed of points by good players who have a misrepresentative rating****The range of ratings will go from as low as 500 to as much as 2500 (for players who are INSANELY good), which is similar to the range of ratings for the current elo system.****Ratings will be able to be accurately compared across ladders****Displays GXE as an elo rating, therefore not being a percentage.****It will not be ridiculously hard for top rated players to gain points due to the fact that the rating takes into account the fact that GXE becomes harder to raise as it approaches 100.****Doesn't share the problem that ACRE had, in which some players can get ridiculously high ladder scores.**

**Issues With This Rating System**- Shares a similar problem with glicko in that after many battles are played, the rating will not be as sensitive to change. HOWEVER, it will still be possible to regain the ability to gain points more quicky by taking a break, and letting the RD increase.
- Players may be able to park themselves at the top of the ladder, HOWEVER their ratings would still decay due to the increase of RD.

**Easy Fix to the Problems With TELO****If a "ladder reset" option were programmed into the game, players could easily reset their rating back to 1250 with an RD of 350, if their rating was at a standstill and the player did not feel like waiting for his/her RD to increase.****It would have to be required that a player could not perform a ladder reset unless they have played "x" number of games, or until their RD is at least as low as "x," to prevent players from immediately resetting their ladder ranking upon losing their first match or their first few matches.****If desired, it could be programmed that ONLY players with an RD of 350 have their rating temporarily removed from the ladder, until they decide to ladder again. This would prevent parking.**

https://docs.google.com/spreadsheets/d/1ITHZxJcczf4Hd9xZT0mPLYjCDi8tc7HfRS0t61Y7mB8/edit?usp=sharing

**But wait, isn't this just ranking players by GXE?**

Essentially, yes, but it displays GXE in a more convenient fashion. I do not feel that we should make any changes to the current GXE formula, except that no rating with an RD below 350 be displayed as "0" or "provisional," since the Glicko RD acts as a decay agent.

Last edited: