Don't panic: PS ratings have simply been rescaled

Antar · Jan 15, 2014

Those of you who have already logged onto PS this morning might have noticed something new. On the ladder, players are ranked not by ACRE but by Elo rating.

ACRE has always been problematic, and our implementation of Glicko2 has some bugs that have been leading to wildly inaccurate results. The Senior Staff is currently in the process of developing a new ladder framework that is separate from skill rating, but until that is done, we're defaulting the ladder to a tried and tested rating system: Elo, which is the same rating system that PO uses (we're not doing decay for now).

You may have noticed that you don't gain points on the new ladder nearly as quickly as you used to. This is actually by design, as ratings have a mathematical meaning (a rating difference of 200 means that the higher rated player should win roughly 75% of the time). Before you freak out, just remember that everyone else is on the same ladder that you are. For now, all that's needed is a change in mentality: 1300 is roughly the new 2000. Don't worry--suspect reqs will be adjusted accordingly.

This is not open to discussion, but feel free to PM me if you have questions.

If you want to know more about rating systems, read here.

Antar · Jan 15, 2014

Update: I understand people are experiencing some frustration with the ladder along the lines of "I gain practically no points from winning and lose a ton from losing."

Yes. This is something that happens with Elo (and most practical rating systems, actually). Under our implementation, you should typically win 25 points per win and lose 25 per loss, but the larger discrepancy between your rating and your opponent's, the fewer points the higher ranked player will gain from winning and the more points they will lose by losing. The formula is rather simple:

Point change = 50 x (score - expected)

where score is 1 for the winner and 0 for the loser, and "expected" is the odds of winning. So if you and your opponent have the same rating, then the odds of winning is 1/2, and you'll either gain or lose 25 points. On the other hand, if your rating predicts that you have a 90% chance of winning the match, you only stand to gain 5 points, and if you lose, you lose 45.

The bottom line is that, with Elo, your rating fluctuates fairly wildly. This is one reason I personally prefer the slower-updating Glicko systems, but in many ways, it's a matter of personal preference.

PS' matchmaking algorithm should ensure that you are--as often as possible--paired with players of similar rating (to keep the expected outcome as close to 1/2 as possible), but that's not always possible, depending on who's online.

It's come to my attention that PO uses a slighly modified version of the above system, where that factor of "50" (which actually starts at "32") goes down once your rating passes a certain threshold. We may or may not go to the trouble of implementing something similar, since Elo is supposed to be a temporary stopgap while we work on a more permanent solution

Calm_Mind_Latias · Feb 1, 2014

Antar,

Is there any equivalence table or simple formula to convert the ACRE or Glicko2 from the last system to the Glicko1 or ELO to this system? What are the considerations of this system.

Short question: do the element of prediction and hax limit ELO in Pokemon? Is it possible to have a Magnus Karlsson or Garry Kasparov (ELO > 2800) by the nature and elements of Pokemon battling?

Antar · Feb 1, 2014

Calm_Mind_Latias, I have moved your post to what I feel is a more appropriate thread (which was previously locked, so it's not like you *could* have posted it there.

Calm_Mind_Latias said:
Is there any equivalence table or simple formula to convert the ACRE or Glicko2 from the last system to the Glicko1 or ELO to this system? What are the considerations of this system.

The problem with the last system was that it was mathematically flawed and "broken" (as evidenced by the rating numbers in the 3000+ range, which should be statistically impossible under Glicko/2). It was also based on an incorrect assumption that I will address in answering the next part of your question.

With that in mind,

ACRE is strictly a function of Glicko rating: ACRE=R-1.4*RD, meaning you can, if you'd like, calculate your own ACRE under this new system
*Theoretically* your new Glicko rating should be the same as your old one once you've played enough battles. However, this only holds true if, on average, everyone else is playing more battles, and you don't have new players joining. In reality, the "age" of the ladder, as it were, doesn't change, with, like 80% of players on the ladder having only played a single game (and like 95% having played less than 5).
*Most likley* what you'll find is that your Glicko ratings are somewhat equivalent if you rescale, that is: R'=(R-1500)/350*130+1500. The short of this is that 1630 is the new 1850.
Zarel's still tweaking Elo, so a correspondence between Elo and anything else isn't really possible right now.

Short question: do the element of prediction and hax limit ELO in Pokemon? Is it possible to have a Magnus Karlsson or Garry Kasparov (ELO > 2800) by the nature and elements of Pokemon battling?

Elo and Glicko are perfectly capable of handling hax. The difference is reflected in the scaling of ratings. An Elo rating difference of 200 roughly means that the higher rated player has a 75% chance of winning. So if there is hax, it will mean that the rating (Glicko as well as Elo) of a more skilled player won't be quite as high as it would have been for a game with no hax, since the higher-skill-level player can still lose to "bs hax." It's still theoretically possible to get to an Elo of "master" range, but it's a lot harder. Also note that the equivalent master range wouldn't be as high, as I believe the initial rating under FIDE is higher than 1000, as we have it for our implementation of Elo.

cuteflounder · Mar 13, 2014

Not sure if this is the right place to ask this but how is rating decay applied? My friend and I have both been laddering recently but after a short period of inactivity (~a few hours), my rating will decay by 4-5 points whereas my friend's rating doesn't decay even after close to a day of inactivity.

Antar · Mar 18, 2014

cuteflounder, decay is, I believe, 1 point per day per 100 points above 1400. Is that right, Zarel?

Frostfluff, it has nothing to do with your Glicko/GXE.

Zarel · Mar 18, 2014

Antar said:
cuteflounder, decay is, I believe, 1 point per day per 100 points above 1400. Is that right, Zarel?

For active users, yes. For inactive users, it's approximately twice that.

phoopes · Mar 18, 2014

Zarel said:
For active users, yes. For inactive users, it's approximately twice that.

How is inactivity determined? Is it a week without logging in, a week without battling...? (using one week as an arbitrary time)

Zarel · Mar 18, 2014

phoopes said:
How is inactivity determined? Is it a week without logging in, a week without battling...? (using one week as an arbitrary time)

It's per day. i.e. You lose twice as much if you played 0 battles that day.

Zebstrika · Mar 21, 2014

Maybe I'm remembering wrong, but back when ACRE determined the ladder, I would see that at the top few spots of the ladder, nearly everyone had a GXE of about 95-100, and now with Elo people's GXE for all the top 500 seem to just be a number from about 70 to 90, regardless of whether you're in the top few (well, the very top 50 or so are 85-90, but after that it's a lot less consistent) or near the 500th spot.

Is that just a bug or something that got worked out with the change from Glicko2 to Glicko1?

Antar · Mar 21, 2014

Zebstrika, it's something that's been fixed. GXE/Glicko was woefully inaccurate before--it's better now?

Zebstrika · Mar 21, 2014

Yeah. I would like to think that no one beats an average ladderer 99% of the time.It makes me feel kind of pathetic :p

Antar · Mar 22, 2014

Zebstrika said:
beats an average ladderer 99% of the time.

Minor nitpick: GXE of 99 doesn't mean the player beats an average ladderer (who would be a player with a Glicko rating of 1500 and a GXE of 50) 99% of the time--it means the player wins 99% of all matches against randomly selected players on the ladder. Subtle distinction, but important.

Zebstrika · Mar 22, 2014

Antar said:
Minor nitpick: GXE of 99 doesn't mean the player beats an average ladderer (who would be a player with a Glicko rating of 1500 and a GXE of 50) 99% of the time--it means the player wins 99% of all matches against randomly selected players on the ladder. Subtle distinction, but important.

From what I read here it's supposed to be your chance of winning against a random opponent but that was too tedious to compute for an entire ladder, and the chance of beating an average, new opponent with a clean record was a very good approximation.

X-act said:
I then simulated 250 players, each having their own rating and deviation, and found the probability of every player beating every other player using the above formula, and averaged out the probabilities for every player. This provides the true rating for every player.

However, this is a strenuous effort to do, and hence I wanted to approximate this probability for every player using just his R and RD (not everyone else's as well). After considering various possibilities, it dawned on me that the probability of the player beating a 1500 rating, 350 deviation player (the rating and deviation of a player that has just joined the ladder) would provide a good approximation. When testing it out, it did provide a good approximation of the true rating... a very, very good approximation actually!

The only time it didn't provide a good approximation was when the deviation of the player was high. This confirmed yet again that players that have a rating deviation that is too high (meaning that his rating is too uncertain) shouldn't even be listed on the leaderboard. And this is what I propose for the estimated rating to be done.

After consulting a bit with the community, it was decided that this system's rating should represent the estimated percentage that that player has of winning a battle against a random opponent.

Antar · Mar 22, 2014

Zebstrika, the average player has RD much less than 130 (350 in that post) but the distribution of all players is centered at 1500 with std Dev of ~130.

Zebstrika · Mar 22, 2014

So both a new player and an average player have 1500 rank and different deviations, but the new player with 350 deviation could really have any skill (so that's how it simulates a random player) and we won't know until they play a few battles. And that's what the difference between a new 1500+350 player and average 1500+130 is?

Antar · Mar 22, 2014

Okay, you've got that mostly right, but there is no 350. We assumed fot the longest time that the distribution of ratings had a stddev of 350. It does not. It's 130. (Approximately)

Don't panic: PS ratings have simply been rescaled

Antar

Antar

Calm_Mind_Latias

Antar

cuteflounder

Antar

Zarel

Not a Yuyuko fan

phoopes

I did it again

Zarel

Not a Yuyuko fan

Zebstrika

Antar

Zebstrika

Antar

Zebstrika

Antar

Zebstrika

Antar