1. Remember to check the Simple Questions/Simple Answers , Suggestions , Bug Reports and Technical Support threads before posting. If you have something that warrants extended discussion then post a thread, but when in doubt, please use an already existing thread
  2. The moderators of this forum are the PS! Leaders (&) and Admins (~).
  3. Welcome to Smogon Forums! Please take a minute to read the rules.
  4. Click here to ensure that you never miss a new SmogonU video upload!

Don't panic: PS ratings have simply been rescaled

Discussion in 'Pokémon Showdown!' started by Antar, Jan 15, 2014.

  1. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    Those of you who have already logged onto PS this morning might have noticed something new. On the ladder, players are ranked not by ACRE but by Elo rating.

    ACRE has always been problematic, and our implementation of Glicko2 has some bugs that have been leading to wildly inaccurate results. The Senior Staff is currently in the process of developing a new ladder framework that is separate from skill rating, but until that is done, we're defaulting the ladder to a tried and tested rating system: Elo, which is the same rating system that PO uses (we're not doing decay for now).

    You may have noticed that you don't gain points on the new ladder nearly as quickly as you used to. This is actually by design, as ratings have a mathematical meaning (a rating difference of 200 means that the higher rated player should win roughly 75% of the time). Before you freak out, just remember that everyone else is on the same ladder that you are. For now, all that's needed is a change in mentality: 1300 is roughly the new 2000. Don't worry--suspect reqs will be adjusted accordingly.

    This is not open to discussion, but feel free to PM me if you have questions.

    If you want to know more about rating systems, read here.
    Last edited: Jan 15, 2014
    MooMoo82, Malleon, Magnemite and 6 others like this.
  2. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    Update: I understand people are experiencing some frustration with the ladder along the lines of "I gain practically no points from winning and lose a ton from losing."

    Yes. This is something that happens with Elo (and most practical rating systems, actually). Under our implementation, you should typically win 25 points per win and lose 25 per loss, but the larger discrepancy between your rating and your opponent's, the fewer points the higher ranked player will gain from winning and the more points they will lose by losing. The formula is rather simple:

    Point change = 50 x (score - expected)

    where score is 1 for the winner and 0 for the loser, and "expected" is the odds of winning. So if you and your opponent have the same rating, then the odds of winning is 1/2, and you'll either gain or lose 25 points. On the other hand, if your rating predicts that you have a 90% chance of winning the match, you only stand to gain 5 points, and if you lose, you lose 45.

    The bottom line is that, with Elo, your rating fluctuates fairly wildly. This is one reason I personally prefer the slower-updating Glicko systems, but in many ways, it's a matter of personal preference.

    PS' matchmaking algorithm should ensure that you are--as often as possible--paired with players of similar rating (to keep the expected outcome as close to 1/2 as possible), but that's not always possible, depending on who's online.

    PO's rating system (open)

    It's come to my attention that PO uses a slighly modified version of the above system, where that factor of "50" (which actually starts at "32") goes down once your rating passes a certain threshold. We may or may not go to the trouble of implementing something similar, since Elo is supposed to be a temporary stopgap while we work on a more permanent solution
    DHR-107 likes this.
  3. Calm_Mind_Latias

    Calm_Mind_Latias

    Joined:
    Aug 20, 2013
    Messages:
    462
    Antar,

    Is there any equivalence table or simple formula to convert the ACRE or Glicko2 from the last system to the Glicko1 or ELO to this system? What are the considerations of this system.

    Short question: do the element of prediction and hax limit ELO in Pokemon? Is it possible to have a Magnus Karlsson or Garry Kasparov (ELO > 2800) by the nature and elements of Pokemon battling?
  4. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    Calm_Mind_Latias, I have moved your post to what I feel is a more appropriate thread (which was previously locked, so it's not like you *could* have posted it there.

    The problem with the last system was that it was mathematically flawed and "broken" (as evidenced by the rating numbers in the 3000+ range, which should be statistically impossible under Glicko/2). It was also based on an incorrect assumption that I will address in answering the next part of your question.

    With that in mind,
    • ACRE is strictly a function of Glicko rating: ACRE=R-1.4*RD, meaning you can, if you'd like, calculate your own ACRE under this new system
    • *Theoretically* your new Glicko rating should be the same as your old one once you've played enough battles. However, this only holds true if, on average, everyone else is playing more battles, and you don't have new players joining. In reality, the "age" of the ladder, as it were, doesn't change, with, like 80% of players on the ladder having only played a single game (and like 95% having played less than 5).
    • *Most likley* what you'll find is that your Glicko ratings are somewhat equivalent if you rescale, that is: R'=(R-1500)/350*130+1500. The short of this is that 1630 is the new 1850.
    • Zarel's still tweaking Elo, so a correspondence between Elo and anything else isn't really possible right now.
    Elo and Glicko are perfectly capable of handling hax. The difference is reflected in the scaling of ratings. An Elo rating difference of 200 roughly means that the higher rated player has a 75% chance of winning. So if there is hax, it will mean that the rating (Glicko as well as Elo) of a more skilled player won't be quite as high as it would have been for a game with no hax, since the higher-skill-level player can still lose to "bs hax." It's still theoretically possible to get to an Elo of "master" range, but it's a lot harder. Also note that the equivalent master range wouldn't be as high, as I believe the initial rating under FIDE is higher than 1000, as we have it for our implementation of Elo.
  5. cuteflounder

    cuteflounder

    Joined:
    Dec 21, 2013
    Messages:
    103
    Not sure if this is the right place to ask this but how is rating decay applied? My friend and I have both been laddering recently but after a short period of inactivity (~a few hours), my rating will decay by 4-5 points whereas my friend's rating doesn't decay even after close to a day of inactivity.
  6. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    cuteflounder, decay is, I believe, 1 point per day per 100 points above 1400. Is that right, Zarel?

    Frostfluff, it has nothing to do with your Glicko/GXE.
  7. Zarel

    Zarel Not a Yuyuko fan
    is a member of the Site Staffis a Battle Server Administratoris a Programmeris a Pokemon Researcheris an Administrator
    Creator of PS

    Joined:
    Aug 16, 2011
    Messages:
    3,572
    For active users, yes. For inactive users, it's approximately twice that.
  8. djanxo unchained

    djanxo unchained Junichi Masuda likes this!!
    is a Battle Server Moderator Alumnus

    Joined:
    Jul 9, 2011
    Messages:
    2,059
    How is inactivity determined? Is it a week without logging in, a week without battling...? (using one week as an arbitrary time)
  9. Zarel

    Zarel Not a Yuyuko fan
    is a member of the Site Staffis a Battle Server Administratoris a Programmeris a Pokemon Researcheris an Administrator
    Creator of PS

    Joined:
    Aug 16, 2011
    Messages:
    3,572
    It's per day. i.e. You lose twice as much if you played 0 battles that day.
    djanxo unchained likes this.
  10. Zebstrika

    Zebstrika

    Joined:
    Oct 3, 2010
    Messages:
    1,070
    Maybe I'm remembering wrong, but back when ACRE determined the ladder, I would see that at the top few spots of the ladder, nearly everyone had a GXE of about 95-100, and now with Elo people's GXE for all the top 500 seem to just be a number from about 70 to 90, regardless of whether you're in the top few (well, the very top 50 or so are 85-90, but after that it's a lot less consistent) or near the 500th spot.

    Is that just a bug or something that got worked out with the change from Glicko2 to Glicko1?
  11. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    Zebstrika, it's something that's been fixed. GXE/Glicko was woefully inaccurate before--it's better now?
  12. Zebstrika

    Zebstrika

    Joined:
    Oct 3, 2010
    Messages:
    1,070
    Yeah. I would like to think that no one beats an average ladderer 99% of the time.It makes me feel kind of pathetic :p
  13. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    Minor nitpick: GXE of 99 doesn't mean the player beats an average ladderer (who would be a player with a Glicko rating of 1500 and a GXE of 50) 99% of the time--it means the player wins 99% of all matches against randomly selected players on the ladder. Subtle distinction, but important.
  14. Zebstrika

    Zebstrika

    Joined:
    Oct 3, 2010
    Messages:
    1,070
    From what I read here it's supposed to be your chance of winning against a random opponent but that was too tedious to compute for an entire ladder, and the chance of beating an average, new opponent with a clean record was a very good approximation.
    Last edited: Mar 22, 2014
  15. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    Zebstrika, the average player has RD much less than 130 (350 in that post) but the distribution of all players is centered at 1500 with std Dev of ~130.
  16. Zebstrika

    Zebstrika

    Joined:
    Oct 3, 2010
    Messages:
    1,070
    So both a new player and an average player have 1500 rank and different deviations, but the new player with 350 deviation could really have any skill (so that's how it simulates a random player) and we won't know until they play a few battles. And that's what the difference between a new 1500+350 player and average 1500+130 is?
  17. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,879
    Okay, you've got that mostly right, but there is no 350. We assumed fot the longest time that the distribution of ratings had a stddev of 350. It does not. It's 130. (Approximately)
    Zebstrika likes this.

Users Viewing Thread (Users: 0, Guests: 0)