1. New to the forums? Check out our Mentorship Program!
    Our mentors will answer your questions and help you become a part of the community!
  2. Welcome to Smogon Forums! Please take a minute to read the rules.

Suggested changes to Rating system in our new Smogon ladder

Discussion in 'Pokémon Policy' started by X-Act, Jul 5, 2008.

Thread Status:
Not open for further replies.
  1. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a Smogon IRC SOp Alumnusis a Researcher Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    chaos has asked about suggesting a rating system for our Smogon ladder, and here are my suggestions.

    Basically, I propose to use the glicko2 system, which is exactly the same as the one implemented in the Shoddy ladder, with a few modifications. Assuming that R is the mean rating, RD is the rating deviation and v is the volatility of a player, the changes I suggest are the following:
    1. The Rating displayed to the player is just round(R), not R - 4*RD as is used on Shoddy.
    2. The Rating of a player is not always shown, however. It is only shown if RD<100, otherwise the Rating of the player is provisional. This way, a new player would need to play between 20 and 25 games for his or her rating to become visible. This should hopefully deter players from creating multiple accounts.
    3. RD cannot drop below the threshold value of 60. If the RD of a player becomes less than 60, it becomes equal to 60. This allows for the rating of a frequently-competing player to continue to change at a nice pace instead of very slowly, which should help players keep playing with their current account.
    4. RD cannot go above the threshold value of 350. If it becomes greater than 350, it becomes 350. This is a very minor change, done to make a player's rating deviation be at least that of a beginning player even if the player stops playing completely.
    5. If a player does not battle in a particular day, phi (which is equal to RD / 173.7178) becomes equal to sqrt((phi^2) + 4*(v^2)) instead of sqrt((phi^2) + (v^2)) as is currently implemented (and then the new RD becomes the new phi * 173.7178). This change makes a frequently-competing player's rating go provisional after about 14 consecutive days of inacitivity, which should deter players from occupying the top of the ladder for a long time without playing. It also has the effect of making a player's rating become as uncertain as that of a beginning player after about 9 months of inactivity (which means that if you don't play for 9 straight months, the ladder would consider you a noob even if you were #1 before stopping playing.)
    I'd like to have some comments from players that participate on the ladder to see if the above points address what they believe are shortcomings of the Shoddy ladder, and points for further improvement.
  2. Ancien Régime

    Ancien Régime capitalism delenda est
    is a Team Rater Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Jul 21, 2007
    Messages:
    1,453
    Obviously we talked about it on #insidescoop, but I agree 100% with these changes. I hated having to make new nicks and alts because my progress on the ladder was basically halted after a certain amount of time. I feel that a rating system that rewards (or at least doesn't punish consistency) is the best way to go.
  3. Misty

    Misty oh
    is a Site Staff Alumnusis a Battle Server Admin Alumnusis a Programmer Alumnusis a Smogon IRC SOp Alumnusis a Researcher Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Mar 8, 2005
    Messages:
    7,152
    all of this sounds excellent
  4. Great Sage

    Great Sage

    Joined:
    Jul 31, 2006
    Messages:
    6,666
    I also agree with all of these changes. The only part I have a slight objection to is the bolded part of number 5; 10 days is a bit short, IMO.
  5. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a Smogon IRC SOp Alumnusis a Researcher Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    Okay, so how many days do you suggest?
  6. darkie

    darkie mfw i see alison brie
    is a member of the Site Staffis a Smogon Social Media Contributoris a Smogon IRC AOPis a Super Moderatoris a CAP Contributor Alumnusis a Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Server Moderator Alumnus
    Public Relations

    Joined:
    Dec 25, 2005
    Messages:
    6,152
    14 days sounds good to me. Otherwise, everything else looks good, X-Act.
  7. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a Smogon IRC SOp Alumnusis a Researcher Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    Okay, I'll make it 14 days. It's a pretty simple fix; I just need to replace the '6' in the formula with '4'. :) As a result, the time taken to return to an RD of 350 is now 9 months, not 6 months.

    Just wanted to ask something. The Shoddy page says that the ladder system tries to match you with a player having conservative rating estimate (CRE) close to yours. The CRE is the infamous R - 4 x RD used by Colin to represent a rating. Since we're going to just use R to represent a player's rating, that part of the program should be fixed to make the ladder system search for the Rating R that's close to yours, not the CRE.
  8. Aeolus

    Aeolus Bag
    is a Tutor Alumnusis a Tournament Director Alumnusis a Site Staff Alumnusis a Battle Server Admin Alumnusis a Smogon IRC SOp Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Sep 12, 2005
    Messages:
    3,639
    looks great to me. Another thing I like about this is that comparing ratings on our server to player ratings on Official Server will not possible.
  9. Kumar

    Kumar
    is a Site Staff Alumnusis a Forum Moderator Alumnusis a Researcher Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Dec 19, 2004
    Messages:
    3,106
    sweet. this might get me laddering again. i hated it when my rating on shoddy got to like 1600 and never increased which made me quit shoddying :(
  10. david stone

    david stone Fast-moving, smart, sexy and alarming.
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Programmer Alumnusis a Super Moderator Alumnusis a Researcher Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Aug 3, 2005
    Messages:
    5,150
    Unfortunately, I am unaware of just how much someone would have to play to get their RD below 100, so it's possible that rule means this first part isn't an issue.

    The reason Shoddy uses the 4*RD part is that because Glicko doesn't attempt to give you a single rating, but rather, a range of values. Displaying just R is saying that the player has a 50% chance to have an actual skill level at or above that value. For new players, their rating range is rather large because Glicko isn't quite sure just where they are. When Colin looked at the list when sorted by R, nearly every player at the "top" was someone he and I had never heard of. Subtracting four deviations is saying "This player has a 99%+ chance of having this rating or higher." which has the effect of only including more certain players.

    As for rule change 5, that really gets to the heart of what the purpose of the ladder is. If the purpose is to create an environment in which people are trying to get to the top and then have to fight to maintain it, then yes, having more "rating decay" is good. If the purpose of the ladder is to rank players in terms of their skill, then the "rating decay" should be roughly equal to the loss of skill over time (so much, much lower than on the Official Server).

    As far as I can tell, in combination with what you proposed in 1. (the use of R over anything involving RD), this will give no "rating decay", so the only issue is keeping yourself from becoming provisional.

    How is this a good thing?
  11. Ancien Régime

    Ancien Régime capitalism delenda est
    is a Team Rater Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Jul 21, 2007
    Messages:
    1,453
    Unneccesary arguing, in the sense of "well my ratings better on official/my rating's better on Smogon" or even "Official/Shoddy has better players", which I'm not sure we want to get into.
  12. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a Smogon IRC SOp Alumnusis a Researcher Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    It takes roughly 20 to 25 battles for your RD to become below 100.

    I know this, and this is why I'm making all ratings having RD 100 or more provisional. If RD is that large, the rating isn't reliable, but is extremely uncertain; hence, it's provisional. And yeah, I looked into that list that Colin made. All of those players that came up at the top that you 'did not know' would have had a provisional rating in this new system, so they would actually not appear at all (or appear at the bottom as 'provisional').

    Here is an old list that Colin has posted to prove his point that R - 4 x RD is the way to go. I added the RD at the end of each player's list:

    Code:
    +-----------------+------------------+------------------+------+--------+
    | name            | mean             | cre              | rank |   RD   | 
    +-----------------+------------------+------------------+------+--------+
    | Riptor          | 2001.38839416590 | 1062.40892193233 | 3005 | 234.74 |
    | TAY             | 1986.88929284469 | 1539.31163028140 |   67 | 111.89 |
    | pokeboy         | 1936.96546149701 | 1286.74050683289 |  936 | 162.56 |
    | Cruel           | 1922.73953480444 | 1260.29075970672 | 1135 | 165.61 |
    | Dietrich        | 1922.46008104249 | 1474.02080852881 |  157 | 112.11 |
    | Astrohawke      | 1912.12921847407 | 1506.23953086525 |  113 | 101.47 |
    [B]| goofball        | 1909.65642964082 | 1687.27542634793 |    2 |  55.60 |[/B]
    [B]| depom           | 1905.66564656608 | 1680.62135901767 |    3 |  56.26 [/B]|
    | Ultimatehero124 | 1904.45073688254 | 1065.74445853246 | 2966 | 209.68 |
    | cfickle         | 1903.27813162474 | 1207.76685711231 | 1549 | 173.88 |
    | icepick         | 1892.06731203119 | 965.700557231403 | 3932 | 231.59 |
    | Cerberus.       | 1885.06702338470 | 1242.31276758645 | 1268 | 160.69 |
    [B]| jrrrrrrr        | 1884.51791441567 | 1631.38600598089 |   14 |  63.28 |[/B]
    | KingGarchomp    | 1878.47795320330 |  733.72568498120 | 5734 | 286.19 |
    [B]| Slice-T_A       | 1878.02889092308 | 1624.60981701875 |   17 |  63.35 |[/B]
    [B]| goofballSKY     | 1873.45047086468 | 1640.56885806744 |   12 |  58.22 |[/B]
    [B]| goofballANGRY   | 1870.11539263519 | 1620.81293485045 |   19 |  62.33 |[/B]
    | chansey_slayer  | 1857.58419165324 | 972.744433035372 | 3864 | 221.21 |
    | Swordzman       | 1856.33698216188 | 858.235378512444 | 4815 | 249.53 |
    | Infernape       | 1856.04245327237 | 672.297485107073 | 6170 | 295.94 |
    +-----------------+------------------+------------------+------+--------+
    
    In this new system, the only players out of the above that would be listed on the ladder are the ones in bold. They would be listed as #1, #2, #3, etc. All the other players would have provisional ratings.

    There would be no rating decay in this system, and that's why I made the RD increase faster in this system. One could obtain a top 10 ranking and then stop playing, looking at his rating up there. With RD increasing faster, he would have 14 days for his rating to drop to provisional (and only if his RD is 60; if it is less, it would take him even less to become provisional).

    Exactly. And that's why I made the rating go provisional quicker than normal. I actually made it to go to provisional in one week at first, then I made it 10 days. Then people suggested to make it drop to provisional in 14 days and I fixed it that way.
  13. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a Smogon IRC SOp Alumnusis a Researcher Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    I did a simulation on Excel using the proposed rating system. Interestingly, the volatility increases dramatically when the people playing each other have their mean rating very far from each other. This happens in Shoddybattle because the player it finds to play against you is the one that has the nearer CRE to you, not the nearer rating. I tested by playing two games yesterday (with a crap team) to confirm this.

    By making the player that plays against you have nearer mean rating (and, if possible, close RD as well), the volatility was better.

    I don't know if this is possible to implement, but I'd suggest that the opponent that is proposed for playing against you on the ladder is one that has similar mean rating and similar RD to yours, not similar CRE.
  14. chaos

    chaos
    is a member of the Site Staffis a Battle Server Administratoris a Programmeris a Smogon IRC SOPis a Contributor to Smogonis an Administratoris a Tournament Director Alumnusis a Researcher Alumnus
    Owner

    Joined:
    Dec 18, 2004
    Messages:
    9,776
    im thinking its best just to drop the ladder changes. too many people are freaking out, and until the bug is fixed in the shoddy client we cant do anything about it
  15. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a Smogon IRC SOp Alumnusis a Researcher Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    That's okay. I'll continue to research on this so that hopefully Competitor will implement it.
Thread Status:
Not open for further replies.

Users Viewing Thread (Users: 0, Guests: 0)