1. New to the forums? Check out our Mentorship Program!
    Our mentors will answer your questions and help you become a part of the community!
  2. Welcome to Smogon Forums! Please take a minute to read the rules.

Rating System For Competitor

Discussion in 'Site Projects' started by X-Act, Aug 7, 2008.

  1. tennisace

    tennisace brock you like a hurricane
    is a member of the Site Staffis a Super Moderatoris a Community Contributoris a Pokemon Researcheris a Smogon Media Contributoris a Tiering Contributoris a Contributor to Smogonis a CAP Contributor Alumnus
    Twitter Head

    Joined:
    Dec 16, 2007
    Messages:
    6,706
    There are two scenarios where an 1150/60 Player (Player A) could beat a 1950/60 Player (Player B).

    Scenario 1: Player A is just an alt of a very good player, and the battle was fair, with little to no hax, and whatever hax happened really didn't make a difference because the two players were close in skill naturally.

    Scenario 2: Player A is a no-name who just started, and somehow managed to beat Player B, with copious amounts of hax.

    This is assuming Player B makes no mistakes in the course of the battle, of course, and is either out-played or haxed. There is no way to tell the probability of either happening really, since I assume due to the way the ladder is set up that it doesn't happen often (that is, players whose ratings are close together are paired up more often).
  2. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    Both scenarios you mention are impossible in this case. A deviation of 60 suggests that the player has played a good number of battles already. This eliminates Scenario 2 immediately. Scenario 1 would be impossible as well since a very good player that has already played his fair share of games wouldn't have a rating of 1150.

    By the way, the way I wrote the algorithm is such that such a mismatch would never even happen. This is just for consideration's sake.

    My question is simply this: if a 1405/60 player wins against a 1695/60 player (this matchup would be allowed), is it more probable that it is due to luck than if a 1495/60 player wins against a 1505/60 one?
  3. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    It's me chiming again with what I've been doing today.

    Basically I spent about 6 hours of today thinking about this, in the process coming up with 4 formulae, 2 of which I ditched because they worked badly.

    If you remember, I suggested that the rating system have a constant p that makes the rating change less drastically because of luck-based wins, and asked the community about such a value of p. I managed to incorporate this p in the formula for the rating change. The new rating with p will be called Glicko with Constant Weighting.

    Later, TAY raised the point that if a player beats another player that is much higher-ranked than him, it is more probable that his win was due to luck. Hence I tried my hand at making Glicko with Variable Weighting. The problem was about how to make the weighting change, and that's why I came up with 4 formulae: each formula corresponds to a different method of varying the weighting. After testing them, it was apparent that 2 of them were counter-intuitive so I'm left with two.

    One of them, which I'll call Glicko with Linear Weighting, assumes the following. Say that PW is the probability of Player 1 winning against Player 2 on merit. This can be calculated given both players' rating and deviation by a formula invented by Glickman. Also suppose W is 1 if Player 1 won, and 0 if Player 2 won. Then the unsigned value of (PW-W), which we shall call x, is a value corresponding to how wrong we were in our PW prediction. If x = 0, then the player was either sure to win according to PW and won, or sure to lose according to PW and lost. If x = 1, the player was either sure to win according to PW and lost, or sure to lose according to PW and won. Needless to say, in practice x is always a number between 0 and 1 exclusive.

    Given this x, Glicko with Linear Weighting assumes that

    If x = 0 then P = 1
    If x = 0.5 then P = p
    If x = 1 then P = 2p-1

    where p is the constant discussed before.

    The other formula, which I'll call Glicko with Quadratic Weighting, assumes that

    If x = 0 then P = 1
    If x = 0.5 then P = p
    If x = 1 then P = 3p-2

    i.e. it makes the rating change even less in the case of an unexpected result than for the Linear Weighting one.

    I then tested a hypothetical Player 1 having rating 1500 and deviation 100 playing against Player 2 having deviation 100 and the following ratings, assuming that he wins in all cases. Here are the updated ratings of Player 1 using the original Glicko, Glicko with Constant Weighting, Glicko with Linear Weighting and Glicko with Quadratic Weighting, assuming p = 0.9375:

    Code:
    Player 2 Rating                         New Player 1 Rating using
                      Normal Glicko   Constant Weighting   Linear Weighting   Quadratic Weighting
    ---------------------------------------------------------------------------------------------
          600              1500           1500  (=)           1500  (=)            1500  (=)
          650              1501           1500 (-1)           1501  (=)            1501  (=)
          700              1501           1501  (=)           1501  (=)            1501  (=)
          750              1501           1501  (=)           1501  (=)            1501  (=)
          800              1501           1501  (=)           1501  (=)            1501  (=)
          850              1501           1501  (=)           1501  (=)            1501  (=)
          900              1502           1502  (=)           1502  (=)            1502  (=)
          950              1503           1502 (-1)           1502 (-1)            1503  (=)
         1000              1503           1503  (=)           1503  (=)            1503  (=)
         1050              1504           1504  (=)           1504  (=)            1504  (=)
         1100              1505           1505  (=)           1505  (=)            1505  (=)
         1150              1507           1506 (-1)           1507  (=)            1507  (=)
         1200              1509           1507 (-2)           1508 (-1)            1508 (-1)
         1250              1511           1509 (-2)           1510 (-1)            1510 (-1)
         1300              1513           1511 (-2)           1512 (-1)            1512 (-1)
         1350              1516           1514 (-2)           1515 (-1)            1515 (-1)
         1400              1519           1516 (-3)           1517 (-2)            1517 (-2)
         1450              1522           1519 (-3)           1520 (-2)            1520 (-2)
         1500              1526           1522 (-4)           1522 (-4)            1522 (-4)
         1550              1529           1525 (-4)           1525 (-4)            1525 (-4)
         1600              1533           1528 (-5)           1527 (-6)            1527 (-6)
         1650              1536           1531 (-5)           1530 (-6)            1529 (-7)
         1700              1539           1534 (-5)           1532 (-7)            1530 (-9)
         1750              1542           1537 (-5)           1534 (-8)            1531 (-11)
         1800              1544           1539 (-5)           1535 (-9)            1532 (-12)
         1850              1546           1541 (-5)           1536 (-10)           1533 (-13)
         1900              1548           1542 (-6)           1537 (-11)           1533 (-14)
         1950              1550           1543 (-7)           1538 (-12)           1534 (-16)
         2000              1551           1544 (-7)           1539 (-12)           1534 (-17)
         2050              1552           1545 (-7)           1539 (-13)           1534 (-18)
         2100              1552           1546 (-6)           1540 (-12)           1534 (-18)
         2150              1553           1546 (-7)           1540 (-13)           1534 (-19)
         2200              1553           1547 (-6)           1540 (-13)           1534 (-19)
         2250              1554           1547 (-7)           1541 (-13)           1534 (-20)
         2300              1554           1547 (-7)           1541 (-13)           1534 (-20)
         2350              1554           1547 (-7)           1541 (-13)           1534 (-20)
         2400              1554           1548 (-6)           1541 (-13)           1534 (-20)
    (Glicko with Constant Weighting makes P = p no matter what x is, while normal Glicko makes P = 1 no matter what x is.)

    If you notice, when Player 1 beats a player having a rating lower than his, all three weightings don't change the original Glicko rating by much. However, when he beats a player having a rating higher than his, the ratings change from a little (Constant Weighting) to moderately (Linear Weighting) to a lot (Quadratic Weighting).

    I'd like to hear your opinions on all the above. And sorry for the longish post. :(
  4. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    No opinions? Well, that makes it a bit easier. :)

    Seriously though, if I don't get any input on this by tomorrow, I'll just decide what is best myself.
  5. Hipmonlee

    Hipmonlee Have a rice day
    is a Smogon IRC AOp Alumnusis a Super Moderator Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnusis a Past WCoP Winner

    Joined:
    Dec 19, 2004
    Messages:
    7,327
    I dont really know if the concept of meritous wins is relevant when everything is derived from probabilites.

    If this is nothing more than saying that ratings wont move as much as they have done in the past, to me that seems sorta relative anyway. If you lose less when you lose, but gain more when you win does it really make any difference?

    I cant really follow the mathematics of it, but it just doesnt quite sound right to me.. I dunno..

    Have a nice day.
  6. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    I suspected that the lack of response was due to people not understanding what I've done rather than lack of interest.

    Actually, Hipmonlee, you don't lose less when you lose and gain more when you win.

    Every game has one out of the following four possibilities:
    1. Player 1 is a more skilled player than Player 2. Player 1 wins against Player 2.
    2. Player 1 is a more skilled player than Player 2. Player 1 loses against Player 2.
    3. Players 1 and 2 are equally skilled players. Player 1 wins against Player 2.
    4. Players 1 and 2 are equally skilled players. Player 1 loses against Player 2.
    In scenario 1, luck probably did not decide the outcome of the battle, since the more skilled player won. In scenario 2, however, luck has a good probability of having decided the outcome, since the less skilled player won. In scenarios 3 and 4, it is more probable that luck has decided the battle than in scenario 1, but less probable that it decided the match than in scenario 2.

    In view of this, since luck was probably not a factor, the rating should change roughly as Glicko says in Scenario 1. In Scenarios 3 and 4, the rating should change slightly less drastically than Glicko since luck might have played a part. In Scenario 2, the rating should change even less drastically than in Scenarios 3 and 4 since luck has more of a chance of having played a part in the win.

    The above paragraph is how linear weighting and quadratic weighting work. The only difference between the two is in Scenario 2: quadratic weighting would make the rating change even less drastically than linear weighting.

    Constant weighting assumes that luck is constant no matter who plays. (Glicko assumes that there is no luck in games, which is a special case of Constant weighting.) It assumes that luck played the same part both in battles where a skillful player wins against a less skillful one and also in others where an unskillful player wins against a more skillful one. In my opinion this is not true, and hence my choice would be between Linear Weighting and Quadratic Weighting.

    Look at the chart in my previous post to have an idea by how much the ratings change in Glicko, Constant weighting, Linear weighting and Quadratic weighting.

    So if you tend to get all the four scenarios above equally frequently, your final rating won't change by much between Constant, Linear and Quadratic weighting, and that is probably what you meant, Hipmonlee. But if you tend to get one scenario more than the others, the rating for all of those would be different.

    So yeah, choice time.
  7. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    I'm obviously going to use GLIXARE for this, to provide a very good estimate of the player's skill.

    About the previous post, yesterday me and Caelum kinda agreed that Quadratic Weighting is the best way of weighting luck. The only problem that is left is the choice of a suitable value of p.
  8. Caelum

    Caelum qibz official stalker
    is a Site Staff Alumnusis a Smogon IRC AOp Alumnusis a Forum Moderator Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Apr 5, 2008
    Messages:
    1,656
    Poor X-Act and his signature topic getting all the buzz =(.

    Since we both agreed that quadratic was best (hey guys, if you don't say anything others decide for you ^__^) we need a suitable value for p.

    I'm going to try different values of p in various simulations and what not and report back results based on that (and hopefully I can use my experience on shoddy to help interpret the results). I'll post back with the data and my reasoning for a choice when I finish that (probably the next two days).

    I just wanted to know, X-Act (or anyone else that's interested), do we agree that value should be greater than p=0.9 (I personally find that much too low myself but just asking)? Just after initial experimentation with everything I also believe that p=0.95xxx is too high (although I'll look into it more). So do you think we should stick in a guideline around there (which would incidentally make your random choice of p actually not half bad !)?
  9. TAY

    TAY You and I Know
    is a Team Rater Alumnusis a Super Moderator Alumnusis a Smogon Media Contributor Alumnusis a Battle Server Moderator Alumnus

    Joined:
    Nov 7, 2007
    Messages:
    1,542
    Quadratic is more helpful for better players, so I would assume that most of the people with IS access would be in favor of it ^__^

    The value of p should be at least .9300. I mentioned before that I thought that maybe 10% of my games were decided by luck; however, upon reconsideration I think it should have been a lower value. Additionally, I think that the value of p should take into account a player's ability to prepare for bad luck via team building and smart playing. That has always been an integral part of the game, and I feel that .9 is a low enough value that it makes that skill significantly less relevant.

    So I would support anything between .9300 and .9500 (No one can compensate for luck that much!).
  10. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    I agree that p is probably between 0.9 and 0.95.

    0.95 seems to be too high, while anything under 0.9 seems to be too low. Even anything less than 0.92 seems to be too low for me, but I don't know.

    But if I get empirical evidence to the contrary I'd change my views.
  11. X-Act

    X-Act np: Biffy Clyro - Shock Shock
    is a Site Staff Alumnusis a Programmer Alumnusis a CAP Contributor Alumnusis a Researcher Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis an Administrator Alumnus

    Joined:
    Feb 17, 2006
    Messages:
    4,675
    I forgot to provide the formula that gives the probability that a person won on merit assumed by Quadratic Weighting. I'm rectifying that in this post.

    First of all, let p_win be the probability that a person wins against the opponent according to their ratings and deviations. This is provided by Glickman. For posterity, it is repeated here. Given Player 1's Rating R_1 and Deviation RD_1 and Player 2's Rating R_2 and Deviation R_2, the expected probability that Player 1 beats Player 2 is:

    Code:
    P_Win = 1 / (1 + 10^(((R_2 - R_1) / (400 * sqrt(1 + C * (RD_1^2 + RD_2^2))))))
     
    where C = 3 * ln(10)^2 / (400 * pi)^2 (approximately 0.0000100724)
    Now let W be equal to 1 if Player 1 won against Player 2, and 0 if Player 1 lost against Player 2, and let X be the difference between W and P_Win. That is:
    Code:
    X = abs(P_Win - W)
    The nearer X is to 0, the more accurate was Glickman's prediction of Player 1's result against Player 2. On the other hand, the nearer X is to 1, the less accurate was Glickman's prediction. Hence, the nearer X is to 0, the less there is a chance that Player 1 won due to luck. Quadratic Weighting assumes that the chance that Player 1 won on merit (i.e. not due to luck)is:
    Code:
    P_Merit = 1 - (1-p) * X * (1+2*X)
    Note that p in the above equation is the constant we're looking for. It is actually the probability that a player wins on merit against another equally skilled player. In this particular scenario, i.e. when two equally-skilled players play each other, X would be equal to 0.5 whether or not the players win or lose. In fact, when X = 0.5, P_Merit = p.

    By comparison, Linear Weighting's formula was
    Code:
    P_Merit = 1 - (1-p) * X * 2
    i.e. it replaces the (1+2*X) for Quadratic Weighting with 2. Here, it is also the case that when X = 0.5, P_Merit = p.

    Constant Weighting's formula was simply P_Merit = p. Here it is assumed that the probability that Player 1 won on merit against Player 2 is constant, no matter what the relative skill of both players is.

    As a last note, the important thing, for those that want to find a value of p using empirical means, is that you play against people of roughly your same skill, otherwise the value of p that you find won't be good.

Users Viewing Thread (Users: 0, Guests: 0)