|
|||||||
![]() |
|
|
Thread Tools |
|
|
#1 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
As you probably noticed from some of my posts, I really don't like ShoddyBattle's 'Conservative Rating Estimate' (CRE) method of providing the overall rating of a player. For the record, the formula used to calculate the CRE of a player is:
Code:
CRE = Rating - 4 * Deviation
Because of this, I set out to try to find a better way of finding a player's overall rating given his Rating R and Deviation RD... and I managed to do this yesterday. I read Glickman's paper (the inventor of the Glicko and Glicko-2 rating systems) and he provides an equation that essentially calculates the probability that a player with rating R_1 and deviation RD_1 beats another player with rating R_2 and deviation RD_2. It is written below: Code:
Probability = 1 / (1 + 10^(((R_2 - R_1) / (400 * sqrt(1 + C * (RD_1^2 + RD_2^2))))))
where C = 3 * ln(10)^2 / (400 * pi)^2 (approximately 0.0000100724)
pi = 3.14159265359
sqrt(x) is the square root of x
ln(x) is the natural logarithm of x
However, this is a strenuous effort to do, and hence I wanted to approximate this probability for every player using just his R and RD (not everyone else's as well). After considering various possibilities, it dawned on me that the probability of the player beating a 1500 rating, 350 deviation player (the rating and deviation of a player that has just joined the ladder) would provide a good approximation. When testing it out, it did provide a good approximation of the true rating... a very, very good approximation actually! The only time it didn't provide a good approximation was when the deviation of the player was high. This confirmed yet again that players that have a rating deviation that is too high (meaning that his rating is too uncertain) shouldn't even be listed on the leaderboard. And this is what I propose for the estimated rating to be done. After consulting a bit with the community, it was decided that this system's rating should represent the estimated percentage that that player has of winning a battle against a random opponent. So, finally, here is what I propose to be a much better estimate of the player's rating. I'm calling it GLIXARE, short for 'Glicko - X-Act Rating Estimate': Code:
Given a player rating R and a rating deviation RD: GLIXARE Rating = 0, if RD > 100 GLIXARE Rating = round(10000 / (1 + 10^(((1500 - R) * pi / sqrt(3 * ln(10)^2 * RD^2 + 2500 * (64 * pi^2 + 147 * ln(10)^2)))))) / 100, otherwise Code:
Rank Rating Deviation True Rating CRE Rank For CRE GLIXARE Rank For GLIXARE 1 1991.408347 52.4913089 86.25% 1781.443112 1 (=) 86.77% 1 (=) 2 1992.528854 68.34131414 86.21% 1719.163597 13 (+11) 86.73% 2 (=) 3 1989.461831 80.60840798 85.95% 1667.028199 18 (+15) 86.51% 3 (=) 4 1969.23615 50.08961382 85.08% 1768.877695 2 (-2) 85.78% 4 (=) 5 1968.509612 50.00035476 85.04% 1768.508193 3 (-2) 85.75% 5 (=) 6 1972.675135 99.88585936 84.88% 1573.131698 34 (+28) 85.58% 6 (=) 7 1963.494349 50.60336877 84.76% 1761.080874 5 (-2) 85.51% 7 (=) 8 1962.47472 50.03981759 84.71% 1762.31545 4 (-4) 85.46% 8 (=) 9 1953.22279 52.20270923 84.18% 1744.411953 6 (-3) 85.00% 9 (=) 10 1958.445485 96.63628542 84.13% 1571.900343 36 (+26) 84.94% 10 (=) 11 1943.00504 50.10101356 83.60% 1742.600986 7 (-4) 84.51% 11 (=) 12 1941.259317 52.64803719 83.48% 1730.667168 10 (-2) 84.41% 12 (=) 13 1938.159617 50.92018226 83.31% 1734.478888 8 (-5) 84.26% 13 (=) 14 1936.183665 51.14032287 83.19% 1731.622373 9 (-5) 84.16% 14 (=) 15 1930.057618 50.16740178 82.84% 1729.388011 11 (-4) 83.85% 15 (=) 16 1927.779647 51.0262639 82.70% 1723.674591 12 (-4) 83.73% 16 (=) 17 1922.220146 57.65787498 82.32% 1691.588646 14 (-3) 83.40% 17 (=) 18 1918.558883 74.43201955 81.99% 1620.830805 28 (+10) 83.09% 18 (=) 19 1908.558873 59.43764926 81.47% 1670.808276 15 (-4) 82.65% 19 (=) 20 1898.317729 81.4379487 80.68% 1572.565934 35 (+15) 81.93% 20 (=) 21 1876.359618 52.21151457 79.44% 1667.513559 17 (-4) 80.86% 21 (=) 22 1870.366607 50.00313579 79.06% 1670.354064 16 (-6) 80.51% 22 (=) 23 1867.964646 51.3491295 78.89% 1662.568128 21 (-2) 80.36% 23 (=) 24 1867.766236 50.56176086 78.88% 1665.519192 19 (-5) 80.35% 24 (=) 25 1863.669023 50.11759802 78.61% 1663.198631 20 (-5) 80.10% 25 (=) 26 1866.541991 95.77888912 78.50% 1483.426435 48 (+22) 79.95% 26 (=) 27 1859.313589 55.68806985 78.28% 1636.561309 25 (-2) 79.81% 27 (=) 28 1854.389046 51.55994467 77.97% 1648.149268 22 (-6) 79.52% 28 (=) 29 1855.299562 72.27265208 77.92% 1566.208954 37 (+8) 79.46% 29 (=) 30 1853.503073 52.54886057 77.90% 1643.307631 23 (-7) 79.46% 30 (=) 31 1841.225405 50.60827664 77.06% 1638.792298 31 (=) 78.70% 31 (=) 32 1834.513128 50.14590546 76.59% 1633.929506 26 (-6) 78.26% 32 (=) 33 1834.145367 52.8071478 76.55% 1622.916775 27 (-6) 78.23% 33 (=) 34 1801.474785 50.04016895 74.21% 1601.314109 29 (-5) 76.04% 34 (=) 35 1804.267316 94.72400253 74.15% 1425.371306 55 (+20) 75.93% 35 (=) 36 1795.795989 50.01171801 73.78% 1595.749117 30 (-6) 75.64% 36 (=) 37 1793.496594 50.25237181 73.61% 1592.487107 31 (-6) 75.47% 37 (=) 38 1782.059218 51.06317634 72.75% 1577.806512 32 (-6) 74.65% 38 (=) 39 1775.843559 50.18930449 72.28% 1575.086341 33 (-6) 74.20% 39 (=) 40 1748.052482 56.82618848 70.11% 1520.747729 42 (+2) 72.08% 40 (=) 41 1748.543861 92.78899389 69.97% 1377.387886 67 (+26) 71.89% 41 (=) 42 1744.455748 50.99377091 69.85% 1540.480664 39 (-3) 71.83% 42 (=) 43 1743.084558 60.21302067 69.71% 1502.232476 44 (+1) 71.68% 43 (=) 44 1742.067022 50.19645562 69.66% 1541.2812 38 (-6) 71.65% 44 (=) 45 1740.620853 84.04522098 69.40% 1404.439969 58 (+13) 71.37% 46 (+1) 46 1738.536514 51.44593668 69.38% 1532.752767 40 (+6) 71.35% 45 (-1) 47 1728.468483 59.51691032 68.55% 1490.400841 45 (-2) 70.54% 47 (=) 48 1727.948725 50.16195168 68.54% 1527.300918 41 (-7) 70.54% 48 (=) 49 1708.596637 59.16778978 66.96% 1471.925478 50 (+1) 68.94% 49 (=) 50 1705.528967 50.05894259 66.73% 1505.293197 43 (-7) 68.72% 50 (=)
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! Last edited by X-Act; Feb 13th, 2009 at 12:47:09 PM. Reason: Modification to display GLIXARE as a percentage |
|
|
|
|
|
#2 |
|
qibz official stalker
![]() ![]() Join Date: Apr 2008
Posts: 1,656
good question
|
Heh. I did this stupid competitive league thing in high school and a few of my friends and myself used the TrueSkills rating system (pretty similar in origin to Elo & Glicko) to devise a better construct of the Conservative skill estimate (same thing as CRE except our league was using 3 as the constant multiplier of the deviation, oh and it was called SKE). I was planning to dig it up and see if I could manipulate it to slightly for glicko when I heard you were doing this, didn't think you'd get it done so quickly! Anyway, so that's my background story for obviously supporting the change. I just checked it and ours was similar in form (obviously not identical since it was a different game and slightly different system but yeah) so I'm going to assume you manipulated everything correctly since it looks pretty similar and you are usually right anyway
![]() Only issue I have is, it isn't a mathematical one or a factual one at all. I think many players will be quite upset to see their "rating" (honestly, people only look at the CRE, or GXE in the future I guess, when talking about rating) listed as 0 simply because there deviation is greater than 100. When this is implemented I'd prefer that the rating listed is done just like anyone elses but leaderboard appearance would be restricted solely to those with a deviation less than or equal to 100. This is more of an aesthetic, to please the user base thing more than anything. Edit: Also, GXE is cooler than GLIXARE. GXE = Glicko- X Act estimate! Last edited by Caelum; Feb 10th, 2009 at 9:18:59 AM. |
|
|
|
|
|
#3 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
Oh, I was only suggesting we do this for the Smogon leaderboards, as I say at the end of the original post.
Also I don't mind a name change; if you prefer GXE, then so be it.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
|
#4 |
|
qibz official stalker
![]() ![]() Join Date: Apr 2008
Posts: 1,656
good question
|
Oh I see.
On shoddy when you want to see your record the first line listed is your CRE. I was assuming that this rating was just going to replace that line. If you are just using it for leaderboard rankings then I have no complaints. |
|
|
|
|
|
#5 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I would also like the CRE displayed in the program to change to GLIXARE (or GXE, whatever), but I believe that is at Colin's discretion.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
|
#6 |
|
Join Date: Jun 2005
Posts: 4,905
Irvine, CA
|
You're the math guy; if you think this is a better option you'll get no objections from me.
__________________
Black/White Friend Code: 1721 2578 4968 My Pokemon | Free Pokemon | YouTube | Wonder Cards (now with Movie Celebis for Platinum and HG/SS!) |
|
|
|
|
|
#7 |
|
coolcoolcool
![]() ![]() ![]()
Join Date: Dec 2005
Posts: 5,356
Plano, TX
|
That issue with Trolly is certainly a large one. I don't mind it at all, especially if it can help curb the number alts people need to make. I'm sure Doug would appreciate the alts deal as well.
__________________
|
|
|
|
|
|
#8 |
|
:D
![]() ![]() ![]() ![]() ![]() ![]()
Moderator
Join Date: May 2008
Posts: 4,175
|
I love the idea of this, especially if it keeps the whole bazillion alts thing down.
__________________
|
|
|
|
|
|
#9 |
|
Knows the great enthusiasms
![]() ![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jun 2007
Posts: 2,901
Houston, TX
|
I'm not in a position to validate your formulas, X-Act. However, I completely agree with your motivations and reasoning. Just eye-balling the results, combined with my intuitive recollection of past empirical data -- it looks to be a better fit for our needs.
Here's my proposal:
__________________
My Art Thread: ArtJustArt - The Art of DougJustDoug |
|
|
|
|
|
#10 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
Actually Doug, GLIXARE can be used by all systems that are based on Glicko, including the current Glicko-2 system implemented by Shoddy. This is just a replacement of CRE, not a replacement of Glicko-2. This is just a better way of interpreting a player's current Rating and Deviation as a single rating than CRE.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! Last edited by X-Act; Feb 11th, 2009 at 12:23:05 PM. |
|
|
|
|
|
#11 |
|
:D
![]() ![]() ![]() ![]() ![]() ![]()
Moderator
Join Date: May 2008
Posts: 4,175
|
After looking at the data, it looks like the biggest issue is that CRE takes Deviation into account on a massive scale due to the fact that CRE is meant to be an average of how good a player is, and Deviation specifically means that the system can't pin down exactly how good the player is. I foresee that the main issue will be that it's a double edged sword in that while it makes it less necessary to have a bunch of alts, it also makes it easier for new alts to climb the ladder due to the rating being based less on deviation.
That's just my take on it, though. There's probably something in there I missed that makes it moot.
__________________
|
|
|
|
|
|
#12 | ||
|
Knows the great enthusiasms
![]() ![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jun 2007
Posts: 2,901
Houston, TX
|
Quote:
Quote:
__________________
My Art Thread: ArtJustArt - The Art of DougJustDoug |
||
|
|
|
|
|
#13 |
|
:D
![]() ![]() ![]() ![]() ![]() ![]()
Moderator
Join Date: May 2008
Posts: 4,175
|
Oh, I did miss that. Thanks. It looks good, actually!
__________________
|
|
|
|
|
|
#14 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I've done a very slight modification to the GLIXARE rating formula. Basically, I made it so that you know at a glance your playing strength as a probability of you winning a random battle.
I've now changed the rating so that it is a number between 0 and 2000. Basically, your probability of you winning a random battle is approximately (Rating / 20) %. This can be quickly calculated by halving the rating and then moving the decimal point one place to the left. For example, if your GLIXARE rating is 1743, the probability of you winning against a random opponent is approximately 1743 / 20, or 87.15%. This means that if your rating is more than 1000, you are better than average, and if it is less than 1000, you are worse than average. I'll edit the original post shortly.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
|
#15 |
|
coolcoolcool
![]() ![]() ![]()
Join Date: Dec 2005
Posts: 5,356
Plano, TX
|
That is a nifty little nuance, and to be honest, it makes perfect sense for rating to be correlated to the chance that a player could win a battle where really only playing skill (and some luck) matters.
__________________
|
|
|
|
|
|
#16 | |
|
Knows the great enthusiasms
![]() ![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jun 2007
Posts: 2,901
Houston, TX
|
Quote:
We don't have to do it, but I think it might be a good idea.
__________________
My Art Thread: ArtJustArt - The Art of DougJustDoug |
|
|
|
|
|
|
#17 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
Well, in that case, we could just make the GLIXARE rating to be simply this percentage. I didn't want to do this because I don't like decimals when showing a rating. Whole numbers are much simpler. That's why I multiplied this percentage by 20 and rounded to the nearest whole number.
I could have also multiplied by any other number in theory. 20 seemed to be a good all-round number though. Numbers less than 20 would have made ratings too near (or equal) to each other (remember that the number is then rounded to the nearest whole number). Numbers greater than 20, on the other hand, produced ratings that were too big for my tastes. Also, 20 has the advantage that you can know immediately whether you're better than average or not just by counting the number of digits in your rating (4 digits = better than average, less than 4 digits = worse than average). You can also say "I'm over 1200 so I have more than a 60% chance of winning" or "I'm over 1500 so I have more than a 75% chance of winning", or something like that. Another option would be to multiply the percentage by 100. That way you'd get a rating between 0 and 10000, and the percentage would be calculated very easily. Like if your rating is 4598, you'd have a 45.98% of beating a random player. This would have the following advantage: 0 - 1000: Percentage of beating someone is between 0% and 10% 1000 - 2000: Percentage of beating someone is between 10% and 20% 2000 - 3000: Percentage of beating someone is between 20% and 30% etc.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! Last edited by X-Act; Feb 13th, 2009 at 5:46:43 AM. |
|
|
|
|
|
#18 |
|
Knows the great enthusiasms
![]() ![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jun 2007
Posts: 2,901
Houston, TX
|
I like the winning-chance percentage, even if it is a decimal, because it is easily explained to new users, and users will have an intuitive sense of the magnitude of the numbers.
A rating of 1723 is basically a "meaningless" four-digit number for most users. It only gains meaning when stacked up against other numbers, by looking at the leaderboard. Very few users will ever bother to find out what that number means, even if it is as simple as dividing by 20. Most users will assume the number is undecipherable by non-math-wizards, and they will go about their business. But a rating of 87.15% almost guarantees that every player will ask a question of, "87.15% of what?". When they are told it represents their chance of winning against a random ladder opponent, that will be quickly understood and never forgotten. Users will tell other users and the "meaning" of the GLIXARE estimate would likely be disseminated quickly and understood by all. Also, if someone is told their rating is 43.25%, for example -- there is an intuitive appreciation for the magnitude of that number, since the user implicitly knows that 100% is the upper limit. If I told them their rating is 825, they likely have no idea where that number falls in the range of possible values. The percentage has the benefit of "being a percentage", which means that it is part of the common parlance of numbers used by mere math mortals. I think the percentage is a number more easily digested by lay users, and could be a distinguishing feature of the GLIXARE estimation system.
__________________
My Art Thread: ArtJustArt - The Art of DougJustDoug |
|
|
|
|
|
#19 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I have no problem with displaying the GLIXARE rating as a percentage to two decimal places. Let me edit the OP.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
|
#20 |
![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jul 2007
Posts: 1,051
|
This looks good and since there are really no advantages to CRE there's no need for anything fancy to allow the client to use both CRE and this new measure. The client (i.e. the view rating dialogue box) might as well just be changed to use this new measure exclusively.
For display purposes, I'd recommend displaying a user with too high of a rating deviation as "provisional" or something similar, rather than showing a 0. Last edited by Cathy; Feb 14th, 2009 at 11:12:57 PM. |
|
|
|
|
|
#21 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I agree with using the word 'provisional' too instead of 0.
Although both have the same effects, really: that of placing your rating at the bottom of the leaderboard. Another way maybe would be to display the word provisional, but then also display the rating that GLIXARE would provide you if your RD wasn't too high. This wouldn't be done on the leaderboard, but rather in the program itself. Something like: Rating: Provisional (63.44%) This would be done so that the player would know how he's doing approximately. Also recall that glicko-2 increases the RD of a player that is not playing. If this becomes too high, his rating would become provisional, which, in my opinion, is a good thing.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
|
#22 |
|
coolcoolcool
![]() ![]() ![]()
Join Date: Dec 2005
Posts: 5,356
Plano, TX
|
A thought: it would be interesting to see how the results of, say, a randbat ladder would correspond with the probability percentage of actually winning a randbat.
__________________
|
|
|
|
|
|
#23 |
|
Knows the great enthusiasms
![]() ![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jun 2007
Posts: 2,901
Houston, TX
|
One possible negative of using the percentage -- people may mistakenly think that it is their "Percentage of wins so far". I still think we should use the percentage, but I won't be surprised if new users jump to this conclusion.
I am changing the code for testing on the CAP server. Like Colin suggested, I plan to keep both CRE and GLIXARE for testing purposes. I totally agree with the proposal for "Provisional", and I think X-Act's suggestion for showing the user's provisional estimate is a good idea.
__________________
My Art Thread: ArtJustArt - The Art of DougJustDoug |
|
|
|
|
|
#24 |
![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jul 2007
Posts: 1,051
|
I don't really like X-Act's proposed name for this measure and if it's in the client I'd rather it be called something more intuitive that describes what it actually is, perhaps just Percentage Rating Estimate.
Last edited by Cathy; Feb 20th, 2009 at 5:27:17 PM. |
|
|
|
|
|
#25 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I don't care about the name. Name it what the hell you like.
Its real definition is "the probability that you win against a player with a 1500 rating and 350 deviation". Such a player may be considered to be a random opponent, since a player with such a rating may be the best or worst player, or something in between. In fact, this probability estimate is very near the true percentage of winning against a random opponent. There, you now have hints on what to call it.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
![]() |
| Thread Tools | |
|
|