As you probably noticed from some of my posts, I really don't like ShoddyBattle's 'Conservative Rating Estimate' (CRE) method of providing the overall rating of a player. For the record, the formula used to calculate the CRE of a player is:
 
	
	
	
		
 
The disadvantages of CRE are the following:
 
Because of this, I set out to try to find a better way of finding a player's overall rating given his Rating R and Deviation RD... and I managed to do this yesterday.
 
I read Glickman's paper (the inventor of the Glicko and Glicko-2 rating systems) and he provides an equation that essentially calculates the probability that a player with rating R_1 and deviation RD_1 beats another player with rating R_2 and deviation RD_2. It is written below:
 
	
	
	
		
I then simulated 250 players, each having their own rating and deviation, and found the probability of every player beating every other player using the above formula, and averaged out the probabilities for every player. This provides the true rating for every player.
 
However, this is a strenuous effort to do, and hence I wanted to approximate this probability for every player using just his R and RD (not everyone else's as well). After considering various possibilities, it dawned on me that the probability of the player beating a 1500 rating, 350 deviation player (the rating and deviation of a player that has just joined the ladder) would provide a good approximation. When testing it out, it did provide a good approximation of the true rating... a very, very good approximation actually!
 
The only time it didn't provide a good approximation was when the deviation of the player was high. This confirmed yet again that players that have a rating deviation that is too high (meaning that his rating is too uncertain) shouldn't even be listed on the leaderboard. And this is what I propose for the estimated rating to be done.
 
After consulting a bit with the community, it was decided that this system's rating should represent the estimated percentage that that player has of winning a battle against a random opponent.
 
So, finally, here is what I propose to be a much better estimate of the player's rating. I'm calling it GLIXARE, short for 'Glicko - X-Act Rating Estimate':
 
	
	
	
		
 
The table below shows the top 50 of the 250 players I've tested, ranked according to the true percentage they had of winning against the other players. Next to them is the rank they would have obtained if CRE was used, and the rank they would have obtained if GLIXARE was used. Notice the stunning accuracy of the GLIXARE ranking compared to the true ranking.
 
	
	
	
		
 
Needless to say, I propose that the GLIXARE rating be used for our Smogon leaderboards. I would also advise Colin to do the same for his ShoddyBattle rating, but, of course, I'll leave that up to him since it's his program after all.
				
			
		Code:
	
	CRE = Rating - 4 * Deviation
	The disadvantages of CRE are the following:
- Rating changes are too slow. You'll need to beat quite a lot of players in order to see your rating change acceptably. This makes players use more alts.
 - The higher the rating deviation of the player, the more the player's true skill is underestimated.
 - It provides horribly incorrect ratings for people whose rating deviation is very high. For an example, just visit this page.
 
- It is simple to calculate.
 
Because of this, I set out to try to find a better way of finding a player's overall rating given his Rating R and Deviation RD... and I managed to do this yesterday.
I read Glickman's paper (the inventor of the Glicko and Glicko-2 rating systems) and he provides an equation that essentially calculates the probability that a player with rating R_1 and deviation RD_1 beats another player with rating R_2 and deviation RD_2. It is written below:
		Code:
	
	Probability = 1 / (1 + 10^(((R_2 - R_1) / (400 * sqrt(1 + C * (RD_1^2 + RD_2^2))))))
 
where C = 3 * ln(10)^2 / (400 * pi)^2 (approximately 0.0000100724)
      pi = 3.14159265359
      sqrt(x) is the square root of x
      ln(x) is the natural logarithm of x
	However, this is a strenuous effort to do, and hence I wanted to approximate this probability for every player using just his R and RD (not everyone else's as well). After considering various possibilities, it dawned on me that the probability of the player beating a 1500 rating, 350 deviation player (the rating and deviation of a player that has just joined the ladder) would provide a good approximation. When testing it out, it did provide a good approximation of the true rating... a very, very good approximation actually!
The only time it didn't provide a good approximation was when the deviation of the player was high. This confirmed yet again that players that have a rating deviation that is too high (meaning that his rating is too uncertain) shouldn't even be listed on the leaderboard. And this is what I propose for the estimated rating to be done.
After consulting a bit with the community, it was decided that this system's rating should represent the estimated percentage that that player has of winning a battle against a random opponent.
So, finally, here is what I propose to be a much better estimate of the player's rating. I'm calling it GLIXARE, short for 'Glicko - X-Act Rating Estimate':
		Code:
	
	Given a player rating R and a rating deviation RD:
GLIXARE Rating = 0, if RD > 100
GLIXARE Rating = round(10000 / (1 + 10^(((1500 - R) * pi / sqrt(3 * ln(10)^2 * RD^2 + 2500 * (64 * pi^2 + 147 * ln(10)^2)))))) / 100, otherwise
	The table below shows the top 50 of the 250 players I've tested, ranked according to the true percentage they had of winning against the other players. Next to them is the rank they would have obtained if CRE was used, and the rank they would have obtained if GLIXARE was used. Notice the stunning accuracy of the GLIXARE ranking compared to the true ranking.
		Code:
	
	Rank  Rating       Deviation    True Rating  CRE          Rank For CRE  GLIXARE  Rank For GLIXARE
  1   1991.408347  52.4913089   86.25%       1781.443112     1 (=)       86.77%       1 (=)
  2   1992.528854  68.34131414  86.21%       1719.163597    13 (+11)     86.73%       2 (=)
  3   1989.461831  80.60840798  85.95%       1667.028199    18 (+15)     86.51%       3 (=)
  4   1969.23615   50.08961382  85.08%       1768.877695     2 (-2)      85.78%       4 (=)
  5   1968.509612  50.00035476  85.04%       1768.508193     3 (-2)      85.75%       5 (=)
  6   1972.675135  99.88585936  84.88%       1573.131698    34 (+28)     85.58%       6 (=)
  7   1963.494349  50.60336877  84.76%       1761.080874     5 (-2)      85.51%       7 (=)
  8   1962.47472   50.03981759  84.71%       1762.31545      4 (-4)      85.46%       8 (=)
  9   1953.22279   52.20270923  84.18%       1744.411953     6 (-3)      85.00%       9 (=)
 10   1958.445485  96.63628542  84.13%       1571.900343    36 (+26)     84.94%      10 (=)
 11   1943.00504   50.10101356  83.60%       1742.600986     7 (-4)      84.51%      11 (=)
 12   1941.259317  52.64803719  83.48%       1730.667168    10 (-2)      84.41%      12 (=)
 13   1938.159617  50.92018226  83.31%       1734.478888     8 (-5)      84.26%      13 (=)
 14   1936.183665  51.14032287  83.19%       1731.622373     9 (-5)      84.16%      14 (=)
 15   1930.057618  50.16740178  82.84%       1729.388011    11 (-4)      83.85%      15 (=)
 16   1927.779647  51.0262639   82.70%       1723.674591    12 (-4)      83.73%      16 (=)
 17   1922.220146  57.65787498  82.32%       1691.588646    14 (-3)      83.40%      17 (=)
 18   1918.558883  74.43201955  81.99%       1620.830805    28 (+10)     83.09%      18 (=)
 19   1908.558873  59.43764926  81.47%       1670.808276    15 (-4)      82.65%      19 (=)
 20   1898.317729  81.4379487   80.68%       1572.565934    35 (+15)     81.93%      20 (=)
 21   1876.359618  52.21151457  79.44%       1667.513559    17 (-4)      80.86%      21 (=)
 22   1870.366607  50.00313579  79.06%       1670.354064    16 (-6)      80.51%      22 (=)
 23   1867.964646  51.3491295   78.89%       1662.568128    21 (-2)      80.36%      23 (=)
 24   1867.766236  50.56176086  78.88%       1665.519192    19 (-5)      80.35%      24 (=)
 25   1863.669023  50.11759802  78.61%       1663.198631    20 (-5)      80.10%      25 (=)
 26   1866.541991  95.77888912  78.50%       1483.426435    48 (+22)     79.95%      26 (=)
 27   1859.313589  55.68806985  78.28%       1636.561309    25 (-2)      79.81%      27 (=)
 28   1854.389046  51.55994467  77.97%       1648.149268    22 (-6)      79.52%      28 (=)
 29   1855.299562  72.27265208  77.92%       1566.208954    37 (+8)      79.46%      29 (=)
 30   1853.503073  52.54886057  77.90%       1643.307631    23 (-7)      79.46%      30 (=)
 31   1841.225405  50.60827664  77.06%       1638.792298    31 (=)       78.70%      31 (=)
 32   1834.513128  50.14590546  76.59%       1633.929506    26 (-6)      78.26%      32 (=)
 33   1834.145367  52.8071478   76.55%       1622.916775    27 (-6)      78.23%      33 (=)
 34   1801.474785  50.04016895  74.21%       1601.314109    29 (-5)      76.04%      34 (=)
 35   1804.267316  94.72400253  74.15%       1425.371306    55 (+20)     75.93%      35 (=)
 36   1795.795989  50.01171801  73.78%       1595.749117    30 (-6)      75.64%      36 (=)
 37   1793.496594  50.25237181  73.61%       1592.487107    31 (-6)      75.47%      37 (=)
 38   1782.059218  51.06317634  72.75%       1577.806512    32 (-6)      74.65%      38 (=)
 39   1775.843559  50.18930449  72.28%       1575.086341    33 (-6)      74.20%      39 (=)
 40   1748.052482  56.82618848  70.11%       1520.747729    42 (+2)      72.08%      40 (=)
 41   1748.543861  92.78899389  69.97%       1377.387886    67 (+26)     71.89%      41 (=)
 42   1744.455748  50.99377091  69.85%       1540.480664    39 (-3)      71.83%      42 (=)
 43   1743.084558  60.21302067  69.71%       1502.232476    44 (+1)      71.68%      43 (=)
 44   1742.067022  50.19645562  69.66%       1541.2812      38 (-6)      71.65%      44 (=)
 45   1740.620853  84.04522098  69.40%       1404.439969    58 (+13)     71.37%      46 (+1)
 46   1738.536514  51.44593668  69.38%       1532.752767    40 (+6)      71.35%      45 (-1)
 47   1728.468483  59.51691032  68.55%       1490.400841    45 (-2)      70.54%      47 (=)
 48   1727.948725  50.16195168  68.54%       1527.300918    41 (-7)      70.54%      48 (=)
 49   1708.596637  59.16778978  66.96%       1471.925478    50 (+1)      68.94%      49 (=)
 50   1705.528967  50.05894259  66.73%       1505.293197    43 (-7)      68.72%      50 (=)
	Needless to say, I propose that the GLIXARE rating be used for our Smogon leaderboards. I would also advise Colin to do the same for his ShoddyBattle rating, but, of course, I'll leave that up to him since it's his program after all.























