|
|||||||
|
|
Thread Tools |
|
|
#1 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I have been researching a fair but simple rating system for the upcoming Competitor program, and here it is.
It is based heavily on the Glicko system, created by Professor Mark E. Glickman. However, I added a little detail: time, so I'll unashamedly call this rating system the Timed Glicko Rating system, or TGR for short. Notice that the ShoddyBattle rating system is not Glicko, but Glicko-2. Glicko-2 is an improvement on Glicko since it also implements volatility, while Glicko doesn't. The volatility measures the degree of consistency of the player — the more consistent, the lower it is. However, since Pokemon is a game in which it is practically impossible to be consistent (because it is easy to win or lose unexpectedly due to luck), I deemed that volatility is basically superfluous, and stuck to the simpler Glicko rating. I think a short explanation of how the Glicko system works would be welcome by lots of people, and hence here it is. The Glicko system assumes that a player has a rating R and a rating deviation RD. The player would normally perform roughly as expected by his rating R, but sometimes, he has a good day, performing better than his rating would suggest, and sometimes he has a bad day, performing worse than his rating. Glickman assumed that these ‘performance fluctuations’ are logistically distributed as per the rating deviation RD. The logistic distribution is very similar to the normal distribution but is preferred because, from observation of chess games, the logistic distribution follows the probability of a player beating another one better than the normal distribution. As a consequence of the logistic distribution, a player has about a 72% chance of playing at a level within one rating deviation from the rating (i.e. at a rating between R – RD and R + RD), a 14% chance of playing better than that level (i.e. at a rating better than R + RD) and a 14% chance of playing worse than that level (i.e. at a rating worse than R – RD). It can be seen from the above that if the player’s RD is large, then his performance is more uncertain than if his RD is small. For example, suppose there are two players, Albert and Ben, both having a rating of 1500, but, whereas Albert has a RD of 200, Ben has a RD of 50. This would mean that Albert is expected to play at a level of rating between 1500 – 200 and 1500 + 200, or between 1300 and 1700, whereas Bob would be expected to play at a level between 1500 – 50 and 1500 + 50, or between 1450 and 1550. It can be clearly seen here that Bob’s playing performance is much more certain than that of Albert, even though they have the same rating. In the Glicko system, the RD becomes smaller the more games that player plays. By playing games, the rating can consequently become more certain, thus lowering the RD. However, if the player stops playing for a long time, then his performance when he returns playing will be more uncertain. Thus, the Glicko system increases the RD of players that are inactive for long periods of time. Having a low RD also results in a player’s rating changing more slowly, especially if he plays against a player with a high RD. Conversely, having a high RD results in that player’s rating changing more quickly, especially if he plays against a player with a low RD. Consider, for example, that Albert plays against Ben and wins. Albert’s rating would increase by a whopping 86, becoming 1586, while Ben’s rating would decrease only by 6, becoming 1494. If Albert loses, his rating would decrease by 86, becoming 1414, while Ben’s rating would increase by 6, becoming 1506. The reason for why this happens is the following. Since Albert’s rating is much more uncertain than Ben’s, beating Ben would mean that Albert’s rating is supposed to be much more than 1500. On the other hand, Ben’s rating wouldn’t lower by much after Albert beats him because Albert’s performance is very uncertain, and hence little information can be gained from such a loss. The reverse argument would follow if Albert loses to Ben. The rating system that I am proposing basically follows the same line of thought as what’s been said above. The main differences are in the way the RD changes. The normal Glicko system always increases the RD slightly by the same amount, governed by a constant c, after every match, and subsequently lowers it according to both the player and the opponent’s ratings and RDs. The Timed Glicko Rating system does not increase the RD by the same amount, but by an amount proportional to the time passed between a battle and the one before it. If a player plays frequently, his RD would thus be lowered by more than for a player that plays rarely, because his RD would first only be increased by a relatively small amount and then lowered accordingly. Another difference is that players whose RD is at least 100 have their rating listed as provisional. Ratings cannot feature in the ladder leaderboard until they are not provisional (or, alternatively, are placed at the very bottom of the leaderboard). Furthermore, every player’s RD is updated once per day depending on how long it took them to play their last battle, so that players that are not playing would see their rating turn provisional. As was said before, having a low RD makes a player’s rating change slowly. Sometimes, it is so low that the rating does not change appreciably even if that player starts to win or lose a lot of games. However, with SC = 20, the RD won't lower a lot anyway. The final difference is not exactly related to the Glicko system per se, but to the way in which players play against each other. As in ShoddyBattle, players who wish to play on the ladder wait in a queue and get assigned a player to play against. In the new system, however, the opponent that will be assigned to a player will have between 15% and 85% chance of beating him, governed by an exact formula. This roughly translates to the opponent having a difference in rating of 300 or less, and prevents huge mismatches from occurring. Here is the TGR algorithm in pseudocode. I have made an implementation of it on Excel, and I know it works well: The following are the constants used for TGR. SC is the factor by which your deviation increases over time. With this number, the rating of the most ardent of players would still take at most 20 days of inactivity to become provisional. Q is a number that is multiplied by the ratings deviation to convert it to the real standard deviation used in the logistic distribution. PI is also involved in the logistic distribution, since the standard deviation of a logistic distribution is equal to s * sqrt(3) / pi, where s is the parameter of the distribution. P is a constant equal to the expected probability that two equally skilled players' battle result has a deserved outcome. Code:
SC = 20; Q = ln(10) / 400; PI = 3.14159265359; P = ... Code:
Subroutine DayFrac(): Get current Hours, Minutes, Seconds; return (Hours * 3600 + Minutes * 60 + Seconds) / 86400; Code:
Subroutine WinProb(Player1, Player2): G = 1 / sqrt(1 + 3 * Q^2 * (Player1.RD^2 + Player2.RD^2) / PI^2); return (1 / (1 + 10^(-G * (Player1.Rating - Player2.Rating) / 400))); Code:
Create NewPlayer; NewPlayer.Rating = 1500; NewPlayer.RD = 350; NewPlayer.Time = DayFrac(); Code:
Time = DayFrac();
P = 0.35;
While P = 0.35 and DayFrac() < Time + 300/86400 do { [300 refers to 300 seconds = 5 mins. This can be changed.]
For every Opponent waiting for a ladder match do {
ProbWin = WinProb(Player, Opponent);
If abs(ProbWin - 0.5) < P then {
Opp = Opponent;
P = abs(ProbWin - 0.5);
}
}
}
If P = 0.35 then {
Player.Time = DayFrac();
display("Sorry, no player is available to battle");
}
else Play Match versus Opp;
Code:
If Player1 won the battle against Player2 then Win = 1 else Win = 0; Time = DayFrac(); UpdatePlayer(Player1,Player2,Time,Win); UpdatePlayer(Player2,Player1,Time,1-Win); Code:
Subroutine UpdatePlayer(Player1,Player2,Time,Win): PTime = Time - Player1.Time; PRD = min(sqrt(Player1.RD^2 + PTime * SC^2), 350); PG = 1 / sqrt(1 + 3 * Q^2 * Player2.RD^2 / PI^2); PE = (1 / (1 + 10^(-PG * (Player1.Rating - Player2.Rating) / 400))); XD = abs(Win - WinProb(Player1, Player2)); [Deviation from the apriori expected probability of winning and the real outcome after the battle.] PMERIT = 1 - (1 - P) * XD * (1 + 2 * XD); [Quadratic Weighting's assumed probability that the result was on merit.] V = 1 / (PG^2 * PE * (1 - PE) * Q^2); Player1.Rating = Player1.Rating + Q * PG * (Win - PE) * (2 * PMERIT - 1) / (1 / PRD^2 + 1 / V); Player1.RD = 1 / sqrt(1 / PRD^2 + 1 / V); Player1.Time = Time; Code:
For every Player on the ladder do {
PTime = 1 - Player.Time;
Player.RD = min(sqrt(Player.RD^2 + PTime * SC^2), 350);
Player.Time = 0;
}
UpdateLeaderboard;
Code:
If Player.RD > 100 then display "Rating is Provisional" else display "Rating is " + round(10000 / (1 + 10^(((1500 - Player.Rating) * PI / sqrt(3 * ln(10)^2 * Player.RD^2 + 2500 * (64 * PI^2 + 147 * ln(10)^2)))))) / 100 + "%"
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! Last edited by X-Act; Feb 13th, 2009 at 11:50:38 AM. Reason: updated with GLIXARE and Quadratic Weighting |
|
|
|
|
#2 |
|
I'm not retarded I'm Canadian it's different
![]() ![]() ![]()
Join Date: Dec 2004
Posts: 6,128
Canada eh
|
this systems appeals to me more than the current shoddybattle one, well done X-Act.
__________________
|
|
|
|
|
#3 |
|
Bag
![]() ![]() ![]() ![]() ![]() ![]()
Join Date: Sep 2005
Posts: 3,636
St. Louis
|
I trust X-Act for this. :)
Thanks for the hard work! |
|
|
|
|
#4 |
![]() ![]() ![]() ![]()
Join Date: Dec 2004
Posts: 7,299
|
When you say time, is that measured in days, or are you gonna get up in the morning and find your rating is less accurate?
Have a nice day.
__________________
|
|
|
|
|
#5 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
When I say time, I mean, for example 13:44:35 or 19:04:36 or 05:13:49 etc, and then, from that time, the number of seconds between midnight and that time is found (basically Hours x 3600 + Minutes x 60 + Seconds). This cannot be greater than 86400, the number of seconds in a day.
So if, for example, your last match ended at 19:46:37, say, your time would be 19 x 3600 + 46 x 60 + 37 = 71197, and your RD was 60. At midnight, your RD would become sqrt(RD^2 + (86400-71197) * SC^2 / 86400). Since SC=20, we get sqrt(60^2 + (86400-71197) * 20^2 / 86400) = sqrt(3600 + 15203 * 400 / 86400) = sqrt(3600 + 70.384) = sqrt(3670.384) = 60.58. If you didn't play at all during that day, your time would be 0. Suppose your RD is 60. At midnight, your RD would become sqrt(60^2 + (86400-0) * 20^2 / 86400) = sqrt(3600 + 86400 * 400 / 86400) = sqrt(3600 + 400) = sqrt(4000) = 63.25. Notice how RD increased much more for the person who didn't play at all during the day than for the person who played at least one game. Also, remember that you will not know your RD in this rating system. You will only know whether your RD is below 100 or not; if it is below 100, your rating is visible, and if not, your rating is provisional and invisible.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#6 | |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
Quote:
If the goal is to make the ratings as accurate a measurement of the player's skill as possible, then this makes no sense. The reason RD increases over time is because your rating becomes more uncertain if you go for a while without playing (although on this system, your rating doesn't decrease over time). If we want an accurate ladder, as opposed to a competitive one, then we would want the rating to decrease over time (under the assumption that people who are out of it for a while would play slightly worse when they come back), not just the rating deviation. I suspect this decrease in rating would follow something like a sigmoid ("S") curve, in that people who haven't played for a few hours will experience little-to-no change, but after a few days it's slightly noticeable. People who are out for several weeks, however, will find that the game has changed a bit, and all of those changes added up mean they will almost certainly perform worse for the first few battles when they come back. Being gone for 9 months, the player would be worse than if they were gone for only 8, but the difference between the two wouldn't be that big. Another issue is that an experienced player will regain their skills a lot faster than a newer player would get up to the experienced player's peak level. I'm also not too sure about just measuring the time since the last battle. Imagine a player that was gone for four months. They play one game during the day at 23:00. Another player of identical stats to the first also comes back after four months. This player plays games at 2:00, 3:00, 4:00, 5:00..., and 22:00. The rating deviation of the second player should drop more, but unless I'm mistaken, it will be the first player whose deviation drops more (only slightly more). So as I said, we have to decide just what we want out of the ladder. I prefer a slightly less competitive, but more accurate ladder. The main drawback of this to me is that it is far more complicated. The ratings of players as it stands will overestimate the chance of the better player to win. As my example for this, consider the top player playing an average player. As far as ratings differences go, the top player may have an apparent 95% chance to win, but because Pokemon is not a game of pure skill (in that it contains elements of luck), the actual chance for that player to win is much higher, as long as they have some critical amount of skill to be using stuff other than Tackle Swampert and Mud Slap Pidgey.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
|
#7 | ||||
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
First, I'll start from the things I understood more.
Quote:
Quote:
Quote:
Quote:
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
||||
|
|
|
|
#8 | ||||
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
Quote:
Quote:
Quote:
Quote:
Hope I did a better job explaining this time.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
||||
|
|
|
|
#9 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
The actual skill level of a player is never known no matter what you do. That's the whole point of the Glicko system! As I said in the post, a player always has roughly a 28% chance of playing better or worse than his rating.
I don't like the rating becoming lower with inactivity. What should become lower is the certainty of his rating. If the top player is rated 2000, then, after a month of inactivity, that rating should become so uncertain that it's not even visible anymore. However, when he returns, he starts again at 2000, not 1500, and, given his performance during the period he plays while his rating is provisional, a new rating is assigned to him. That's another reason why a person with a very high RD has his rating changing very quickly: the system is trying to assign him a reliable rating that's near his playing capabilities. For example, if the player with 2000 rating and 130 RD (quite uncertain) comes back after a month of inactivity and loses the first game against a player with 1700 rating and 60 RD (maybe due to being rusty or due to changes in the metagame as you say), his rating drops immediately to 1925 and his RD becomes around 125 (so his rating is still provisional). After playing a few more games, his RD becomes less than 100 and the system can then assign him a new reliable rating. I know that the actual probability of winning in Pokemon depends also on luck, but how can you quantify luck? It would depend partly on the teams the players are using. The reason I put 15% to 85% is only to ensure that the players playing each other are not of totally dissimilar playing ability, not to calculate their exact probability of winning or losing. Anyway, I have to go now. See you next Tuesday!
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#10 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I'm updating this system slightly. Basically, I'm giving more freedom to the rating deviation - I'm allowing it to become as low as it wants instead of having 60 to be its minimum.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#11 |
|
Knows the great enthusiasms
![]() ![]() ![]() ![]() ![]() ![]()
Administrator
Join Date: Jun 2007
Posts: 2,901
Houston, TX
|
I'd like see if we could implement this on the current Smogon University ladders. I think this rating system would solve the problem of people racing up the ladder on new accounts (I know we can somewhat solve that on the current ladder by ignoring players with an RD greater than 100). And it could curb the incentive for players to constantly create new accounts. It also seems to be a system much more suited to Pokemon, which has so much luck involved in battle outcomes.
I know we created a lot of confusion when we implemented a different rating system back when we first brought SU online. But, with a little planning, I think we could successfully transition to a new system. The big problem last time, was that the new system did not uniformly "convert" all player ratings at the time of implementation. We used a "lazy conversion" -- meaning that players' ratings were converted when they fought their first battle after the new system was put in place. This caused many players to freak out, because they noticed that ratings were jumping around wildly, as "new rating system players" were ranked alongside "old rating system players". There was also the problem of players' lack of familiarity with the dynamics of the new system. So when their rating changed in an unexpected way, it caused confusion. I can think of several ways to mitigate these problems: 1) Run some form of "conversion process" that updates all player ratings to the new system at one time.I'm not saying we should implement this immediately, on a whim. But, I'd like to open a dialog and explore the pros and cons of implementing a new rating system on our current battle server.
__________________
My Art Thread: ArtJustArt - The Art of DougJustDoug |
|
|
|
|
#12 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I'll be glad to have this system used for our ladder. As you say, however, the players need to be 'educated' beforehand so that the transition is as smooth as possible.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#13 |
|
Bag
![]() ![]() ![]() ![]() ![]() ![]()
Join Date: Sep 2005
Posts: 3,636
St. Louis
|
I saw what I thought was a new thread with "Competitor" in the title... I gasped.
|
|
|
|
|
#14 |
|
Fast-moving, smart, sexy and alarming.
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2005
Posts: 5,152
|
Glad I wasn't the only one. I was quite confused when I saw the icon indicating I had already posted in this thread.
__________________
Previously obi. Technical Machine, a Pokemon AI. "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat." - Sun Tzu |
|
|
|
|
#15 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
I decided to refine the rating system a bit. To do so, however, I need a bit of help, specifically from the hardcore players. :)
Glicko is a great rating system, but it is not completely relevant to Pokemon because it assumes that if you lose or win, you did that on merit. That is not always the case for Pokemon. This is not a criticism to the Glicko system - it is just designed for a different purpose. Hence, I would like to know what do you think is the percentage number of games that you shouldn't have lost but you lost. (This would be equal to the percentage number of games that you shouldn't have won but you won.) This would contribute to an even fairer rating system.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#16 |
|
You and I Know
![]() ![]() ![]()
Join Date: Nov 2007
Posts: 1,543
San Diego, CA
|
When you say "shouldn't have lost" should that include bad team matchups in addition to luck? I know it seems silly, but iirc Glicko was designed for chess, which obviously does not have that concept, so if you are trying to account for the differences then perhaps that should be taken into account?
Anyway, I would say that I win / lose maybe 5% of the time when I shouldn't have. If team matchups (and random specialized threats) are involved then that number is probably 10%. |
|
|
|
|
#17 |
|
it's a revolution, i suppose
![]() ![]() ![]() ![]() ![]()
Join Date: Sep 2007
Posts: 3,572
New Jersey
|
I agree with TAY, but I think I win / lose around 10% of the games when I shouldn't have. I don't play all the time, so it seems like most of my matches have a good amount of luck in them.
|
|
|
|
|
#18 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
TAY, I mean in whatever circumstance where you feel you should have won but you lost. Of course, without being silly - for example "damn it I've just replaced HP Ice with HP Fire... if I hadn't done that I would have won" is NOT what I mean by "you should have won but you lost".
Any other input?
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#19 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
Also, when you say 10%, do you mean:
1) 10% where you lose undeservedly, 10% where you win undeservedly and 80% where you win/lose on merit or 2) 5% where you lose undeservedly, 5% where you win undeservedly and 90% where you win/lose on merit ?
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#20 |
|
You and I Know
![]() ![]() ![]()
Join Date: Nov 2007
Posts: 1,543
San Diego, CA
|
If I lose because my opponent used a Psychic / Grass Knot / Rapid Spin Starmie, then I would consider that "shouldn't have lost" for the purposes of the rating system. Which means that I think 90% of matches are "fair", or have the "correct" outcome; which means that 10% of the time the "wrong" outcome occurs.
|
|
|
|
|
#21 |
|
capitalism delenda est
Join Date: Jul 2007
Posts: 1,428
Maryland
|
Hmm - could such a system be based on the likelihood of "luck"? - for example, the 6.25 CH rate as a baseline, and then calculating the likelihood that a certain move, such as Fire Blast, misses? Or could it record "important" misses, such as a Hydro Pump missing Heatran on the switch, and that same Heatran netting two KOs?
Probably too much calculation though. |
|
|
|
|
#22 |
![]() ![]() ![]() ![]() ![]() Join Date: May 2007
Posts: 3,132
|
I don't see why the rating system should calculate for something the user himself should have calculated for.
If you lose because your opponent used a Psychic/Grass Knot/Rapid Spin Starmie, but what if in another game your opponent lost because you used a Will O Wisp + Protect Rotom A? It evens out in the long run. Especially with Critical Hits - why should it put a weight on it when you should have been managing those risks yourself? Why should "luck" matter, when likely it is going to even out in the long run anyway? Of course, it is true - "better players are more prone to luck" - since most of the time they don't need luck to win. But usually such happenstance is so few that it doesn't matter anyway. |
|
|
|
|
#23 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
Here's a situation where I feel I should have lost but I won:
My last Pokemon is Scarf Gengar at 45% health. The opponent's last Pokemon is Celebi at 100% health. The best move I have to hit Celebi, a STAB Super Effective Shadow Ball, doesn't OHKO Celebi. However, I actually do OHKO Celebi, since I get a Critical Hit, thus winning the battle. This is what I mean by a battle where the outcome to the players wasn't fair. And the above scenario DID happen to me - I'm not inventing this. I'm sure everyone agrees that the above is not a fair outcome. Things like this do happen in Pokemon. And that is why I want to modify the rating system slightly to allow for this... .. and I have already come up with the way of fixing the rating system to take this into account. I'll introduce a parameter p which is the probability that a battle's result (win or loss) is fair. Then the only thing I need to fix in the rating system is the following line, taken from as soon as the battle ends: The line Code:
If Player1 won the battle against Player2 then Win = 1 else Win = 0; Code:
If Player1 won the battle against Player2 then Win = p else Win = 1-p; Here's a comparison with how the new system compares with the old one. I considered a 1400/60 player beating a 1500/80 player, and tried p=0.9 for the new system (this number is too low in my opinion). With the old system, the 1500/80 player's new rating became 1477 (a drop of 23), while the 1400/60 player's new rating became 1413 (a gain of 13). With the new system, the 1500/80 player's new rating became 1481 (a drop of 19), while that of the 1400/60 player became 1411 (a gain of 11). This makes sense. The new system is assuming that the battle's result might not have been fair, and hence the rating for both players changed less drastically than for the case where the battle's result is considered as being sure to be fair. I'm sure that p is less than 1 even for competitive Pokemon, but I'm also sure that it is near 1.
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! |
|
|
|
|
#24 |
|
You and I Know
![]() ![]() ![]()
Join Date: Nov 2007
Posts: 1,543
San Diego, CA
|
Will the value of P change depending on the ratings of the players involved (e.g. someone with 1400 rating beating someone with 1750 rating)? Or is it going to just be set at a value less than one for all battles?
|
|
|
|
|
#25 |
|
np: Biffy Clyro - Shock Shock
![]() ![]()
Join Date: Feb 2006
Posts: 4,679
Malta
|
That is an interesting consideration, TAY.
Well, you tell me, really. If a lower rated player wins against a higher rated player, is it more probable that it is due to luck than when a player wins against another equally rated player? To clarify, if 1150/60 beats 1950/60 is it more probable that it is due to luck than if 1490/60 beats 1510/60 ? If you deem that it is, then I'll make P change depending on the difference in rating. Right now, it is assumed to be a constant for every battle. Also, I've been thinking about this when I was sleeping (really) and I got that I fixed the code incorrectly. What I should have done is: Code:
If Player1 won the battle against Player2 then Win = 1 else Win = 0; Code:
Player1.Rating = Player1.Rating + Q * PG * (Win - PE) / (1 / PRD^2 + 1 / V); Code:
Player1.Rating = Player1.Rating + Q * PG * (Win - PE) * (2 * P - 1) / (1 / PRD^2 + 1 / V); This way, the 1400/60 beating a 1500/80 player example becomes (with P = 0.9): 1400/60's rating becomes 1410 (instead of 1411) 1500/80's rating becomes 1482 (instead of 1481). Minor difference, really, but this reflects better that the player's probability of having a fair result is P and that of having an unfair result is (1-P). Interestingly, if P = 0.5, then the rating stays unchanged no matter who you play and whether you win or lose. This makes a lot of sense; if the probability of the game's result being fair is equal to that of being unfair, then you cannot rate any player at all. Hopefully competitive Pokemon isn't like that!
__________________
http://users.smogon.com/X-Act For all your Pokemon needs (and more!) including: the Defensive EVs applet, the Probabilities of Breeding IVs in Pokemon applet, and the Ratings of Pokemon Base Stats applet (now Version 2.0!). And also the IV to PID applet! Last edited by X-Act; Jan 31st, 2009 at 3:08:29 AM. |
|
|
| Tags |
| irrelevant |
| Thread Tools | |
|
|