Glicko2 suggestion: Short-term and long-term ratings

capefeather · Aug 19, 2013

I've been meaning to suggest this for a while, but I kept putting it off and prioritizing other things. Apparently aldaron is also discussing the rating system with Zarel and that's what reminded me to do this, so I'm sorry if I'm stepping on people's toes with this. It's just not the kind of thing I can explain on IRC. Otherwise, I might have given better warnings of my intention to make this post than I did.

The Glicko2 rating system generates a probability distribution of a player's rating, and out of this it generates the mean, an error a.k.a. deviation quantity, and a volatility quantity. For example, "IceGhost" currently has a rating of 1994 +- 53, which shows the mean followed by the error.

The way I understand the rating mechanism at the moment, assuming nothing drastic has happened with it since I last checked, is that ratings are updated after every battle, and there are "rating periods" at the end of which ratings "decay", meaning the errors would increase depending on the volatility quantities. However, the way Glicko2 is supposed to work is that ratings are updated only at every rating period, when they "decay". It's generally agreed that this is impractical for a ladder on an internet gaming server.

My proposition is to have a provisional "short-term" rating that applies to the current rating period. For example, let's look at IceGhost with his/her overall rating of 1994 +- 53. With what I'm proposing, this rating would be fixed for the whole rating period, while there would be a "short-term" rating starting at 1994 +- 300 (or whatever the error is for a new account), which gets updated every battle. At the end of the rating period, IceGhost might have gotten a short-term rating of 2081 +- 100. When the long-term rating updates, it incorporates the short-term rating through averaging methods, then "decays".

I think that this would help solve the current incentivization of alts and lessen the impact of early win streaks. For one thing, the short-term rating would be like a new account but better (unless your long-term mean rating is below 1500). For another thing, the notion of "early win streaks" would be rendered almost meaningless since that opportunity would exist every time and simultaneously have less of an impact. I'm not sure of the long-term implications, but I speculate that it would also take longer for accounts to "go bad" since we're doing the equivalent of drastically reducing the number of data points that an account directly takes into... account.

There are some matters that would need to be resolved with this idea. The main one I can think of is: What quantity should be used to place a player on the ladder, or for matchmaking purposes? I'm not entirely sure, but presumably the short-term rating would be involved in it, since we presumably want to keep updating the player's ladder-relevant rating dynamically. Related to this is my slight reservation with actually starting a short-term rating at the long-term rating's mean, as opposed to 1500 like a new account. Another issue is the matter of "rating safety"; people often grind to low error so that their ratings don't vary as wildly with each battle. However, from what I remember, PS!'s rating mechanism adjusts an account's rating period depending on how frequently it battles, so it doesn't seem like it would be too much of an issue. Otherwise, I'd be much less keen on the whole idea.

I see a lot of people saying, "Glicko2 doesn't work," "Elo is better," etc. but it seems to me that the problems with the rating mechanism used by PS! (and Shoddy and PL before it) come not from the rating system itself, but from its implementation. I hope that by using something like my proposal, we can improve the implementation of Glicko2 on the ladder. Remember, what matters is not the absolute numbers, but the relative comparisons between numbers.

Antar · Aug 21, 2013

@Zarel should probably weight in, but what you describe is what I believe is already in place: all players actually have *two* Glicko-like ratings, the "provisional" rating which is displayed, and the actual Glicko2 rating, which only gets updated once every two days or 15 matches. As I understand it, "real" rating is what's used for updating ratings, but otherwise matchmaking, laddering and stats weighting all use provisional.

I recommend that anyone who's interested check out the Ratings FAQ I'm working on for details about how rating systems in general--and ours in particular--work.

Zarel · Aug 21, 2013

Yeah, what Antar said is correct, except matchmaking now uses (PR*2+R)/3.

Glicko2 is designed around using real rating for matchmaking; purely using provisional rating causes too much rating spread.

Glicko2 suggestion: Short-term and long-term ratings

capefeather

toot

Antar

Zarel

Not a Yuyuko fan

Users Who Are Viewing This Thread (Users: 1, Guests: 0)