WCAO: Possibly slightly improving GXE

Zarel · Mar 21, 2018

So anyone looking closely might notice that on the PS website ladder pages (the ones outside of the sim, which I assume no one looks at), there's a new column, WCAO:

https://pokemonshowdown.com/ladder/gen7ou

WCAO is the result of me staring at the GXE formula and wondering if a minor adjustment might improve it.

So, GXE, if you were wondering, is just the Glicko win-chance estimate for winning a battle against a player with 1500±130 rating – in other words, the win chance against a random opponent.

WCAO is the Glicko win-chance estimate for winning a battle against a player with 1500±0 rating – in other words, the win chance against an average opponent.

If you're a good player, playing a good player instead of an average player is going to decrease your win chance more than playing a bad player instead of an average player is going to increase your win chance. So WCAO scores will be more spread out than GXE scores, so small differences between player skill will be more noticeable.

I also took the opportunity to make the name of the rating much clearer in terms of what it actually means: "Win Chance vs Average Opponent" needs less explaining than "Glicko X-Act Estimate".

Thoughts, feedback, etc?

Kink · Mar 21, 2018

Interesting idea. Why average as opposed to vs the current top 500 on the ladder. Wouldn't that give a better indicator how good that player actually is?

teal6 · Mar 22, 2018

Zarel, first things first, for a community that so obsessively uses your creation I feel you don't get enough credit for the amount of passion and effort you put into it. I know there's been instances of bickering here and there but genuinely, thanks for always working to refine PS.

My question is: would this number be better used for suspects?

Zarel · Mar 22, 2018

Kink said:
Why average as opposed to vs the current top 500 on the ladder. Wouldn't that give a better indicator how good that player actually is?

There's a variety of reasons, including:

- "Average" is much easier to calculate (it's just 1500) than "current top 500 on the ladder".

- It makes an easy reference point at 50%: above 50% means "above average" and below 50% means "below average"

- "Average" is a static value, while "current top 500" changes over time; neither are perfect at capturing an "absolute" skill level (partially because metagame changes will always change where you are relative to average), but I think "average" will do it better, in general

- In theory it makes the spread better: on very popular ladders, average players will all have very low win chances against the top 500, so it'd be hard to tell their skill apart

The main advantage of WCAO over GXE is "better spread", i.e. people are further apart from each other, so it's slightly easier to see small skill differences.

teal6 said:
Would this number be better used for suspects?

Yes, if people like it enough, I'd probably replace GXE with it. There's no need to have both numbers because of how similar they are.

(You would always be able to calculate either score from their actual Glicko rating and RD, which PS does also publish.)

Bughouse · Mar 22, 2018

I guess what gives me pause is that the whole point of the Glicko rating system is that including deviation is (supposedly - I’m not enough of a stats guy to say) an improvement over Elo

Why does taking it out of GXE now make it better?

Just seems counter intuitive.

Info about rating systems: https://www.smogon.com/forums/threads/everything-you-ever-wanted-to-know-about-ratings.3487422/

DragonWhale · Mar 22, 2018

Bughouse said:
I guess what gives me pause is that the whole point of the Glicko rating system is that including deviation is (supposedly - I’m not enough of a stats guy to say) an improvement over Elo

Why does taking it out of GXE now make it better?

Just seems counter intuitive.

Info about rating systems: https://www.smogon.com/forums/threads/everything-you-ever-wanted-to-know-about-ratings.3487422/

If I read this correctly then it's simply a definition thing. The probability of beating someone with 1500+/-130 glicko (aka a random opponent) is GXE, probability of beating a 1500+/-0 glicko (someone who loses as much as he wins over a lot of battles, aka the average user) is WCAO. I presume the formula is exactly the same.

Glicko itself isnt losing deviation.

Kink · Mar 22, 2018

Zarel said:
There's a variety of reasons, including:

- "Average" is much easier to calculate (it's just 1500) than "current top 500 on the ladder".

- It makes an easy reference point at 50%: above 50% means "above average" and below 50% means "below average"

- "Average" is a static value, while "current top 500" changes over time; neither are perfect at capturing an "absolute" skill level (partially because metagame changes will always change where you are relative to average), but I think "average" will do it better, in general

- In theory it makes the spread better: on very popular ladders, average players will all have very low win chances against the top 500, so it'd be hard to tell their skill apart

The main advantage of WCAO over GXE is "better spread", i.e. people are further apart from each other, so it's slightly easier to see small skill differences.

cool. my next question is how well will this translate to lower tiers, where 1500 is on the slightly higher end of things?

Zarel · Mar 22, 2018

Bughouse said:
I guess what gives me pause is that the whole point of the Glicko rating system is that including deviation is (supposedly - I’m not enough of a stats guy to say) an improvement over Elo

Why does taking it out of GXE now make it better?

Just seems counter intuitive.

It's important to realize that RD isn't just some number you add to your equations to make them "better". It actually means something.

Specifically, it means how sure you are that the rating is correct (lower = more sure).

GXE is your win chance against a random opponent. For that matchup, RD=130 is correct: it means you don't know anything about your opponent's rating.

WCAO is your win chance against an average opponent. For that matchup, RD=0 is correct: You're completely sure that the average opponent rating is 1500, because that's by definition how Glicko works.

Kink said:
cool. my next question is how well will this translate to lower tiers, where 1500 is on the slightly higher end of things?

GXE and WCAO are based on Glicko rating, not Elo rating. In Glicko, the average is defined to be 1500.

Bughouse · Mar 22, 2018

Got it, that specific clarification makes sense to me. Both metrics are "right" they're just trying to measure different things and the question becomes which one is better for Smogon's purposes.

I'm wondering what the value of knowing someone's skill vs the literal average player is when everyone's ladder pairing experience will have variation in it. If there were no ladder and everyone played an infinite number of games via simulation, then sure, WCAO is 100% the better metric. But that's just not the case. To me the win chance vs a normalized "random" opponent seems more useful given that constraint.

Zarel · Mar 22, 2018

Bughouse said:
I'm wondering what the value of knowing someone's skill vs the literal average player is when everyone's ladder pairing experience will have variation in it. If there were no ladder and everyone played an infinite number of games via simulation, then sure, WCAO is 100% the better metric. But that's just not the case. To me the win chance vs a normalized "random" opponent seems more useful given that constraint.

Zarel said:
If you're a good player, playing a good player instead of an average player is going to decrease your win chance more than playing a bad player instead of an average player is going to increase your win chance. So WCAO scores will be more spread out than GXE scores, so small differences between player skill will be more noticeable.

Zarel said:
The main advantage of WCAO over GXE is "better spread", i.e. people are further apart from each other, so it's slightly easier to see small skill differences.

Zarel · Mar 22, 2018

Bughouse said:
I'm wondering what the value of knowing someone's skill vs the literal average player is when everyone's ladder pairing experience will have variation in it. If there were no ladder and everyone played an infinite number of games via simulation, then sure, WCAO is 100% the better metric. But that's just not the case. To me the win chance vs a normalized "random" opponent seems more useful given that constraint.

Okay, less snarkily: The point of GXE and WCAO is to give a standardized score. Neither GXE nor WCAO are intended to estimate your actual winrate on the ladder, which should be 50% unless you're near the top, because the point of a matchmaker is to match you with people at approximately your skill level.

The point of GXE and WCAO is to give a number that represents how good you are, and specifically to do that in a way that can be easily understood.

Bughouse · Mar 22, 2018

I read up more on X-Act's development of GXE in 2009: https://www.smogon.com/forums/threa...layers-overall-rating-than-shoddys-cre.51169/

Since suspect reqs now tend to be by GXE, I agree it makes sense to use WCAO if it provides greater granularity.

If I'm understanding the spread correctly, the higher you are above average Glicko, the bigger the increase you'll have from GXE->WCAO. (And the reverse for below average Glickos as well.) This will mean that for suspect test reqs, assuming TLs keep using WCAO and WCAO+battle count as parameters, as they've been doing with GXE, better players will be able to finish reqs comparatively quicker to less good players than before when using GXE. I think that's a good outcome, so I would support this as well.

TLs should just adjust their parameters as needed when setting reqs, since it's NOT the same as the GXE they've used before.

EDIT: If I have that wrong and that's not the way the spread works, then I don't really much see the point. And if I have it backwards and the spread occurs mostly in the middle and pushes people at the top closer together, then that's a bad outcome for suspect testing.

Zarel · Mar 26, 2018

Bughouse said:
EDIT: If I have that wrong and that's not the way the spread works, then I don't really much see the point. And if I have it backwards and the spread occurs mostly in the middle and pushes people at the top closer together, then that's a bad outcome for suspect testing.

That's actually a really good point. I'd also been assuming the spread happened everywhere except the parts no one's reached before, but I did some more research and it actually doesn't happen much at the top at all. I'm going to stick with GXE.

Antar · Mar 30, 2018

Yeah, I don't exactly see the benefit here. You're 100% right about what WCAO means vs. what GXE means, but I'd still say GXE is better metric for pretty much any purpose I can think of. The incredibly useful thing about GXE is that it's your expected winrate if we didn't have a matchmaking system. So the sell that I've always pushed is to as a drop-in substitute for the very understandable winrate metric that everyone intuitively wants to use.

Just seeing how often you win against the "average player" basically only tells you how much of an issue hax is on the ladder.

WCAO: Possibly slightly improving GXE

Zarel

Not a Yuyuko fan

Kink

it's a thug life ¨̮

teal6

Zarel

Not a Yuyuko fan

Bughouse

Like ships in the night, you're passing me by

DragonWhale

It's not a misplay, it's RNG manipulation

Kink

it's a thug life ¨̮

Zarel

Not a Yuyuko fan

Bughouse

Like ships in the night, you're passing me by

Zarel

Not a Yuyuko fan

Zarel

Not a Yuyuko fan

Bughouse

Like ships in the night, you're passing me by

Zarel

Not a Yuyuko fan

Antar