WCAO: Possibly slightly improving GXE

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
So anyone looking closely might notice that on the PS website ladder pages (the ones outside of the sim, which I assume no one looks at), there's a new column, WCAO:

https://pokemonshowdown.com/ladder/gen7ou

1521647060137.png


WCAO is the result of me staring at the GXE formula and wondering if a minor adjustment might improve it.

So, GXE, if you were wondering, is just the Glicko win-chance estimate for winning a battle against a player with 1500±130 rating – in other words, the win chance against a random opponent.

WCAO is the Glicko win-chance estimate for winning a battle against a player with 1500±0 rating – in other words, the win chance against an average opponent.

If you're a good player, playing a good player instead of an average player is going to decrease your win chance more than playing a bad player instead of an average player is going to increase your win chance. So WCAO scores will be more spread out than GXE scores, so small differences between player skill will be more noticeable.

I also took the opportunity to make the name of the rating much clearer in terms of what it actually means: "Win Chance vs Average Opponent" needs less explaining than "Glicko X-Act Estimate".

Thoughts, feedback, etc?
 
Zarel, first things first, for a community that so obsessively uses your creation I feel you don't get enough credit for the amount of passion and effort you put into it. I know there's been instances of bickering here and there but genuinely, thanks for always working to refine PS.

My question is: would this number be better used for suspects?
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
Why average as opposed to vs the current top 500 on the ladder. Wouldn't that give a better indicator how good that player actually is?
There's a variety of reasons, including:

- "Average" is much easier to calculate (it's just 1500) than "current top 500 on the ladder".

- It makes an easy reference point at 50%: above 50% means "above average" and below 50% means "below average"

- "Average" is a static value, while "current top 500" changes over time; neither are perfect at capturing an "absolute" skill level (partially because metagame changes will always change where you are relative to average), but I think "average" will do it better, in general

- In theory it makes the spread better: on very popular ladders, average players will all have very low win chances against the top 500, so it'd be hard to tell their skill apart

The main advantage of WCAO over GXE is "better spread", i.e. people are further apart from each other, so it's slightly easier to see small skill differences.

Would this number be better used for suspects?
Yes, if people like it enough, I'd probably replace GXE with it. There's no need to have both numbers because of how similar they are.

(You would always be able to calculate either score from their actual Glicko rating and RD, which PS does also publish.)
 

DragonWhale

It's not a misplay, it's RNG manipulation
is a Top Social Media Contributor Alumnusis a Community Leader Alumnusis a Community Contributor Alumnusis a Dedicated Tournament Host Alumnusis a Battle Simulator Moderator Alumnus
I guess what gives me pause is that the whole point of the Glicko rating system is that including deviation is (supposedly - I’m not enough of a stats guy to say) an improvement over Elo

Why does taking it out of GXE now make it better?

Just seems counter intuitive.

Info about rating systems: https://www.smogon.com/forums/threads/everything-you-ever-wanted-to-know-about-ratings.3487422/
If I read this correctly then it's simply a definition thing. The probability of beating someone with 1500+/-130 glicko (aka a random opponent) is GXE, probability of beating a 1500+/-0 glicko (someone who loses as much as he wins over a lot of battles, aka the average user) is WCAO. I presume the formula is exactly the same.

Glicko itself isnt losing deviation.
 

Kink

it's a thug life ¨̮
is a Tutor Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Top Tiering Contributor Alumnusis a Contributor Alumnus
There's a variety of reasons, including:

- "Average" is much easier to calculate (it's just 1500) than "current top 500 on the ladder".

- It makes an easy reference point at 50%: above 50% means "above average" and below 50% means "below average"

- "Average" is a static value, while "current top 500" changes over time; neither are perfect at capturing an "absolute" skill level (partially because metagame changes will always change where you are relative to average), but I think "average" will do it better, in general

- In theory it makes the spread better: on very popular ladders, average players will all have very low win chances against the top 500, so it'd be hard to tell their skill apart

The main advantage of WCAO over GXE is "better spread", i.e. people are further apart from each other, so it's slightly easier to see small skill differences.
cool. my next question is how well will this translate to lower tiers, where 1500 is on the slightly higher end of things?
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
I guess what gives me pause is that the whole point of the Glicko rating system is that including deviation is (supposedly - I’m not enough of a stats guy to say) an improvement over Elo

Why does taking it out of GXE now make it better?

Just seems counter intuitive.
It's important to realize that RD isn't just some number you add to your equations to make them "better". It actually means something.

Specifically, it means how sure you are that the rating is correct (lower = more sure).

GXE is your win chance against a random opponent. For that matchup, RD=130 is correct: it means you don't know anything about your opponent's rating.

WCAO is your win chance against an average opponent. For that matchup, RD=0 is correct: You're completely sure that the average opponent rating is 1500, because that's by definition how Glicko works.

cool. my next question is how well will this translate to lower tiers, where 1500 is on the slightly higher end of things?
GXE and WCAO are based on Glicko rating, not Elo rating. In Glicko, the average is defined to be 1500.
 

Bughouse

Like ships in the night, you're passing me by
is a Site Content Manageris a Forum Moderator Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnus
Got it, that specific clarification makes sense to me. Both metrics are "right" they're just trying to measure different things and the question becomes which one is better for Smogon's purposes.

I'm wondering what the value of knowing someone's skill vs the literal average player is when everyone's ladder pairing experience will have variation in it. If there were no ladder and everyone played an infinite number of games via simulation, then sure, WCAO is 100% the better metric. But that's just not the case. To me the win chance vs a normalized "random" opponent seems more useful given that constraint.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
I'm wondering what the value of knowing someone's skill vs the literal average player is when everyone's ladder pairing experience will have variation in it. If there were no ladder and everyone played an infinite number of games via simulation, then sure, WCAO is 100% the better metric. But that's just not the case. To me the win chance vs a normalized "random" opponent seems more useful given that constraint.
If you're a good player, playing a good player instead of an average player is going to decrease your win chance more than playing a bad player instead of an average player is going to increase your win chance. So WCAO scores will be more spread out than GXE scores, so small differences between player skill will be more noticeable.
The main advantage of WCAO over GXE is "better spread", i.e. people are further apart from each other, so it's slightly easier to see small skill differences.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
I'm wondering what the value of knowing someone's skill vs the literal average player is when everyone's ladder pairing experience will have variation in it. If there were no ladder and everyone played an infinite number of games via simulation, then sure, WCAO is 100% the better metric. But that's just not the case. To me the win chance vs a normalized "random" opponent seems more useful given that constraint.
Okay, less snarkily: The point of GXE and WCAO is to give a standardized score. Neither GXE nor WCAO are intended to estimate your actual winrate on the ladder, which should be 50% unless you're near the top, because the point of a matchmaker is to match you with people at approximately your skill level.

The point of GXE and WCAO is to give a number that represents how good you are, and specifically to do that in a way that can be easily understood.
 

Bughouse

Like ships in the night, you're passing me by
is a Site Content Manageris a Forum Moderator Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnus
I read up more on X-Act's development of GXE in 2009: https://www.smogon.com/forums/threa...layers-overall-rating-than-shoddys-cre.51169/

Since suspect reqs now tend to be by GXE, I agree it makes sense to use WCAO if it provides greater granularity.

If I'm understanding the spread correctly, the higher you are above average Glicko, the bigger the increase you'll have from GXE->WCAO. (And the reverse for below average Glickos as well.) This will mean that for suspect test reqs, assuming TLs keep using WCAO and WCAO+battle count as parameters, as they've been doing with GXE, better players will be able to finish reqs comparatively quicker to less good players than before when using GXE. I think that's a good outcome, so I would support this as well.

TLs should just adjust their parameters as needed when setting reqs, since it's NOT the same as the GXE they've used before.



EDIT: If I have that wrong and that's not the way the spread works, then I don't really much see the point. And if I have it backwards and the spread occurs mostly in the middle and pushes people at the top closer together, then that's a bad outcome for suspect testing.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
EDIT: If I have that wrong and that's not the way the spread works, then I don't really much see the point. And if I have it backwards and the spread occurs mostly in the middle and pushes people at the top closer together, then that's a bad outcome for suspect testing.
That's actually a really good point. I'd also been assuming the spread happened everywhere except the parts no one's reached before, but I did some more research and it actually doesn't happen much at the top at all. I'm going to stick with GXE.
 
Yeah, I don't exactly see the benefit here. You're 100% right about what WCAO means vs. what GXE means, but I'd still say GXE is better metric for pretty much any purpose I can think of. The incredibly useful thing about GXE is that it's your expected winrate if we didn't have a matchmaking system. So the sell that I've always pushed is to as a drop-in substitute for the very understandable winrate metric that everyone intuitively wants to use.

Just seeing how often you win against the "average player" basically only tells you how much of an issue hax is on the ladder.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top