Ladder and rating system policy

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
This is a thread on ladder and rating system policy, split off from another thread.

The main questions discussed here are:

- Should W/L be displayed in the ladder?

- Should we have a ladder reset option?

Summary points:

- W/L isn't an accurate reflection of anything. Winning against a 1000 player is completely different from winning against a 1700 player, but the count doesn't say who you won against, only what you won. The other rating numbers are much more accurate in terms of measuring how good you are; W/L is very misleading about what it means.

- Suspect tests often use total games played and apparently also W/L to determine qualifications. This is weird and should probably not be done.

- Some people like to see how many wins/losses they've had (either in a run or overall), more for sentimental value than because it means anything. This is a use-case we should support.

P.S. ignore the Like from blarajan, the forum software is dumb so I had to edit one of my earlier posts to split off this thread.
 
Last edited:

PDC

street spirit fade out
is a Team Rater Alumnusis a Top Tiering Contributor Alumnusis a Smogon Media Contributor Alumnusis a Four-Time Past WCoP Champion
why did you get rid of w/l when you type /rank? seriously i don't get this change at all. this also effects required w/l ratios for suspect tests (idk if other tiers have these, but we do for OU). please change this back lol.
 

Kiyo

the cowboy kid
is a Forum Moderatoris a Top Tiering Contributoris a Top Social Media Contributor Alumnusis a Community Leader Alumnusis a Community Contributor Alumnusis a Contributor Alumnus
if theres an easy way to make it so you can only see your own when typing /rank, but others can't see ur w/l when doing /rank [user] im sure that would suffice (assuming the only reason w/l was taken away was because other users were like shittalking people about it or smth?)
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
why did you get rid of w/l when you type /rank? seriously i don't get this change at all. this also effects required w/l ratios for suspect tests (idk if other tiers have these, but we do for OU). please change this back lol.
Because:

We do not display win/loss stats because we don't want people to keep creating new accounts until they get a good win/loss value, and your win/loss should be 50-50 since you should be playing with other people about as good as you. Please use GXE instead, as it accurately estimates your "real" win/loss ratio.​

Suspect tests should not require W/L; please yell at any suspect test that does.
 

Stratos

Banned deucer.
I thought OU moved to a "total battles" metric instead of W/L (either way the point is that you cant just play a shitload of games until you get reqs) so you had to get reqs in <x battles. In which case, we don't need to see W/L, but we do need to see total # of matches. I think it's a useful thing to have anyways. Could you add that category instead?

And "total battles" is actually a totally valid thing to put in reqs so :D
 
Because:

We do not display win/loss stats because we don't want people to keep creating new accounts until they get a good win/loss value, and your win/loss should be 50-50 since you should be playing with other people about as good as you. Please use GXE instead, as it accurately estimates your "real" win/loss ratio.​

Suspect tests should not require W/L; please yell at any suspect test that does.
Does the competitive nature of it not matter, though? For example in UU community there's a sort of "record" of topping the ladder with 41-0 and no losses. A few of us have tried beating that (and come close!) but now we can't do it as far as I can tell.
 
Because:

We do not display win/loss stats because we don't want people to keep creating new accounts until they get a good win/loss value, and your win/loss should be 50-50 since you should be playing with other people about as good as you. Please use GXE instead, as it accurately estimates your "real" win/loss ratio.​

Suspect tests should not require W/L; please yell at any suspect test that does.
if you are any halfway decent player on the ladder, for example, a tournament player, i can guarantee that if you laddered for 50 games you would not have a 50/50 wl. this ideology applies to the very lower end of those using the ladders and I believed that these changes were intended to benefit the tournament community, of which this change only frustrates without providing any real benefit at all.
 

Freeroamer

The greatest story of them all.
is a Community Contributoris a Top Tiering Contributor
Well completing reqs within a certain amount of games is simply a GXE requirement more than anything else, ie for most suspect tests in OU I believe the numbers of games limit meant you needed at least a 77 GXE to be able to attain the required COIL within the limit.

I believe the main issue is people creating numerous alts on suspect tests because they didn't go like 10-0 in their first few games which in fairness is something I know a hell of a lot of people do. Not sure if there's a practical solution to solve this though outside of asking for a certain GXE rather than setting a games limit.
 

soulgazer

I FEEL INFINITE
is a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis a Past SPL Champion
if the alts creation is the problem, let us be able to reset our rankings manually. Like Beds said, may it be GXE, W/L, or whatever, there will always be a reason to make new alts. By letting us reset our rankings, this shouldn't happen.

I'm sure it is possible to limit the usage of it too. Like, once every few days. That way nobody can abuse it and it remains useful for suspect tests laddering.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
I thought OU moved to a "total battles" metric instead of W/L (either way the point is that you cant just play a shitload of games until you get reqs) so you had to get reqs in <x battles. In which case, we don't need to see W/L, but we do need to see total # of matches. I think it's a useful thing to have anyways. Could you add that category instead?

And "total battles" is actually a totally valid thing to put in reqs so :D
We already show total battles, it's the last column.

Does the competitive nature of it not matter, though? For example in UU community there's a sort of "record" of topping the ladder with 41-0 and no losses. A few of us have tried beating that (and come close!) but now we can't do it as far as I can tell.
Yes, it is sad that that is no longer possible. If only other people didn't abuse thr feature.

Not sure if i understand this but if I'm not happy with my ladder performance I will make a new alt, why does it matter if i measure that in W/L, elo or GXE?
Because we don't want you to do that. That's why:

W/L - removed

Elo - your reset Elo will be worse than your old Elo, why bother?

GXE - in theory fewer people care about GXE; also we're now hiding GXE until it stabilizes so this should be less worth-it, at least

if you are any halfway decent player on the ladder, for example, a tournament player, i can guarantee that if you laddered for 50 games you would not have a 50/50 wl. this ideology applies to the very lower end of those using the ladders
This is true, but your W/L still won't be a reflection of your actual skill. That's what GXE is for.

if the alts creation is the problem, let us be able to reset our rankings manually. Like Beds said, may it be GXE, W/L, or whatever, there will always be a reason to make new alts. By letting us reset our rankings, this shouldn't happen.

I'm sure it is possible to limit the usage of it too. Like, once every few days. That way nobody can abuse it and it remains useful for suspect tests laddering.
Unfortunately, several people (I think mainly Antar) strongly oppose resetting ratings, since it lets you "lie" about your real skill level by repeatedly resetting it to get a good number.
 

Stratos

Banned deucer.
the problem with glicko is that it's built on a theory which doesn't allow for improvement (basically it assumes that you were equally good for every game you played and it's just learning more about how good you are). if you're not going to let us reset our ratings, at least promise periodic ladder resets.
 

M Dragon

The north wind
is a Community Contributoris a Top Tiering Contributoris a Top Tutor Alumnusis a Tournament Director Alumnusis a Forum Moderator Alumnusis a Top Dedicated Tournament Host Alumnusis a Battle Simulator Moderator Alumnusis the Smogon Tour Season 17 Championis a defending World Cup of Pokemon Championis a Past SPL Champion
World Defender
Removing W/L will not help with that problem. If people see they are doing badly (like they lose a couple of games in the first 20 games) they will just create another account. Being able to see W/L only gives you some extra information about how you are doing, which is useful.
Hiding it won't prevent people from making alts, people can know if they are doing badly without looking at the WL record. I dont think this fixes anything, and it removes some useful information.

Best solution might be allowing users to reset their ratings, but limiting that, so they dont abuse it.
 

soulgazer

I FEEL INFINITE
is a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis a Past SPL Champion
Unfortunately, several people (I think mainly Antar) strongly oppose resetting ratings, since it lets you "lie" about your real skill level by repeatedly resetting it to get a good number.
Yea, but making a new alt also brings the same result. Thing is, you guys seems to want to find solutions to stop that from happening, and the ability to reset our rankings should be able to greatly help with that. There's no real way to stop people from possibly "lying" about their "real skill level", and I believe that the solution I brought up will be more useful in the long run.
 
Yes, it is sad that that is no longer possible. If only other people didn't abuse thr feature.
How is people making new alts "abusing" the fact that there's a win / loss ratio shown? If win / loss is as meaningless as you say it is then I don't see how it's a problem if people "abuse" something to get a good one.

Win / Loss is also a very basic statistic on something like a ladder and I really don't see how it makes much sense to not include it. If you want to limit the number of alts created then directly limiting this seems like a much more sensical solution than removing a very loosely related statistic that has a few positive uses.
 
Is there any good reason why the teambuilder fills 0's into the EVs thing where there was previously a blank? It makes it much more annoying and slower for me to quickly scan a team and check the evs for mistakes or just to see what they are.

Also I really really really want an option to have the old sprites back into the teambuilder (i'm chill with the new icons), and I would like this option to be seperate from the option to see BW sprites in battle since I prefer the oras sprites then.

edit (regarding resetting ladder ranking): I also would like this feature, and the explanation for why this can't happen on this thread doesn't really make sense to me, but this does:

~Zarel: unfortunately, allowing users to reset their rank crashes the ladder
~Zarel: imagine you're playing Jenga
~Zarel: and resetting a rank is removing a block from the Jenga tower
~Zarel: if it's a very short tower: no problem
~Zarel: if you have 5 million users and dozens of ladders: problem
 

V4Victini

再起不能
is a Battle Simulator Admin Alumnusis a Community Leader Alumnusis a Programmer Alumnusis a Top Researcher Alumnus
edit (regarding resetting ladder ranking): I also would like this feature, and the explanation for why this can't happen on this thread doesn't really make sense to me, but this does:

~Zarel: unfortunately, allowing users to reset their rank crashes the ladder
~Zarel: imagine you're playing Jenga
~Zarel: and resetting a rank is removing a block from the Jenga tower
~Zarel: if it's a very short tower: no problem
~Zarel: if you have 5 million users and dozens of ladders: problem
To clarify this is for individual resets. Wiping and restarting the ladder wouldn't be a problem, and I would love periodic resets.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
Is there any good reason why the teambuilder fills 0's into the EVs thing where there was previously a blank? It makes it much more annoying and slower for me to quickly scan a team and check the evs for mistakes or just to see what they are.
This was an oversight; fixed.

Also I really really really want an option to have the old sprites back into the teambuilder (i'm chill with the new icons), and I would like this option to be seperate from the option to see BW sprites in battle since I prefer the oras sprites then.
This is kind of unfeasible.

edit (regarding resetting ladder ranking): I also would like this feature, and the explanation for why this can't happen on this thread doesn't really make sense to me, but this does:

~Zarel: unfortunately, allowing users to reset their rank crashes the ladder
~Zarel: imagine you're playing Jenga
~Zarel: and resetting a rank is removing a block from the Jenga tower
~Zarel: if it's a very short tower: no problem
~Zarel: if you have 5 million users and dozens of ladders: problem
After I said that, I thought about it further, and it's definitely possible to do it in a way that doesn't crash the ladder.

The main problem is, of course, policy. I'll decide on how to do this later.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
For now, I've decided to allow people to see W/L, but only after clicking through a lecture on why W/L doesn't mean what they think it means.

Also, please tell us about the suspect testers using W/L in their suspect reqs; they are wrong and Antar and I need to yell at them.
 

AM

is a Community Leader Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnusis a Battle Simulator Moderator Alumnusis a Past WCoP Champion
LCPL Champion
For now, I've decided to allow people to see W/L, but only after clicking through a lecture on why W/L doesn't mean what they think it means.

Also, please tell us about the suspect testers using W/L in their suspect reqs; they are wrong and Antar and I need to yell at them.
Yell at Antar then cause he approved on it when we just had our most recent suspect test or do what everyone else does, make a policy review thread.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
Yell at Antar then cause he approved on it when we just had our most recent suspect test or do what everyone else does, make a policy review thread.
Tagging Antar

wtf are you looking for in a rating system? I thought you cared about accurate stats? What exactly do you want here? Yes/no to displaying W/L? Yes/no to a ladder reset button?

I remember you wanted to ban people for using alts so I'm confused why you've suddenly stopped caring about accurate stats?
 
Last edited:

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
Tagging Antar

wtf are you looking for in a rating system? I thought you cared about accurate stats? What exactly do you want here? Yes/no to displaying W/L? Yes/no to a ladder reset button?

I remember you wanted to ban people for using alts so I'm confused why you've suddenly stopped caring about accurate stats?
Tagged the wrong Antar, tagging Antar this time
 

atomicllamas

but then what's left of me?
is a Site Content Manager Alumnusis a Senior Staff Member Alumnusis a Community Contributor Alumnusis a Top Tiering Contributor Alumnusis a Contributor Alumnus
Zarel OU's suspect system limits total number of games played and ignores W/L, it is actually based on GXE. For OU tests the GXE cut off is 79.8 iirc (I calculated last time cause curious). When RU wanted to implement this (min GXE cutoff), which was before OU did, we were still told no by Antar. But maybe he changed his mind, I don't see why a minimum GXE would be an issue, as it allows for more customization of ladder systems, and the number of games played at each GXE can be fiddled with more easily.

Also, not displaying win/loss won't deter people from starting new alts, smart people know what GXE is and people make new alts for other reasons (ie. I'll usually make one per team I'm testing in RU).
 

Stratos

Banned deucer.
if you really want people to stop making alts just convince them all to play doubles where the ladder is so bad we don't use it in the first place.

And I'm going to address the argument of "coil has a built in GXE floor, why do we need another one besides" before it crops up: the problem with COIL's GXE floor is that it's approached exponentially, and youre basically given the option of hoping nobody will play 150 games to get reqs (they will) or forcing somebody with a GXE you would allow reqs at 70 games to play 150 for it. Just as an example, say I don't want anyone below 75 GXE voting, but at 75 GXE I want it to take about 70 battles to get reqs, with exponentially fewer games for higher GXEs. I could set a B value of 12, and a COIL of 2650 (min GXE 66.25), and get a spread that looks like this:

90: 23
85: 34
80: 45
75: 67
72: 100
70: 151

But while that looks good for 90, 85, 80, and 75, I don't want people with 70 or 72 GXE voting no matter how many games they play. So I could set the minimum GXE of the COIL formula to, say, 74, but then I'm given the choice between having people with 80 GXE play 18 games, or having the people at 75 GXE play 300 games. Neither of these are desirable.

If I were to explain this with MS paint, I would say that COIL only allows us to make reqs curves that look like this:



When all the tier leaders WANT to make reqs curves that look like this:

 

Users Who Are Viewing This Thread (Users: 1, Guests: 1)

Top