1. New to the forums? Check out our Mentorship Program!
    Our mentors will answer your questions and help you become a part of the community!
  2. Welcome to Smogon Forums! Please take a minute to read the rules.

Announcement The decision to base UU off of 1760 stats

Discussion in 'Competitive Discussion' started by Antar, Mar 13, 2014.

Thread Status:
Not open for further replies.
  1. Thorhammer

    Thorhammer

    Joined:
    Jun 18, 2008
    Messages:
    1,959
    I get that that's the idea behind it, but it doesn't mean similar concepts to the candles can't be used in other ways, to avoid the huge problems this system has.
  2. Antar

    Antar Self-anointed Czar of LC UU
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,158
    Anyway guys, it turns out that the candle idea was a turd. I'll explain more later, but it has to do with what happens when you push the weighting system too far.
  3. Calm_Mind_Latias

    Calm_Mind_Latias

    Joined:
    Aug 20, 2013
    Messages:
    431
    Hmm... even though I haven't play a competitive match since Jan 27, I am still interested in the mathematical and theoretical reasons.
  4. Antar

    Antar Self-anointed Czar of LC UU
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,158
    Basically, for all of these candles, as the baseline (not calling it cutoff anymore. You happy, WebBowser and @Excitement?) increased, the candle usage didn't even come close to tending towards zero. In fact, above a certain threshold, candle usage actually started going up. The reason for this is because once the baseline is above the vast, vast majority of players, R ends up mattering a lot less than RD. So if the baseline is, say, 1800, then a new alt (rating 1500±130) will have substantially greater weight than a player with rating 1700±25 (by a factor of over 300). And it doesn't matter if the new player has a weight that's 100x less than a player with rating of 1900±25--there are hundreds of thousands upon thousands more new alts than there are really good players.

    In truth, what this demonstrates is the limits of our weighting system--it was never really designed to work at any level above 1500. I have some alternatives I may try. I'll make sure to keep everyone posted.
  5. Calm_Mind_Latias

    Calm_Mind_Latias

    Joined:
    Aug 20, 2013
    Messages:
    431
    Still, I cannot image how standard candle usage could increase after some R threshold, assuming by "threshold" you mean Glicko, not weight. Although it wasn't listed as an official candle, I cannot see how someone, even a skilled player with an otherwise well-constructed team, using a Gengar with physical attacks (besides Focus Punch) can get a rating above 1760.

    Your explanation isn't even surprising at all if "threshold" means "weight". It seemed obvious to me that an increased "baseline" can be abused by someone for tiering purposes due to the properties of the normal distribution. Needless to say, if the threshold is so high ( baseline >> R) , then for someone lacking the skill or willingness to use competitive Pokemon to attain it, then they can increase the value of the integral by increasing the standard deviation so the tail would be larger; playing more would simply decrease their rating deviation and their weight. It is a perverse disincentive not to play many matches on a single alt.

    Still, I did not think this would affect tiering unless their is a conscious and a mathematically informed effort to undermine it, since most newbies/trolls/uncompetitive players are likely unaware of the tiering process and rating system and they would simply just play matches on their alt while their RD shrinks and their rating languishes at 1500 or below unconcerned about their weight. But dedicated tier trollers could take advantage of this by making numerous alt with a high initial rating deviation and challenging newbies randomly winning one or two games fairly easily even being handicapped from using uncompetitive Pokemon, and then using a different alt. While the tier trollers absolute weight has decreased, their relative weight has increased since an increased baseline has diluted the influence of the the majority uncompetitive average players on the ladder and this would make the tier trollers' influence more concentrated.

    And besides there are better things to do in life than abuse the tiering system on an unofficial Pokemon website. :)
    Last edited: Apr 3, 2014
  6. WebBowser

    WebBowser

    Joined:
    Oct 17, 2013
    Messages:
    483
    Wait... candle usage went... up? Could the mons/sets being used at low levels be even more random then we thought? Or perhaps candles are being purposely used by folks who know what they are doing just to show off how much better they are then the average player? (ala the PS OU forum's NU + Delibird challenge, where one must make a team consisting entirely of NU mons, one of which being Delibird, and try to ladder to a semi-respectable rating). Yeah, I'm pretty baffled here. Well good luck figuring all of this out. Also, yay slightly more accurate terminology. I know it probably seems really petty to you, but describing things well is actually really important from a PR standpoint, as it's really hard for folks to have an informed opinion of what's going on when terminology is inconsistent and/or inaccurate.

    Also, you sound really tired. Get some sleep, it might help you think of new ideas (or at least help alleviate some stress. It works for me at least). I really hope this doesn't come off as patronizing, I'm just concerned is all.
  7. Antar

    Antar Self-anointed Czar of LC UU
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,158
    No, WebBowser and Calm_Mind_Latias, the problem is not trolls or people mistakingly using wrong stuff at the top of the ladder. The problem is that the weighting system more strongly weights players with RDs over players whose R is closer to the baseline, and you have the maximum RD when you first start off. There is nothing odd or mysterious about this--it's just math. It's not even wrong math: a new alt, just starting out, has a nonzero chance of belonging to a player at the top of the ladder, but if the alt's played 100 games and is in the 1600-1700 range, it's extremely unlikely that that player has a "true rating" of 1800.
    WebBowser likes this.
  8. Calm_Mind_Latias

    Calm_Mind_Latias

    Joined:
    Aug 20, 2013
    Messages:
    431
    Sorry,

    I never said trolls were a problem; I just noted the possibility that tier trolls can abuse a weighting system by playing fewer matches in order to maximize their weight by focusing on the high RD from a new alt. It also gives a perverse incentive not to play many matches on a single alt and discounts moderately skilled players relative to a nascent random alt.

    Still, I do not see how standard candle usage can increase after a given R (with the exception at around 1500 that some candle users win their first or two battle and quit). The math easily allows for increased weight relative to the average player who uses an alt for tens of matches among some standard candle users. As I said before, I understand the math behind it (it mostly revolves around the AUC of the normal distribution) and do not see it as an incomprehensible black box.
  9. xam13124

    xam13124

    Joined:
    Sep 30, 2010
    Messages:
    13
    This is something that makes me even more convinced that the rating system as a whole is flawed. If at 1700 glicko with a very low deviation my contribution to usage stats can fall below a new player with high deviation, that is a major flaw. Players should be rewarded for playing many games at a high glicko, not punished by how their decreasing deviation reduces their contribution to weighting.
  10. Zebstrika

    Zebstrika

    Joined:
    Oct 3, 2010
    Messages:
    890
    Could you make a cutoff deviation then, similar to how 100+ deviation players didn't appear on the ladder leaderboards?
  11. vayu

    vayu

    Joined:
    Sep 29, 2010
    Messages:
    32
    is there a rough glicko/elo to percentiles list out there?
  12. Antar

    Antar Self-anointed Czar of LC UU
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,158
    vayu, Elo, no, but Glicko is designed to match a normal distribution with center 1500 and standard deviation 130, so the formula for percentile is:
    Code:
    pctl=1-0.5*(1+erf((R-1500)/sqrt(2*130*130))
    Glicko ratings also have a deviation RD which complicates things, but treat that as an "uncertainty."
Thread Status:
Not open for further replies.

Users Viewing Thread (Users: 0, Guests: 0)