Weighted Stats

Aldaron

geriatric
is a Tournament Director Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
We should have them. Stats are very important and they determine our tiers; does it really matter what scrub7086 used dillydallying with Light Ball Pikachu and Frenzy Plant Venusaur at the bottom of the ladder?

No...it really doesn't. I'm not asking for a ridiculous weighted formula that totally cuts off any significance from lower rated people, but we definitely need something that emphasizes what people who play more at higher levels use. The most obvious example of this that I can think of was some stat taken 2 years ago that had Celebi at #1 one usage (I believe it was the official server's weighted usage points / total usages, meaning it was used by high rank people but not low rank people). If any of you remember, mid 2007 Celebi was barely used (reaching in 60s in usage), but as 2007 ended and 2008 came about, more and more good players were using Celebi and it ended up a standard on most teams.

I know people claim "cre is bullshit so weighted stats are bullshit" but honestly I don't get this. Yes, CRE isn't an end all declaration of someone's skill. Still, I don't think anyone is going to argue that there is a high correlation between high CRE and good playing. If anything, at the least you can say having a higher CRE means it is more likely for you to be a good player than someone with a low CRE (don't bring the "what if loki uses a new alt" argument, that's obviously nonsense and I don't even want to acknowledge it). Regarding deviation and "significance of playing," we already have some minimum deviation required to reach leaderboard, so if necessary we can just require that (not sure if it will be).

This is especially important because what should be determining our tiers is what is used most often to win. We are a competitive community; that which wins should hold priority over that which doesn't. There is a very high correlation between having a high CRE and winning (I know the relationship is causal with ladder, but I'm referring to other competitive venues such as tournaments and tour).

Weighted stats won't necessarily end the issue and give us a perfect sample...in fact they obviously won't. But they'll get us closer to stats that represent what the "real metagame" is, namely the metagame that good people are playing when playing to win.

So...quick summary:

~We should have weighted stats because there is a high correlation between high CRE and winning (not only on ladder)
~We care about the metagame played by those winning
~"CRE is bullshit" doesn't apply due to a simple correlation analysis
~We care minimally (I really want to say we don't care at all but :X) about scrub7086 using Light Ball Pikachu and Frenzy Plant Venusaur at the bottom of the ladder with a 35 deviation and 756 CRE.
~Having weighted stats would help us prevent people from "gaming" the system. Right now, if someone wanted to, he could easily move borderline UU Pokemon to OU simply by getting in a bunch of battles and quitting. You could say "we'd stop this" but if someone wanted to do this with a team of borderline uu pokemon + 5 level 1 exploders, you couldn't stop him. It would take about ~30 seconds a battle assuming we stop the instant quitter, and he could still get about 200 usages a day and 6000 usages in a month, which is more than enough.


~There is one potential flaw and that is the "IPL with sunkern" dilemma. A really bad Pokemon will have boosted stats if played on a team with 5 really good Pokemon and that team wins. I think this will be a non issue mostly due to the sheer volume of battles that occur monthly. "IPL with sunkern" would essentially have to win like 5000 battles with sunkern on his team. A loser can do it because even with 6,7,8,9000 usages, it won't mean much at the bottom of ladder. A winner doing it will frankly be quite the feat. Sure, he can continue winning, but winning a legitimate battles takes so much longer than losing purposely and there are enough battles on the server to make this mostly a non issue.
 
I agree... Really, before I begin this, there's not much more I can say other than I agree that as a competitive community we should emphasise competitive stats. How do you think we should deal with tiering, though? I'm assuming from your 'gaming the system' point that you want us to use weighted stats, which I agree with, since if we have the stats in the first place based on the reasons they're superior to unrated stats (which I hope we'd retain) then we should use them for our competitive tier based on the same principle. I think this has a lot of potential to change our tiers though. For instance, stuff like Electivire might even drop to UU. Whether this is bad or not, I'm not going to make a judgment... I just want to put that out there.

Also, I don't know much about math and the rating system, but since we use rating in our suspect qualification gathering, would rating be a superior alternative to CRE? CRE is what determines our #1 player at a given time (with deviation for being allowed on the board). Also, what about ladder cheaters?
 

Colonel M

I COULD BE BORED!
is a Site Content Manager Alumnusis a Community Leader Alumnusis a Community Contributor Alumnusis a Smogon Discord Contributor Alumnusis a Top Contributor Alumnus
Well, I guess CRE can cause a minor setback with it. After all, another minor issue with there being alt accounts that don't get weighed down so easily by it. Of course, if SB2 comes out with axing alts, then it shouldn't be much of an issue to begin with, IMO. I like the idea of weighted stats more than anything. Telling us what winners use overall is what's important; not what's the most common between every player. So yeah, I agree that we should find a way to measure what is used. Weighted stats hopefully will help this, even if there are minor inaccuracies.

Sorry to sound a little hazy with the subject. I'm in agreement, so long as we can avoid the instances such as "ipl using sunkern" and the ladder cheaters that come along with it.
 

Aldaron

geriatric
is a Tournament Director Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
This is Doug's LONG rejection post for weighted stats.

http://www.smogon.com/forums/showpost.php?p=1714040&postcount=22

Posting here so people get both sides and for easy reference for me when I choose to argue his points, the relevance of some which I question.

I have opposed using weighted numbers because I think the available metrics for determining "weighted usage" is far too arbitrary to be used effectively.

As I say that, I'm sure the first comment that comes to mind is: "What are you talking about Doug? Every player has a rating. That's a clean numeric representation of a player's strength. Add that every time a pokemon is used and you have a clean numeric representation of weighted usage!"

The presumption is that weighted statistics will tell us "What good players are using." That sounds so simple. But it isn't. Not really...

Here are several problems with using a players rating to calculate weighted usage statistics:
---------------------------

What rating do you use? The rating at the time the stats are collected, or the rating at the time the battle was held? Actually neither of those is really accurate. If you use ratings at the end of the month, it's potentially very different than the rating at the time of the battle. So we really can't use that. If you use the players rating at the time of the battle, you aren't really using the rating that is the basis of the Glicko2 system. The Glicko system is based on the idea that ratings are calculated for all battles conducted during a rating period. So, the rating that you possess at the end of a given battle, isn't actually the rating that will be recorded in your "real rating" which is the rating used to compute rating against other players. Your "real rating" is calculated every night at 11:30pm, against all battles conducted during 24 hours.

So, we would have to build new mechanisms for not only calculating ratings for every player, but also assigning ratings values to every battle conducted in a 24 hour period. Even if I was willing to do that work, it really isn't worth it for other reasons I will mention below.
---------------------------

At any given time, a players rating is not a true assessment of the player, it is the rating of the account being used by the player at that time. We have no way to determine who are the good players at any point in time. We can only identify the ID's of people who have chosen to ladder actively at that time. Let's look at few fictional users....

ElderChampion is a very knowledgeable and skilled battler, arguably one of the best in the history of the metagame. But he has not played in a certain length of time, and his CRE is somewhat low. That doesn't mean his rating is lower, it means his ratings deviation is higher -- which simply means we don't know whether he is still good or not. Maybe his skills have eroded, maybe not. But since we use CRE (that means CONSERVATIVE ratings estimate) -- it will be estimated on the low end. So, if this fantastic player with incredible knowledge and skill, uses a pokemon -- it will not be weighted very high. The CRE really is not a reliable indication of who is good or who is bad. It's simply a measurement of who is winning the current race.

Now look at CurrentChampionUndercover, which is the fictional alt of the current top-ranked player on the server. How should we regard this player's usage? In one scenario, the player may be trying out a new team, and wants to play a bit under an alt without affecting his main rating. Should we rate his pokemon lower? He is the best player on the server -- why would we NOT be interested in the pokemon he is using? If you are interested in what the best players are using CURRENTLY -- isn't this team the EXACT team that you want to weight highly? It's the current team of the current best player! But because of the alt system, these usages will be rated down with all the other noobs.

Maybe CurrentChampionUndercover is just screwing around and is playing with a gimmick team for fun. In this case, even if I could identify that the alt really is CurrentChampion (maybe possible by looking at IP addresses) -- I really DON'T care what the player is using, because in this case the best player on the server is intentionally NOT using his best pokemon. In that case I would prefer to completely ignore the player, because their usage is virtually meaningless for competitive weighting.

Then we have CurrentChampionStartingOver. Since rating volatility actually discourages players from keeping the same alias for a long period of time, the CurrentChampion decides to "start fresh" with a new alias. This is the exact same player, playing the exact same team, with the exact same strategy. But suddenly the weightings for that player are going to be suddenly rated down with the idiot noobs. How can that possibly be right? Well, since we have no idea that CurrentChampionStartingOver is really a great player -- then we can't rate his pokemon accurately.

So when it comes to weighting the pokemon being used by the various good players mentioned above -- there is no way to weight them accurately by simply looking at the numbers. The numbers are terribly arbitrary.
---------------------------

The usage system is already exposed to skew based on individual usage -- why would we ever want to increase that exposure? Let me explain...

Which pokemon is used more -- a single player using Ninjask over and over 100 times in a single day, or 100 different players using Dugtrio once in a day? I would certainly argue that Dugtrio being used by 100 different people is far more "Used" than one guy with a shitload of time on his hands, spamming the hell out of Ninjask. Right now, the usage stats really can't differentiate, so both cases are considered equal.

Well, now let's change it up and say that the single player is one of the best players on the server. And he is spamming his Stallrein in 100 consecutive battles. On the other hand, 100 different other users, all rated highly, but not as highly as the Stallrein user -- they all are using Salamence once to great effect, because it's such a true stud of a pokemon. Which pokemon will receive a higher overall weighted usage? Stallrein will. Because a lone highly rated player spammed it severely. So, even though I could make a very valid argument that Salamence was 100 TIMES more popular amongst good players that day -- the weighted usage stats would say that Stallrein was "used more", whatever that means. That is ridiculous, in my opinion.

If you wonder how far people can go to spam usage, let me give you this little tidbit -- the most active battler on the server played over 2000 matches in the month of December. Many good, active players average less than 200 battles per month. I sure hope the guy with 2000 battles is lowly rated, or using "the right pokemon" -- because that guy would have a dramatically heightened effect on weighted usage numbers.

With the current system, every usage is equal, so an active player -- potentially a player intentionally trying to game the tiers -- can only effect the stats by 1 unit each time they play a game. Yes, a spammer can still game the stats, but only one unit at a time. If we use weighted stats, we increase that players ability to manipulate the stats.
---------------------------

Since individual battles matter, we unfairly add weighting to pokemon that are part of offensive teams. Since offensive teams play faster, they complete more battles. As such, these pokemon get higher usage numbers. However, just like my previous point -- each usage currently only counts for 1 unit on the stats. If we add weightings to the mix, we are only enhancing the skew.
---------------------------

I also disagree with using only winning pokemon for weighted stats. This argument is frequently mentioned as a way to defend against people gaming the tiers, by spamming pokemon on losing teams. However, since players of like skill are intentionally matched against each other, you would be intentionally excluding good pokemon being used by good players, every time good players face each other. This likelihood of exclusion is enhanced during times of high server activity, since there is an increased likelihood of good players being matched during those times. So, the weightings would be enhanced for overseas players that ladder during times of inactivity. If something as arbitrary as the time of day has any bearing on our pokemon tiers, then something is severely wrong.
I'm going to stop now... I'm tired of typing, and for those of you that have made it this far -- you're probably tired of reading it. If you don't understand my point by now, then you never will.

The heart of my argument is this -- we have no way of knowing what pokemon are being used by good players. Yes we have usage, and yes we have player ratings. But that doesn't mean we can mix the two in an accurate and useful way.

All we can reasonably do is report usage of pokemon in plain-vanilla terms. As in, "If you press Find Match on the Standard Ladder, here is the percentage chance of facing Pokemon X on your opponent's team."

Everyone loves to talk about how wonderful "weighted statistics" would be. Sure, I agree that "true" weighted statistics would be great. But, as yet, I have not seen any proposal that can be implemented with our currently available stats that can be anything other than a vague arbitrary guess, masquerading as "more informative stats".
 
Even if we don't base our tier system off of Weighted Statistics, I don't see what the issue is of having them available to players. Everyone will know that there's a chance of them being skewed: it's up to the player to determine how much of the statistics he or she should take to heart.

In case you can't tell, I agree with the OP.
 
While I'm very much in the "CRE is typically bullshit" camp, I think it is difficult to argue the following thesis about weighted stats: While not without flaws, weighted stats will almost always create a better representation of what good players are using than the current usage stats do. I don't think we're going to find some magical solution that gives us the secret answer for exactly what the best players are using the most often, but I think weighting on CRE is the best option we have available to try to move in that direction.

I also feel strongly that we shouldn't "care" about what players who aren't playing well enough to win games are using, regardless of whether or not they're trying to game the system. Removing people trying to manipulate the tiers is just a happy bonus to me - the real benefit is getting stats from about what successful players are using. While tiers are based on usage, I think we use usage as kind of an indirect function of power because with this being a competitive game we can normally assume people are only using the stuff that works. Parring the usage down to focus on the players performing better should make tiers more representative of power, which is a good thing. We don't listen to everyone during parts of the tiering process relating to suspects, instead choosing to focus on the best players we can, why listen to everyone for usage?

I think moving to weighted stats is a good decision.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I agree with some weighting being used to determine the tiers.

What I'm not convinced about is that the weighting should be the CRE of the player. I mean, why would one Salamence usage from someone with a CRE of 1644 count as 1644 usages of Salamence, while another Salamence usage from someone with a CRE of 1356 count as 1356 usages of Salamence?

I'll ask a few question below:

If a player rated 1700 wins against another one rated 1300, should this affect the weighting? What if the 1700 player lost? Should this affect the weighting? What about if the players' rating is similar, say both 1500? What if they are both 1300? Or both 1700? Should the losing player's Pokemon be weighted the same as the winning player's Pokemon even if they are both rated 1700?
 
I'm sorry if I'm missing something completely obvious by suggesting this, but what about using a player's rankings over the CRE?

For example, let's say you have 1000 players on the smogon university server that played in the month of August 2011. If the #1 ranked player was always at the top using the same 6 pokemon, those six pokemon would get 1000 points per each match he played.... as long as when the match ended he ranked #1 still.

To attempt to answer X-act's questions:

"If a player ranked higher wins against a person of a lower rank, should this affect the weighting?"
Yes: let's say # 4 and #8 play each other. When #4 wins he stays #4, but since #8 loses he gets knocked down to #10. #4's used pokemon would count 997 points, whereas #10's(previously #8) fainted pokemon counts 991 points.

"What if the higher ranked player lost? Should this affect the weighting?"
Yes still: in the same scenario of 1000 users with #4 and #8 battling, if #4 loses and gets dropped down to #5 while #8 wins and goes up to #7, then #5's(previously #4) fainted pokemon counts 996 points whereas #7's(previously #8) used pokemon counts as 994 points.

"What if both players are ranked the same?"
Well this is highly unlikely since there is usually a small difference that causes players to be ranked differently. But if both players are, say, ranked 15 and play each other, obviously one player moves up and the other moves down: just at the end of the match count the scores accordingly towards the weighted statistics.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top