UU cutoff step 1: Cutoff method

What UU cutoff method should Smogon use?


  • Total voters
    38
Hi. Since the CAP server has become the new Smogon server, I thought that this should be discussed once again. My previous thread on the subject met with (I thought) a rather disappointing turnout, and though it could be blamed on the lack of usage stats, I'll admit that it could also have been blamed on its emphasis on explaining stuff rather than on actually sparking discussion. I especially thought that maybe it would at least catch the attention of the tiering leaders...

I decided to attach a poll to this because if I didn't do that, I might as well have bumped the old thread or changed the OP or something. You CAN read that thread not too far down on the list from this one; in fact, I'd recommend it. In any case, the options right now seem to boil down to "individual" and "collective" cutoff.

Individual cutoff is basically the way we've been tiering until now: A Pokémon's usage-based tiering is decided by its individual frequency of usage in our ladder. There are actually multiple ways to achieve this, as outlined in the previous thread. Mainly, there's a hard x% cutoff (like in PO) or a slightly more complicated calculation like X-Act's formula (like in Smogon's 4th gen tiers).

Collective cutoff is an idea that was X-Act proposed some time after Smogon started using his current cutoff formula and, when mentioned again in my previous thread, got support at the very least from coyotte. Essentially, a Pokémon's usage-based tiering is decided by its inclusion in a group that makes up (or exists in) a certain proportion of all teams. One potential disadvantage to this is that it requires the usage of teammate statistics to be calculated correctly.

Like I said last time, I believe that the choice comes mostly down to whether we want to reflect the utility of the individual or we want to reflect an accurate threat list. Of course, anyone is free to interpret this in a different way or even suggest a different cutoff method altogether. In any case, I'm really hoping that more people care about this now that we do have working usage stats, because sometimes I do see people criticizing the usage tiering system.

I don't know if I should be giving my own opinion on this in an OP, but I think that an individual-based cutoff would fit the best with how we think about OU vs UU. Whenever I see people talking about usage tiers, they may talk about how many Pokémon should be OU or whether a certain Pokémon should be OU, but in the end their thought process always seems to come down to how often they see a Pokémon (if at all). People may also notice that I'm personally not as serious about how arbitrary the system ends up as Cathy or X-Act might have been; I think that the system should just reflect our common intuition accurately.

This poll is not intended to be necessarily binding, and is intended to get a quick feel of what people are thinking about this.
 

Destiny Warrior

also known as Darkwing_Duck
is a Smogon Media Contributor Alumnus
I support the individual cutoff, mostly for technical reasons. Teammate statistics are somewhat harder to generate this time from what I've come to understand from Dusk, and I feel that setting up this kind of page: http://91.121.73.228/index.html will be very very tedious. To simplify the problem, individual cutoff slots in well here.

On a competitive note, we used this system in DPP, and it worked pretty well. Like you said, when we discuss OU and UU, we discuss it based on how many etams we see a Pokemon. That should be the foremost idea behind our tiering in my opinion, not how often it appears in a group. A group would most likely have to be arbitrary, something I think we can(and should) avoid.
 
We're going to be doing monthly posted stats anyway, not the streaming pages. I'll have to tweak the template files for usage stats so that they get outputted in a way that's meaningful to me and without the unwieldy HTML, but that's not a huge issue either way.

That said, I support the idea of moving to an ideal pure objective design for this sort of thing. I don't think the individual cutoff is bad by any means, and quite frankly, if we kept with it I wouldn't be bothered at all. Sure, we had Pokemon like Electivire, Ninjask, and Umbreon in OU for DPP, but I really don't think that made either metagame worse in any way. It might've been cool to see those guys drop, but I don't think it would've improved the metagame in any drastic way. Regardless, if it's feasible to make this cutoff totally objective, we should do it.

Notice how in there I didn't actually support the collective cutoff. I don't think there is a unique solution to the problem, and while this one is very cool and all, there may be better ones we can think up at not-7AM and in later days.
 

Nails

Double Threat
is a Community Contributoris a Tiering Contributoris a Social Media Contributor Alumnusis a Community Leader Alumnusis a Battle Simulator Moderator Alumnusis a Past SPL Championis a Three-Time Past WCoP Champion
I think that the individual cutoff should be used, however, with a caveat: weighted stats need to be used. This prevents the issue of the bottom dwellers in OU staying up because they're used on the bottom of the ladder, and it doesn't require teammate stats. If weighted stats aren't available though, then I would support the second option. However, I feel either way weighted stats are completely necessary to get an accurate cutoff. The fact that random7087 uses ninjask and umbreon on his ou team doesn't mean it should count the same as a user who has a 1500 rating.

Nutshell: Though I slightly prefer the teammate stats determining usage, I don't really care which system is used as they produce similar results. However, weighted stats are necessary.
 

JabbaTheGriffin

Stormblessed
is a Top Tutor Alumnusis a Senior Staff Member Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
definitely agreeing with nails. weighted stats NEED to be used this gen. also i'd like it if t was lowered from 20 to 10 depending on how the usage statistics look after a couple rounds. since there's the whole weather wars thing going on t=20 may be a better representation of OU than it was in gen 4 but i'd like to have the option open.
 

Erazor

✓ Just Doug It
is a Smogon Media Contributor Alumnus
Weighted stats should definitely be used, but then... alts? AFAIK alts are able to be used on PO.

Also, I don't see a problem in keeping OU slightly large. ~45 Pokemon seems fine, which t = 20 will give us, right?
 
Agreeing with previous posters on weighted cutoff thing, since I think that's probably something that will help fluctuations in usage from making the UU tier as unnecessarily volatile as it was at times in Gen 4.

There were a handful of Pokemon just inside and outside of the line for UU in Gen 4 that were rarely used by high end players, but still maintained enough usage to stay out of UU for long periods of time(Heracross, who might have been nice to test before we banned so many other Pokemon) and weird cases with fluctuating usage(Umbreon, who UU might have been better without losing) that would probably be normalized by weighted stats, since this sort of significant strange usage shouldn't happen as much at the higher levels, barring actual significant metagame changes.

I found these fluctuations to be very irritating when trying to help deal with UU policy some last generation because it was a variable we couldn't control, and something that was potentially a big deal in regards to the voters making the right long-term decisions -- there was a time, for instance, when it looked like Tentacruel might have dropped(which admittedly was more due to OU suspect testing but work with me as an example) down to UU, which would have significantly changed the metagame by weakening former suspects like Moltres, which could have resulted in an unnecessary ban had Moltres been banned and would-be hard counter Tentacruel had dropped down later.

It would be nice for it to be less probable for situations like that to occur, and while changes in OU bans and move availability changes from new games will still cause there to be some fluctuation in usage, there hasn't been a huge change in the usage of Pokemon among the best players within the same era of the game in any generation I can remember, so using weighted stats would at least help with one major variable.

I voted individual just because it seems simpler but I don't really have a strong opinion on what is in the poll without knowing BW OU's metagame well enough to know what Pokemon that would likely greatly influence the tiering of... not my area of expertise at the moment.
 
I liked the old way, but yea I definitely prefer weighted stats (I mean Electivire and etc....)

I agree with Nails (and Jabba).
 
Weighting has issues right now because PO is not PL. PL prevented users from having alts, which made a rating of, say, 1550 very indicative of true skill. Now, a very good user might be sitting at around 1200 for days on many different alts, which totally corrupts the impact of their weighting—they'd be treated as lesser than another player who plays one account and sits at 1500 at all times. I am taking the user's rating down as I collect stats, but our stats will very likely be no less random than if we didn't weight them in the first place. Don't get me wrong, I like the idea of weighting stats, but I highly doubt it'll be as clean as you guys suspect. If a majority of PR is willing to weight stats regardless, though, it can still be done; I think I will output both weighted and unweighted stats for April so people can compare and get a feel for the differences on their own instead of taking my word for it.

Also, I still like the idea X-Act proposed long ago of using teammate statistics to determine a value of T from actual research rather than PR voting. I think that would ultimately do us a world of good.
Erazor said:
Also, I don't see a problem in keeping OU slightly large. ~45 Pokemon seems fine, which t = 20 will give us, right?
Roughly, yes. Depends on how centralized the metagame is, because the more centralized it is the quicker the drop-off for lesser user Pokemon tends to be.
 
Also, I still like the idea X-Act proposed long ago of using teammate statistics to determine a value of T from actual research rather than PR voting. I think that would ultimately do us a world of good.
The reason I set the thread up in this way was that I didn't think it would catch on. I'm frankly surprised and maybe the poll isn't even relevant anymore O_O

On rating-weighted stats: Perhaps you all have seen my disdain for the rating system on PO, lol. This is why I kind of don't want weights based on rating mostly for the reasons Rising_Dusk said, i.e. alts. (Also, PO encourages alt usage a lot more than Shoddy ever did IMO, due to the need to reduce risk from laddering with a bad team. I felt like I had to make an alt for every team I made if I wanted a legitimate shot at reqs, and I was unwilling to game the suspect system in this way.) I still think that, if we were to do this, it would have to use stat collection on winners only, rather than having a rating cutoff. I maintain that this would result in a more accurate view of what's "good" in the metagame, especially with a rating system as relatively flawed as PO's.
 

Nails

Double Threat
is a Community Contributoris a Tiering Contributoris a Social Media Contributor Alumnusis a Community Leader Alumnusis a Battle Simulator Moderator Alumnusis a Past SPL Championis a Three-Time Past WCoP Champion
I hadn't thought about it, but only counting rating on wins seems like a cool idea. Maybe combine that with a weighted system? It'd be nice if all of those could be provided, just for comparison with each other, as seeing what's being used vs what's successful looks like it could be really really helpful.
 

Ice-eyes

Simper Fi
I think if we're going to have weighted stats then we need to fix the rating system. I think I remember someone on the PO staff saying there wasn't any reason why we couldn't implement a different rating system, either something Shoddy-esque or new.
 
Rising_Dusk said:
ow, a very good user might be sitting at around 1200 for days on many different alts, which totally corrupts the impact of their weighting—they'd be treated as lesser than another player who plays one account and sits at 1500 at all times.
A very good user will go as high as his team takes him, and he'll break the 1200 extremely easily...

Playing a lot won't magically grant you a good rating, if someone has a level of 1500 then he'll have several alts at 1500, not at 1200. Between someone who has a rating of 1530 and someone who three alts at 1490, the one with the rating of 1530 is the better player.
capefeather said:
due to the need to reduce risk from laddering with a bad team. I felt like I had to make an alt for every team I made if I wanted a legitimate shot at reqs, and I was unwilling to game the suspect system in this way.
Then each of your teams has a weight corresponding to its level of strength (combined with your way of playing), it'll make a better weighting than giving even the crappy teams you can make the weight of your ladder peak, won't it?

That said, the best way to start with weighting is not accounting for all the battles of players with 1000 rating or less. It'll remove easily a good chunk of non wanted stats.
 
I can't tell what you're attempting to prove with that post. The part you quoted was merely talking about how alts are encouraged, a side remark that's not really an argument for rejecting rating weightings but an explanation. The point is that alt use is an "artificial" way to manipulate your rank, and making alts to "preserve" your rating is equivalent to inflating it.

I suggested a winner-only system a few times before, during the Shoddy days. So no, PO is not getting "special" negative treatment from me in that regard. However, if you really insist on defending your rating system, it would be nice for the response not to boil down to, "No, there's nothing wrong with the rating system. You just suck."

Finally,
Playing a lot won't magically grant you a good rating,
Considering PO's rating system is more closely based to a win-loss difference than a win-loss ratio, assuming that the player is consistent, yes, playing a lot will make you more likely to have a higher rating. There's also the fact that alts often become hard to track and begin to decay...
 
coyotte508 said:
if someone has a level of 1500 then he'll have several alts at 1500, not at 1200
More often than not, an alt is used to test a new team and see how it fares. Sometimes that will not be very impressive, or sometimes even if the team is good he'll get haxed a lot and make another new alt to do it again with. Maybe he'll get to only 1300 and then get bored with the team. This is compounded by the fact that new alts can rise quickly, but ones that have played hundreds of battles tend to be more stable. Why not rise quickly if you can just make another alt to do it with? Lots of players take advantage of this. The point is, so long as alts are allowed, there is no reliable measurement of how good a single player is when you face them on the ladder. Countless 1000 rating players are alts for 1500+ players, and those matches played at that level will not be representative of their skill. This also creates the proverbial ladder deathtrap where a good player at 1000 stands to gain a tremendous amount from an equally skilled player at 1500; any major ladder player can attest to the situation where they'd gain 5 from a win but lose 50 from a loss to the same user. This scenario is a very common example of how the rating of a player is not very representative of their skill, and how the weights would be flawed as a result.

I see what you're getting at, though. You're trying to make the case that the rating corresponds to the effectiveness of a given team used by a given player. That's too idealized, though, and isn't true in practice. Many players use many different teams, and simultaneously, many good players are using good teams at a low rating because they're using new alts. This deathtrap for high-rated players really kills any meaning that rating actually has.
 
I suggested a winner-only system a few times before, during the Shoddy days. So no, PO is not getting "special" negative treatment from me in that regard. However, if you really insist on defending your rating system, it would be nice for the response not to boil down to, "No, there's nothing wrong with the rating system. You just suck."
I didn't mean to say that at all, I was quoting you saying you make new alts when testing new teams that are potentially bad, I was saying why not, if the team is bad then the rating won't go so high, if the team is good it'll go higher.

Also, winner-only is kind of pointless, if you stay at 1200 you'll win as many battles as you lose, against opponents around your level, same if you stay at 800 rating, so the "winner-only" approach would basically give the same weighting to both of them. The only case in which it gives somehow good results is at 1350+ where there begins to be a shortage of good battlers in the ladder and more often than not you're paired against people with a lower rating than people with a higher rating.
Finally,Considering PO's rating system is more closely based to a win-loss difference than a win-loss ratio, assuming that the player is consistent, yes, playing a lot will make you more likely to have a higher rating.
When you get a high rating, due to the shortage of players at a high rating you'll more often than not get battles like +10 -21. It means you have to win 2 battles for each battle you lose, against an opponent 100 points lower than you. If you were on their level, you'd win and lose 1, and fall back to their rating. If you progress, it means you're better than them. The higher you go, the higher will be the opponents 100 points under you (or even the opponents at your level against which you have to maintain the same W/L ratio, or higher opponents against which you can lose more than you win but with a consistent W/L ratio). You have to maintain a win/lose ratio determined by how big the difference of points is between you and your opponent and whether they're better than you or not.
There's also the fact that alts often become hard to track and begin to decay...
I don't see what you want to mean, if alts begin to decay they're not used anymore, and if it's meaning that someone can get back on his old alt and will have a lower rating than he's worth, well first his rating won't be decayed after several battles (10 for a long inactivity, 1 or 2 for a short one) and you can assume that those battles are few compared to the number of battles he'll continue to do after decay is erased, second PO keeps track of the undecayed value so that might as well be what's used.

Rising_Dusk said:
Maybe he'll get to only 1300 and then get bored with the team. This is compounded by the fact that new alts can rise quickly, but ones that have played hundreds of battles tend to be more stable.
That's not true. Shoddy acted like that, with glicko, not PO. For the first 5 battle on PO it's true that you get more points, but that's it. After those 5 first battles, if you won all of them, you're probably at 1200, but then you progress as fast as anybody else -- the number of battles you played previously has no effect at all, solely the difference of points between you and your opponent. New team or not, he has more intereset in keeping this alt than making a new one.
I see what you're getting at, though. You're trying to make the case that the rating corresponds to the effectiveness of a given team used by a given player. That's too idealized, though, and isn't true in practice. Many players use many different teams, and simultaneously, many good players are using good teams at a low rating because they're using new alts.
It was more in my mind that people may make new alts to test their team, and not use it on their main alt if the team isn't good.

This deathtrap for high-rated players really kills any meaning that rating actually has.
No. That's the part I don't agree with. If a (skilled) player wants to score high, he'll do so. Having many alts doesn't prevent you from achieving high rating with each alt.

Sure, you can have "garbage alts" in which you try ridiculous teams or just play 10 battles and leave after that, but I don't see how it kills the meaning of rating. If they keep having a low rating because they keep making new alts (as you say), they either don't care at all about rating, play for fun, and just love trying out new names, or have a main alt which is regularly played.

But you probably mean that those new alts should be considered with the same rating as the main alt for the purpose of stats weighting, which is partially true (because if they don't play enough to get a high rating then it's minor influence, and sometimes it's just that they don't like the test team they made so they drop the alt)

Also, this somehow turned out in me "defending" PO's rating system, originally I was just pointing out two things from your posts, but then it escalated till this post. I agree that forcing players to keep one alt may make ratings for the purpose of the weighting formula more accurate, but then people aren't able to test teams on different alts, and so on. (Like a player at 1500 wanting to try a sun team for the first time while all he used so far is a sand team, will he be forced to ruin his 1500 during the time he plays the sun team, can't he just sort it out on another alt?). Also in this case it's more an issue of forcing people to keep one alt than a rating system issue, but I guess it ends in the same result.

---

Otherwise are you sure you want to give the players at 1500 more weighting than the players at 1200 in order to create UU? From what I understand, what you want is no gimmick in OU, it's done easily by removing players at 1000 rating or less (any good player will have over 1000 rating after only one battle on a new alt), I'm not sure that over some non-gimmick treshold it's wise to give more weighting to some people than others. There'll be a few gimmicks staying at 1050 rating or so, but it'll only be a fraction, and will never put pokémon in OU.
 

Ice-eyes

Simper Fi
Otherwise are you sure you want to give the players at 1500 more weighting than the players at 1200 in order to create UU? From what I understand, what you want is no gimmick in OU, it's done easily by removing players at 1000 rating or less (any good player will have over 1000 rating after only one battle on a new alt), I'm not sure that over some non-gimmick treshold it's wise to give more weighting to some people than others. There'll be a few gimmicks staying at 1050 rating or so, but it'll only be a fraction, and will never put pokémon in OU.
While I disagree with a lot of what Coyotte says in his post (and I find that the PO rating system is extremely vulnerable to haxlosses and requires considerable grinding to get to the upper echelons, with not much incentive to play once you're there), I do like this idea.

Banning alts is a really stupid idea unless you allow like guest accounts or something.
 
Hi, apparently I can post here.

I disagree with banning alts because, as others have said, many users create alts to test teams and such and they will win a few games most likely but it won't be as good as they would like it to be so they scrap it and make a new alt. Its no-harm no-foul for the most part.

I also think we should just keep the UU decisions the way it is with the individual cutoff, but use weighted stats as well so we don't see Ninjask and Evire repeats this gen. Basically, I agree with Jabba et al.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
Late to the party

I would support a winner-only method of looking at stats for several reasons.

If the argument is that the rating approximates the same thing as whether you win, then that would mean the two methods are no different. If the two methods are, in fact, almost the same, there is no reason to use one over the other. Only using the winner's stats seems to me more straightforward, which means it has an advantage in being understandable to people who aren't mathematical, which I would say is an advantage.

However, keep in mind that Pokemon players use multiple teams. The rating can be seen as an attempt to measure the "true" rating of the player, but the true rating will vary based on which team the player is using. A 1300 player with one team might be a 1400 with another. Using only stats from the winner will take that into account, as well.

It seems that winner-only stats would be a direct measure of what usage stats for tiers are apparently trying to measure: Pokemon that win. Why use a substitute when we can just directly measure the real thing?

Also, if there is still interest in trying to re-build a team based on usage stats, I have a program that does that very thing. I'm using it as my team prediction algorithm for Technical Machine, my Pokemon AI, but it has the effect of building a team based on any number of Pokemon by 'reversing' the team stats.
 

Ice-eyes

Simper Fi
The problem with that is that you tend to be matched up with people of a similar rating to you. A low-rated player beating a low-rated player doesn't necessarily mean the winner was good (one of them has to win, after all). Should that be weighted equally to a high-rated player beating another high-rated player?
 
yea, winner-only doesn't work because even if you're really good you can win only half your match, and if you stay at a 800 rating you can also win half your matches. It's because you're paired with similar strength and not the whole server.
 
I'm really opposed to the whole concept of weighted stats in general, but I understand why we don't want people dicking around with Specs Manectric and Cotton Guard Altaria and still being counted. So I have an alternative idea in mind- why don't we set a rating cutoff that you must meet to have your stats analyzed? Once you're above, say, 1000 or 1100 or whatever we make the number, we count your stats and if you aren't there, they don't matter. Once you're above said cutoff, however, your stats mean exactly the same between 1100 to 1500. I think this is a good measure to keep things like "Electivire out" while not skewing the stats too badly.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top