Go Back   Smogon Community > Pokémon > Smogon Metagames > OverUsed
Register FAQ Social Groups Calendar Search Today's Posts Mark Forums Read

Closed Thread
 
Thread Tools
Old Feb 20th, 2013, 12:56:01 PM   #576
Rayquaza_
 
Join Date: Sep 2010
Posts: 694
Default

Quote:
Originally Posted by Fat Melee Mewtwo View Post
Usage is irrelevant, see Gen 4 Wobb, Tornadus-T, ect. (and don't forget these same guys still use Ape in BW2)
Quote:
Originally Posted by Fat Curtains View Post
Oh what about volcarona? What does it do against that? Volcarona is probably the hardest of the hard of counters to deo-d as it can set up in its face and probably 2hko with unboosted bug buzz. Also volcarona isnt a bad pokemon... As most teams these days get whacked by it. (see rmt forum rates)
Quote:
Originally Posted by Fat Lavos Spawn View Post
first of all, volcarona saw 6.55% usage last month, meaning it's not especially common.
I love how usage is not a valid argument when addressing how broken a pokemon is, but it is an argument when it comes to its checks and counters.
Rayquaza_ is offline  
Old Feb 20th, 2013, 2:04:09 PM   #577
Meru
is a Tiering Contributor
 
Meru's Avatar
 
Join Date: Aug 2005
Posts: 599
Default

Anybody who says that they didn't notice the differences between the Suspect ladder and the other ladder must be really oblivious. Throughout the 80-something battles you did to get reqs, did the influx of SR Terrakion, Aero, Azelf, Mew, Froslass, Roserade, and Custap+Sturdy users not tip you off?
Meru is offline  
Old Feb 20th, 2013, 2:17:40 PM   #578
Shining Latios
 
Shining Latios's Avatar
 
Join Date: May 2010
Posts: 1,311
Default

Quote:
Originally Posted by Fat Meru View Post
Anybody who says that they didn't notice the differences between the Suspect ladder and the other ladder must be really oblivious. Throughout the 80-something battles you did to get reqs, did the influx of SR Terrakion, Aero, Azelf, Mew, Froslass, Roserade, and Custap+Sturdy users not tip you off?
I never saw Aero, Azelf, Mew, Froslass, or Roserade in any of my battles in the first place.

Last edited by Shining Latios; Feb 20th, 2013 at 5:40:52 PM. Reason: typo
Shining Latios is offline  
Old Feb 20th, 2013, 2:33:10 PM   #579
Tassa
 
Join Date: Jan 2013
Posts: 48
Default

Quote:
Originally Posted by Fat Pwnemon View Post
Ferrothorn and Skarm can setup alongside.
When ? The best they can is attack, Magic Coat is a huge pain for them in other cases. (I made few match with Deo-D, but I remember this stupid Skarm who tried to setup his hazards on me, giving me the SR I lacked in my first test team)

Quote:
Originally Posted by Fat Pwnemon View Post
E @ 2 above: Zam can run encore or Taunt (why don't more?)
Magic Coat, Mental Herb.

Quote:
Originally Posted by Fat Curtains View Post
What is stopping a focus sashed terrakion to get SR up? Nothing.
Sableye : taunt turn 1, burn turn 2, recover turn 3.
Espeon is maybe KOed by Stone Edge but set you at 1 HP while preventing SR, so you cannot set SR. And this, whatever the Espeon set you run as soon you have Psychic.

Quote:
Originally Posted by Fat Curtains View Post
Also rain is dominant? I think i recently seen a stat that rain is only second to HO or non weather. So that is false.
Stat are a thing.
Analyze them properly is another.
Do you realize that "non weather" is not a playstyle with a "non-weather mon" that we see in all these team ? That non weather is what we had in 493 (bar some Rain Dance and Aboma which does real weather teams ; teams with Tyra were rarely strongly weather-oriented)
Do you say that 80% of the 493 metagame was a unique playstyle ?


To Meru, I made something like 140 matchs (I failed at my first attempt, which is funny since my winrate is similar in the 2), and I cannot judge objectively since the second time I used a Custap Forre. It even was funny when someone asked me if I was MM due to that. It's pretty effective, but can nothing against Taunt or smart play (first : don't hit too effectively ; two : kill it)
For Aero and Roserade I didn't seen any, Mew/Azelf/Froslass maybe one of each. I've seen an honnest number of SR Terrakion though (which I killed happily before spinning safely their hazard and saving Forre for later, or just have a half-life Forre without hazard up in both sides)
Tassa is offline  
Old Feb 20th, 2013, 5:29:13 PM   #580
Jukain
rip numeros
is a Contributor to Smogon Mediais a Contributor to Smogon
 
Jukain's Avatar
 
Join Date: Feb 2011
Posts: 1,685
somewhere could be anywhere
Default

okay I got reqs a little while ago, and I must say, there were a ton of sash rak on the suspect ladder. there was quite a bit of azelf and custap forry / skarm. there were a few crustle and mew also. people were desperately trying to replace deoxys-d, and they're all really effective. I've found them to be as effective as deoxys-d if not more, as many of them are really fast and thus can prevent pretty much everything from getting up hazards. also, kyurem-b and latias are insanely good in this metagame. latias just checks a ton of important stuff, while kyurem-b (I ran mixed with fusion bolt / earth power / ice beam / roost with max satk / some hp / some spd) can hit so many threats hard.

the suspect ladder itself was really annoying; I lost a bunch of battles right off the bat, so I ended up playing like 150+ games. all I have to say is that glicko2 is fucked up, people had reqs with worse records than I had at various points.

anyway I'm going to vote ou, deoxys-d is nowhere near broken.

EDIT: also seriously forfeiting games to lower dev? it's fine if you play the game out and think you have no chance of winning, not wanting to waste time, but forfeiting right away is dumb.
__________________
C&C Work | 1k RMT | Contribute! | VM for an OU Rate! | gp member: vm/pm for a check | previously pokemon0078 / aka jew-cane

Last edited by Jukain; Feb 20th, 2013 at 9:22:04 PM.
Jukain is offline  
Old Feb 20th, 2013, 9:20:07 PM   #581
Windsong
Aim for the moon, even if you miss you’ll land among the stars.
is a Tiering Contributoris a Contributor Alumnus
 
Windsong's Avatar
 
Join Date: Apr 2009
Posts: 817
elsewhere
Default

I'd just like to say that I'm really disappointed with this ladder round and I feel that whatever the result, be it ban or unban for deo-d, it should be pretty much completely discounted. The system is great when we're letting players vote who have solid win ratios and rankings and clearly know the tier, but when players with 2:3 (and worse) win:loss ratios are qualifying it just completely makes the whole system kind of worthless. This was probably some error with the ladder of course, but if in the future making reqs could be based off w:l rate in addition to score somehow I'd have a lot more faith in the voting system.
Windsong is offline  
Old Feb 20th, 2013, 9:31:21 PM   #582
Rhys DeAnno
Slacking Off
is a Tiering Contributor
 
Rhys DeAnno's Avatar
 
Join Date: Dec 2009
Posts: 144
The Ladder
Default Multiple OU Ladders and Their Effects on Glicko2 and the Suspect Process

Windsong raises a good point, and I've actually been thinking about the whole thing lately and I'll post up my thoughts here.

It's likely that everyone who's been laddering in the two most recent OU suspect tests has noticed the stark differences between the environments and how Glicko2 has responded to them. In the Torn-T test, a relatively consistent performance overall was required to succeed, usually with a win percentage of at least 80 or so over about 70 games. In the Deo-D test, it required a win percentage at least of 60 or so over about 90 games. The reason behind this large difference was that the standard OU ladder was closed for the Torn-T test and open for the Deo-D test.

The open OU ladder meant that most people who were not looking to qualify did not play on the suspect ladder. This obviously included most of the poor and average battlers who make up the bulk of OU's population. This had a number of obvious and subtle consequences on the suspect ladder:

  • The suspect ladder had much less people battling on it, which resulted in wait times for battles. These wait times were often not too bad, but could be as long as a couple minutes in off-hours.
  • The much smaller population of the suspect ladder meant that there was no large "heat sink" of Glicko2 to keep a normalized distribution. A large number of alts get abandoned by suspecters after a few early losses, and this causes their "negative points" to be trapped away and isolated from the system. If the population is large and few people do this it does not have noticeable effects, but if a large amount of the population does this it results in inflated ladder ratings for the entire system.
  • This inflation did actually not seem to be time-independent. As more and more accounts were created and destroyed the average Glicko2 rating of a ladderer became very high, probably well over 2000 by the last days of the test.
  • These higher ratings resulted in more spread out ratings in general, making deviation fall more slowly and extending the amount of battles needed to make +/-55.
  • Due to this large amount of required battles and the generally inflated ratings of the ladder, many suspect testers finished with a string of forfeits to get qualifications more quickly. Unfortunately this probably poisoned and inflated the ladder even further.

While I lack the statistical and programming acumen to easily run simulations to investigate the above effects, I think the general outline they paint is clear enough. While the above situation might or might not be bad, it was certainly much different from the situation with the Torn-T test, and I'm concerned about the validity of our testing process if we test for bans under such different conditions. I think we should probably standardize whether we turn Standard OU on or off during OU suspect testing so we have more predictable tests, and so we can adjust ladder reqs accordingly for either environment. My main concern is a rating of 2000 +/- 55 probably does not indicate the same thing during this test as it did during the last test, and drawing our suspect voters from different populations during different tests is going to corrupt our suspect process.
Rhys DeAnno is offline  
Old Feb 20th, 2013, 9:35:17 PM   #583
Deluks917
 
Deluks917's Avatar
 
Join Date: Nov 2012
Posts: 247
Default

The problem seemed to be that most people realized the best plan was to quit if you lost any of your first say 8 games. When basically everyone does his the avg rating will wise throughout the test. As in fact dramatically happened at the end.

It would have been much better imo to have just used regular ladder ranking for this. For one the ratings were reliable. And for another you actually would have gotten to play with Deo-D.

I mean by the end people were recommending forfeiting as a strat (I recommended playing random crap teams but that is far from a forfeit strategy).

Raising the reqs for this seriusly favors people who played at the end of the test. During the beggining of the test ratings were fairly normal. Maybe its best to just throw this test out. Though people spent alot of time trying to get deviations low.

edit: I meant I don't see a fair way to handle this test. People thought the requirements were one thing and acted accordingly.

Last edited by Deluks917; Feb 20th, 2013 at 10:17:15 PM.
Deluks917 is offline  
Old Feb 20th, 2013, 10:11:18 PM   #584
Curtains
 
Curtains's Avatar
 
Join Date: Jul 2009
Posts: 974
Dirtiest player on smogon
Default

One solution is to just have the top 30 battlers with a reasonable rating minimum. Also eliminate the extra ladder so it won't take an hour to do 4 battles. Besides I can rarely see an instance where not having the suspect on the ladder creates interesting and informed discussion. This is especially true with a pokemon whose most suspecting quality is the ability to use it easily.
__________________
Curtains Youtube Channel
Curtains is offline  
Old Feb 20th, 2013, 10:31:09 PM   #585
Cyrrona
in color
is a Tiering Contributor
 
Cyrrona's Avatar
 
Join Date: Sep 2008
Posts: 169
Default

The ratings obviously got really screwed up towards the end, but I think trashing the entire round would be a huge mistake. Qualifying was a significant time investment for most people, and it'd be massively discouraging and wasteful to toss all of that work out. If people are concerned about this crop of voters' metagame credentials, the council could consider requiring short justification paragraphs from those teetering on the W/L fence (which shouldn't take too long to review if we limit the requirement to those suggested). I'd much prefer any of these sorts of compromises to the drastic alternatives referenced above. The last portion of this test was far from ideal, and I'm sure we'll be taking steps to prevent similar problems in the future...but the current Round 10 is definitely salvageable.

EDIT:
Quote:
Originally Posted by Fat Rhys DeAnno
I think requiring any sort of paragraphs would be a huge mistake. In the end evaluating paragraphs is always going to be a judgement call, so if we're basing quals on paragraphs we might as well just have the council making fiat decisions. I think what's done is done for this test, and we should lie in the bed we made and focus on improving results of subsequent tests.
The suspect process is filled with elements of subjectivity as it stands... I'm not advocating regular paragraph requirements, but I think it'd be a decent one-time option in this situation for verifying some of the users and clearing doubts about the vote's legitimacy. Although I understand others might be more skeptical, I trust the council would review the submissions as impartially as possible and determine (to the best of their ability) whether someone's actually an idiot or a reasonably knowledgeable player. I get the anxiety about unintentional bias, even if I think we could probably minimize it to acceptable levels in this case...we could, though, potentially safeguard against that sort of thing by adding another layer of transparency and posting the submissions publicly. Whether we use this idea, work out another solution, or leave what we've collected untouched, I'm just primarily focused on preserving what we've accomplished so far.

EDIT EDIT: Agreeing with the anti-W/L-ratio sentiments below.
__________________
ARTBY ICONIC

Last edited by Cyrrona; Feb 21st, 2013 at 12:30:29 AM.
Cyrrona is offline  
Old Feb 20th, 2013, 10:47:11 PM   #586
Rhys DeAnno
Slacking Off
is a Tiering Contributor
 
Rhys DeAnno's Avatar
 
Join Date: Dec 2009
Posts: 144
The Ladder
Default

Quote:
Originally Posted by Fat Cyrrona View Post
If people are concerned about this crop of voters' metagame credentials, the council could consider requiring short justification paragraphs from those teetering on the W/L fence (which shouldn't take too long to review if we limit the requirement to those suggested).
I think requiring any sort of paragraphs would be a huge mistake. In the end evaluating paragraphs is always going to be a judgement call, so if we're basing quals on paragraphs we might as well just have the council making fiat decisions. I think what's done is done for this test, and we should lie in the bed we made and focus on improving results of subsequent tests.

EDIT: More thoughts

Another big mistake is to trust W/L percentages as some kind of gospel just because Glicko2 was goofy this test. Someone could easily have faced much more difficult competition and have a justly high Glicko2 rating for a meh win percentage, or have faced lots of weak opponents and have a great win percentage compared to their Glicko2. Additionally, lots of people operated under the assumption that 2000 +/- 55 was sufficient and ended with a string of forfeits, which would obviously impact their win percentage in a very negative fashion. Really, W/L percentage is an even shittier metric of measurement than corrupted Glicko2 is, since at least Glicko2 is attempting to be intelligent and W/L percentage doesn't even try.

Response to above Edit:
Quote:
but I think it'd be a decent one-time option in this situation for verifying some of the users and clearing doubts about the vote's legitimacy
It would actually make me completely doubt the validity of the process, since there has been tons of bad blood about Deoxys-D and a vast disconnect between different elements of the playerbase concerning if it should have been tested at all compared to other things like Drizzle. It'd be easy even for somebody trying to remain impartial to have their judgement of the paragraphs twisted by their position, especially in the borderline competent cases. If the reviewer is even slightly less harsh on one side or the other that could completely skew the vote.

Last edited by Rhys DeAnno; Feb 20th, 2013 at 11:19:10 PM.
Rhys DeAnno is offline  
Old Feb 21st, 2013, 6:16:09 AM   #587
Remedy
 
Remedy's Avatar
 
Join Date: Oct 2012
Posts: 464
I tell you I'm a Tensai
Default

Quote:
Originally Posted by Fat Windsong View Post
I'd just like to say that I'm really disappointed with this ladder round and I feel that whatever the result, be it ban or unban for deo-d, it should be pretty much completely discounted. The system is great when we're letting players vote who have solid win ratios and rankings and clearly know the tier, but when players with 2:3 (and worse) win:loss ratios are qualifying it just completely makes the whole system kind of worthless. This was probably some error with the ladder of course, but if in the future making reqs could be based off w:l rate in addition to score somehow I'd have a lot more faith in the voting system.
I don't understand it.

People who have a shitty W/L ratio in the end met the reqs for a simple reason : they had an almost perfect W/L ratio at the start.
So it's too easy to trash them and say that they suck in the metagame for this reason.

Want an example ? I have a terribad W/L ratio this suspect and you know what ? I don't care the slightest and I think I know this metagame enough to vote. I had so much glicko˛ at one point and the deviation was lowering so slowly that I did not give a single duck to my games. It was because I was flawless in the start that I could afford that.
I think it's unfair to look at my W/L in the end and say "okay he has a bad opinion on this metagame, he's probably bad". I played 110 games seriously, and my W/L is horrible because I had 2.5K glicko˛ at 90 deviation and I was like "Okay screw this, I don't care anymore, let's play gimmicks or whatever because anyway I just need to lower my deviation now".

So yes, too easy to stare at us from your "good ratio". One could say that your ratio was good because you faced bad players, and could argue that his shitty W/L is due to the level of the players he played against himself.
Improve the ladder ? Force people to not run dozen alts, is the only possible solution. But this will never be done.
No "improve W/L, glicko˛ etc.." nonsense please, else I'm just gonna make another alt, with a perfect W/L ratio and everybody will do the same. In the end we'll play between people at 3K Acre or just wait the perfect win succession.
And W/L means nothing, imagine that on the ladder, there are only 10 of the best OU BW players on smogon right now. They all "deserve" to vote, but what will be their W/L ratio ?
You see what I mean I guess, W/L depends on who you meet. And in every game, there is a loser, does that mean this loser wasn't good ? No, you can't say that because you don't know his level just by watching his W/L and I don't even understand how you came to that shortcut.

Last suspect I had a good W/L ratio, one of the best iirc I did cry about the number of games I had to play, I did not trash the other people doing the suspect because they had a less good ratio because I have no clue who they had to play against.

TL;DR : Don't dare take away the vote from me after 110 games or I go mad.

EDIT : I thought it was a ratio of 2 OR 3 W for a L that you were talking about.. oh well it's true that 2 wins for three losses is beyond my expectations of what is a low ratio.. whatever x)
__________________
We must not let daylight in upon the magic.

VM me for a rate in BW2 OU, my advices are free for now

*Check my last RMT -Friend's Prophecy- ! And give me your opinion !
http://www.smogon.com/forums/showthread.php?t=3482863

Last edited by Remedy; Feb 21st, 2013 at 7:24:04 AM.
Remedy is offline  
Old Feb 21st, 2013, 7:39:08 AM   #588
Sacaen
 
Sacaen's Avatar
 
Join Date: Oct 2012
Posts: 75
Default

One issue you're forgetting about having alot of people string forfeits at the end of their run to get requirements is that it boosts other people up flawlessly. (and I'm not saying you or anyone in particular did this Remedy)

Because the ladder was so empty it really was not hard at all to simply view who was currently doing matches on the ladder and if they were chain forfeitting you could queue up with them, and get a chain of essentially guaranteed wins. I didn't have the time (or care, I'm pretty neutral on Deo- D's situation) to actually get req's myself, but I know for a fact how manipulable it was possible to be, as I was able to do this just by watching some suspect matches, easily figure out that they were chain forfeitting every time they got a match, and then queue for a suspect match and get a free win. It should be put into question how many people who got requirements got there with the (even unintended) help of other people being on at the same time forfeitting to lower their deviation? 'Broken' describes it quite nicely.

The situation above really should not be allowed to exist. Ladderers shouldn't be put into a position where chain forfeitting is what they need to do to get req's in a timely fashion as it can corrupt others' ratings.
Sacaen is offline  
Old Feb 21st, 2013, 12:06:58 PM   #589
Iconic*
is a Tutoris a Tournament Directoris a member of the Smogon Site Staffis a Super Moderatoris a Smogon IRC AOpis a Community Contributoris a Contributor to Smogon Mediais a Tiering Contributoris a Contributor to Smogonis a Team Rater Alumnusis a Battle Server Moderator Alumnus
 
Iconic's Avatar
 
Super Moderator
Join Date: Jan 2009
Posts: 1,784
Canadaland
Default

fyi people with solid win ratios but whose deviations can't be lowered enough due to inflated ratings are almost always accepted as special applications if they have played enough games, and nearly all people who qualify with sub 50% win ratios are due to throwing games because they're too lazy to lower deviation legitimately, so i'm not sure where this 'disappointment' is coming from lol

aldaron and i were talking a couple of days ago about implementing a minimum 50% win percentage to discourage people from throwing games to lower deviation, because as heist pointed out there exists this culture of losing on the ladder for the sake of meeting reqs. i think jabba actually brought this up last round but it completely slipped my mind until a few days ago. ratings have the tenancy to get really inflated on the suspect ladder towards the end of the test, making it hard to lower deviation, but as i said before that's why we have special apps. glicko is not perfect but it's certainly not as bad as some of you think (i'd like to see you guys devise an algorithm for measuring skill in such a luck-based game!!). these details will certainly be hammered out before the end of the next test
__________________
◠‿◠

Last edited by Iconic; Feb 21st, 2013 at 12:19:16 PM.
Iconic is offline  
Old Feb 21st, 2013, 1:25:28 PM   #590
Jayde
SOLIDARITY
is a Tiering Contributor
 
Jayde's Avatar
 
Join Date: Jul 2011
Posts: 321
Default

What I don't like about the ladder system is how easy it is to hit reqs. You honestly just have to get lucky with ratings for the first ~10 matches, and after that you can make reqs by forfeiting over half of your remaining matches. With this ladder, almost anyone can get reqs if they put some time in. Excessive forfeiting also gives ladderers many undeserved points. During my forfeit spree, I gave at least 10 wins to at least 2 people, one of which ended up barely hitting reqs. I understand that this issue is the council's to deal with, but I doubt that this is what they intend. I'd honestly put set a win ratio for reqs

I'd honestly set a win ratio for reqs, maybe something around 2:1 or 5:3 (with a set number of minimum battles). This would put a limit on the number of forfeits as well as the number of undeserved voters. I know that the difficulty of reqs is the council's to decide, but this is just my take on the matter, and I doubt that they want reqs to be this much of a joke either.
__________________
Super
Willpowered
Energetic
Loving
Loquacious
Outstanding
Washes himself daily
Jayde is offline  
Old Feb 21st, 2013, 1:27:18 PM   #591
Princess Bri
COME FORTH
is a Tiering Contributor
 
Princess Bri's Avatar
 
Join Date: Feb 2012
Posts: 1,062
Default

i still had over a 50% win ratio and i threw ~20 matches

the problem with the current ps system is the inflation toward the end of the round if we were going to move to a higher glicko2 score to get reqs. if you ladder at the beginning of the round, you're going to be facing lower kiddies on the ladder and getting a higher glicko2 is much harder than at the end of the round. when we laddered on PO the ladder score is much more stabilized unlike PS since it doesn't inflate nearly as bad since people can set a ranking variation. additionally, i understand deviation is so you play a certain amount of battles, but seriously it's just a hastle from 65 --> 55. if we were going to try and improve the ladder score system, i'd say 21(50?) glicko2 and 65 dev.
__________________
Princess Bri is online now  
Old Feb 21st, 2013, 1:27:52 PM   #592
Deluks917
 
Deluks917's Avatar
 
Join Date: Nov 2012
Posts: 247
Default

Honestly is there any evidence GLICKO is more stable then regular ELO with a reasonable K value.

I also support 2150+-65 GLICKO2 if we are going to stick with gLICKO2.

Though again this heavily favors people who play at the end of the suspect test.

This problem was much less severe during the garchomp and Tornadus Tests. Maybe our userbase for this test was too small for the system to handle.

I think ELO with constant (potentially fairly large) K value should be considered. Theoretically this reduces the rate at which you attain your true skill. But it has the advantage tht your first and last matches affect your rating fairly equally.

To be exact if K is 25 and I go on a new account and beat EO I would get almost 25 points. However later after 70 battles I battle EO again and we have similar ratings I would gain/lose 12.5 points. The actual number of games I have played does not affect anything. The only downside to this is GLICKO cushions ratings lose when an established account loses to a new random person (they could be Heist or whateveR). But in ELO with fixed K you can lose at most K points anyway. And its not like losing to new players doesn't hurt in GLICKO. This is I suppose not exactly equal but it is nothing like GLICKO. I remember beating some high ranked guy (Volta?) on my 2nd game on one account and I instantly shot to like 2700 GLICKO2. This sort of thing does not happen with ELO.

Another fairly ridiculous fact in GLCIKO is that if you can manage to play high rated players in your first battles due to the server being depopulated you are way better off. Say you have a real skill of 2200. If you play people with real skill 1900, 1950, 2000, 1800, 2100 in your first 5 battles you ar dramatically better off then if you play people with 1500, 1600, 1700, 1800, 1900 (assuming equal deviation).

Last edited by Deluks917; Feb 21st, 2013 at 3:47:41 PM.
Deluks917 is offline  
Old Feb 21st, 2013, 2:35:41 PM   #593
Rhys DeAnno
Slacking Off
is a Tiering Contributor
 
Rhys DeAnno's Avatar
 
Join Date: Dec 2009
Posts: 144
The Ladder
Default

Quote:
Originally Posted by Fat Jayde View Post
I'd honestly set a win ratio for reqs, maybe something around 2:1 or 5:3 (with a set number of minimum battles). This would put a limit on the number of forfeits as well as the number of undeserved voters. I know that the difficulty of reqs is the council's to decide, but this is just my take on the matter, and I doubt that they want reqs to be this much of a joke either.
The problem with this is you wait until a bunch of idiots are on the ladder and play then to get a high win ratio. We're using Glicko2 for a reason: it's because using W/L record is completely stupid. Remedy makes an excellent point that if you happen to be laddering when all the usual suspects for OU are laddering some people are going to have a less than 50% win rate, but Glicko2 is supposed to account for this by judging the degree of difficulty.

Maybe the system could judge when ladder accounts have been abandoned and correct for it with some kind of negative bonus pool? It's the process of trashing accounts that is confusing Glicko2, so the fix is going to be something to do with recognizing the trashed accounts and compensating. Also during last round the situation seemed less severe, probably because regular OU was shut down and we had all the non-serious players behaving normally and grounding the system. I think the first precaution we should take if we're interested in fixing things is not to run two OU ladders at once anymore, to take advantage of the heat sink our casual players provide Glicko2.
Rhys DeAnno is offline  
Old Feb 21st, 2013, 3:06:41 PM   #594
Melee Mewtwo
Dat Lugiass
is a Community Contributoris a Tiering Contributor
 
Melee Mewtwo's Avatar
 
Join Date: Jan 2011
Posts: 657
France
Default

Except that the accounts are trashed because your first 15ish battles are the key ones and if hax screws you over on any of them you are likely to have a harder time reaching a higher peak as the first battles count largely more than the last ones. I don't think we should further punish the ladders that are already annoyed about having to start completely over from scratch and create a new alt name. I'm liking the sound of this ELO system as its this massive difference between the early and late games that are pushing people to create new alts and go on forfeit sprees at the end in the first place.
__________________
[01:47:47] <+Limi> gamefreak has confirmed the rumour
[01:47:53] <+Limi> that mewtwo now has a tumor
[01:47:59] <+Limi> but please man, chill out
[01:48:03] <+Limi> you don't need to pout
[01:48:08] <+Limi> just take it all in good humour!
Quote:
Originally Posted by Fat trickroom View Post
Blizzard is for the whole Dragon Slayer thing, it OHKOes almost any Dragon in the tier save Kyurem, Giratina, Dialga, Palkia, Reshiram, Zekrom, Latios, Latias and Giratina-O.
Melee Mewtwo is offline  
Old Feb 21st, 2013, 5:57:41 PM   #595
Woodchuck
<Feranfell> punbot irl aka virginity protector
is a Smogon IRC AOpis a Battle Server Moderator
 
Woodchuck's Avatar
 
Join Date: May 2010
Posts: 1,648
us best
Default

Really, suggesting ELO? ELO was essentially the rating system we used on PO, and it was garbage. You could easily get haxed out of reqs because you would lose such tremendous amounts of points to poor luck against lower ranked players, wasting the efforts you had made for hours before. You'd spend many battles with +1 -30 differential, and losing just one to hax sets you back 30 battles. ELO is alright for chess, but in a game with such a luck aspect as Pokemon, it is grossly inadequate for the job. There are other ways we could set reqs requirements, but going to ELO is a step backwards.

I also have a problem with the win ratios idea. With an ideal rating system that perfectly assigned each person their skill level, everyone would have near 50% win ratios. The point of the rating system is to match you up against players of roughly equal skill, so in evenly matched games, you should be winning roughly half the time. The only people with disparate win ratios should be those at the very very top and the very very bottom -- but this clearly isn't happening. Either way, the fact that we have a rating system placing people on the ladder at all makes win ratios a horribly inaccurate way to evaluate player level.

Win ratios were never designed to be an adequate measure of skill in a laddering situation; attempting to use them for any meaningful purpose is a bad idea.
__________________
Woodchuck is offline  
Old Feb 21st, 2013, 6:05:21 PM   #596
Cyrrona
in color
is a Tiering Contributor
 
Cyrrona's Avatar
 
Join Date: Sep 2008
Posts: 169
Default

For what it's worth, I think the simplest/least disruptive short-term solution is something like this:

1) Revert to our one-ladder system instead of splitting OU and Suspect
2) Set the qualifying benchmark at 2000 +/- 65

Like others have noted, the larger playerbase should be able to offset any ripples that certain laddering practices might cause. On that note, I'd expect the number of "strategic forfeits" to fall significantly with this modest (but much more manageable) deviation change. I don't see any real need to raise the actual rating requirement if this shift can mitigate the inflation problems...we've successfully run a handful of single-ladder tests with thresholds of 2000 in the past, and I personally don't think raising the minimum dev by 10 points would have any noticeable impact on voter quality.
__________________
ARTBY ICONIC
Cyrrona is offline  
Old Feb 21st, 2013, 6:13:25 PM   #597
Deluks917
 
Deluks917's Avatar
 
Join Date: Nov 2012
Posts: 247
Default

One ladder would basically eliminate most of the problems.

Though I would still advocate going to 2050 or 2100. But seems like there have been concerns when not enough people voted.
Deluks917 is offline  
Old Feb 21st, 2013, 7:06:03 PM   #598
Jayde
SOLIDARITY
is a Tiering Contributor
 
Jayde's Avatar
 
Join Date: Jul 2011
Posts: 321
Default

Quote:
Originally Posted by Fat Rhys DeAnno View Post
The problem with this is you wait until a bunch of idiots are on the ladder and play then to get a high win ratio. We're using Glicko2 for a reason: it's because using W/L record is completely stupid. Remedy makes an excellent point that if you happen to be laddering when all the usual suspects for OU are laddering some people are going to have a less than 50% win rate, but Glicko2 is supposed to account for this by judging the degree of difficulty.
I don't get what you mean by "wait until a bunch of idiots are on the ladder". It's not like the skill level of the ladder really fluctuates throughout the round.

Also, I don't think you understood me correctly. I'm not saying that we should scratch Glicko2, I'm just saying that we should incorporate a win ratio as well. Using a rating system solely based on W/L wouldn't be ideal, but clearly, neither is the Glicko2 system when people with awful win ratios or 40+ forfeits are hitting reqs.
__________________
Super
Willpowered
Energetic
Loving
Loquacious
Outstanding
Washes himself daily
Jayde is offline  
Old Feb 21st, 2013, 7:15:40 PM   #599
Rhys DeAnno
Slacking Off
is a Tiering Contributor
 
Rhys DeAnno's Avatar
 
Join Date: Dec 2009
Posts: 144
The Ladder
Default

Quote:
Originally Posted by Fat Jayde View Post
I don't get what you mean by "wait until a bunch of idiots are on the ladder". It's not like the skill level of the ladder really fluctuates throughout the round.
The skill level of the ladder fluctuates massively depending on the time of day you're playing. I have seen so many ridiculous gimmicks from silly players in late/early morning EST on the OU ladder that I'm probably permanently traumatized. Win ratio is completely useless as a statistic in a metagame with such disparate skill levels, you might as well base qualifications entirely on chance.
Rhys DeAnno is offline  
Old Feb 21st, 2013, 11:15:41 PM   #600
Lavos Spawn
our state of zen
is a Community Contributoris a Tiering Contributoris a Smogon World Cup defending champion
 
Lavos Spawn's Avatar
 
Join Date: Mar 2012
Posts: 2,160
Default

i played somewhere like 90+ games with a glicko2 of above 3000 the whole time, and my deviation is still in the high 170s because of how broken the ladder system is on showdown. basically, if you have a really high rank, your deviation goes nowhere. it took like 10 battles to lower it from 190 to 189.

i don't know the solution, but i know the problem
Lavos Spawn is offline  
Closed Thread Smogon Community > Pokémon > Smogon Metagames > OverUsed

« Previous Thread | Next Thread »
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -4. The time now is 4:05:32 AM.