An alternative to "special permission applications", and why it's needed

Rising_Dusk · Apr 29, 2011

khz said:
Perhaps someone more informed about the programming can answer this: is it possible to implement Shoddy's old rating system?

It can be implemented, but requires new clients and a new release of PO and stuff. It also shouldn't be terribly difficult, but getting any of the PO coders to give it any priority would probably be tough. That said, I am on the programming push list for PO, but what with stats and other massive undertakings I am working on, can't really invest much time into it at the moment. Maybe I can work something out with coyote for the next PO release. It would certainly not happen for awhile, in any case.

Firestorm · May 1, 2011

franky said:
the problem with the old paragraph system is that they will essentially echo each other's statement and it's too time consuming. with a rating of 1500+ we already assume that the player is well-versed enough to vote without justifying him/herself. with all the rating decay though, i think we can solve this by

I would say paragraphs should be used to justify the reason for you vote rather than you being knowledgeable enough. Being a good player doesn't mean you're necessarily voting to ban things that keep the metagame from being competitively playable. Rather, you may be voting to ban things because you dislike it / it annoys you / it increases your chances to keep on winning.

Kevin Garrett · May 2, 2011

I have been in favor of having paragaphs for suspect tests since the beginning of BW. I talked to Philip about it at length and I understand he was trying to keep subjectivity out of the council, but it's quite clear that votes are subjective to begin with. It needs some kind of check. From what I understand, the question about having paragraphs is, "How long should they be?" If they are too short, it is easy to sugarcoat your opinion with facts. If they are too long, it is a stress on the council to read through them all. Nonethelss, I think there is a middle ground we can come up with to make everyone happy.

khz · May 3, 2011

Rising_Dusk said:
It can be implemented, but requires new clients and a new release of PO and stuff. It also shouldn't be terribly difficult, but getting any of the PO coders to give it any priority would probably be tough. That said, I am on the programming push list for PO, but what with stats and other massive undertakings I am working on, can't really invest much time into it at the moment. Maybe I can work something out with coyote for the next PO release. It would certainly not happen for awhile, in any case.

Just thought I'd post this:

Wiz said:
It is possible. The rating system is implemented in the PO server application, which is open source. It can even be changed without breaking compatibility with the current client application.

I've thought about implementing it myself around 8 months ago, but I talked to coyotte508 and he didn't like the idea.

And just so I stay on topic: Like I said before, I'm not totally convinced that high ranking = qualified to vote based on reasonable measures. But even if you have paragraphs justifying your vote how hard is it to want to vote "Blaziken because I cbb putting a counter on my team" (just an example) but then in a paragraph put reasons that you found in one of many discussion topics? To be perfectly honest I think most people vote for rational reasons, but given that paragraphs do very little I don't think it's worth the effort, unless someone can show me that there are some people who will put in bad reasoning in these paragraphs to warrant them being stripped of voting privileges, which I'm all for.

animenagai · May 3, 2011

Haunter said:
This. Bring back the old requirement of 1400. It's already hard enough to accomplish when factoring in hax and rating decay. We definitely need to enlarge our voting pool.

I agree, and I've been saying this for a while now. Tell me, what exactly is the 15/15 system supposed to achieve? What I've heard from the higher-ups is that it's supposed to reward effort and consistency throughout the tiering process (please correct me if I am wrong). However in practice, it does no such thing. Put it this way, no one needs to ladder for the majority of the testing period. Since a snapshot is taken a the end of the tiering process and that alone (basically) tells you who qualified and who didn't, why wouldn't you just ladder at the end of the period? This system punishes good players who don't have a lot of time on their hands. Some of us are busy people in uni, at work or both. Not all of us are teenagers who can ladder for hours on end every day.

I've seen people argue that you can just reach a high ladder ranking and then win 1 game every day to stop your decay. The problem with this argument is that being in the 15/15 range early in the test and being qualified later in the test would require completely different scores. As the testing period goes on, the average score on the leaderboard will go up. What gets you in the 15/15 range in the first week probably won't be good enough by the time the test ends. Seriously, if you guys are just worried about people qualifying early (and hence not knowing what is broken etc.), just set a time frame. You could do something like 'to qualify for voting, you need to achieve a rating of 1400 or higher after the 2nd week of the suspect test". That would still give us more time and flexibility than the current system. Heck, raise it to 1450 if you want to. Just establish a reasonable number that is concrete throughout the suspect test.

coyotte508 · May 4, 2011

PK Gaming said:
It allowed good players to ladder occasionally instead of religiously every single day.

I'm not going to argue about anything, just saying that you don't have to play every day. (if the decay is set to be once every 24 hours). Just stop playing for a few days and then play the same number of battles as the
number of days you were offline, and decay is erased.

jrrrrrrr · May 4, 2011

Just posting to say that I agree with the OP. The PO rating system is terrible and it's an embarrassment to the suspect testing process that we rely so heavily upon it. One bad luck streak over a small handful of battles and suddenly your 1400 "near qualification level" rating is now down to an 1100 "pathetic" rating. I haven't had a chance to run through the proposed system but SOMETHING really needs to change here if we want the tests to have any legitimacy. I have no problem with making the voting pool exclusive but using POs rating system boils it down to "who can avoid hax the most" instead of the original goal of the suspect test, which was to have voters based on their knowledge of the metagame.

Smith · May 18, 2011

I wasn't going to post in here because I didn't see what was wrong with the system- but I had never made an attempt at voting before. I've been laddering for quite a bit now to try and get voting reqs and the system is absolutely ridiculous. I'm really angry so can I just talk about the actual process of laddering? I hate it, it's terrible. I've gotten so many +7 - 24 battles or the like and I think I'm going to scream. I've dropped about 60 points today, because I got critted and stuff- and once I got angry at that, it just went downhill in a negative feedback loop. I'm running a stall team so I actually have a much higher chance of losing to noobs than people who actually know what they're doing (I actually got swept by an Electivire that had the exact four moves it needed to sweep, two of which weren't viable in the least) because of the random shit they pull. Once I get paired up against a batch of noobs and I lose, my rating drops, I get even angrier, I face more noobs because my rating is lower, and it's awful. Now I know this tirade is kind of off topic but the rating system is simply killing me, and I wish laddering were a bit easier. I've easily had over 100 battles this round and I clearly know enough about the metagame to vote (at least in my opinion), but there is just no way I am getting into that 15 + 15 range. I am not a bad Pokemon player, only an inconsistent one.

I have a couple of ideas- firstly, I'd like to bring the bar like to 1400 like everybody is talking about. I don't get whats even wrong with people parking their accounts anyway, they clearly are qualified to vote if they can get that high. Not to mention that 15 + 15 sets the bar really really high- right now, it puts it at 1499, and it's only going to get higher. That would make this even WORSE than Round 1, when EVERYBODY complained that voting reqs were too high. 1400 WORKS. I don't care how lucky you are, you simply cannot hax your way up to 1400 without some knowledge of the metagame- and even if you did, you must've learned something in the massive amount of battles that would require.

The other thing I would like to see is MORE SUBJECTIVITY in special applications. Yeah, I know, that's exactly what we've been trying to avoid, but why? Getting high on the ladder is the most objective proof available, we really don't need any more "objectivity". I think that a high-ish ladder rank would be a great thing to include in your special application, but I don't think Iconic or Eo or somebody should have to explain why they got haxed out of voting reqs (despite the fact that that would never happen). You should just have to send a PM to reach with your experience, any evidence you can supply (like people you've beaten, ladder peaks, tournament wins or placings, etc.) and what you think of the metagame. Just impress upon reach that you know what you're talking about, and he shouldn't have to ask about your ladder peaks (although they obviously help in showing your competence, if they're high). I have faith in the fact that reach isn't a moron and that Phil wasn't a moron for putting him up top; I think I can leave it to him to decide who's worthy of voting.

In summary, we all hate hax, hax is everywhere, ratings aren't always so telling of skill, we expect higher ratings than we ought to, and I trust the people up top.

BKC · Jul 10, 2011

Echoing Smith. This rating system is fucking terrible. For instance, I was at 1446 or so last week, and I got a +7 -24 match...I was about to win, go up to 1453 and get my voting reqs, but my Landorus got crit flinched and then flinched again by Excadrill's Rock Slide. It's not just the hax, it's the decay...

<symphonyx64> I would have tiering contrib badge if I didnt get outright cheated out of LC voting
<symphonyx64> when the server went down over a week ago I had made reqs
<symphonyx64> then, decay
<symphonyx64> but one night before the deadline I got back up to reqs again
<symphonyx64> so the next day (day of the deadline) I go to the beach for the whole day
<symphonyx64> come back and saw my account decayed AGAIN to 1242
<symphonyx64> Me and eternal got to 1250 within minutes of each other
<symphonyx64> and he DIDNT decay

It really comes down to who can battle the most to avoid decay, or who can avoid hax the most. I think if Shoddy's old rating system was implemented, it would be alot more beneficial because it means one or two hax losses vs. scrubs won't kill you. If we're not going to do that, the bar should definitely be set at 1400...if you can get 1400 you definitely have played enough of the metagame to vote. You can't simply hax your way to that high a rating without being somewhat knowledgeable of the metagame.

Moo · Jul 10, 2011

Rating system is annoying, and decay is a pain, no doubt about it, but I think it's up to the people that run PO to change the ladder system, not us.
One of my friends is a PO admin and helped make it, I could ask about the shitty rating system

Bologo · Jul 11, 2011

I absolutely agree with the sentiment that this rating system is absolute garbage. I mean, I don't like to badmouth these things because I know that a lot of work gets put into them, but this is a little extreme.

If we're not going to change the rating system, for god's sakes, at least make the required rating 1400. 1450 is way too high, because at that point, literally any battle you have has the potential to wreck all of your work for one day. I know I'm really just echoing the previous statements, but something really needs to be done, because it really is all about who can withstand the most hax and decay.

Also, I know that when the server's down the ratings aren't supposed to decay, but I'm sure that I saw quite a bit of decay in my rating after the server crash. I know I took like 2 or 3 days off from pokemon, but I don't think that would account for a 43 point loss that I had (1424 to 1381). Are you guys sure that it doesn't decay while the server's down? Because if it does turn out to decay, then I feel like the requirements should really be set lower, at least for the current round.

Sorry if this sounded like an off-topic rant, but it's just frustrating that there are either very clearly experienced players that are disguised as low ranked alts, or noobs that get extreme amounts of hax, and that one battle with them can result in an hour of laddering being a waste of time. :/

Is there at least a way to make it so that you don't have to battle people with a 100 rating difference? I know you used to be able to set it to lower than 100, but it feels like making the minimum difference 100 only caused problems, and didn't actually benefit anyone. At least if you could battle more people within your range you could have +15, -15 battles instead of +7, -24 all the time (or worse, I even had a +7, -25 earlier for some reason, though admittedly I thought the minimum was 200 :/).

capefeather · Jul 20, 2011

OK so, I probably should have been more aware of this sooner, but PO's rating system basically (tries to) implement the Elo rating system, which is used in FIDE ratings and similar places. While Elo itself is, I suppose, "legitimate", there are still a few problems with this that result in what I had demonstrated in the OP:

Shoddy had Glicko2. Elo is still worse than Glicko2 (the convergence complaint in the OP still applies). Thus, it's still a step backward. I'll happily admit that I (and most others) probably wouldn't be complaining at all if Shoddy never existed, but Shoddy has still set this standard as well as others. We should never accept steps backward.

There's also the fact that chess player ratings, at least at the top, come from many years of playing thousands of matches. Compare that to Pokémon, specifically our suspect tests. Our tests last a month; in my best attempt at laddering to the voting requirements, I played a bit over 200 battles within a month, and Jibaku apparently played around 300. IMO, more would have been unreasonable to anyone with a life. Elo, with its lack of a deviation, was simply never meant to be used for such "low" numbers for matches. We time our suspect tests to reflect the time that it supposedly takes to understand the metagame, but we pay no heed to the time that it may take to get a proper reading of true player ratings. Coyotte likes to argue that players will approach their true ratings "eventually", but in light of the chess comparisons, it's a pretty lazy argument.

There's a significant random element in Pokémon. Chess players may experience performance variation for other reasons, but the luck factor is still not nearly as prevalent (zomg I'm Black I'm slightly disadvantaged!!!). The results of randomness reflect in "wrong" rating changes and continue to impact the rating.

The matchmaker uses the ratings to find opponents. This may not seem like a big deal, but considering everything said before, it really works to make laddering more of a chore than it should be (or at least more than it was on Shoddy). This works to widen the gap between tournament performance and ladder performance. Also, I'm not sure how relevant this is atm but I'll mention it here anyway. I really do believe that the matchmaker really magnifies the issue (though it's hardly its fault).

I know that this post may be a bit irrelevant right now considering the server problems (lol PO strikes again!), but it came up in chat, so it's here.

An alternative to "special permission applications", and why it's needed

Rising_Dusk

Firestorm

I did my best, I have no regrets!

Kevin Garrett

is a competitor

khz

animenagai

coyotte508

jrrrrrrr

wubwubwub

Smith

BKC

Moo

Professor

Bologo

Have fun with birds and bees.

capefeather

toot