|
|||||||
![]() |
|
|
Thread Tools |
|
|
#1 |
|
hey, even pirates need attorneys
![]()
Moderator
Join Date: Apr 2009
Posts: 2,603
especially internet pirates
|
This post has been quite a few weeks coming. Over the months, I've seen certain complaints against the suspect test and the rating system that powers it. The trouble is, much of it isn't far from the truth. Proven and/or dedicated battlers aren't making reqs while some... questionable characters are. People are robbed of their req status after a string of hax-heavy battles. Finally, the current method of dealing with all this is terribly inadequate. I'm not here to scold anyone or blame anything; I'm just stating certain facts and clear trends, and I'm hoping that some change comes out of this.
A note: I do get that a lot of this isn't as clear as I thought it would be! I've edited this OP quite a few times already based on people asking me to clarify one thing or another. I'm not unwilling to edit it more for this purpose. Ratings (btw, lines 78-92 of this part of PO's code calculates what rating each player in a battle gains/loses. Basically, it depends on the difference between the ratings and the number of times that one or the other (that part isn't all that clear to me) has battled, though that only factors in for the first five battles on an account.) Pokémon Online has taken several steps backward from Shoddy Battle / Pokémon Lab, and its rating system is a big part of that. Now, I'm not going to shoot my mouth off about this here without some kind of evidence, but this particular fact can be seen right from the fact that we've gone from a convergent rating system to an at best oscillatory one. What I mean is, on Shoddy Battle and PL, a player's rating fluctuates less over time as it battles. However, on PO, one can gain or lose 7-24 points no matter what, not really getting to a "true" rating but going back and forth. This fact alone means that PO's rating system doesn't even actually try to measure a player's skill level in a precise way. Better players will tend to have higher ratings, but that just shouldn't be good enough for something as important as suspect voting. The fact that it seems to do what it's supposed to do provides a convenient illusion in which qualified voters can't help but to see complaints like this as excuses for inadequacy, so the problem is never solved. Exclusivity A huge problem that I find with the current suspect testing system is that it's very, very exclusive. The DPP Stage 3 Round 2 test had around 90 voters drawn from a much smaller pool. The misconception that I find people having here is that they make comparisons to other tests that notably involved writing arguments to the tiering organizers. This makes them think that the current problems can be solved by upping the requirement, making the testeven more exclusive. I hope that by the end of this post, people will see that this actually will not solve the problem without shutting out even more deserving users and making the voter selection more arbitrary. A demonstration For the purposes of this post, I wrote a ~100-line program on Python using the SciPy module (it's no MATLAB but eh) that attempts to simulate the probability distributions of the rating of players of different skill levels. I've made the following assumptions: 1. "New" users come in at a constant rate 2. Players battle at the same rate for 201 battles (Recently, the #1 battler was determined to have played 95 battles in about half the time of a suspect test. I'm going by this figure.) 3. If a player's rating drops below 1000, it quits (and presumably makes an alt as part of a "wave" of "new" users) I may have forgotten an assumption that I have made. Anyways: the code Included is a commented introduction that reads: "This program builds a matrix (called "rating") that looks at the "real" skill levels of each hypothetical player (rows) and gives a probability distribution of the rating that PO gives it (columns). At each "round", rating changes are recorded in a different matrix (called "new_rating"), which is then copied over onto the rating matrix to start the cycle anew. Each entry goes through a process of "searching" for other battlers; each possible matchup with a different entry has a weighted probability associated with it, and then the possible new rating probabilities are added based on the probability that the better player will beat the worse player and vice versa (a figure controlled by the constant "hax"). Additionally, the end of each "round" introduces a "wave" of new user accounts starting at rating 1000. A separate list is also made to analyze the probabilities for the best players in the "original wave", henceforth known as "star battlers"." The simulation puts in the "kfactor" (the multiplier that controls the rating variations) properly only for the first "wave" of battlers, but it shouldn't make a terribly large difference. This simulation obtained the following results: the results
some sums that I calculated from the results
The numbers are a bit lower than the actual results that we have seen, but please bear with me. The main thing that is noticed is that the average rating for each skill level is actually pretty low. The rating cutoff catches the "tail ends" of these roughly bell-curve rating distributions. I don't think that it's a stretch to think that this probably happens in reality. The implications below mostly result from this.
the implications (put in hide tags so as to make the post look less bloated/intimidating)
"Special Permission" Where does this leave the "special permission applications" that those who somehow don't make it are tasked to write? As I see it, these applications are supposed to solve the flaws inherent in the rating system. The main problem is that the instructions for these applications are really vague, and I suspect that ratings weigh heavily on who gets through here, which really just defeats the whole point of the applications. Other thoughts I'm not saying that most of the test was illegitimate or that the voters are a sham or anything like that. I'm just saying that this matters a lot. I may not have achieved the most realistic results on my little program, and maybe a "hax" factor of 1/8 isn't completely realistic, either, but what it says is pretty concerning to me. The worst part for me is that all this largely stems out of the 200 rating difference minimum in ladder matchmaking. The ladder behaves like a sort of gravity well, or I guess quicksand to an extent, because, until you hit 1200, you could be facing an opponent of ANY sort of skill level, and that effect never truly fades. One battles an opponent with an incorrect rating, and the points gained or lost from that are also incorrect, resulting in incorrect rating fluctuations. The other thing is the encouragement of alts. The fact that making an alt is optimal in certain obvious cases (e.g. you lose one of your first five battles, or you go under 1000 rating) makes this painfully clear. The thing is, people don't seem to understand that this matters a lot. No legitimate system of determining skill level would allow players to throw out win/loss records because it's "optimal". Could you imagine if they could do that in football or soccer or tennis or any other sport? So why are we so reliant on such a system here? I don't make alts. I don't believe in gaming the ladder, even now when it practically begs me to do so. At the same time, I wish I could be higher on the ladder more often so that I could battle better players more often. But when I get haxed out of several battles to the point where you run into low-rated, boring battles, that's simply not a fun or motivating situation. Here, I'm forced to choose between honesty and a good game. I'm not willing to choose. Hell, I don't even expect the rating system to be "fixed" any time soon (though abolishing the 200 minimum would help a lot). It would take quite a bit of effort to redo the rating system to make it as reasonable as what Shoddy Battle used to use. Despite everything, coyotte508 can do whatever he wants with his program. That is why I am going after the suspect test system instead (though obviously I can't blame the suspect test leaders, either). Summary So what have we found from all of this? 1. The rating system is unreliable for gathering informed voters, largely because it doesn't actually try to arrive at a single "true" rating for a player. 2. The rating system gives a convincing illusion that it does what the suspect test wants it to do. After all, better players TEND to do better. *thumbs up* 3. The suspect test voting privileges are extremely exclusive - more so than those from DPP Stage 3 Round 2 - but it has to be, for all the wrong reasons. 4. The voters and even the tiering leaders have little choice but to perpetuate the lie unwittingly, even through the "special permission applications", despite knowing or at least suspecting that something is wrong here. Well, then give us a solution, you crybaby! Well, I've actually proposed the solution that I'm about to present before, notably last September when Cathy attempted to take over the tiering process, but I guess there were other things on people's minds back then, and it's understandable. Paragraphs have been seen as the ideal, but they've been rejected for taking way too much time. However, the Smogon Council system introduced an interesting alternative: an IRC conversation between the "council" members and only between them. Now, it would be quite a bit harder to organize the same system for 50+ voters and get everybody into the same conversation, but ultimately I don't see that as completely necessary. Voters should engage in conversation not only to prove that they're actually competent but also to demonstrate that they care about more than just their preferences. Of course, we'd also need to consider lowering the rating requirement so that more of the "right" people get in. What I would personally also like to see (especially if the rating requirement isn't adjusted) is a way of letting people in. What I have in mind is not a private channel but a public one with mute on and current voters voiced, so that interested people can at least see what's going on. I see people getting "temp voice" status through alternative credentials like successful contributions to C&C/CAP or doing well in a relevant tournament. Bad apples out, good apples in. I'm not posting all this just for my own sake. I don't fully expect to make voting privileges even if these measures were taken, though it would have been nice if that was because I had 1600 mean rating on Shoddy/PLab and it still wasn't enough, rather than what is going on currently. I don't think that I would have bothered with this if it weren't for the people that I watched experience completely unfair situations, or if a team based on Cynthia couldn't make a solid 1200 rating. This is a real problem that affects many people, and I know that a lot of people get that there really is something wrong - even if they don't want to admit it.
__________________
If we cannot take joy in things that are merely real, our lives will always be empty. <+joshe> im a registered sex offender for up to calc 3 <+Reflect_Suicune> i was thining of fucking jellicent for some reason <DetroitLolcat> I AM AROUSED BY BIMETALLIC CURRENCY! Last edited by capefeather; Apr 26th, 2011 at 9:57:15 AM. |
|
|
|
|
|
#2 |
|
indulges in unsavory behavior
![]() ![]() ![]()
Moderator
Join Date: Aug 2007
Posts: 2,744
|
didnt read this but i think its about voting so im gonna use this thread to ask if we can bring back the council!! who agrees?
__________________
|
|
|
|
|
|
#3 |
|
Quiet Thunder God
![]() ![]() ![]() ![]() ![]() ![]()
Moderator
Join Date: Aug 2009
Posts: 4,524
Izanagi
|
I gave it a good read and I mostly agree with Capefeather. Voters are decided on who can play for extended periods of time on a regular basis. I mean I gotta give credit where credit is due, there are some pretty awesome players who can achieve voting rights in one sitting, even with some serious hax, but the rest of us have to ladder for like +4 hours (on a regular basis too due to decay) to get anywhere. Even if his ideas don't get implemented a simple change in PO's rating system like deviation would suffice.
V We cool now♪
__________________
Last edited by PK Gaming; Jul 18th, 2011 at 7:57:08 PM. |
|
|
|
|
|
#4 |
|
im kinda the shit
Join Date: Aug 2009
Posts: 746
|
damn 4+ hours with 10 minutes each day due to decay pokemon sucks
__________________
![]() BKC: i'm sorry lol i know i bs'd you Lavos Spawn: youre seriously going to bs me like this Lavos Spawn: damn Lavos Spawn: guess the game is more important than the friendship |
|
|
|
|
|
#5 |
|
the Hero
Join Date: Nov 2009
Posts: 1,626
|
Why don't we bring back the paragraph system and see if people actually know what their talking about? I think that would be a productive way to keep uninformed voters from messing with the process. I understand it has been rejected before, but I think it should be reconsidered.
Unless the tiering staff doesn't want that on their plate of course.. edit: If the paragraph system was implemented Reach could simply assign a few more assistants or advisors to help him read through the paragraphs to lighten the load. Even if the paragraphs slightly slow down the process, I believe the things they provide will be much more beneficiary in the long run. |
|
|
|
|
|
#6 |
|
✓ Just Doug It
Join Date: Oct 2008
Posts: 1,657
Never never land
|
The problem with the paragraph system is that it injects a LOT of subjectivity into the system. Now, I KNOW for a fact that reachzero is an awesome person and is unlikely to be biased and whatnot, but there is the (legitimate) complaint that "I can't vote despite trying because he said I can't". Not to mention that English isn't everyone's forte.
__________________
Credit to Legacy Raider for the awesome Avatar. Check out my Archived Warstory between me and Lemmiwinks MkII |
|
|
|
|
|
#7 |
|
The greatest oak was once a little nut that held its ground
![]() ![]() ![]() ![]() ![]()
Join Date: Aug 2009
Posts: 2,030
Came by a fork in the road, and went straight
|
You don't have to be an English major to articulate your thoughts about the metagame. Also I believe voters tend to know who among them is good, and if some people were unreasonably taken out we would know. We aren't morons.
Also subjectivity isnt really an issue imo. I think it's pretty easy to tell when someone really understands a subject because of the amount of things they can apply it to. Others just can't. Paras don't even have to be long, the standard should just be "enough to show you know your stuff"
__________________
![]() |
|
|
|
|
|
#8 |
|
Join Date: May 2010
Posts: 264
|
An issue I see with paragraphs is that it is possible for someone to form and develop an opinion based on just reading stuff and playing very little. I think that's particularly a problem given that people put thoughts/opinions/calculations in various suspect threads or analyses, meaning that it's not as if the only way to learn this stuff is by playing the game. Those people then develop "their opinion" based on what they've read, not what they really think (because they don't know anything outside of theorymon to make an informed decision). Let's be honest here, it's not rocket science, I could probably make a decent case for banning Garchomp from gen 4 OU without ever playing that metagame.
|
|
|
|
|
|
#9 | |
|
the Hero
Join Date: Nov 2009
Posts: 1,626
|
Quote:
|
|
|
|
|
|
|
#10 |
|
I crashed my car into the bridge, I don't care
![]() ![]() ![]() ![]()
Super Moderator
Join Date: Oct 2009
Posts: 5,523
~(^.^)~
|
Instead of writing paragraphs to prove that you're good enough to vote on the Suspects, why not use that time to write those same paragraphs to those people who meet the rating reqs, but you feel are not "mature" enough or "knowledgable" enough of the metagame?
Seriously, if you feel too many stupid people are the ones making the decisions for the tiers, then convince your smart friends to get better at battling (or be more lucky); or, you can take the time to get to know those people who are good at battling, but know nothing about personally to see where they stand with the Pokemon of the metagame. In short, I like our system how it is, and in no way should we bring back the paragraph or special permission system. |
|
|
|
|
|
#11 | |
|
Join Date: Jan 2010
Posts: 1,674
Minnesota
|
Quote:
With the change from a set 1400 (after Phil changed it from 1500), the voting reqs have only gotten higher with the implementation of the 15+15. Even though there has only been a single round of the 15+15, the minimum point requirement (excluding special permisions applications) went from 1400 to 1427, and the total amount of voters went from 53 (Round 1) and 51 (Round 2) to 43. It might not be a significant drop, but it did make it harder for people to qualify, making the suspect test more exclusive. Now I might get the argument that "paragraphs will disqualify some voters making your proposal counter-productive," why not lower the voting reqs to 1350? It's still a challenge for most users to hit that, and the paragraphs will weed out random users who somehow got up there. I don't think its bad if there are 70-80 voters. If they all break the ladder requirement, they obviously know the metagame (excluding the few lucky users) and should have no trouble writing a paragraph or two per suspect expressing their thoughts of why something is or isn't Uber. Just my thoughts, I'd be more than happy to talk on IRC if anybody has questions. Long post short, I agree with the OP, but also think that paragraphs should be added to the qualification of suspect test voting. I hope this post makes, sense and that I wasn't just rambling on in a confusing manner.
__________________
ASB Profile [19:35:22] <@Charmander> alphajolt just has the worst luck [19:35:22] <@Charmander> like [19:35:35] <@Charmander> his luck is worse than prem's lc abilities [03:49:23]<toshimelonhead> jolteon's shiny looks like gf pissed on the normal one Last edited by AlphaJolt; Apr 26th, 2011 at 8:30:22 PM. Reason: fixing my misquote of capefeather |
|
|
|
|
|
|
#12 | |||
|
hey, even pirates need attorneys
![]()
Moderator
Join Date: Apr 2009
Posts: 2,603
especially internet pirates
|
Quote:
For posterity, here's the post that I alluded to earlier that suggested something like this: Quote:
Quote:
EDIT @ Oglemi: Duly noted.
__________________
If we cannot take joy in things that are merely real, our lives will always be empty. <+joshe> im a registered sex offender for up to calc 3 <+Reflect_Suicune> i was thining of fucking jellicent for some reason <DetroitLolcat> I AM AROUSED BY BIMETALLIC CURRENCY! Last edited by capefeather; Apr 26th, 2011 at 10:38:40 PM. |
|||
|
|
|
|
|
#13 |
|
I crashed my car into the bridge, I don't care
![]() ![]() ![]() ![]()
Super Moderator
Join Date: Oct 2009
Posts: 5,523
~(^.^)~
|
OK, my bad, I didn't mean to put down the special permission part, I meant to say "the paragraphs or Smogon council." I'm not going into detail as to why I don't want to see the Smogon council come back, but yeah.
Also, I was not derailing the thread. I was merely addressing Yondie, who suggested bringing back the paragraph system. I did read your fucking long ass post, and I apologize for not quoting Yondie when making my post. I also stand by stating that I don't see anything wrong with our current system, even though you pointed out a lot of stuff, but that's just me. |
|
|
|
|
|
#14 | ||
|
Suspect process: users edition
![]() ![]() ![]() ![]()
Administrator
Join Date: Aug 2007
Posts: 4,388
Italy
|
Quote:
__________________
Quote:
|
||
|
|
|
|
|
#15 |
![]() ![]() Join Date: Apr 2009
Posts: 2,419
|
the problem with the old paragraph system is that they will essentially echo each other's statement and it's too time consuming. with a rating of 1500+ we already assume that the player is well-versed enough to vote without justifying him/herself. with all the rating decay though, i think we can solve this by:
- taking a screenshot of the night you got 1500+ and should you have difficulty maintaining the rating due to real life issues and rating decay, you'll have viable proof to showcase for the special permission applications. the rating is a good measure of what you've accomplished in the ladder and if you're hellbent on laddering consistently, you can have this proof. however, to make things fair, you must also take a screenshot of your current ladder ranking the day the identification thread gets posted to show that you aren't really camping and have completely disregarded maintaining your ladder points throughout the entire suspect round. the system itself is actually fine in my opinion but i'm sure a lot of people are complaining. the best way to get your requirements is to play the last week (or two weeks if you're really prolific as a player) so its easier to manage and maintain. if you're laddering now to attain 1500- fine by all means, but if you're faced with hax and rating decay that's your error to ladder this early and you shouldn't really complain because it's your personal choice to get your high rating now and maintain it for x amount of days. |
|
|
|
|
|
#16 | |
|
hey, even pirates need attorneys
![]()
Moderator
Join Date: Apr 2009
Posts: 2,603
especially internet pirates
|
Quote:
__________________
If we cannot take joy in things that are merely real, our lives will always be empty. <+joshe> im a registered sex offender for up to calc 3 <+Reflect_Suicune> i was thining of fucking jellicent for some reason <DetroitLolcat> I AM AROUSED BY BIMETALLIC CURRENCY! |
|
|
|
|
|
|
#17 |
|
I wanna be a red panda when I grow up
![]() ![]()
Join Date: May 2010
Posts: 1,253
|
The PO rating system sucks. We can all agree on this.
The irc discussion channel would be a great idea. Paragraphs suck; however, if you can't justify your votes, that is worse. So, simply make your votes and then write a couple sentences like "latios doesn't have enough counters in OU. Nothing is a safe switch into it." A larger pool of voters would help a lot, because it'd be a bigger part of the community etc. However, since it has more "bad" battlers, it's more vulnerable to bandwagon bans.
__________________
reyscarface: nails unluckiest man alive Eternal: no ghosting Das |
|
|
|
|
|
#18 |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
Instead of complaining about the PO rating system, you may as well just ask for the voting requirements to be reduced, as other people have already mentioned. I agree completely that it can be incredibly frustrating to ladder at times, but at the end of the day, it does exactly what a rating system is supposed to do: reward the players who win frequently against good opponents.
Also, this underlying assumption that increasing the voter pool is inherently beneficial bothers me. I'm all for allowing more people to vote, but in all honesty I trust the judgement of someone who maintains a 1500+ rating than someone who hovers around 1350 or even 1400 in virtually all cases. Allowing more people to vote will indeed change the way in which things are banned, but I'm just saying that you need to take a step back if you think it'll undoubtedly be for the best.
__________________
◠‿◠ |
|
|
|
|
|
#19 | |
|
Suspect process: users edition
![]() ![]() ![]() ![]()
Administrator
Join Date: Aug 2007
Posts: 4,388
Italy
|
If you manage to get a rating of 1400 then you definitely know how to play the game. Such a rating is not easy at all to achieve and maintain.
As for increasing the voting pool: we've never had as many complaints about the suspect process as we've been having after the recent drizzle+swift swim and Blaziken bannings. Increasing the voting pool will, in my opinion, enlarge the consensus around Smogon's current tiering process.
__________________
Quote:
|
|
|
|
|
|
|
#20 | |
|
✓ Just Doug It
Join Date: Oct 2008
Posts: 1,657
Never never land
|
Quote:
__________________
Credit to Legacy Raider for the awesome Avatar. Check out my Archived Warstory between me and Lemmiwinks MkII |
|
|
|
|
|
|
#21 |
|
hey, even pirates need attorneys
![]()
Moderator
Join Date: Apr 2009
Posts: 2,603
especially internet pirates
|
The rating system rewards players who win frequently against opponents who win frequently against etc. By appealing to the system's tendencies, you attempt to justify it with circular logic, using its own apparent integrity to prove... its own integrity. It's like a defense attorney playing judge. Yes, high-rated opponents TEND to be good, and sure, good players TEND to higher ratings than worse players. Any functional rating system can do that, and "tendencies" are a terrible benchmark of quality, especially for a game that involves luck. All we really know is that the system rewards at least SOME good players, but we see here that it's far from a foregone conclusion that the system rewards more than a select few on the tail end of a bell-curve distribution.
I would compare it to a poorly written exam. Not only is some of it unclear (luck factor), but there are a lot of questions that require the answers to a previous question (matchmaking). You must, of course, fight luck compounded with psychological factors to get to a high rating first before you can prove yourself at all by beating high-rated people. What we're doing currently is comparable to bell-curving to fix the effects of the poorly written exam... which is a terrible solution. Fortunately, IRL, you have the right to have a question clarified. There's no equivalent here. I just don't like the attitude of some people with so much power over the banlist where they claim that something is "good enough". If the space program stopped at "good enough", it would have risked lives and huge monetary investments. So why are we risking the tiering system like this?
__________________
If we cannot take joy in things that are merely real, our lives will always be empty. <+joshe> im a registered sex offender for up to calc 3 <+Reflect_Suicune> i was thining of fucking jellicent for some reason <DetroitLolcat> I AM AROUSED BY BIMETALLIC CURRENCY! |
|
|
|
|
|
#22 |
|
Quiet Thunder God
![]() ![]() ![]() ![]() ![]() ![]()
Moderator
Join Date: Aug 2009
Posts: 4,524
Izanagi
|
I just wish we could use Shoddy's old rating system.
Look at Doug's post: http://www.smogon.com/forums/showpos...98&postcount=3 A CRE minimum & deviation helped a ton when laddering for reqs. It allowed good players to ladder occasionally instead of religiously every single day. It also allowed you to make mistakes and lose against someone with a lower rank without terribly affecting your rating, instead of mercilessly punishing you like the PO system does. Very few people complained about Shoddys ranking system. If thats not possibly to change the rating system at the moment, then lowering the rating would be fine I guess.
__________________
Last edited by PK Gaming; Apr 28th, 2011 at 11:27:58 AM. |
|
|
|
|
|
#23 | |
|
Midlife Crisis
Join Date: Apr 2008
Posts: 1,292
England
|
I think personally just up it back to 1500 for voting, decreases the chance for scrubs voting as well as (hopefully) increase the trust in the voters! While I don't think any wrong bans were made besides Brightpowder which I don't care about at all, the potential is definitely there to have that problem.
__________________
Quote:
|
|
|
|
|
|
|
#24 | |
|
The north wind
![]() ![]() ![]() ![]() ![]()
Moderator
Join Date: Jun 2008
Posts: 1,137
|
Quote:
I think 1400 is a better number (most scrubs will never get it). Also, I agree completely with Haunter post
__________________
|
|
|
|
|
|
|
#25 | |
|
Join Date: May 2010
Posts: 264
|
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
|
|