Proceeding with the Suspect Process

Aeolus

Bag
is a Top Tutor Alumnusis a Tournament Director Alumnusis a Site Content Manager Alumnusis a Battle Simulator Admin Alumnusis a Top Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnusis a Top Dedicated Tournament Host Alumnus
Ok, after talking with Jump, chaos, and X-Act... it seems that the only way to proceed with the suspect process is to ban Shaymin-S from the standard ladder at this time. The current vote, obviously tarnished as it is, is not salvagable and Uber won by 2 votes.

We propose, however, that Shaymin-S be re-queued for testing in 1 month's time after the Lati@s test and vote is complete. The rationale for waiting a month to begin the new test is two fold:

1) We want to give the metagame a chance to evolve slightly, perhaps gaining two new pokemon in the form of Latios and Latias, before tossing Skymin back into the fray. If we retest it now, nothing new will be learned since it will be the same game as the one we just played for 30 days.

2) It would be preferable if the hub-bub and emotion surrounding this vote were given a chance to subside somewhat. That would not happen if the people who JUST won their victory by voting it uber had the rug pulled out from underneath them with a new testing period that they would surely perceive as one trying to reverse the result of the poll.

Further:

As was decided before, all suspects that were previously tiered uber will be played on the suspect test ladder... and it is from there that people will have to "earn their stripes" to vote. Additionally, a membership duration requirement will be added to prevent users who have only registered in the very recent past from participating.

In this thread, please comment on the proposal and suggest any other items that you think would be beneficial to the voting process. This is uncharted territory for us, and Jumpman16 and I have been doing our best to better the game... we've made mistakes; that is clear in hindsight (which everyone knows is 20/20). Let's not pretend that anyone foresaw the current problems that have arisen because of the closeness of the Shaymin-S poll. Obviously such a person would be negligent if they hadn't spoken up to begin with and everyone is doing their best here. Hopefully this thread will help us avoid similar problems going forward.
 
I have to say I liked the checkpoint idea someone proposed, since it rewards consistent play. Also, matching their IPs when voting was a good idea, but that's the least of our problems.
 
I also like the checkpoint idea too... except for a few things. One, it is not terribly practical. Two, it is logistically difficult. Everyone should recognize an additional tradeoff with it as well. If we choose to implement a checkpoint system, the number of voters will likely decrease drastically and (I predict) will put us back into the neighborhood of 25-35 voters rather than the 75-100 that people have expressed they wanted.

To learn about the checkpoint idea, see The Policy Review forum.
 
2) is a good point, as much as I deep down don't really care about some of the community (the part that whines a lot and/or has shitty arguments), we are even doing a suspect test process to give the community input and really "final say" on this, rather than have PR members vote on Suspects themselves
 
I also like the checkpoint idea too... except for a few things. One, it is not terribly practical. Two, it is logistically difficult. Everyone should recognize an additional tradeoff with it as well. If we choose to implement a checkpoint system, the number of voters will likely decrease drastically and (I predict) will put us back into the neighborhood of 25-35 voters rather than the 75-100 that people have expressed they wanted.

To learn about the checkpoint idea, see The Policy Review forum.

i definitely agree that practicality is an issue but unless i am misunderstanding how rankings work, wouldn't we stand to get more than the 32 votes we got on dx-s with "easy requirement of just 95 deviation|slightly tougher requirement of just 73 deviation|'only' 10 days of 1665/65"
 
I see it more as a quality filter than as something that constrains good and consistent players. If someone can't make one checkpoint there's always special permission to consider.

I'd rather have 25-30 quality voters than 100 "my jirachi was flinched 19 times in a row" and "it's not allowed in Nintendo tournaments".
 
yeah, and also...we seriously can allow as many voters as we want...but we probably don't want to publicize that we are looking for 75-100 because that will probably give people incentive to half-ass
 
My primary concern with the checkpoint system is that people do not have attention spans that last 30 days. They simply don't... and that puts us into a boat of granting 'special permission' so much that it becomes the standard way of getting voting rights. Every time someone goes on vacation, has a busy week, or "anything else that keeps them from playing lots of pokemon at a particular time" they will need special permission. That said, if everyone else wants it, and Doug is willing to gather the data for it every time... I'll administer it and sift through the "special permission" requests.

I just don't think it adds a whole lot beyond what we have already. It would fix a problem that was unique to this past test. That problem of people already having established ratings will not be an issue going forward and that is primarily what the checkpoint system addresses imo.
 
I think we need to have a better idea of exactly how stage 2 and stage 3 are going to be organised. Stage 2 should be easy enough, it will just be the standard ladder at the end of testing. So would that be run while at the same time we would have the all suspects unbanned ladder? And then at the end, how do we come to some sort of decision? I am guessing we will do another round of voting for every suspect to decide whether it should be retested as a suspect or something? Though that doesnt seem to deal with the issue that they may only be broken when added by themselves, and not when added together..

But our process for revisiting decisions I think should be clearly established before any more decisions are made, because otherwise we are going to run the risk of appearing as though we are intentionally trying to manipulate the results.

I think our revisiting process is important to sort out because it sorta affects how our current process should work. If it is easy to overturn a decision if it is determined that it would be a good idea to do so, then winning a vote with only a 50% majority isnt a big deal. If it turns out that overturning things is going to be very hard, then I think we should definitely look at requiring a larger margin.

Also we should work out a minimum number of voters required for a vote to count.

And I still think we need to make it very clear that when they vote they are voting on whether or not the pokemon is negatively affecting balance. If we cant come up with a viable definition of balance then we can defer to sirlin:
A multiplayer game is balanced if a reasonably large number of options available to the player are viable--especially, but not limited to, during high-level play by expert players.
TBH I think we should just run with his definition, rather than coming up with our own. We havent made any progress with our own, sirlin is pretty well universally respected, and he actually has a definition. We leave the subjective aspects of this definition up to the individual voter (like what does reasonably large mean).

I mean once we say that much, then anyone voting because of nintendo tournaments is just blatantly defying the community. I would hope that people arent so disrespectful as to do that..

Having the suspect ladder back should prevent the people from voting who havent made any attempt to investigate the usefulness of Shaymin.

Have a nice day.
 
My primary concern with the checkpoint system is that people do not have attention spans that last 30 days. They simply don't... and that puts us into a boat of granting 'special permission' so much that it becomes the standard way of getting voting rights. Every time someone goes on vacation, has a busy week, or "anything else that keeps them from playing lots of pokemon at a particular time" they will need special permission. That said, if everyone else wants it, and Doug is willing to gather the data for it every time... I'll administer it and sift through the "special permission" requests.

I just don't think it adds a whole lot beyond what we have already. It would fix a problem that was unique to this past test. That problem of people already having established ratings will not be an issue going forward and that is primarily what the checkpoint system addresses imo.

I actually suggested a checkpoint system before Skymin, because of the fear that people were coming in two days before the test ended and passing that way, even though that clearly does not indicate that they have experienced the suspect and how it affected the metagame. It serves to further purify the pool, even if that costs us some voters (who we didn't really want voting anyway as far as I'm concerned because they clearly didn't care to really participate). As I said in the PR thread, I honestly don't think it's asking too much to ask people to play 20 battles in 10 days without regard for rating, as the system X-Act has suggested states (Days 1-10 = 95 Dev only, Days 11-20 = 73 Dev only).
 
We can give it a try... I don't disagree with you guys that it would be nice if it worked. I'm just concerned that it won't.
 
I think we need to have a better idea of exactly how stage 2 and stage 3 are going to be organised. Stage 2 should be easy enough, it will just be the standard ladder at the end of testing. So would that be run while at the same time we would have the all suspects unbanned ladder? And then at the end, how do we come to some sort of decision? I am guessing we will do another round of voting for every suspect to decide whether it should be retested as a suspect or something? Though that doesnt seem to deal with the issue that they may only be broken when added by themselves, and not when added together..

stage 2 is actually happening right now for each suspect, i think you're mixed up. stage 1 is the playing, stage 2 gives a "tag" to each suspect at the end of its test. and stage 3 is what accounts for the high possibility that any given suspect may not be broken when in a metagame with other suspects (or broken when previously believed not to be, if that's possible without the notion of an "ou bp mew").

But our process for revisiting decisions I think should be clearly established before any more decisions are made, because otherwise we are going to run the risk of appearing as though we are intentionally trying to manipulate the results.
the only reason we're revisiting this is because there wasnt a suspect ladder. there will never be another reason to revisit decisions, and that includes the issue of "majority" as far as i'm concerned, because i don't see a reason to not honor 51% especially when that 51% wouldnt be final because of stage 3

Also we should work out a minimum number of voters required for a vote to count.
this seems like a good idea in theory but who knows how badly the community will disappoint us. thats their fault though, we can literally "only do so much"

reasonably large number of options available
i dont think anyone in their right mind can argue that having skymin in the metagame lowered the number of viable option, since all the top pokemon check it and are awesome in their own regard. "large number of options available" (we can assume "viable options" is a fair modifier) can get pretty vague and arbitrary
 
I really don't see a reason for the checkpoint system except making sure people don't have the requirements from the start and not participate in the test at all, which could be very simply be remedied by just forcing everyone to use a new account. I like having the freedom to schedule my time to get up to the required voting threshold (read: I'm lazy and like to do it at the last minute), and the checkpoint system would cut into that.

Making everyone participate in X battles per week is something that would turn a lot of people off from voting, I'm sure. I don't really see a checkpoint system helping to remove aforesaid "My Jirachi was flinched 19 times"/"Nintendo bans it so we ban it" arguments - unless we're going and checking on an individual basis, there isn't any real sign that the people with bad vote reasons play more/less consistently than those with sound logic. If we REALLY want to see only sound arguments, there's really nothing that can be done but going back to bold voting (which would be better done in PM, where you can't copy/paste someone else's argument as easily) where you're required to provide reasoning, because forcing someone to play consistently doesn't change their mind any more than playing a whole lot will. Of course, that has a lot of flaws too, as everyone knows. Checkpoint system doesn't help with the problem at hand at all.



Also, I might be biased since I voted it Uber, but I'm assuming/hoping that the logic of some of the people voting Uber is sounder than it might look like based on their post in the vote thread (with the exception of Nintendo-based crap) ie
Stonecold said:
Okay, so I really was going to vote OU, but after seeing Skymin take out my 6 cm rachi more then once with pure luck, I decided Uber. I also wanted to vote the opposite of TAY.
then, later
Stonecold said:
Also, id debate you guys and write my story on why I believe it should be uber, but quite frankly....my opinion is my own it is not yours.

We're not requiring any logic at all for votes, so a post like the second one is absolutely fine, which implies something more intelligent than "i was flinch-haxed to death fuck that". There's a good chance that a lot of people who would say something like "I was haxed, it's uber" actually have a real reason behind their votes like "I don't like the fact that it greatly influences the impact of luck on the game".
 
As someone who is decent friends with Stone, I have talked with him about Skymin numerous times and he had been planning to vote uber for a while. It had nothing to do with some battle, or with me. IIRC he just didn't like that most of its counters have to rely on luck to stop it, which is a valid concern.

So yeah, I think he was just poking fun at me and being a subtle troll. There is no need to keep quoting him as a poster child for crap logic (he's actually quite a bit smarter than most of the voters).
 
Ok, after talking with Jump, chaos, and X-Act... it seems that the only way to proceed with the suspect process is to ban Shaymin-S from the standard ladder at this time. The current vote, obviously tarnished as it is, is not salvagable and Uber won by 2 votes.

We propose, however, that Shaymin-S be re-queued for testing in 1 month's time after the Lati@s test and vote is complete. The rationale for waiting a month to begin the new test is two fold:

1) We want to give the metagame a chance to evolve slightly, perhaps gaining two new pokemon in the form of Latios and Latias, before tossing Skymin back into the fray. If we retest it now, nothing new will be learned since it will be the same game as the one we just played for 30 days.

Sorry, but I honestly don't see any point in retesting Shaymin again after Lati@s or at any point until Stage 3. The introduction of Lati@s will drastically affect the way the game is played and if they are allowed to remain OU then when Shaymin is retested it will be an entirely new metagame and we would have basically "wasted" this month testing it.

If Lati@s were to be voted Uber, then we are basically putting Shaymin back into the same metagame that players were somewhat indecisive to begin with. Also, with Lati@s being tested people are going to adapt more to those two being in the metagame and are not going to be any bit more prepared for Shaymin than they were when it was still allowed.

2) It would be preferable if the hub-bub and emotion surrounding this vote were given a chance to subside somewhat. That would not happen if the people who JUST won their victory by voting it uber had the rug pulled out from underneath them with a new testing period that they would surely perceive as one trying to reverse the result of the poll.

It is still going to feel that way a month later, only now people will have additional reasons to complain, thanks to the addition of Lati@s, and even then there is no real guarantee that the Lati@s vote will be a landslide either, which will only provide us with further problems.

We can't simply keep retesting individual suspects if we don't like the outcome of a vote. They all have to be retested again together in Stage 3, so why waste extra time doing a retest?
 
I also think that there have been some rather crass generalizations regarding the logic behind some of the uber voters... If you look at some of the posts in the discussion thread in Stark, some users such as skiddle even went so far as to say that the uber voters were "younger" than the OU voters. Some posters came up with some other theories behind the small margin of victory for ubers like "people being afraid of change" or "people getting angry about getting haxed". But nobody can really prove any of these thoughts to be true.

Thus I agree with Fishin about the checkpoint system. Having to maintain a certain rating on such a tight schedule can be quite cumbersome for many people, so I don't think it will weed out more qualifed voters, just those with more spare time on there hands.
 
Well I am still unclear on how the stage three testing is going to be done?

My understanding came from this:
As far as the other suspects I feel, rather than lumping them all together, we should test them individually starting with the most likely to end up OU (which happens to be the order they landed in on the list). I'm using the same reasoning here as I did for Garchomp. None of these Pokemon directly counter each other so therefore testing them together just creates an indiscernible mess. Say in this situation we test Lati@s first and find that it isn't too overpowering for the metagame, well then Lati@s moves on to what I'll call "Stage 2" of testing. However if the general consensus is that Lati@s is completely broken, Lati@s is kept in ubers. When Manaphy is tested next, we test the metagame without Lati@s even if Lati@s passed Stage 1. We do this for all Uber suspects. When we've completed all 4 tests, we then test a metagame with all suspects that passed Stage 1. This is to see if a combination of Pokemon, while not individually overwhelming to the metagame, places too large of a burden on the game. For instance, Lati@s and Manaphy both may fit into OU just fine, but when they're combined they might break the metagame (highly unlikely, but it's a necessary step). Then of course we bold vote :D.
Stage 2 I had thought was the bit where we test everything that passes its suspect test vote. Though, by our current method that would be the standard ruleset at the end of testing.

Also I am pretty sure someone told me that if something passes its test then it ceases to be a suspect, which seems to contradict with this original plan. I am guessing that the original plan has been modified in that regard, but I guess I'm not entirely sure that that is the case.

then stage 3 is:
Finally, if I were to offer an addendum to Jabba's proposal, it would be a Stage 3 where we add the suspect(s) banned in Stage 1 to the successful, suspect-free metagame we arrive at following the completion of Stage 2. We'd do this just to confirm that this suspect does indeed break the true metagame, where "true" means one without suspects, which we will have determined through the successful completion of Stages 1 and 2. I actually think this is a necessary step, and it address the very valid possibility that, should Garchomp fail Stage 1 but Lati@s pass Stage 2, that Garchomp may indeed not overpower the true metagame which again is one where there are no suspects, which means Lati@s don't break it by themselves so it would stand to reason that they would probably not be more powerful with Garchomp back in the mix.
But I still dont really understand just how we are planning on going about this. Like, once we do this test, how are we planning on using the experience of it to make changes to the ruleset.

Well I guess the thing I am thinking that we really want to make a decision that even if they dont agree with it, it will be accepted by as many people as possible. Things like small voter numbers and tiny margins of victory are better to be avoided if at all possible. But revisiting this in stage 3 could be a good way of dealing with this I think, just not knowing the details, I cant really be sure of that.

Also I think the argument that Skymin reduces variety would be something like "it is so strong I feel I have to use it every battle" or "I have to use Zapdos and scarf Heatran to have even a chance at beating it" or something like that.. I dont know the specifics, I havent been battling with it..

Have a nice day.
 
TAY, I believe that anyone can be intelligent. But when you post something such as that, you kind of get the wrong idea straight away. Though the "getting haxed with Jirachi by Skymin 12 times" argument didn't flow well in my head either.

Overall, I wish "Opinions" were forced to be posted, to give us a clear idea why these people voted for the Pokemon to be Uber / OU. Backing up their vote should be mandatory IMO because otherwise I don't know why they voted it, especially if I hear "for obvious reasons".
 
I'd rather have 25-30 quality voters than 100 "my jirachi was flinched 19 times in a row" and "it's not allowed in Nintendo tournaments".
This.

We are actually providing the community this privilege to have a say on how they want the metagame to be. I'm calling it a 'privilege' because, frankly, that's what it is. However, if you want to be allowed this privilege, you'll need to know what you're talking about.

About the checkpoints... I don't know. I'm neutral about them, on second thoughts. What the checkpoints will do is to ensure that you play 2 games everyday for 30 days and not play 30 games per day in the final 2 days. However, since both players got 60 matches and a favourable rating by the end, wouldn't they both be relatively knowledgable about the suspect Pokemon being tested? The most important thing to be done is to ensure that every player starts with his or her rating at 1500/350 in every suspect test.
 
Well I personally assign a lot of importance to actually conducting the test. I think there is a lot to be said for actually experiencing the suspect throughout the entire period of the test. There is no way the suspect is going ot be handled the same way during the initial hype of the first days than during the end of the test.

And think of it this way. I, personally, don't feel that I need to play with or against any suspect at all to be able to give a qualified opinion about whether it is uber or not. This is in part because I watch quite a few battles, but also because let's face it, ego or not, my past experiences and my general/pokemon intelligence qualifies me enough to sound off on any Suspect. But instead of debating the smarts of the mighty Jumpman16, realize that I purposely don't apply to Aeolus for special permission to vote on suspects (which he would probably grant). I don't do this because, even though I actually do not think that you need to have actually played battles to have a valid idea of whether or not a suspect is uber, this is the requirement we are going with now. (Besides, I'm sure I'm not the only person capable of forming such valid opinions without actually playing with the suspect, unless we want to argue that I am just that smart and good and awesome.)

Therefore, I do not think it is asking much for people to actually pass three 10-day checkpoints (especially considering the first two arent all that hard at all), because otherwise, I will just argue that people like me should get to vote, since we showed just about the same willingness to pass the actual battling checkpoints as those who, for whatever reason, didn't want to pass the first two.
 
I see. In that case, I see what the point of the checkpoints is now, and agree with their implementation.
 
Well I personally assign a lot of importance to actually conducting the test. I think there is a lot to be said for actually experiencing the suspect throughout the entire period of the test. There is no way the suspect is going ot be handled the same way during the initial hype of the first days than during the end of the test.

This is one of the few areas where Jumpman and I agree on the facts, but have a slight difference of opinion. It is true that there is no way the suspect is handled the same way during the initial hype of the first days than during the end of the test.

I contend that by the final days of testing, the suspect has reached something of an equilibrium... or rightful place in the metagame. For that reason, it seems clear to me that the most valuable information that can be learned about a suspect is that which is learned during the latter days. This is why I am not opposed to denying the right to vote to people who engage in intense participation at the end of the test. It is my belief that those people still benefit from the sum total experience of the entire test because the metagame they play at the end is the manifest incarnation of the previous month's collective experience. I think we use phrases like "whore the ladder at the very end" to diminish and discount these people, and I wonder if that is a mistake.

Or take it from the other side. Participating in the test during the initial hype gives a distorted view of how the pokemon will affect the game. If opinions are formed then, could that not also degrade the quality of the vote? I do not agree that forcing people to participate periodically throughout the month necessarily yields a voter pool of higher quality. In fact, I think the highest quality voter pool would be one drawn from people who had zero experience during the initial hype of the test and only began to play once the suspect reached its equilibrium position. I think dangerous and erroneous assumptions have been made about this checkpoint thing.

Initially, my gut reaction to it was very positive in all respects except logistical ones... but the more I think about it, the less I think it takes us in the direction we want to go.

Maniaclyrasist said:
Sorry, but I honestly don't see any point in retesting Shaymin again after Lati@s or at any point until Stage 3.

Think. The point is that the vote itself was tarnished and inordinately close. Putting much, if any, stock into that for the time (probably several months if not more!) until stage three is ridiculous.
 
Ideally, everyone that ends up voting would battle consistently throughout the suspect testing period and we would have a poll amongst a wide pool of experienced, knowledgeable voters -- all of whom came into the suspect voting period with an open mind, all of whom carefully evaluated the performance of the suspect when playing with and against it, and all of whom engaged in a period of deep reflection afterwards and formulated an opinion with logical reasoning to support that opinion.

That ain't gonna happen, folks.

The fact is, many people have preconceived and/or ill-informed opinions on the suspects we test -- and the hoops that we make them jump through to earn the right to vote will be viewed as nothing more than that -- hoops (ie. barriers to doing what they damn well intend to do anyway). This is an inevitable consequence of making people qualify for something they want. While the qualifications may be intended to prove something or make them learn something, it won't necessarily do that. It will only prove that they know how to meet the qualification requirements. It will only teach them to meet the qualification requirements.

It's just like any other test. If you score well on a test after taking a class, it doesn't guarantee you know the underlying material or that you learned anything in that class. It just means you scored well on the test. For all we know, you may have cheated on the test. Or maybe you crammed the night before, scored well on the test and then promptly forgot everything you crammed. It may mean you knew all the material beforehand and never studied or learned anything new in the class. Who knows? We can't say for sure. The test score is an indicator and nothing more.

With our suspect test, I think it is futile to make more convoluted processes in an attempt to eliminate the fact that many people have no intention of participating in a true suspect test. Many people simply want to cast a certain vote. Anything they have to do to get there, they will do it. But, that doesn't mean that more complicated requirements will make the test any more "pure".

If people can "cram" in the suspect test by battling for a few hours at the very end -- then making checkpoints just means people will have to cram three times during the month, rather than once. But, if a person is cramming for the explicit purpose of achieving a certain rating so they can cast their predetermined vote, it doesn't matter if they do it once, or three times. Those people have their minds made up anyway. The number of checkpoints is not going to make them participate in the test any more seriously.

I think we need to put up sufficient requirements to prevent the noobs and total idiots from voting. I think the current rating system, paired with a join date provision, is sufficient to ensure that noobs and total idiots will not vote. I think that's about the best we can hope for. Any more than that is chasing an ideal that I believe is, unfortunately, unattainable.
 
Yeah, I'm inclined to agree with all that. For about a week now I've been hinting here in IS that I actually really don't care if we go the elitist route and only let PR people vote, because I really do think that a lot of our community is full of not "idiots" but stubborn people who are unwilling to grasp that Latios may NOT be uber or that Deoxys-S is. The caveat with using this knowledge to the extent we should is that it's not fair to the community to censor their votes because they aren't objective enough to be open to the possibility that a Suspect may not be as uber/ou as they thought.

It's just kind of sad that we want to try to involve the community with this and a lot of them let us down, where "the other half" would sneer at us for being elitist and "mandating" and making these unfair tiers ourselves in PR or whatever, and it seems kind of like a no-win situation. Maybe the weighting we should do is make the community's decision on a Suspect have 80% of the weight, while PR members' thoughts carry 20%, lol.

But seriously, Aeolus, I agree with you that checkpoints are probably not going to realize their intended results, partly because those who "cram" are probably seeing the suspect in its truest sense cause if it's really uber it still will be on Day 30 and if people have been able to counter it by them withouth "overcentralizing" then that's its fate. I say "probably" because there's no substitute for actually seeing what goes on throughout a Suspect's entire test, like maybe people are so tired of dealing with Suspect X that they aren't using it anymore by the end of the test and are using teams geared towards stopping it, but neither will ladder whore x, so he wouldn't have any real information about the suspect itself. However, it is probably just more trouble than it is worth to implement checkpoints given what Doug said about biased people willing to be stubborn regardless.
 
Something somewhat off-topic that I still feel I ought to say is that the specific requirements for voting are arbitrary. There is no real reason to use, say, 73 deviation instead of 76 or 72. With how close the vote is, such a change may have either left out or let in someone who was on the margin, and that small change in the voter pool could be enough to change the outcome of the ballot, which is why a small margin should not be seen as some mandate.
 
Back
Top