Fixing UU

Jumpman16 · Jul 23, 2009

Chris is me said:
Just wanted to point out that this was one of several reasons the UU test is different, and it's not really fair to cherry pick one of the times I misspeak in an IRC chat about the test when I have a long history of not debating well in a live format in order to parade around a "win" you had on me to prove a point to Hipmonlee. Pointing out one of the admittedly numerous times I spoke hastily or did not think words through in order to try and convey that I'm obviously a fool with less understanding of basic logic than you (out of context at that) is at best unfair.

Aside from the fact that citing "live format" debating as something you're not strong with is a cheap way of absolving yourself from having inconsistencies picked on, I gave you the chance to "debate" me directly by responding to your post in full. But you're also content to let Hip do your arguing for you in a non-live format. Why should I even pay attention to you if you not only are going to claim that you are prone to numerous bouts of haste live but indicate that you are ok with having someone else do your own work for you (the discussion was between me and you initially, after all)?

Yeah, except that's natural inlfuence, not artificial. There's a difference between "hey man, why don't you try out babiritar it totally takes people by surprise" and "Oh no, everyone hates Tyranitar, it's going to be a Suspect, I better use it so I can vote OU / Uber since I've never / always had a problem with it". If you really want to claim that the natural progression of a metagame is just as unhealthy for it as mandating that Pokémon be used, a lot, in it, then go ahead, but I doubt anyone would buy that.

That's my point, Chris, I specifically said "or whatever you want to call it". It's natural and occurs in both OU and UU, so it doesn't matter if anyone thinks it's unhealthy. I certainly wouldn't back that rather pessimistic argument.

cim · Jul 23, 2009

I still don't think you get my point with the artifical adjustment of a metagame. Under normal metagame trends, battlers have one goal: consistent winning. Thus they will use Pokémon that help them achieve that, and those Pokémon's usage will raise. People will talk about the specific good pokemon, contertrends develop, etc.

However, when you not only add an objective other than winning (increasing SEXP), but mandate the use of Pokémon in order to fufill said objective, that is when the metagame, naturally or not, immediately focuses on those Pokémon, and those Pokémon only. They're not allowed to fall out of favor or rise to power on merit. They're just there.

Those aren't the same thing, and it's not "natural". People will use things that are good, that's natural. But mandating their use and saying "oh they probably would have anyway" is a problem which you don't think exists or happens more than naturally (not sure how you can explain the inflated use of suspects on the suspect ladders in OU then).

Seven Deadly Sins · Jul 23, 2009

Chris brings up an extremely interesting point here. You're seeming to miss the difference between an artificially influenced metagame and a naturally influenced metagame. The tiering of Pokemon has always been based on the idea that the best battlers will use the best Pokemon to make the best teams, and this works because the sole motivation of the majority of battlers is to win. Problems arise when you create a second objective irrelevant to actual competitive battling. This has been the issue since the beginning of the Suspect Test, where one of the major complaints has been how the Suspect Test has influenced the metagame to add a second objective: the use of the suspect. Now here's where it gets interesting.

http://www.smogon.com/forums/showthread.php?p=1677628#post1677628

http://www.smogon.com/forums/showpost.php?p=1679526&postcount=90

In these two posts, you make the case that the metagame will "stabilize", and that usage of the suspect will wane as people realize that it has issues and the metagame is flooded with counters. However, the existence of SEXP actually completely neutralizes this possibility, as people realize that if the suspect's use goes down, they won't get required SEXP, and so they use it themselves in order to ensure they get necessary SEXP. In fact, I'm guessing there's a spike in end of testing usage, as people want to cram in as much SEXP in as possible. Combine this with the fact that the Suspect Ladder doesn't see nearly enough usage, and it actually completely overrides the natural progression of the metagame that you outline here. So do you think that this is still true, and that metagame procession continues to naturally occur, or do you think that metagame procession is not necessary to the testing period, and that you're willing to sacrifice the natural balancing of a metagame for whatever it is that your Suspect EXP gives you?

----------------------------------------------------------------------

Now onto another matter entirely- the way that Suspect EXP can affect UU. First, let it be said that it WILL affect UU. There was a little bandying about that the use of SEXP could be secret, and that would be the only way that SEXP would fail to have an impact on the metagame, but considering that this discussion is now in PR, the cat's kinda out of the bag on that one. As far as I'm concerned here, Chris is right for the wrong reasons. (By the way, while we're on that subject, I'd like it if you didn't use me as some kind of proof that Chris is wrong. Sure, he was guilty of trying to "have it both ways" there, but I never said that he's wrong, just that he needs to watch out for contradicting himself.) He's concerned about the impact on the stats (which I believe will be minimal), whereas I am concerned about the impact on the voting pool. There have been some rather borderline suspects in UU, notably Crobat, and considering the original overwhelming 0-20 vote to keep it UU, I'm willing to say that I'm not the only one that thought Crobat to be rather mediocre. The only reason I used it at all was for my Stall team, and it was the least useful member of my team. I ran LonelyBalance on an alt for a while, and once again, Crobat was the least valuable member of my team. On teams other than my Stall team, I simply didn't use it. Now, I'm willing to bet that I'm not the only person that thought this, and at that point, it becomes luck of the draw as to whether or not I end up with enough SEXP to vote, since if I don't use it on my team, I have to encounter it on the opposing team.

This is my issue. Suspect EXP requires the person to use it in order to guarantee decent EXP amounts. The people that are more likely to use the Suspect are people that think it is broken and want to get it banned, and thus will use it on 100% of their teams and rack up huge amounts of SEXP. Meanwhile, people like me that honestly don't think that the suspect is actually that good, and thus don't want to shoehorn a "mediocre" Pokemon into their team, have to roll the dice and hope they end up with enough SEXP to vote. The end result is that the voting pool is skewed towards BL voters, since they are guaranteed to have the necessary SEXP to vote. Meanwhile, people that don't think a Pokemon is all that great, or simply don't run teams that are conducive to using the Pokemon, have to gamble and hope that they meet the suspect enough to vote. I believe this was an issue in the Latios vote with FiveKRunner, where his SEXP was extremely low, and his record with Latios was heavily oriented towards the losing side.

I honestly don't think that this is at all fair to the people voting. If a person builds teams with the suspect, loses a whole bunch, takes it out, and then proceeds to start winning a huge amount of his matches (thus meeting the rating requirements), I would assume that person is capable of saying that the Suspect is not all that good, since in his experience, the Suspect was simply not good enough to earn a spot on his team, and teams without the suspect were more efficient. Sure he may not have much SEXP, but that doesn't change the fact that he has very real experience with the suspect that has told him that the suspect is simply "not good", and therefore not broken.

To summarize, SEXP use in UU with non-predetermined suspects slants the voter pool by allowing 100% of rating-qualified BL voters through with the very real possibility of excluding otherwise qualified UU voters because of an intangible system based on the luck of "encountering a Pokemon enough."

----------------------------------------------------------------------

Finally, I want to bring up one last point about SEXP that I have never been fully certain about. The reason that the SEXP formula is not publically available is because there are apparently some "breaking flaws" in it that allow people to disproportionately gain Suspect EXP through specific flaws in the system. However, with the amount of time that it has been around, and the amount of time spent defending this secrecy to anyone who pokes, wouldn't that time have been better spent either trying to fix whatever exploitable flaw there is, or at the very least, implementing a way to make it blindingly obvious who's intentionally gaming the system? I generally have doubts about any system that is so flawed that the only way to keep it secure is to keep it hidden, because that doesn't exactly assure me that the system is even working as intended, since obviously there are variables inside it that can skew the output data in unintended or unknown ways. Also, considering how Doug has made the point on IRC that even if the source code was available and the formula made known, it would still be extremely difficult to decipher it at all, do you really think that someone is going to go to all the trouble to decipher the system and cheat when it's likely easier to just make the damn qualifications naturally? If it's that hard to just get a basic and cursory understanding of the system, are you really scared of someone breaking it? If anything, people breaking it would give you a chance to "unbreak" it, thus resulting in a stronger formula in general. As far as I'm concerned, it's a win-win situation. On top of that, the actual current Suspect Test in OU is pretty much done so far, and since as far as it seems, use of SEXP in UU would be extremely limited, the effects of releasing the formula would simply be a better understanding of the formula in general.

Hipmonlee · Jul 23, 2009

So is it true then that if you have two battlers who have similar records and one used the suspect on all of his teams and one didnt. Then that second one is not more likely to qualify to vote?

Because, ok, in that case suspect experience is trivial. What the hell is it for? I have said many times that if the purpose is to find people who are misleading you in their submittals then I would be fine with it.

Otherwise "the lie" that "You are adding a group of players to the voter pool who werent able to make the rating/deviation requirements based on the fact that they used the suspect in all of their battles " would actually be true. I never suggested that this is all sexp does (which is what SEXP = using a pokemon implies, hence I never said it), but I am suggesting this is that this is a necessary result of rewarding people for using the suspect.

This is also very different to saying SEXP = using a pokemon. I am well aware people can gain suspect experience by battling others, but if you also gain SEXP by using the suspect then you are sure to have a hell of a lot more of it if you used the suspect in every battle than if you didnt.

If you do not gain suspect experience by using the suspect then it seems like it is quite trivial. If the suspect isnt being used, and people want to vote it uber, then they would probably need some powerful reasoning to pass your paragraph test (which I dont support but I had up until recently given up arguing about). If the suspect isnt used and people vote OU then that seems like it was probably the logical result of the test.

And my constructive solution to these problems has always been to scrap bold votes and also suspect experience. And just to ask people to vote in good faith and let anybody who can make the qualification requirements do it.

Have a nice day.

DougJustDoug · Jul 23, 2009

When did it get written in stone that people who use a suspect are going to vote to ban it? I don't recall seeing that in the rules, and it's not consistent with my estimation of voter psychology. Yet many people in this thread are throwing around that assumption like it is a proven fact.

It is not a fact. The FACT is that none of you can read minds, and you certainly can't read the future. So you have no idea how people will vote when the time comes. Players with a lot of SEXP should be voting based on the same premise as players without as much SEXP -- based their intelligent, informed reasoning of whether the pokemon is broken. Using a pokemon does not mean that you want to ban it.

If I want to play dime-store mind-reader -- I could argue the EXACT OPPOSITE. I could argue that people who play with a pokemon are more likely to vote to KEEP it in the metagame. Why? Because it's obviously one of their favorite pokemon. They use it, they are familiar with it, and they have learned to win with it. Don't you think those players are more likely to vote to keep such a thing in the metagame? It seems perfectly reasonable to me that they would. For all this talk about SEXP biasing the vote towards banning, it could be the exact opposite. SEXP could be influencing the metagame towards NOT banning.

Do I know this to be true? No. But it's certainly possible. And it's possible that you are right about suspect abusers tending to vote BL. I don't really know, and neither do you.

I do agree that suspect testing inflates usage of a pokemon, and the presence of a mysterious SEXP report being run behind closed doors only makes people use it all the more. And increased usage does distort the metagame. But the minute a pokemon is rumored to be suspect, there will be a spike in usage. If we spread a rumor that a newly-discovered Beedrill set was dominating the metagame, and that it was likely to be a suspect in the next round -- people would rush to put Beedrill on their team and see what all the fuss is about.

With suspect testing, there will be a skew towards using the pokemon. Does the existence of SEXP inflate that even further? Yes, I'm sure it does. But that doesn't mean it skews anything towards a BL conclusion. I hope it means it skews the vote towards a voter pool with a deeper understanding of the pokemon being voted upon. And, IMO, that's certainly not a bad thing.

Hipmonlee · Jul 23, 2009

When did it get written in stone that people who use a suspect are going to vote to ban it?

It doesnt have to be written in stone.

reachzero said:
2. If you find a "broken" set, abuse it as much as you can Don't be shy about using an obviously overpowered set. Don't be discouraged if your opponents flame you for using it. This is what the Suspect process is all about. Play to win.

In the topic "How to be an Intelligent and Responsible Suspect Voter" we specifically ask people to use pokemon if they think they are broken. Based onthat fact, it isnt unreasonable to assume people that people who are going to vote a pokemon uber are more likely to have used the pokemon on their teams.

And it wouldnt have to be the case that every person who uses the pokemon is going to vote it uber. It only has to be an increase in the likelihood they would do so.

I could argue the EXACT OPPOSITE. I could argue that people who play with a pokemon are more likely to vote to KEEP it in the metagame. Why? Because it's obviously one of their favorite pokemon.

These sorts of people though are voting in a manner that is contrary to the philosophy of smogon. They ideally would be disallowed to vote because of their paragraphs..

Have a nice day.

Seven Deadly Sins · Jul 23, 2009

Doug, that's a common reply to my claim, and it's exactly the opposite of what I'm implying. I'm not saying that 100% of people that used the suspect thought it was broken. I'm saying that 100% of people that thought the suspect was broken have or had sufficient SEXP, whereas less than 100% of UU voters that may have relevant experience will be able to vote because of Suspect EXP. In this way, Suspect EXP removes directly from the pool of UU voters without touching the BL voters, simply because the UU voters may find it underwhelming and not use it. I didn't think that Crobat was all that good, and many teams that I had crobat on were mediocre until I replaced Crobat. Why should I have to use it on my team so that I can say that it sucks? It's the only way to guarantee that I can have my say, since otherwise I have to gamble on the chance that my opponent has it, when the overwhelming majority of players don't.

To clarify, I am not saying that 100% of users vote BL. I'm saying that 100% of people who vote BL are users, whereas less than 100% of people who vote UU are users.

Oh, and also:

If we spread a rumor that a newly-discovered Beedrill set was dominating the metagame, and that it was likely to be a suspect in the next round -- people would rush to put Beedrill on their team and see what all the fuss is about.

This is the same problem that Jump was having (and it's another example of the difference between OU and UU). This is what we call a natural metagame trend. People hear about Beedrill and "rush to put it on their team" because they are attempting to win. This works in UU, because in UU, Suspects are determined by the community in a bold vote, and only the things with the most negative impact get enough nominations to become a suspect. Better yet, this goes to my other point: People that think that Beedrill "still sucks" and take it off their team, or who don't run teams that would use that beedrill set, because it doesn't help them win are now at the whim of the ladder to see if they do get to vote on the tiering of something they think is underwhelming and should remain UU.

cim · Jul 23, 2009

I do agree that suspect testing inflates usage of a pokemon, and the presence of a mysterious SEXP report being run behind closed doors only makes people use it all the more. And increased usage does distort the metagame. But the minute a pokemon is rumored to be suspect, there will be a spike in usage. If we spread a rumor that a newly-discovered Beedrill set was dominating the metagame, and that it was likely to be a suspect in the next round -- people would rush to put Beedrill on their team and see what all the fuss is about.

I think the point you're getting at is that Suspect use inflation is no different than the natural inflation of a "good" Pokémon in a metagame, a point I addressed in an earlier post. If you're saying a Suspect's use would be inflated even without SEXP because people want to use the Suspect, personally I doubt it's much different than the natural progression, if at all.

DougJustDoug · Jul 23, 2009

All this talk about a phantom bias in the Suspect voter pool is pure VOODOO. It has the appearance of a logical argument, but you are falling into a classic logical fallacy trap often referred to as "Causation Fallacy". In Latin, it is "cum hoc ergo propter hoc". In science and debate circles you may have heard it referred to by the phrase, "Correlation does not imply causation". This is a common specious argument, but it simply cannot be made.

A reduced form of improper causation is "Every time I have seen A, then B has occurred. When I haven't seen A, then B does not occur. Therefore, A causes B." Which is very similar to, "I have seen many people use the Suspect and vote BL. I have seen many people not use the Suspect, and they vote UU. Therefore, BL voting results will tend to happen when lots of people use the Suspect." This is Logical Fallacies 101, and it happens a lot. But, it is the very definition of a specious argument -- which is something that has the attractive appearance of good logic, but is really bad logic.

How people end up voting is not caused simply by their use of the pokemon -- whether they win or lose with it. Do not attempt to assign a causal link, because there is no provable causal link. You don't even have any hard statistical trends to back up your claims of CORRELATION. And even if you did prove correlation -- as I have clearly pointed out, that STILL has no logical relevance to the question of causation.

For any of you that don't get into the argumentation geek-speak I referred to above -- I'll state it simply:
Increasing the usage of a suspect DOES NOT cause voting results to bias towards BL.

Seven Deadly Sins · Jul 23, 2009

I am not talking about SEXP increasing the usage of the suspect. That's Chris' field. I am talking about SEXP decreasing the ability of anti-ban voters to gain voting rights. That is all.

cim · Jul 23, 2009

That would make sense if our arguments were based on correlations we noticed through empirical evidence rather than logical consequences of actions. If we said "since SEXP was implemented, everything has been voted Uber", then obviously we would be committing the fallacy. If there's something wrong with SDS, Hipmonlee's, Obi's, and my arguments against SEXP, it is certainly not that fallacy. None of us have ever cited tests as evidence in this regard.

Specifically I'm talking about SEXP hindering natural metagame development. Though I also support SDS's points, you're mixing the two arguments into one, which is of course why it would make no sense.

jrrrrrrr · Jul 24, 2009

Ok guys I hate to bring this thread back on track but here is the plan:

We start with the tier as usual.

After 6 weeks of play, we will open up a thread for UU->BL nominations:

- Anybody can nominate, as long as they fulfill two requirements:
1) They must do the usual paragraph thing highlighting the offensive, defensive and support characteristics
2) If they make a nomination, they must have a minimum relative sexp score with that pokemon in order to have the nomination accepted. We don't want some weenie who has never used a pokemon to nominate it for banning, especially if people really feel that nominations do influence the bans. I don't really think anybody can argue against something this basic, if you haven't used or seen a pokemon then you shouldn't be nominating it for a ban.

- A pokemon that receives 10% of "valid" posts in the nominations thread will then become a suspect. It is also worth noting that "no nominations" is a perfectly legitimate vote, assuming you can explain why you feel that way. If 50% of the votes are for "no nominations", there will be no UU->BL suspects (although this "no noms" thing is still on the table, we havent worked out how we want to do this yet).

After the nominations, all of the eligible voters will then be notified and will vote. To be eligible, you must meet a rating and deviation requirement, as well as a soft sexp requirement to vote. We aren't forcing people to use the suspect, but if you have never used or seen the suspect, then you won't qualify. You don't have to obsessively use the suspect, so long as you perform well with or against it. If you have been playing the tier enough to meet the rating and deviation requirements, you WILL have faced the suspect enough to meet the sexp requirements we set forth. Since sexp is relative, we will even be able to have a decently sized voting pool on something like Abomasnow, who if we included a month-long test for, it would definitely spike in usage even though it wasn't even on 8% of teams before it was banned.

After voting, the things that are voted BL will go into BL. At that point, there will be another nomination thread opened, for things that are BL to be nominated back into UU. I'm pretty sure we will be just requiring a paragraph and rating/dev minimum before people can vote, but I need to talk to jabba/reach about it first. Those things that are nominated for UU will then be put into UU, and can then be voted back into BL at the end of the next 6-week period if people feel that they are still broken. Again, the vote of "no nominations" comes into play here. After the votes, everyone's deviations will be reset

If something is used enough to become OU, it is immediately banned from UU.

If something was OU but falls out due to low usage, it will be placed in the tier it was in before it became OU. For example, if Raikou becomes OU on the next tier list and then falls out on the one after that, it will be placed back into BL. However, if Umbreon falls out of OU on the next tier list, it will be placed in UU.

If something that has never been UU or BL under the "new UU" system falls out of OU, it is automatically placed in UU. For example, Dugtrio and Donphan are currently UU because of the most recent tier list changes. BL is now only for things that are explicitly banned from UU.

The most convenient thing about all of this is that it is much shorter than the current test and prevents the creation of an "artificial metagame"

That is just the basic outline of what jabba, reach and I have discussed to this point. I need to add more details and tweaks after talking to them a bit more, but this is the general idea of the process. It is fairly similar to the current process, but with fixes to the complaints that I outlined in the OP of this thread. If there are any huge issues that people have, I don't mind posting....but I just wanted to get something out in the open so that people know we are working on this. I still need to go over the fine print (for lack of a better term) with jabba and reach, I just figured that making this post would help get this thread back in the right direction. Most of the people arguing about sexp here are not even talking about something relevant to the UU testing process.

Also, are we being expected to read the paragraphs for the current round of voting? It's been like a week since the deadline and there is still no voting thread.......

Seven Deadly Sins · Jul 24, 2009

Just going to say that I back this plan 100%.

Ironically, all my arguments AGAINST the use of SEXP in UU make me FOR the use of SEXP in Bold Voting. People that want to nominate someone for BL should have either a shitload of SEXP from using it and winning, or a disproportionate number of losses against opponents using the Suspect.

As long as the SEXP qualifications for voting aren't anywhere near strict, I'm all for this plan.

cim · Jul 24, 2009

Other than the principled objection to the use of anything based on secrecy, I can't see enough wrong with the plan above to have a giant objection about it. Gotta compromise, you know. That is, provided SEXP use in the voting phase is kept to a very soft level. A point for clarification, though.

You don't have to obsessively use the suspect, so long as you perform well with or against it. If you have been playing the tier enough to meet the rating and deviation requirements, you WILL have faced the suspect enough to meet the sexp requirements we set forth.

What if you use ti and you don't play well with it, but do just fine against it? Will you be in the clear?

Also with this plan, if it's, say, 4 weeks, 1 week nom, 1 week vote, you could fit two tests in one UU cycle.

Hipmonlee · Jul 24, 2009

Sorry to go back to being off topic, but you seem to have misunderstood us doug.

"I have seen many people use the Suspect and vote BL. I have seen many people not use the Suspect, and they vote UU. Therefore, BL voting results will tend to happen when lots of people use the Suspect."
Nobody has made this argument.

How people end up voting is not caused simply by their use of the pokemon -- whether they win or lose with it. Do not attempt to assign a causal link, because there is no provable causal link. You don't even have any hard statistical trends to back up your claims of CORRELATION. And even if you did prove correlation -- as I have clearly pointed out, that STILL has no logical relevance to the question of causation.

There is a very easily proveable causal link though. We specifically ask people to use things they believe are broken. Then you select voters based on what they have used.

Their beliefs affect what they use, and their use affects their likelihood to vote. That is a causal link.

If believing something is broken implies a person will use that thing (which is exactly what people are asked to do).

And using the suspect implies they are more likely to qualify to vote (which I am pretty sure we have established that sexp does this). Although, I will admit if everyone had the same sexp score, this would no longer be the case, but then suspect experience would be trivial..

Then believing the suspect is broken implies you are more likely to qualify to vote. By modus ponens.

We cant assume everyone will vote as we have asked them to, but anyone who doesnt is just noise. The problem we have is that among the people who do we have a bias towards pokemon being removed from whatever tier is being tested.

Have a nice day.

Seven Deadly Sins · Jul 24, 2009

Thank you, Hipmonlee. I couldn't have said it better myself.

jrrrrrrr · Jul 24, 2009

What Hipmonlee posted is the exact reason why we decided to scrap the month of isolated testing. I think his argument is against the way the test is designed, not against sexp, though. The way we are incorporating sexp is very different than the OU tests, so I don't think Hip's post really changes anything regarding the UU test although it is certainly an interesting point.

Can we make another thread if we are going to keep this sexp talk going? Although I think it is a very interesting debate, I don't want the point of this thread to get lost.

X-Act · Jul 24, 2009

I like the new outline overall. I have a question, though: when you say 'we start with the tier as usual', you mean we start with the tier as soon as it is updated every 3 months, right?

jrrrrrrr · Jul 24, 2009

Yes. We will be starting with the current UU tier list, including Duggy and Donny, excluding Umby, and including/excluding the Honch, Shayshay and Crobey depending on how their votes end up

Jumpman16 · Jul 24, 2009

Chris is me said:
Now that's a stretch and you know it. You can't be actually insulted by that or have gone "wow Chris, what nerve he has to disrespect us to assume we wouldn't do that" when the post was worded more or less to ask of the UU judges how they can have an opinion on using the formula rather than in any way calling you guys inept, incompetent, or whatever. You're just trying to cop out.
I really don't want to be a part of a community or discussion or whatever where it's okay for staff to berate and insult someone because they're "wrong". It's that simple.

If your actual intent was to be helpful, you would have phrased that post in the form of a suggestion or even a question rather than an assumption. You are hard-wired to find fault with SEXP and it is obvious from your posts. If you are insulted by me calling you out on shitty assumptions designed to undermine Doug and my efforts with the SEXP formula I don't know what to tell you.

Seven Deadly Sins said:
Chris brings up an extremely interesting point here. You're seeming to miss the difference between an artificially influenced metagame and a naturally influenced metagame. The tiering of Pokemon has always been based on the idea that the best battlers will use the best Pokemon to make the best teams, and this works because the sole motivation of the majority of battlers is to win. Problems arise when you create a second objective irrelevant to actual competitive battling. This has been the issue since the beginning of the Suspect Test, where one of the major complaints has been how the Suspect Test has influenced the metagame to add a second objective: the use of the suspect. Now here's where it gets interesting.

At once this is untrue, because the Tiering Contributors have sought to change the general motivation of players from winning to gaining experience with the Suspects. This is why we don't just use Rating/Deviation marks to determine who will vote after a month on the Suspect Ladder. It is a subtle but crucial difference I will point out before even reading the rest of your post.

http://www.smogon.com/forums/showthread.php?p=1677628#post1677628

Click to expand...

http://www.smogon.com/forums/showpost.php?p=1679526&postcount=90

In these two posts, you make the case that the metagame will "stabilize", and that usage of the suspect will wane as people realize that it has issues and the metagame is flooded with counters. However, the existence of SEXP actually completely neutralizes this possibility, as people realize that if the suspect's use goes down, they won't get required SEXP, and so they use it themselves in order to ensure they get necessary SEXP. In fact, I'm guessing there's a spike in end of testing usage, as people want to cram in as much SEXP in as possible. Combine this with the fact that the Suspect Ladder doesn't see nearly enough usage, and it actually completely overrides the natural progression of the metagame that you outline here. So do you think that this is still true, and that metagame procession continues to naturally occur, or do you think that metagame procession is not necessary to the testing period, and that you're willing to sacrifice the natural balancing of a metagame for whatever it is that your Suspect EXP gives you?

The latter. First, I'm sure you now realize that when this post was written in December 2008, the rest of you didn't know that I had SEXP waiting in the wings, which means that as far as the process then was concerned I was being truthful. From the first post you linked to:

"either way, you will have won and won consistently, which is pretty much the only objective barometer in the test thus far. (i have an idea for another objective barometer and have shared it with the other main suspect test facilitators, but that's for another post.)"

I realized then that we were relying solely on that...winning. Aeolus and I knew this was a problem, and this is precisely the reason the idea of SEXP struck me when we spoke a month earlier. (If you haven't already, go find Doug's thread in IS now that you can see what I'm talking about.) So second, that's simply not how the Suspect Test works now. The focus is on gaining experience with the Suspects, as I stated at the beginning of this post and as I have been saying for months.

Now onto another matter entirely- the way that Suspect EXP can affect UU. First, let it be said that it WILL affect UU. There was a little bandying about that the use of SEXP could be secret, and that would be the only way that SEXP would fail to have an impact on the metagame, but considering that this discussion is now in PR, the cat's kinda out of the bag on that one. As far as I'm concerned here, Chris is right for the wrong reasons. (By the way, while we're on that subject, I'd like it if you didn't use me as some kind of proof that Chris is wrong. Sure, he was guilty of trying to "have it both ways" there, but I never said that he's wrong, just that he needs to watch out for contradicting himself.) He's concerned about the impact on the stats (which I believe will be minimal), whereas I am concerned about the impact on the voting pool. There have been some rather borderline suspects in UU, notably Crobat, and considering the original overwhelming 0-20 vote to keep it UU, I'm willing to say that I'm not the only one that thought Crobat to be rather mediocre. The only reason I used it at all was for my Stall team, and it was the least useful member of my team. I ran LonelyBalance on an alt for a while, and once again, Crobat was the least valuable member of my team. On teams other than my Stall team, I simply didn't use it. Now, I'm willing to bet that I'm not the only person that thought this, and at that point, it becomes luck of the draw as to whether or not I end up with enough SEXP to vote, since if I don't use it on my team, I have to encounter it on the opposing team.

That's fine. But shouldn't your issue with this, though, have much, much more to do with the nominating process than with how SEXP will or will not be used? They are completely separate—SEXP of course had nothing to do with how the "Six Deadly Suspects" were decided on. When I made the Order or Operations thread last June, I made sure that we were all in concord on which pokemon should actually be considered Suspects, and was flexible enough to remove those that other PR members, as the months went on, decided were not Suspects for whatever reason: Mew, Darkrai, and Latios with Soul Dew. This also includes Salamence and Scizor, both of which were whispered should be possibly made Suspects after the Platinum upgrades, and both of which I held off on.

And you can't really blame me for pointing out that you also called Chris on his contradiction. I can call Chris wrong as much as I want to, I probably haven't for the last time. It has a much greater impact when it comes from someone else.

This is my issue. Suspect EXP requires the person to use it in order to guarantee decent EXP amounts.

No matter how any of you try to phrase this it will still be wrong. Limitless and Stathakis are literally perfect examples of this.

The people that are more likely to use the Suspect are people that think it is broken and want to get it banned, and thus will use it on 100% of their teams and rack up huge amounts of SEXP. Meanwhile, people like me that honestly don't think that the suspect is actually that good, and thus don't want to shoehorn a "mediocre" Pokemon into their team, have to roll the dice and hope they end up with enough SEXP to vote. The end result is that the voting pool is skewed towards BL voters, since they are guaranteed to have the necessary SEXP to vote. Meanwhile, people that don't think a Pokemon is all that great, or simply don't run teams that are conducive to using the Pokemon, have to gamble and hope that they meet the suspect enough to vote. I believe this was an issue in the Latios vote with FiveKRunner, where his SEXP was extremely low, and his record with Latios was heavily oriented towards the losing side.

He didn't qualify for voting because of his paragraph, as I made clear in the Latios Voters thread when he insisted that his submission was better that everyone else's. When I looked at his SEXP, I was actually surprised it was that low even though Aeolus and I had made it pretty obvious that using a Suspect was a good way to fulfill the Hidden Requirement. Don't you think 5KR would have at least been smart enough to use Latios a bit more (he used it in less than a quarter of his battles over two accounts)?

More importantly though, your entire post has kind of had this hint to it as have others before yours, and I finally see why, so I will repeat myself. You say "enough SEXP to vote". Pardon? Tell me one voter I have ever forbidden to vote because they didn't have enough SEXP. You guys are forgetting that paragraphs submissions are the actual requirement along with Rating/Deviation thresholds, and forgetting even more that SEXP was devised to increase the pool of knowledgeable voters, not to keep voters out. Hip can call SEXP therefore trivial all he wants since he thinks rating/dev is good enough but the rest of you have not yet posted a reason for this, and that seems to be at the crux of this entire debate.

I honestly don't think that this is at all fair to the people voting. If a person builds teams with the suspect, loses a whole bunch, takes it out, and then proceeds to start winning a huge amount of his matches (thus meeting the rating requirements), I would assume that person is capable of saying that the Suspect is not all that good, since in his experience, the Suspect was simply not good enough to earn a spot on his team, and teams without the suspect were more efficient. Sure he may not have much SEXP, but that doesn't change the fact that he has very real experience with the suspect that has told him that the suspect is simply "not good", and therefore not broken.

Bingo. And if his voter paragraph is decent, he'll be to vote. I approved many of these cases in the tests where we've used SEXP.

To summarize, SEXP use in UU with non-predetermined suspects slants the voter pool by allowing 100% of rating-qualified BL voters through with the very real possibility of excluding otherwise qualified UU voters because of an intangible system based on the luck of "encountering a Pokemon enough."

Wrong. "100%" is wrong, "requires the person to use" is wrong, and "excluding otherwise qualified UU voters" is wrong because I've never rejected a submission because of SEXP nor will I instruct the UU heads to. I don't know why you seem to assume that SEXP has completely supplanted the need for paragraph submissions, but your summation here clearly indicates that you're working under that assumption.

Finally, I want to bring up one last point about SEXP that I have never been fully certain about. The reason that the SEXP formula is not publically available is because there are apparently some "breaking flaws" in it that allow people to disproportionately gain Suspect EXP through specific flaws in the system. However, with the amount of time that it has been around, and the amount of time spent defending this secrecy to anyone who pokes, wouldn't that time have been better spent either trying to fix whatever exploitable flaw there is, or at the very least, implementing a way to make it blindingly obvious who's intentionally gaming the system? I generally have doubts about any system that is so flawed that the only way to keep it secure is to keep it hidden, because that doesn't exactly assure me that the system is even working as intended, since obviously there are variables inside it that can skew the output data in unintended or unknown ways.

The flaw is a social flaw, as Doug and I have stated on IRC as well. Quantitatively, it does exactly what we want it to. And trust me, it's a lot easier to type a some words on the internet than it was to come up with the formula, and address the flaws before and with Doug prior to using it for the Suspect tests.

Also, considering how Doug has made the point on IRC that even if the source code was available and the formula made known, it would still be extremely difficult to decipher it at all, do you really think that someone is going to go to all the trouble to decipher the system and cheat when it's likely easier to just make the damn qualifications naturally? If it's that hard to just get a basic and cursory understanding of the system, are you really scared of someone breaking it? If anything, people breaking it would give you a chance to "unbreak" it, thus resulting in a stronger formula in general. As far as I'm concerned, it's a win-win situation. On top of that, the actual current Suspect Test in OU is pretty much done so far, and since as far as it seems, use of SEXP in UU would be extremely limited, the effects of releasing the formula would simply be a better understanding of the formula in general.

You'd be surprised to see just the kind of trouble people have gone through to cheat on the server. I'm sure I don't even know the half of it, Doug can fill you in. There have been cheaters we've caught because of the formula, as stated above. And I'm pretty sure we have future generations where the formula will be applicable.

Caelum · Jul 24, 2009

jrrrrrrr said:
After the nominations, all of the eligible voters will then be notified and will vote. To be eligible, you must meet a rating and deviation requirement, as well as a soft sexp requirement to vote. We aren't forcing people to use the suspect, but if you have never used or seen the suspect, then you won't qualify. You don't have to obsessively use the suspect, so long as you perform well with or against it. If you have been playing the tier enough to meet the rating and deviation requirements, you WILL have faced the suspect enough to meet the sexp requirements we set forth. Since sexp is relative, we will even be able to have a decently sized voting pool on something like Abomasnow, who if we included a month-long test for, it would definitely spike in usage even though it wasn't even on 8% of teams before it was banned.

I have no idea what that means. If you are voting right after suspects, why even bother with a vote and just combine the vote and nominations in the thread since they should be identical? Seems like a waste of time.

After voting, the things that are voted BL will go into BL. At that point, there will be another nomination thread opened, for things that are BL to be nominated back into UU. I'm pretty sure we will be just requiring a paragraph and rating/dev minimum before people can vote, but I need to talk to jabba/reach about it first. Those things that are nominated for UU will then be put into UU, and can then be voted back into BL at the end of the next 6-week period if people feel that they are still broken. Again, the vote of "no nominations" comes into play here. After the votes, everyone's deviations will be reset

So, right after something is voted to BL; you are going to open up a thread to re-nominate them back into UU? That's what your paragraph implies. That's completely pointless, why bother voting for BL if "at that point" a thread is going to be opened to renominate them back to UU?

6 weeks is entirely unnecessary anyway. You've already played with it for weeks, it shouldn't take you 6 weeks to figure it out again.

Also, I guarantee you, this method will create a hectic tier with no finality. If you allow that all the time and the metagame hasn't even changed significantly you are telling the initial voters that their vote is hugely irrelevant and any mildly controversial Pokemon will keep bouncing back and forth due to the nature of constantly re-testing it because of who is nominating that time will solely determine it. There isn't always going to be an agreement on a Pokemon, that's too bad; but there needs to be some finality and that system doesn't allow it under any circumstance.

The most convenient thing about all of this is that it is much shorter than the current test and prevents the creation of an "artificial metagame"

I've explained countless times why it was necessary to create an artificial metagame due to the nature of combining a tier into one, but I conveniently noted my elaborated explanations saw no rebuttal; but "whatever", I don't feel like arguing about it anyway since I said 5 times it was already being eliminated for different reasons anyway.

Half of this didn't make any sense because the language was confusing for illustrating the timeline, could you re-post this some time with a properly fleshed out timeline since I have no idea when have of the things are occuring.

Also, are we being expected to read the paragraphs for the current round of voting? It's been like a week since the deadline and there is still no voting thread......

Read the last post on the last page. It'll be done once it's set-up, hopefully tonight; but at worst tomorrow morning.

cim · Jul 24, 2009

If you are insulted by me calling you out on shitty assumptions designed to undermine Doug and my efforts with the SEXP formula I don't know what to tell you.

First, it actually was phrased as a question so I don't know where you're getting that from.

I'm not exactly sure why your first conclusion when dealing with a user known for lacking any tact whatsoever actually makes a series of posts in an attempt to help that specifically avoids targeting, implicitly or explicitly, any person or group is that he's out to get you all. Please tell me how this assumption that you wouldn't tell people the details of the SEXP formula after a good 8 or 10 hours of chat of you saying exactly that, then a quick check with the UU heads hours before that they do not know the formula is somehow an assumption so utterly stupid that only an idiot would even think you would "have the nerve" to actually do what I assumed? I just don't get it.

And why are you still asserting that I intended to offend and undermine your efforts when not only did I not, but I've explicitly said I haven't? At this point it just seems you're trying to find something to be offended by in my words so you can maintain high ground.

----

Something to note:

Jumpman16 said:
I've never rejected a submission because of SEXP

Jumpman16 said:
so this is part of the reason i didnt accept your submission. the other part is your very poor Suspect EXP.

Jumpman16 said:
oh, for our more observant readers, it should be pretty obvious now that usage of the suspect has something to do with how much stock i or aeolus are willing to put into your paragraphs. that should be common sense, though.

----

One more thing.

You are hard-wired to find fault with SEXP and it is obvious from your posts.

I do have a WIDE variety of problems with SEXP, from the very concept of it to how it is being used, yes, but I am not determined to come up with whatever reason I can for it being gone. Under that logic one could say you're coming up for whatever reason you can to defend "your" Suspect process and changing mindsets to fight for it.

Seven Deadly Sins · Jul 24, 2009

Jump, there's only one real problem I have with your explanation of the whole "sexp in OU" thing. You mentioned that you wanted to shift things away from "you just have to win" by using SEXP as a more "objective" gauge of how much people know about the suspects. Yet by all the emphasis you've made, it's pretty clear that the best way to get SEXP is to win. So you've shifted it from "win a lot" to "win a lot in battles containing the suspect", which has effectively no real difference.

Also, regarding FiveKRunner:

Jumpman16 said:
so this is part of the reason i didnt accept your submission. the other part is your very poor Suspect EXP. you can go ahead and assume that the Hidden Requirement is some ""special" criteria that may have been applied to me and no one else", or you can stop wasting your time and mine, and either take the Suspect Test Process seriously or not even bother voting in the future.

Not only did your *entire* evaluation of his paragraphs consist of digs at his Suspect EXP, but you come out and say that Suspect EXP or lack thereof is a specific reason that you barred his submission. That's all I'm going to say on the matter, but I'm willing to bet that there are people who didn't post their paragraphs and got rejected through SEXP one way or another.

In fact, I'd like to know why we abandoned the practice from the Deoxys-S and Wobbuffet old tests where the reasoning behind accepting/rejecting votes was posted. It gave a lot more insight into how the whole thing worked, and would make for generally better submissions IMO.

-----------------------------------------

The whole UU thing seems kind of irrelevant now that j7r has posted his plan and I back it, but I still have an issue. You mention that you are "against SEXP being used to exclude any votes", when actually, I'm in favor of it for the Bold Nominations. There are a lot of bold votes for shit like Ambipom that are basically absord, and I'm willing to bet that SEXP can help weed out a lot of those bold votes before it even matters. So yes, I support SEXP being used to eliminate nominees, but ONLY for Bold Bold Nomination, and only because Bold Nomination is done purely on the merit of one's experience with the potential suspect.

-----------------------------------------

Finally, I'd like to ask something. You say that you've "caught cheaters" with the formula, which makes me wonder how revealing it would make it any less effective. If you know how the formula can be gamed and what its flaws are, then why not reveal it? It's not like it's some kind of magic where it only works when nobody knows what it is. None of that post addresses why the formula isn't public, which is my only concern.

Jumpman16 · Jul 24, 2009

Chris is me said:
First, it actually was phrased as a question so I don't know where you're getting that from.

Ugh, I can't believe you're actually going to split that hair with me. You stated none of the heads know what the formula is, which is based on an assumption. The only thing you "questioned" was "how are they supposed to..." which, given your assumption of the absolute, is a rhetorical question, and therefore not a question you need an answer to. Just admit that you did a poor job of conveying helpfulness and we can move on.

I'm not exactly sure why your first conclusion when dealing with a user known for lacking any tact whatsoever actually makes a series of posts in an attempt to help that specifically avoids targeting, implicitly or explicitly, any person or group is that he's out to get you all. Please tell me how this assumption that you wouldn't tell people the details of the SEXP formula after a good 8 or 10 hours of chat of you saying exactly that, then a quick check with the UU heads hours before that they do not know the formula is somehow an assumption so utterly stupid that only an idiot would even think you would "have the nerve" to actually do what I assumed? I just don't get it.

Huh? The UU heads stopped being "people" as soon as Doug and I agreed, at the same time that chat was taking place, on who they should be, at which point we shared with them the formula. I don't know when or if you asked them and I don't really care, it doesn't matter.

And it's exactly your admitted lack of tact that would lead me to this conclusion even if I hadn't told them. I know your first posts in this thread were helpful, but this only further contrasts with your insistence upon fault-finding the second I post about SEXP.

And why are you still asserting that I intended to offend and undermine your efforts when not only did I not, but I've explicitly said I haven't? At this point it just seems you're trying to find something to be offended by in my words so you can maintain high ground.

I'm sure you never intend to offend anyone, Chris. Regardless, I'm tired of your bad assumptions and your unending defenses of them, as anyone in my position would be.

Something to note:

What's your point? How could it not be more clear from this that I first judged his paragraph, which I found very shaky, then referenced his SEXP which confirmed my suspicions? This is 100% consistent with how I have stated I use SEXP. And "usage of the Suspect" is different from "using the Suspect" whether or not you want to believe I'm not being semantic.

I do have a WIDE variety of problems with SEXP, from the very concept of it to how it is being used, yes, but I am not determined to come up with whatever reason I can for it being gone. Under that logic one could say you're coming up for whatever reason you can to defend "your" Suspect process and changing mindsets to fight for it.

Yes, I'll do anything to defend it because I know it's right. But I already explained what would prompt anyone to think I "changed mindsets", so I won't even dignify your implication.

jrrrrrrr · Jul 24, 2009

Caelum said:
I have no idea what that means. If you are voting right after suspects, why even bother with a vote and just combine the vote and nominations in the thread since they should be identical? Seems like a waste of time.

Because if we combine the vote and the nominations, we will have no idea who the eligible voters are. We have to figure out what the suspects are before people vote on them. If we combine votes and noms, we'll have a situation where not everyone who is qualified to vote on a suspect knows that they can.

Caelum said:
So, right after something is voted to BL; you are going to open up a thread to re-nominate them back into UU? That's what your paragraph implies. That's completely pointless, why bother voting for BL if "at that point" a thread is going to be opened to renominate them back to UU?

No. The pokemon that get voted BL will not be put up immediately for a vote into UU. They will have to wait until at least another test before they can be nominated for UU again. That is a legitimate concern that I admittedly should have clarified, but I assure you that we thought of that!

Caelum said:
Also, I guarantee you, this method will create a hectic tier with no finality. If you allow that all the time and the metagame hasn't even changed significantly you are telling the initial voters that their vote is hugely irrelevant and any mildly controversial Pokemon will keep bouncing back and forth due to the nature of constantly re-testing it because of who is nominating that time will solely determine it. There isn't always going to be an agreement on a Pokemon, that's too bad; but there needs to be some finality and that system doesn't allow it under any circumstance.

Although I seriously doubt that the problem would be as big and hectic as the tone of this paragraph would imply, I understand what you are saying. I just think you are blowing this small facet of the system way out of proportion. I would much rather have a fluid UU tier than a ban list full of things that aren't broken. If we continue with the current system, we will be exactly where we started: with a BL tier full of things that aren't broken in UU.

At least the new system addresses the controversy of something like Raikou, who was voted BL by a vote of 10 BL - 9 UU - 1 Abstain, and that was BEFORE Chansey, Registeel and Dugtrio became popular in UU. If it takes a couple of votes over the course of months to settle a tier placement, that is a good thing imo. It will focus people on that suspect and make the opinions less divided as people either find ways to abuse it or get accustomed to playing it.

The current system does have finality, but only in one direction. Crobat was unanimously voted UU (20-0-0) last time, but it was immediately put back up for nomination. How is allowing Crobat to be revoted on after the landslide decision more fair than not allowing Raikou to be revoted on, after arguably the most controversial vote in suspect test history?

Caelum said:
I've explained countless times why it was necessary to create an artificial metagame due to the nature of combining a tier into one, but I conveniently noted my elaborated explanations saw no rebuttal; but "whatever", I don't feel like arguing about it anyway since I said 5 times it was already being eliminated for different reasons anyway.

The only times I've ever seen you talk about an artificial metagame is when you describe the alternatives as "hectic" without any explanation as to why. The artificial metagame tells us nothing about the suspects themselves, and since we are attempting to formulate opinions on the suspects and not the metagame it makes no sense to take them out. To test Garchomp in OU, we put it in OU. To test Honchkrow in UU, we took it out of UU. One of those two tests has it backwards....

Caelum said:
Half of this didn't make any sense because the language was confusing for illustrating the timeline, could you re-post this some time with a properly fleshed out timeline since I have no idea when have of the things are occuring.

As soon as we get a starting point for the new system, this will be done in full.

Fixing UU

np: Michael Jackson - "Mon in the Mirror" (DW mix)

happiness is such hard work

~hallelujah~

Have a nice day

Knows the great enthusiasms

Have a nice day

~hallelujah~

happiness is such hard work

Knows the great enthusiasms

~hallelujah~

happiness is such hard work

wubwubwub

~hallelujah~

happiness is such hard work

Have a nice day

~hallelujah~

wubwubwub

np: Biffy Clyro - Shock Shock

wubwubwub

np: Michael Jackson - "Mon in the Mirror" (DW mix)

qibz official stalker

happiness is such hard work

~hallelujah~

np: Michael Jackson - "Mon in the Mirror" (DW mix)

wubwubwub