Fixing UU

JabbaTheGriffin

Stormblessed
is a Top Tutor Alumnusis a Senior Staff Member Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Yeah me and jrrrr were discussing the whole Suspect EXP thing. What I was thinking of was instead of the month where the suspects are removed as it is currently, we have a month of the exact same metagame, however the obvious difference now being that the suspects are defined. This allows us to not only allow people to really pay attention to the suspects, but to also add sexp to the voting criteria.
 

cim

happiness is such hard work
is a Contributor Alumnusis a Smogon Media Contributor Alumnus
Suspect Experience is something I really don't want to see applied to UU, because unlike OU testing, UU testing is based on the metagame rather than the Pokémon. Adding Suspect Exp. would artificially modify the metagame and thus defeat the entire purpose of the UU test. It works alright in OU, but the UU test is structured around a fundamentally different set of premises. I really can't stress enough how much I would oppose the Suspect Exp. system in UU in particular.

No matter what tier you're talking about, there is now no longer any reason to rely solely on Rating and Deviation to determine a Suspect's tiering.
Suspect Experience is far from perfect. Despite that, the point of the UU test is not to determine the Uberness of a Pokémon alone, as whether or not a Pokémon breaks a metagame is definitely not an intristic property of the Pokémon itself in UU (where in OU it arguably is). Using the Suspect Experience system assumes a few things, mainly that consistent use of a Pokémon is necessary to evaluate its tiering and that the metagame the Pokémon is in doesn't matter (as any Suspect Experience based system utterly destroys the possibility of testing within a real metagame). Neither of these apply to UU, in my opinion.

On a Suspect Ladder, you at least still have the "true metagame" to look at if you want.

Then again, I don't like Suspect Experience in general, so perhaps I have a bias. Regardless, I think the potential advantages of such a system don't apply to UU.
 

JabbaTheGriffin

Stormblessed
is a Top Tutor Alumnusis a Senior Staff Member Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
after talking to chrisisme in #is i realized that what he's saying is right. sexp works fine on a suspect ladder. but on the uu ladder it doesn't at all. now if uu were to have a suspect ladder it'd work there, but that's not a realistic possibility at all.
 

cim

happiness is such hard work
is a Contributor Alumnusis a Smogon Media Contributor Alumnus
Here's a proposal being heavily debated among other things in IS right now: Suspect Experience is not used as a hard requirement, but rather a tool for analyzing qualification to vote. The play phase goes on then suspects are nominated, then SEXP is run on the stats BEFORE the nomination, then those people vote if the Suspect is BL or not. No testing period.

This would effectively eliminate any metagame skewing, as people don't know the Suspects. As long as voting isn't seen as an honor and prize to strive for (in which case people will be offended if they only qualified for suspect x), then this system sounds like a good compromise that maximizes the advantages of the Sexp system.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
Any use of suspect exp (especially in that manner) will bias the vote toward banned. People who use the Pokemon a little and decide it sucks will stop using it, thus not getting enough exp (and this time it's worse, because you don't even know what you're supposed to be getting exp with). Those who think it's really good will use it a lot, thus getting more suspect exp. This will tilt the pool of voters toward those who think it's stronger no matter what you do.

This also belongs in Policy Review. There is no reason to have policy discussion private.
 

cim

happiness is such hard work
is a Contributor Alumnusis a Smogon Media Contributor Alumnus
Thank you Obi.

My main fear with use of Sexp is illustrated above by Obi. The fact that it's not a hard and fast requirement is what comforted my fears somewhat (that and its use is kinda inevitable). This way, someone can read a UU vote stating that "I used it and it sucked" in an eloquent form and still count it. I figured that the main votes influenced by Sexp are those calling for its ban that never used the Pokémon themselves.

Let me make clear though I would prefer as little SEXP influence as possible, I'm trying to compromise but Obi is right.
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
Even if it's not a hard-and-fast requirement, it will still bias the voter pool.
 

DougJustDoug

Knows the great enthusiasms
is a Site Content Manageris a Top Artistis a Programmeris a Forum Moderatoris a Top CAP Contributoris a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis an Administrator Alumnus
All the people leading the UU Suspect testing process have other commitments that are preventing them from devoting much time and effort to this project right now. Therefore, we are going to change the leadership team for the UU testing project.

I am currently discussing potential candidates with other admins. I will make an announcement here, when we decide the makeup of the new leadership team.

For anyone out there that is interested in taking on this responsibility -- we are looking for existing badgeholders (preferably forum staff) with a good working knowledge of the UU tier, who are capable of being open-minded, level-headed leaders of a major Smogon community project. A regular presence on IRC and/or the Shoddy server is strongly preferred. If anyone qualified wants to discuss this with me further -- you know where to find me.
 

Caelum

qibz official stalker
is a Site Content Manager Alumnusis a Community Leader Alumnusis a Smogon Discord Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Well, apparently I guess I'm no longer on this. I don't know why considering I was only gone for two days because my brother was in the hospital, but whatever. I suppose I'm not needed anymore so I don't why I'm trying to explain this but "whatever". I actually originally came back to talk on IRC about UU, but since I'm off this it doesn't matter now.

I'm not responding to this individually since the criticisms are clearly of philosophical difference and I find it amusing the inability of certain parties to accept that and instead babbling absolutists rhetoric that leads no where of any merit.

If you think the idea of removing suspects for a month is my idea, you are quite mistaken. I was using a Popperian ideal of correcting a problem which can be liken to a suitably similar reference class problem in the design. I find the inability of the opposing side to suggest a solution to this obvious problem that has been established countless of times in similar experiments (obviously not with Pokemon) quite astonishing. It was a necessary initial factor as a result of group interaction as a safe-guard. It wasn't even remotely designed to learn about the suspects in the slightest, it was purely an experimental safeguard. So far you're objections to it fail to address how when you are banning multiple Pokemon you correct the issues I enlisted above without removing them for a period. If you just plan to ignore that philosophical flaw, than so be it; but I'm not going to engage in a project where the flaw has been clearly established throughout the critical rationalist thought movement. Sorry that you think the voters lack a memory, if you desire to think they should be memory-fresh at the cost of established methodological flaws - be my guest.

I also have no idea why anyone is even remotely concerned about Dugtrio and Donphan. I could care less if they are dropped down right now, it was merely a convenience thing for starting fresh.

The assumption I have no frame work for re-testing is also amusing. I'm don't announce things before they are finished and I've consulted with RB and GS; but to assume there is no framework is quite presumptuous. If you wanted to know what I have done already, consider asking me. You never bothered, you assumed there was nothing.

In short, the criticisms are short-sighted in that the "corrections" fail to address established methodological flaws. I never claimed any method was "best", but to ignore clear methodology errors in an attempts to make it "more direct" is rather foolish; but if people want to go down that route - so be it.
 

reachzero

the pastor of disaster
is a Senior Staff Member Alumnusis a Top CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Battle Simulator Moderator Alumnus
Oh dear, where to begin with this?

I'm not responding to this individually, since the criticisms are clearly of philosophical difference and I find the inability of certain parties to accept that while instead babbling absolutist rhetoric that leads no where of any merit amusing.
Precisely what philosophical difference are we speaking of, here? The difference between knowing as much as possible about the actual power of the Suspects relative to the metagame and....? Your philosophy of testing is not very clear, to say the least. Also, your disrespect for respected members of our community is disconcerting, to say the least.

If you think the idea of removing suspects for a month is my idea, you are quite mistaken.
Whose idea was it, then? I doubt Popper played UU much.

I was using a Popperian ideal of correcting a problem, which can be likened to a suitably similar reference class problem in the design. I find the inability of the opposing side to suggest a solution to this obvious problem that has been established a countless number of times in similar experiments (obviously not with Pokemon) quite astonishing. It was a necessary initial factor as a result of group interaction as a safe-guard.
So what is this "obvious problem that has been established a countless number of times in similar experiments"? You left that rather vague, to say the least. Also, you possess the unfortunate philosophical trait of attempting to obfuscate the meaning of your argumentation by using abstruse language (and bad grammar) in an obvious attempt to make yourself more difficult to criticize.

It wasn't even remotely designed to learn about the suspects in the slightest, it was purely an experimental safeguard.
The whole point of a Suspect test is to evaluate the power of the Suspects relative to the metagame, then vote accordingly. If we don't learn anything useful about the Suspects, who cares?

So far your objections to it fail to address how when you are banning multiple Pokemon you ought to correct the issues I enlisted above without removing them for a period.
First of all, what is the referent for this "it" we object to? Secondly, kindly please repeat and clarify the "issues you enlisted above", since finding them and identifying them at all is rather a chore.

If you just plan to ignore that philosophical flaw, than so be it; but I'm not going to engage in a project where the flaw has been clearly established throughout the critical rationalist thought movement. Sorry that you think the voters lack a memory, if you desire to think they should be memory-fresh at the cost of established methodological flaws
Please enlighten us as to this philosophical/methodological flaw, particularly as it relates to assessing the power of specific Pokemon relative to the metagame.

I am glad you at least support Donphan/Dugtrio entering the UU metagame, where they belong, ASAP.
 
When we originally talked about this, it seemed that people were in some sort of agreement that the one month period without the suspects would not be a problem, or at least, if there was opposition, i don't remember too much of it. Evidently, after seeing how many possible eligible people aren't able to vote because of missing the first month the second round through, it is evident that this method is not the best at all.

If we just cut out the one month, and have the vote right after the nominations, I believe that would go a good way to solving a majority of the problems. Then we would be following (almost) the same system that is being used for stage 3, but in reality, is there any reason why that would be a bad thing? It would speed up the rate of the current process and cause less confusion during the time that exists when a pokemon is in limbo.

Also, I apologize for my lack of activity. I was going to get back into the swing of things once I got home from camp, but I will not object to any changes being made since I was not proactive enough to force the changes to happen soon enough or even suggest them myself.
 

LonelyNess

Makin' PK Love
is a Tournament Director Alumnusis a Forum Moderator Alumnusis a Tiering Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Is it too much to ask that we retest things by nominating them during the suspect nomination phase? I think that retesting banned Pokemon shouldn't be something we are doing often, and that if you think the metagame has changed enough to where a previously seen BL Pokemon is no longer BL, you should have to prove it in some way. You think that Raikou isn't BL because when it was allowed we never knew how awesome Camerupt was in UU and now it's the 3rd highest used Pokemon in the tier, alongside Steelix at #2 and Venusaur at #4? Well then fine, we'll test Raikou because now one of the main arguments for banning (that its best checks and counters aren't viable otherwise) no longer holds true.

I am just against retesting for the sake of retesting. If someone has logical reasoning for why they think a Pokemon is more acceptable now than it was back when it was banned, then they can voice it when suspect nominations are held. But they should know that it would take a large change from when it was banned in order to be brought down. Their suspect status should also not rely on any of other suspect's status (We don't want Raikou brought down because Dugtrio can revenge it only to see that Dugtrio becomes BL while Raikou becomes UU in the same round of voting).

Everything else in this thread however I agree completely with Gay Dolphin and Jabba.
 

Jumpman16

np: Michael Jackson - "Mon in the Mirror" (DW mix)
is a Site Content Manager Alumnusis a Top Team Rater Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
Suspect Experience is far from perfect. Despite that, the point of the UU test is not to determine the Uberness of a Pokémon alone, as whether or not a Pokémon breaks a metagame is definitely not an intristic property of the Pokémon itself in UU (where in OU it arguably is). Using the Suspect Experience system assumes a few things, mainly that consistent use of a Pokémon is necessary to evaluate its tiering and that the metagame the Pokémon is in doesn't matter (as any Suspect Experience based system utterly destroys the possibility of testing within a real metagame). Neither of these apply to UU, in my opinion.
First, I will reiterate that there are many, many facets of SEXP aside from "consistent use of a Pokémon", the main one being facing the Suspect(s) in question. So second: how can you expect anyone who used Crobat twice in a month, lost both battles, and only faced it three other times in that month to be even remotely qualified to sound off on it, regardless of their record, deviation, or how eloquent they can be? Isn't that the entire reason there is so much opposition to this evident month-long period where UU is played without any of the Suspects?

Here's a proposal being heavily debated among other things in IS right now: Suspect Experience is not used as a hard requirement, but rather a tool for analyzing qualification to vote.
This was my intention from my initial suggestion. When SEXP was used in Stage 1 for Latias, Latios and Manaphy, the only way Aeolus and I used it was to analyze the qualification of the people who made the Rating/Dev marks we'd set. SEXP was never a hard requirement. I think that you and obi are getting ahead of yourselves, or thinking that the way I'm using SEXP now for the OU Stage 3 is the way I'm suggesting it should be used for UU right now.

Any use of suspect exp (especially in that manner) will bias the vote toward banned. People who use the Pokemon a little and decide it sucks will stop using it, thus not getting enough exp (and this time it's worse, because you don't even know what you're supposed to be getting exp with). Those who think it's really good will use it a lot, thus getting more suspect exp. This will tilt the pool of voters toward those who think it's stronger no matter what you do.
You can obtain SEXP on a given suspect without using it. I've been clear on this in multiple threads since I started using SEXP for the OU Suspect Tests. And even if you don't know what the Suspects are, if they are actually worthy of being banned then they should be used more, in order to obtain the actual hard requirements of Rating and Deviation so you actually stand a chance of having a say.

Here's a hint: you don't get as much Suspect EXP with a given Suspect if you don't actually win with it. If you think about it, though, this is also common sense, as are the other "hints" I've posted about the formula: why would we care what someone who is 8-23 with Crobat has to say about it if even if he had the Rating/Dev to make the cut (especially if, when you think about it)?

Finally, here's something you all may not know. I still get about one PM a month every two months from people seeking RMT advice because I'm an "expert battler", when as far as they literally know (Rating and Deviation) I've never played one battle on Shoddy. One of the most recent requests was actually soliciting UU team advice, which definitely is not my area of expertise (when has anyone ever seen me post anything about UU?) no matter how many people are under the impression I'm an OU guru.

For the Suspect Test at large, I sought out to reduce the reliance upon voting submissions and "eloquence", and add another objective barometer when I made the SEXP formula. Again, the only reason I'm being firm on this is because, aside from Rating and Deviation, there are no objective measures of anyone's experience in a given metagame. So when we're talking about determining voting qualification, why not use as much objective data as possible?
 

cim

happiness is such hard work
is a Contributor Alumnusis a Smogon Media Contributor Alumnus
(grumble grumble caelum grumble)
First, I will reiterate that there are many, many facets of SEXP aside from "consistent use of a Pokémon", the main one being facing the Suspect(s) in question. So second: how can you expect anyone who used Crobat twice in a month, lost both battles, and only faced it three other times in that month to be even remotely qualified to sound off on it, regardless of their record, deviation, or how eloquent they can be? Isn't that the entire reason there is so much opposition to this evident month-long period where UU is played without any of the Suspects?
No, the objection is that no one even can use said Pokémon, and they forget why they didn't like it in the first place. Similar, yes, but seriously you underestimate the power of a good judge like yourself... It is very hard to bullshit an argument on a Pokémon.

This was my intention from my initial suggestion. When SEXP was used in Stage 1 for Latias, Latios and Manaphy, the only way Aeolus and I used it was to analyze the qualification of the people who made the Rating/Dev marks we'd set. SEXP was never a hard requirement. I think that you and obi are getting ahead of yourselves, or thinking that the way I'm using SEXP now for the OU Stage 3 is the way I'm suggesting it should be used for UU right now.
The key to the proposal you quoted was what came immediately after that line. I understood this. My and Obi's fears come entirely from data contamination, as the UU votes are very much based on the metagame at hand, contrasting with the OU suspect test.

You can obtain SEXP on a given suspect without using it. I've been clear on this in multiple threads since I started using SEXP for the OU Suspect Tests. And even if you don't know what the Suspects are, if they are actually worthy of being banned then they should be used more, in order to obtain the actual hard requirements of Rating and Deviation so you actually stand a chance of having a say.
The fact that you get more Suspect Experience by using it than by not using it is the issue. You can get by with just playing against it, but why would you want to if you well and truly believe it sucks, or it doesn't work on the same team as another Suspect? You would be motivated to build these teams because it increases the chances, it would be foolish not to. Then the metagame is affected.

(I'd make a "how are people supposed to know this" quip, but honestly that's not even the issue anymore, and if the system was open to begin with then we could make these criticisms in the first place in an attempt to make an objective barometer with no contamination)

Here's a hint: you don't get as much Suspect EXP with a given Suspect if you don't actually win with it. If you think about it, though, this is also common sense, as are the other "hints" I've posted about the formula: why would we care what someone who is 8-23 with Crobat has to say about it if even if he had the Rating/Dev to make the cut (especially if, when you think about it)?
Isn't that evidence that Crobat is not a great Pokémon? Perhaps "he's using it incorrectly" but then his bold vote chronicling his experiences would come into play where he bashes how bad NP Crobat or whatever gimmick is. But if he loses when he tries to use it and provides a detailed analysis of his experimentation with the sets of the Suspect, and they match up with "the broken set" as proposed in the nomination... Why would that vote be bad?

It's not like if you have low Suspect Experience you can say "this set sucks lol" and get away with it in a vote.

Ideally the Suspect process would only nominate close to broken Pokémon, but strange things have happened.

Finally, here's something you all may not know. I still get about one PM a month every two months from people seeking RMT advice because I'm an "expert battler", when as far as they literally know (Rating and Deviation) I've never played one battle on Shoddy. One of the most recent requests was actually soliciting UU team advice, which definitely is not my area of expertise (when has anyone ever seen me post anything about UU?) no matter how many people are under the impression I'm an OU guru.
I don't see the point. You're an admin on a Pokémon site, an active participant in the community, that doesn't sound so far fetched. Are you trying to say that you're so experienced that random users who clearly don't know how to use Smogon.com can tell? That's not much of a qualification.

This isn't an attempt at saying these battlers have the wrong idea, but these are hardly qualified judges. I'm sure chaos gets just as many RMTs (maybe less since he doesn't have the badge).

For the Suspect Test at large, I sought out to reduce the reliance upon voting submissions and "eloquence", and add another objective barometer when I made the SEXP formula. Again, the only reason I'm being firm on this is because, aside from Rating and Deviation, there are no objective measures of anyone's experience in a given metagame. So when we're talking about determining voting qualification, why not use as much objective data as possible?
When the objective measurement for said data contaminates it, we have a problem. Why would you pose that question when that is the complete objection everyone has been posting about in the entire thread? It's obvious that's "our" answer here, but this post has done nothing to quell our objections on the data contamination front. In OU, this isn't an issue since the entire premise of the test is different (not to mention the Suspect Ladder prevents contamination of the "real" metagame if you feel like comparing) but in UU it just won't work.
 

Jumpman16

np: Michael Jackson - "Mon in the Mirror" (DW mix)
is a Site Content Manager Alumnusis a Top Team Rater Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnus
No, the objection is that no one even can use said Pokémon, and they forget why they didn't like it in the first place. Similar, yes, but seriously you underestimate the power of a good judge like yourself... It is very hard to bullshit an argument on a Pokémon.
Actually, several people have specifically stated that the month-long period is faulty because it doesn't provide useful information on a given Suspect:

Finally, since they can't play with the suspect during this one month process, how are they supposed to know if its broken or not? The very idea of taking the suspects out compromises peoples opinions on the matter. Could you imagine if Jump and Aeolus made a post tomorrow saying "ok, to see whether or not you think Garchomp is broken, we are going to prevent you from using it"?
And the suspects and the metagame are entirely intertwined. You can't examine a suspect without examining the metagame and vice-versa. I can study an electron by observing how it interacts with other forces, objects - this is no different.
Playing a metagame without a pokemon does not tell you anything about the pokemon's performance in that tier. How are we supposed to tell if Froslass is capable of enabling a teammate to sweep easily if we can't even use it to find out?
Finally, those that played in both months have been given an alternate metagame that not only proves nothing about the suspects themselves, but also puts a skew on their vote by allowing them to get a feel for which metagame they "like better".
I agree with Gay Dolphin that the month-long testing period is rather ridiculous; it really does not help us to determine the power of the Suspects, or give us any helpful information regarding the Suspects themselves.
The concern expressed is the same throughout—you don't gain experience with a Suspect, so how can you be expected to have a qualified opinion? It's the same thing as listening to someone who played with and against Crobat three times.

Chris is me said:
The key to the proposal you quoted was what came immediately after that line. I understood this. My and Obi's fears come entirely from data contamination, as the UU votes are very much based on the metagame at hand, contrasting with the OU suspect test.
Explain in detail how the data can be contaminated.

The fact that you get more Suspect Experience by using it than by not using it is the issue. You can get by with just playing against it, but why would you want to if you well and truly believe it sucks, or it doesn't work on the same team as another Suspect? You would be motivated to build these teams because it increases the chances, it would be foolish not to. Then the metagame is affected.
If the Suspect actually sucks, then most people either won't use it very much or win as much as they would otherwise when they do. There are no two ways about this. A player can't maintain that a Suspect sucks if every one else is not only using it, but doing well enough with it to get the Rating/Deviation needed to vote, and more importantly the SEXP to qualify their votes (since, as I stated, winning with a Suspect is very important to gaining SEXP for it). The best example of this is when FiveKRunner claimed that Latios was "dead weight most of the time", which is silly in theory but rendered even sillier when you contrast his SEXP with that of those who also had the Rating/Dev to potentially vote but had much better SEXP with Latios.

Besides, this doesn't even apply to UU like it does to OU where you know the Suspect(s) beforehand. Say for the sake of simplicity we go back to before Crobat and Froslass and Raikou and Abomasnow and Gallade and Staraptor were even nomimated. There aren't any Suspects. So, if after a month, a player both thinks Crobat sucks so he didn't use it that much and also only faced it six or seven times (relatively much lower than most everyone else who played on the UU Ladder for that month, a real number isn't needed), why should we listen to his opinion on Crobat? Why? And why should he even want to sound off on Crobat? He literally and objectively did not experience it very much at all, so why should his opinion on Crobat bear any weight?

(I'd make a "how are people supposed to know this" quip, but honestly that's not even the issue anymore, and if the system was open to begin with then we could make these criticisms in the first place in an attempt to make an objective barometer with no contamination)
(it never has been an issue, doug, chaos and now aeolus all understand the formula, how objective it is, and how much work i put into making it objective)

Isn't that evidence that Crobat is not a great Pokémon? Perhaps "he's using it incorrectly" but then his bold vote chronicling his experiences would come into play where he bashes how bad NP Crobat or whatever gimmick is. But if he loses when he tries to use it and provides a detailed analysis of his experimentation with the sets of the Suspect, and they match up with "the broken set" as proposed in the nomination... Why would that vote be bad?

It's not like if you have low Suspect Experience you can say "this set sucks lol" and get away with it in a vote.

Ideally the Suspect process would only nominate close to broken Pokémon, but strange things have happened.
That vote would be bad if and only if he only used it a few times, which Doug and I can clearly and objectively determine at a glance. And yes, his bold vote would come into play—I actually make it a point to not cross-reference a player's SEXP with his or her submission before reading the submission first, as this would actually stand a chance of biasing me for or against the vote.

And of course you wouldn't be able to get away with "this set sucks lol", this is the entire reason I'm suggesting using SEXP for the UU Suspect Test.

I don't see the point. You're an admin on a Pokémon site, an active participant in the community, that doesn't sound so far fetched. Are you trying to say that you're so experienced that random users who clearly don't know how to use Smogon.com can tell? That's not much of a qualification.

This isn't an attempt at saying these battlers have the wrong idea, but these are hardly qualified judges. I'm sure chaos gets just as many RMTs (maybe less since he doesn't have the badge).
I brought it up as a nod to my supposed "eloquence" or whatever you want to call it. As far as they can see I don't actually play, yet I am still regarded as some kind of expert. How often I play or when I play is a non-issue—maybe I play a ton under alts, maybe I don't—as the point is that one can be eloquent and therefore give "judges" the impression they know what they're talking about when they really don't. The "really" as far as it relates to the Suspect Test process is SEXP.

And I very much doubt chaos gets very many RMT requests at all. As I'm trying to say, the only reason people have any reason to believe I'm an "expert" on OU is because of my posts and consistent activity in Unchartered Territory, Stark Mountain, and Policy Review, and maybe the knowledge that I wrote a handful of early DP Anaylses. chaos will be the first to tell you he doesn't have much of a handle on DPPT and has not for two years.

When the objective measurement for said data contaminates it, we have a problem. Why would you pose that question when that is the complete objection everyone has been posting about in the entire thread? It's obvious that's "our" answer here, but this post has done nothing to quell our objections on the data contamination front. In OU, this isn't an issue since the entire premise of the test is different (not to mention the Suspect Ladder prevents contamination of the "real" metagame if you feel like comparing) but in UU it just won't work.
Again, you're going to have to clarify what you mean by "contaminated data". And the quotes I included above indicate that I'm addressing one of the objections posed in this thread (the only one that can be addressed with SEXP).
 

Caelum

qibz official stalker
is a Site Content Manager Alumnusis a Community Leader Alumnusis a Smogon Discord Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
This is too scattered to quote everyone, I just do a general response.

@reachzero: I thought the philosophical flaw was obvious, I explained it twice; but I guess I didn't do it clearly enough.

Let's say in a metagame consisting of Pokemon: A,B,C,D, and E. In this whole metagame A and B are deemed "broken"; however, does not the possibility remain that a metagame of C,D, and E with B included is not a broken whereas with the inclusion of A, both are? If given the opportunity to view a metagame of C,D,an E - this allows the voter to potentially realize that B isn't broken at all in this metagame and was only broken in conjunction with A and realize B should be given another opportunity.

This notion that the designation of the suspect is somehow independent of the metagame is ridiculous. A suspect's status is entirely dependent on the current metagame otherwise we could just look at move pool, stats, typing, and a bit of common sense to figure out the tiering entirely. The definitions of an uber even promote this idea, both implicitly (in the phrase "common battle conditions") and explicitly (in the phrase "significant portion of teams in the metagame").

Want examples?
These were an incredibly small portion of the theorymon examples I thought of originally when I thought up this process.

A = Abomasnow, B = Walrein. Walrein was close to being a suspect initially with Abomasnow support; however, without Abomasnow, Walrein is next to worthless. If both were nominated as suspects it should've been clear to any person that in a metagame independent of Abomasnow that Walrein would be acceptable - whereas Walrein's presence is irrelevant to Abomasnow.

A = Deoxys-S , B = Gliscor. Think of the DSDS + BP Gliscor combo that became so famous.The important distinction between the two tier testings is that Deoxys-S was designated a suspect. In UU, we are picking them from scratch so the designation of which is "broken" is not so presently clear and open to interpretation of the voter. If given free-reign it wouldn't be unreasonable for a voter to view Deoxys-S as a suspect; nor would it be unreasonable to view Gliscor as a suspect. Only because we had experience with a metagame without Deoxys-S was it clear Deoxys-S was the culprit and not Gliscor. In a metagame where there is little to no experience it wouldn't be unreasonable at all to view both of them as broken and ban them outright. With a removal period of both, you can view a metagame independent of either of them and using a bit of common sense determine whether or not either A or B deserves to be re-examined again. If this method was done, most of us would see that clearly Gliscor was the one that wasn't the key.

I can literally do this all day long, I just picked the most obvious and simple examples to demonstrate my point. I can pick countless number of Pokemon and I can expand this logic to 3 or 4 or 5 Pokemon with ease; but I figured you get the point so I won't wastes peoples time.

I still have no idea why this is even being debated since this is a moot, purely academic discussion since I said I was stopping it for reasons I already explained. It's like bickering for the sake of bickering. If we both get what we want for different reasons, then who cares.

------------------------------

Suspect EXP:

I'm planning to use it when we begin re-examining key Pokemon. The stage of re-examination, a pseudo-"stage 3", is significantly more important than whatever has happened here. With some slight disagreements here and there, I mostly agree with Jump's position so I'll just leave it at that.

On the note of current voting process in terms of suspect experience and inability to know the suspects prior so which should you use. Firstly, I'll admit it is somewhat unfair to people that want to vote. If you didn't play with them prior, you shouldn't vote - even if you wanted too. Sorry. Secondly, nobody was immediately denied solely because they didn't play with each and every suspect for every match.

Suspect experience will be a valuable tool once re-examination starts for reasons Jump stated. The reasons Suspect experience wasn't used thus far because it would require a separate ladder and I didn't feel there would be enough participation; nor did I believe it would be timely and practical.

-------------------------------

So ... what exactly are we even discussing here?

I'm was already planning to change the removal process for entirely different reasons; so the only purpose of this discussion now is to "Monday Night Quaterback" without really producing anything substantive to the process since it's irrelevant at this point. I'm fine with defending my beliefs, but I don't see the point of pursuing a discussion that doesn't matter.

On the note of suspect experiences, I do think it's important to use them for a different perspective. Once the "revisitation stage" begins (hopefully soon), I was planning to use it similarly to the current way it is used.

-------------------------------

@Doug: Who is doing this now since I was kicked off and do I finish this round? I kind of need to know whether to wrap this up or what the deal is .... I would've liked to finish this, but it's alright I guess.
 

cim

happiness is such hard work
is a Contributor Alumnusis a Smogon Media Contributor Alumnus
Explain in detail how the data can be contaminated.
As has been said several times in this thread, the simple act of saying "your vote is partially dependent on how much you see and use this Suspect" will automatically inflate the use of that Suspect in the metagame. As UU votes are relative to the metagame even more so than OU, placing SExp on the table causes a great deal of artificial external influence on the metagame.

It's been gone over at least 4 times now ._.

If the Suspect actually sucks, then most people either won't use it very much or win as much as they would otherwise when they do. There are no two ways about this. A player can't maintain that a Suspect sucks if every one else is not only using it, but doing well enough with it to get the Rating/Deviation needed to vote, and more importantly the SEXP to qualify their votes (since, as I stated, winning with a Suspect is very important to gaining SEXP for it). The best example of this is when FiveKRunner claimed that Latios was "dead weight most of the time", which is silly in theory but rendered even sillier when you contrast his SEXP with that of those who also had the Rating/Dev to potentially vote but had much better SEXP with Latios.
What if the suspect actually and truly is not particularly good? This means only the players that can work around it can vote at all, which makes the bias inherently toward "ban it", even if just a little bit, as the only people that get to vote on a Suspect are the ones that use it successfully (and the people that use it successfully are obviously much more likely to vote it Uber).

Yes, there are some cases when it is obvious a Pokémon is at best extremely good, but this will definitely not be the case for every UU suspect and we can't afford to fuck shit up.

Besides, this doesn't even apply to UU like it does to OU where you know the Suspect(s) beforehand. Say for the sake of simplicity we go back to before Crobat and Froslass and Raikou and Abomasnow and Gallade and Staraptor were even nomimated. There aren't any Suspects. So, if after a month, a player both thinks Crobat sucks so he didn't use it that much and also only faced it six or seven times (relatively much lower than most everyone else who played on the UU Ladder for that month, a real number isn't needed), why should we listen to his opinion on Crobat? Why?
The same reason we should listen to someone who wins with Crobat. They've experienced the Pokémon. If he only faced Crobat 6 or 7 times and managed to not be fucked up by it to the point of being pissed enough to vote for its Suspect nomination, he obviously didn't have a problem with it. I'd prefer he faced it more and all, but if he honestly didn't see it as an asset to his team and enough other people didn't that he only faced it "6 or 7 times" then it either won't be a Suspect or deserves to be a landslide vote for no ban.

And why should he even want to sound off on Crobat? He literally and objectively did not experience it very much at all, so why should his opinion on Crobat bear any weight?
People want to sound off on a Suspect because voting is seen as a Good Thing on Smogon, as it should. In UU, "experiencing the Pokémon" is as important as "experiencing the metagame", and you can only 100% ensure one or the other. But honestly, with ranking and deviation votes, if you can make rank and only run into one or two of a Suspect then it's clear everyone isn't completely overwhelmed by how good it is. If you manage to do well enough against it that your ranking isn't ruined then it's pretty obvious you know how to deal with it, which means you can decide how "good" it is.


@ Caelum: So I'm confused. Since a Suspect isn't independent of the metagame, why is Suspect Experience being used? What exactly is going forward?
 

Caelum

qibz official stalker
is a Site Content Manager Alumnusis a Community Leader Alumnusis a Smogon Discord Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
@ Caelum: So I'm confused. Since a Suspect isn't independent of the metagame, why is Suspect Experience being used? What exactly is going forward?
I don't understand how my statement in my previous post suggesting they aren't independent is contradictory towards the use of Suspect EXP. I believe there is some role to be had in experiencing the Pokemon and so I think SEXP is a valuable supplement. I don't see how they are contradictory positions really.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I still believe the fact that potentially new Pokemon are added to UU and old Pokemon are subtracted from UU every three months is actually the _main_ issue that needs to be addressed. Adding and subtracting Pokemon from a metagame can potentially change the entire metagame, enough to warrant retesting the BL list every 3 months. Maybe stuff in BL aren't broken anymore in the new metagame. Maybe some Pokemon that was alright before is now broken in the new metagame.

Basically UU is always new every 3 months, which makes it potentially need testing every 3 months... which is something that is bad, in my opinion. I'm sure people would like to just play in a metagame rather than test it continuously.

I see only two solutions to this, both of which, unfortunately, are drastic:

1) Get rid of BL, and let UU change every 3 months. Basically nothing is banned in UU except Pokemon that are in Uber and OU. This will stop UU testing immediately. This is exactly what is happening to those people who play NU (even though it's not an official metagame, people do play it), and I haven't seen them complaining all that much.

2) Make the tier lists change less often, say every 6 months. This is a less drastic course of action, but it has two drawbacks: a) The OU list would be severely outdated after 3 months, and b) it won't stop UU testing anyway, but just alleviate it a bit.

I don't know if this is a shock to you people, but I would actually go with option 1. The BL list made a bit of sense in Advance because the OU tier didn't change much (or, if anything, not every 3 months), and hence the potential UU play was stable, and hence the BL list, whether it was right or wrong, could be made stable as well.

As things are right now in DPP, UU changes every 3 months, and hence so should the BL list. We end up wasting those 3 months testing to see what Pokemon are in BL, by which time the UU tier has already been changed! This doesn't make sense at all. The only solution to this is to either banish the BL tier, or to fix the OU tier forever (like we did with RBY, GSC and ADV if you think about it).

If you have some other solution I'm not aware of, please enlighten.
 

Hipmonlee

Have a nice day
is a Community Contributoris a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnusis a Four-Time Past WCoP Champion
An easier solution would be to make OU pokes that are not OU anymore go to borderline by default, and only introduce them into UU if they are believed to be of an appropriate strength level and they are seen as unlikely to return to OU.

Have a nice day.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I thought of that solution, too, Hip, but it's only a partial one, since if some Pokemon goes from UU to OU, the UU metagame will still potentially change.
 

JabbaTheGriffin

Stormblessed
is a Top Tutor Alumnusis a Senior Staff Member Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Yeah and of course the main difference is that UU is in fact an official metagame that is currently played in the Smogon Tour.

I really think you're underestimating how much UU is going to stabilize. There are really only a select few pokemon who are going to dance the line between UU and OU and after a while, these Pokemon are all going to be Pokemon that have done it in the past. So essentially the metagame produced by changes will usually be one that was already played.

I see no reason to allow Pokemon that actively break the tier into the metagame simply because of these changes. It really seems like the laziest approach on everyone's part. Anyone committed to making UU the best tier it should be should see that.

Edit -

Now I can see a problem arising due to several checks/counters for a Pokemon voted BL coming down to UU at the same time, but I guess that's just something we need to address in our process for nominating Pokemon for bring down.

But you can't say things like Abomasnow or Staraptor wouldn't ruin the tier no matter what Pokemon close to the bottom of OU were dropped down. And I think that's reason enough for this process.
 

Caelum

qibz official stalker
is a Site Content Manager Alumnusis a Community Leader Alumnusis a Smogon Discord Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
An easier solution would be to make OU pokes that are not OU anymore go to borderline by default, and only introduce them into UU if they are believed to be of an appropriate strength level and they are seen as unlikely to return to OU.

Have a nice day.
I thought that was "sort-of" implied. I mean, if Lucario suddenly becomes UU; I don't see a point of testing it. Most of the guys that fluctuate out of OU and UU won't really effect the metagame too much; rather conveniently.

I've always maintained one Pokemon isn't enough to make another Pokemon broken or not-broken so I'd be skeptical if something dropping down or moving out would cause the BL list to be disturbed.
 

cim

happiness is such hard work
is a Contributor Alumnusis a Smogon Media Contributor Alumnus
I think as long as the line of ridiculousness is kept reasonable then not testing Tyranitar in UU isn't implausible, but there shouldn't need to be a "burden of proof" or something to say that a test would be worthwhile. Like, no one should go "oh dugtrio would be a beast in UU don't bother testing it" and then require proof it's not broken (however you do that without testing). But if it's brain dead obvious, then go for it.

I don't anticipate a completely obvious BL to ever fall that low, though. I'd even be in favor of testing something almost on the order of Heracross (though really it's a better Gallade so yeah).

It's likely going to be the same Pokémon fluctuating (Yanmega, Roserade, Dugtrio will pretty much always be at the line) for a long time, barring a generation change (when all tests should restart IMO anyway, not like that's going to happen but i digress).
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I thought that was "sort-of" implied. I mean, if Lucario suddenly becomes UU; I don't see a point of testing it. Most of the guys that fluctuate out of OU and UU won't really effect the metagame too much; rather conveniently.

I've always maintained one Pokemon isn't enough to make another Pokemon broken or not-broken so I'd be skeptical if something dropping down or moving out would cause the BL list to be disturbed.
What? By your same reasoning, Donphan and Dugtrio should go immediately to BL now. But this isn't what you people told me to do! (Read especially RB Golbat's post.)

EDIT: Well, what I mean is that, as I understood it back then, stuff goes to UU by default, unless a Pokemon is extremely obvious that it is not UU. If that's what you told me, then ignore this post.

EDIT 2: To Jabba and Chris is me, even if the same Pokemon end up going down and up over and over, the metagame still changes. If Dugtrio were to bounce from OU to UU and back repeatedly every 3 months, assuming it's not voted BL, wouldn't the UU metagame change every 3 months? With Dugtrio present, wouldn't a retest of Raikou (for instance) be warranted? I'm not an expert in UU battling, or any form of Pokemon battling for that matter, but I'm trying to provide an example; if my example isn't that good, try to formulate better examples yourselves, but you get the idea.

My point is that a single Pokemon introduced to UU can be a check for Pokemon that were voted BL previously. Or, a single Pokemon going out of UU can imbalance the UU metagame. In both cases, a retest of certain Pokemon would be needed every 3 months... which is tedious, and something that needs to be addressed.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top