Suspect voting: Abstentions and no-shows

Stratos · May 25, 2020

In case you missed it, there was some drama with the NatDex Metagross suspect this morning. Basically, the outcome of the vote hinged on how we counted no-shows. Were they a vote for the status quo, or discounted from the vote total entirely? In the end, you can read Hogg's ruling as tiering admin here. However, this has been applied inconsistently in the past, with the sand rush bw vote (at least) using the opposite methodology—though it's worth noting that in this case you didn't have to actively try for reqs. A lot of people who got reqs here probably never intended to vote, or even heard about the test, which is different from the Metagross test.

In the course of the discussion, the example of abstentions was raised. Apparently current protocol is to count these as a status quo vote, as opposed to throwing them out. This is also something that has been applied differently in the past.

I am not trying to overturn the Metagross test (or any other test) itt, but to propose policy for future tests. We argued about this for a while on Discord, but organized thoughts in forum posts are much better at getting points across so let's take this to a more productive forum.

---

As for my opinions on these topics:

Abstentions: I think an abstain option, which removes your vote, is valuable. I agree that each voter should require higher than 51% personal sureness to vote against the status quo. However, no matter where you set this bar, like 80 or 90%, there's going to be some suspect where your feelings fall right on the line. I don't think it makes sense to force these people to choose. In fact, it might well be counterproductive if the goal is to bias toward the status quo. If I'm really on just about the 90% line on a suspect and I can't decide, I'll probably vote ban because well, I was 90% sure... I had just about this exact thought process when I was debating my vote in the DOU Salamencite test, and I imagine that the majority of people who wish they could abstain would end up voting ban. Hogg said a few people have asked him about abstaining and he explained it was a DNB vote—if you remember exactly when, we could follow up on these and see what they ended up voting to bring more data to this debate.

No-shows: I would rather no-shows be thrown out. Hogg wants to avoid "in an extreme example, you could end up with cases where so many people don't vote that only a handful of people, or even a single person, end up banning something." I am much more afraid of the opposite. If people really cared, they would find thirty seconds to post in a three day window. I honestly have no idea what would possess someone to go through the effort of laddering and posting reqs but fail to do the easy bit of voting, but I have to imagine that these are mostly drive by voters and not the hardcore participants in the metagame. An arbitrary check of this vote for example shows that the 4/69 no shows were: raf, Therazer456, mono117, and Red Pill PUA. Three of these I've never seen engage with the DOU community and RPPUA ... has his own problems. I really don't care about the opinion of someone who can't bother to vote and I really don't want the laziness of these chucklefucks to affect the metagame for the people who do care enough to vote promptly. Extenuating circumstances exist, of course, but I very much doubt that 12/82 voters (in the Gross case) had very good reasons they failed to vote. Just as we don't care about all the people too bad to make reqs, I don't think we should care about these people. I'm fine with counting no-shows as status quo votes if we infract no-shows, I guess—not my preferred option, but I figure I'd bring it up.

ABR · May 25, 2020

Someone who qualifies for voting but doesn’t end up voting is the same as someone who never qualified to vote in the first place. They’re not part of the voter pool if they don’t do that thing voters do. No shows should be entirely ignored.

Regarding abstentions, they’re useless but also harmless. If you want to abstain you can simply not vote. Abstaining is basically the same as no showing, but maybe we could let it slide just so people can count towards TC.

Jho · May 25, 2020

I think the main issue here is consistency and lack of enforced standards; a brief look through the blind voting sub and here in policy review and there is no thread detailing the standards and rules of a suspect test, why different methods are used in different tiers, or why the they’re set up the way they are.

Apparently this information is distributed to Community Leaders but there should definitely be a public source of information that clearly lays all of this out in my opinion, as it would greatly reduce ambiguity in situations like these where voters were telling us as the TLs we had made a mistake in the result, and us not actually knowing if we had or not until Hogg confirmed it since smaller metas like ours have no access to such channels. Being more transparent with these rules and standards can only be a good thing.

In terms of whether or not abstains should count for a “preserve the status quo” vote, I can’t think of any other voting system that does this and while I understand that no vote is essentially “not a ban vote” and therefore a preserve the status quo vote, I don’t particularly like forcing a stance on people who didn’t vote at all. I checked through the previous suspect tests but couldn’t find a single one where changing the method of counting voters would change the outcome like it would have today in the Metagrossite suspect test so precedent here is lacking. I wouldn’t mind going forward continuing to use the number of total eligible voters as opposed to the number of users who actually voted if the reasoning and surrounding standards were made available publicly so that it is painfully obvious and voters / TLs cannot get confused as to which precedent they should be following.

Bughouse · May 25, 2020

Including an active vote of Abstain as "keep the status quo" is wrong for reasons I hope are pretty obvious. It makes no sense to offer it as a distinct voting option if it's going to be counted the same as either the ban/no ban option. So if that's what Abstain truly means, then just remove the option to abstain, and make people actively vote one way. But since we've had Abstain as a distinct voting option for at least a decade, I find it hard to believe that that's what it means. It certainly isn't what it meant in the past.

Here's literally the oldest vote in the current Blind Voting forum with Abstain as an option, and it was used to remove people from the denominator entirely.
https://www.smogon.com/forums/threads/ou-round-1-ability-suspect-voting.84149/page-2#post-3212568
Drizzle had 24 Ban, 17 DNB, 9 Abstain and this was counted as 58.5% ban (i.e. 24/41) not 24/50 with the Abstains added in.
In this era, bans all required either a high super majority of > 2/3, or 2 consecutive rounds of simple majorities.
Because Drizzle got a simple majority vote, this put it "on notice" for the next round of voting. Had the vote been interpreted as 24/50 or 48% Drizzle would not have been on notice.

Here's another BW vote, this time on Excadrill (among others).
https://www.smogon.com/forums/threads/ou-round-5-pokemon-suspect-voting.3455842/page-4
It came out 49-22-3. This was interpreted as 69% ban (49/71). Had it been interpreted as 49/74 with the 3 abstentions counted as no ban, it would have been 66.2% ban. Same as the Drizzle vote above, this happened in an era where a ban vote required a > 2/3 supermajority, so had the abstentions been counted as "do not bans" Excadrill would not have been banned.

As for no shows, imputing what missing pieces of data would have been (i.e. counting a no show as "keep the status quo") almost only ever results in a less accurate measurement. I don't see why this should be done either. The only valid denominator is the actual votes cast, excluding abstentions, which is just an active way of demonstrating that one is not voting.

Stoward · May 25, 2020

Slightly off topic but Hogg literally said this in the post linked in the OP.

“This discussion also raised some legitimate concerns about voting Abstain, which I believe aren't super relevant here but deserve a more full discussion; I'll create a PR discussion on that shortly.”

Why did Stratos have to feel the need to go and make his own thread?

Stratos · May 25, 2020

MajorBowman · May 26, 2020

I see absolutely zero reason as to why an Abstain vote should count towards TC. The badge is called Tiering Contributor, and abstaining means you contribute nothing. I guess I'm not opposed to making Abstain an option as a vote again just because it makes it easier for TLs to count who has or hasn't voted, but counting it towards TC seems ridiculous.

As far as no shows, I don't feel super strongly either way. I think a non-negligible number of no shows are active choices to abstain (which might be alleviated by including an abstain option), but some are definitely people who are either largely disconnected from the community of the tier they're voting for or just straight up forgot. I don't think there's anything particularly wrong with the current system for tiers with smaller player bases like DOU and that the issue with no shows applies only for suspects orders of magnitude bigger like the natdex test in question. Would probably lean towards sticking to the same system just for simplicity's sake but I definitely see the merit in discarding no-shows past a certain deadline and wouldn't be mad if that was the decision made.

atomicllamas · May 26, 2020

MajorBowman said:
I see absolutely zero reason as to why an Abstain vote should count towards TC. The badge is called Tiering Contributor, and abstaining means you contribute nothing. I guess I'm not opposed to making Abstain an option as a vote again just because it makes it easier for TLs to count who has or hasn't voted, but counting it towards TC seems ridiculous.

I disagree with this. As stratos pointed out there will always be some line of uncertainty (whether its 51% or 80% or whatever). Changing the rule doesn’t make them more sure of their choice but it does motivate them to just pick one. If someone ladders for a test and they truly don’t know, or if someone ladders for tc and at the end they don’t know/don’t care, I don’t think forcing those people to choose is necessarily a good thing.

MajorBowman · May 26, 2020

atomicllamas said:
I disagree with this. As stratos pointed out there will always be some line of uncertainty (whether its 51% or 80% or whatever). Changing the rule doesn’t make them more sure of their choice but it does motivate them to just pick one. If someone ladders for a test and they truly don’t know, or if someone ladders for tc and at the end they don’t know/don’t care, I don’t think forcing those people to choose is necessarily a good thing.

"Laddering for TC" isn't what actually gets you TC though, the voting is. The TC standards state that TC is awarded to people with X number of votes, not X number of laddering sessions. Laddering and qualifying to vote but then abstaining is literally not voting and contributes nothing to the test itself. I'm not trying to shame people who are uncertain or who want to abstain because those are fully within their right, but all that counting abstentions accomplishes is watering down a badge that's already considered one of the easiest to get.

atomicllamas · May 26, 2020

MajorBowman said:
"Laddering for TC" isn't what actually gets you TC though, the voting is. The TC standards state that TC is awarded to people with X number of votes, not X number of laddering sessions. Laddering and qualifying to vote but then abstaining is literally not voting and contributes nothing to the test itself. I'm not trying to shame people who are uncertain or who want to abstain because those are fully within their right, but all that counting abstentions accomplishes is watering down a badge that's already considered one of the easiest to get.

Yes, and my point is those people will still ladder for TC and toss out a random vote if they were going to abstain (if abstain doesn’t count for TC), I don’t think that’s a good thing.

Shurtugal · May 26, 2020

I think "Abstain" is a good option, sometimes you truly do not know if something should go or not. Think about a player that played for voting rights but maybe doesn't play the tier too much, so they vote to abstain. I still think their vote is valid, and I think they are still contributing. By being a strong presence on the ladder, that is contributing to voting, because it makes it harder for others to also get requirements, so encouraging people to ladder is a good thing imo. Perhaps some people just ladder to get the badge and didn't care about the voting, I personally wouldn't look down on that because we still encouraged them to contribute to the ladder, but I would like to give these players the benefit of the doubt: perhaps they were laddering in hopes of learning if something was broken or not, and then at the end were inconclusive. I once voted in RU despite never really playing it much, thankfully I was able to learn that their suspect was broken upon playing the tier, but I'm sure there are many cases both for active players and players testing a tier out that are inconclusive at the end of their ladder run.

Forcing player to choose an option can be more hurtful than allowing them to abstain. Removing Abstain votes from the denominator is the best solution imo

I think if a player doesn't vote entirely, that it is showing a great deal of laziness or a lack of respect, unless there are circumstances justifying the no-show. These players should be taken out of the voting entirely and infracted for wasting the time of the people counting the votes, but that is just my opinion. It is out of respect for everyone involved that if you posted your requirements to vote in the identifier thread, that you are promising to post a vote when the voting thread comes up. I don't think Abstaining is the same thing as a no-show, because it meant that the player took the time to think about their choices and came to the conclusion that they didn't have enough info or a feel to make a strong decision.

tldr: abstaining is valid, no-shows are not valid.

EDIT TO ADD: it could be an issue if there are so many no-shows that only a handful of votes are counted. Some tiers like OU will likely never have this issue because of the playerbase size, but i can see this happening for smaller playerbase tiers. My solution would be to come up with a threshold of minimum voters required, and if it isn't met, then the voting could either be redone(?), extended (?), or you could have the Tier Leaders vote on a decision instead, which I think is the preferred method here.

Arhops · May 26, 2020

I agree with the other posts that we shouldn't count abstentions and no-shows, but I think a lot of them are kinda ignoring the reason why they were being counted in the first place. In particular, this post is in response to the argument made on Discord that a lack of ban votes relative to all other votes, including non-votes, shows the ban isn't necessary/wanted.

To me the reason why we shouldn't take into account people who abstain/don't vote is that it doesn't really come close to fully representing the people it sets out to represent (people who aren't convinced that a test is necessary). If we really took into account the people who don't care, yet could have qualified to vote if they wanted, we would also have to take into account people who don't have Smogon accounts and/or had better uses for their time than spend hours laddering for reqs to prove that they don't care. We don't do this since it would be impossible to figure out who these people are and it would make bans basically impossible to go through. Using people who did go through the process but didn't vote to represent this whole group seems arbitrary and not especially effective to me.

I think a better solution to this problem of lack of representation for indifferent people would be gauging interest in the community of people who play the tier before a test is started. In this test, people who got reqs just for the fun of it/for the badge were more represented in their indifference than many of the people who are actual members of the tier's community. If the tier's players and leaders generally agree a test should happen the turnout shouldn't matter barring an extremely small amount of people voting, which as other people have suggested could be solved by requiring a minimum amount of votes.

Something like a poll at the start of Showdown matches in place of the suspect test message and/or in the tier's Showdown room and Discord server asking if a test is necessary I think would be much more representative of the group of people who don't find a test necessary. Tier leaders should be able to figure out what works best for figuring this out for their tier.

TL;DR: Whether or not a suspect test is necessary and/or wanted should, barring an extreme lack of voters, only be decided before it starts.

Hogg · May 27, 2020

Thanks for posting this. I think it’s definitely something that warrants some discussion.

I think there are two separate topics here, abstentions and no-shows, and they should likely be handled differently. I’ll focus on no-shows for now, and come back to the abstention issue another time.

This is an issue that really hasn’t come up before, and in some ways it’s good that it came up in an unofficial metagame so that we can have this discussion now instead of after, say, a hotly contested and controversial OU vote. Luckily it’s a really rare event - for the question of how no-shows impact a final vote to be relevant, it requires both a very close vote and a significant number of people who failed to vote after completing reqs and identifying themselves as a voter.

I touched on this in the NatDex thread, but in general, banning something should always be an exceptional event, one that only occurs when a significant majority of the community believes that it is unhealthy or problematic. This means that we should always prioritize getting as large a sample size of active players as we can, and that the burden of proof should always rest on the pro-ban side. To some degree that’s represented in the fact that you need a majority (60% in OU tiers, 50% +1 in lower tiers) to ban something, but I also believe that it means that if someone participates in the suspect test but then decides not to cast a vote, they are essentially conceding that they do not strongly feel that something should be banned. Ambivalence in the case of a suspect test should never have the result of making something easier to ban, and “do not ban” should almost always be the default position. For that reason, I’ve always gone off of the total number of eligible voters when determining how many votes are required to ban something.

It was my understanding that this is how all recent suspect tests have been handled (and if you look at UU suspect test voting threads throughout my tenure as TL, you’ll see that I always put the number of votes required to ban something in the thread’s OP to make sure that this is explicit). I know that there are some counterexamples, with the most recent one to my knowledge being the BW Sand Rush decision from a year and a half ago. However, that was very much not a suspect test, which is what we are addressing here. Instead it was a polling of successful BW tournament players, many of whom had never expressed a desire to participate in such a vote. In a suspect test, the pool of eligible voters is specifically formed from people who have spent days laddering to meet the GXE and game minimum requirements, and then actively identified their intention to vote in a Voter ID thread.

I’d assumed that this was the standard that all tiers have been following for the past couple of years - again, it’s only a potential issue in the rare instance where you have both a close vote and a significant number of no-shows, so it simply hasn’t come up prior to this. However, talking to various tier leaders, it looks like there isn’t a consensus on how this should be handled. Several TLs were operating under the same assumption I was, while others assumed it was a percentage of votes cast, not of eligible voters. So I’m glad we’re having this discussion now, because whatever we decide should be explicitly made into policy moving forward.

Bouncing back to the sample size issue, it’s always been my opinion that suspect tests should be as robust as possible. The suspect test itself should theoretically filter out those not well-versed in the tier. Once we’ve done that and created a pool of active suspect testers, I strongly believe that we need a majority of the entire pool if we are going to ban something. If 50 people identify as suspect voters but 40 of them don’t vote, we could potentially see something banned because 6 people happened to feel strongly enough to vote ban. You could argue that that’s appropriate, since by not voting those 40 people essentially said they didn’t care what happens, but I really do not think we should encourage a situation where a vocal minority can ban something without properly making their case or convincing others. I very much disagree with the idea that a small group should be able to ban things on their own just because they REALLY REALLY feel strongly about it.

Anyhow I went on too long as usual, so I’ll give a quick tl;dr and head off:

Banning something should be hard, with the burden of proof on the pro-ban side, and should only happen if desired by a majority of active and successful players.
Removing no-shows from the total pool of voters potentially enables a small but vocal minority to push through a ban that others are at best ambivalent about.
There’s currently nothing in policy clarifying this one way or another and that’s absolutely something we need to fix ASAP.

Legitimate Username · May 27, 2020

Just going to reiterate a bunch of stuff I said over Discord because it's probably more useful here than there.

Hogg said:
Bouncing back to the sample size issue, it’s always been my opinion that suspect tests should be as robust as possible. The suspect test itself should theoretically filter out those not well-versed in the tier. Once we’ve done that and created a pool of active suspect testers, I strongly believe that we need a majority of the entire pool if we are going to ban something. If 50 people identify as suspect voters but 40 of them don’t vote, we could potentially see something banned because 6 people happened to feel strongly enough to vote ban. You could argue that that’s appropriate, since by not voting those 40 people essentially said they didn’t care what happens, but I really do not think we should encourage a situation where a vocal minority can ban something without properly making their case or convincing others. I very much disagree with the idea that a small group should be able to ban things on their own just because they REALLY REALLY feel strongly about it.

This is an argument that I can see on a theoretical level, and the main reason why I think that there's probably no simple answer and any policy moving forward should be able to handle the extreme cases. The arbitrariness that can result from a small voter pool concern can be a pretty real one (just to name a shitty and probably not-all-that-relevant example off the top of my head I still think BW OU's weather tiering between complex bans and straight ability bans is an inconsistent and selectively-applied mess), and now is certainly a good time to be future-proofing this whole thing to deal with extreme scenarios.

This is a case though where I don't feel like the concern really applies. We have a representative pool of 70 voters who cast a vote, of which we hit the 60% supermajority with 43 ban votes. That doesn't strike me as insufficient or arbitrary at all, but a fairly (though not exhaustively) thorough representation of how the voters/playerbase feel on the matter. Introducing the 12 additional "unknown" votes into the equation only muddles things with extra junk data that we don't know thrown into the mix and makes the sample of community opinions just a lot further from being properly representative. 43/70 voters accurately shows how people feel much more clearly than (43-55?)/82 does, and assigning those 12 votes to be an automatic "keep the status quo" to reach 43/82 only worsens things further.

I don't really have any strong feelings on the specifics of an implementation, but a proposal like Shurtugal's where a failsafe of minimum votes required in order for a test to be valid could be a reasonable way to get the best of both worlds and address both extreme sides of potential concerns, or infracting those who don't show up in the voting thread after qualifying, or whatever else. The point is, there probably isn't a decent one-size-fits-all solution to this problem and I think the Metagross results demonstrate the problems that come with applying concerns about insufficient voters to a case where they do seem genuinely sufficient and the threshold to determine between "which decision and side to lean towards is the lesser of two evils" has been passed.

I totally get the concerns about how no-shows result in a smaller sample to rely on and wanting to avoid cases where an excessively small sample is determining tiering outcomes for everyone, but I think that this is a case that shows the flaws in treating all cases with that extremity, as we already had a pretty fairly representative sample among the 70 votes cast and are only muddling things further by treating it as insufficient when I feel that it honestly isn't, and requiring the full 82 to be counted when if it means assuming a side from the people who don't vote is the overall more flawed choice. People already made plenty of points on why "no vote is a vote for the status quo" isn't great and I don't think I need to reiterate them, but I think the arguments about sample size don't really support counting the full 82 eligible votes over just the 70 votes cast either.

I'd also be curious to see if after this discussion runs its course and if the policy determined ends up in that direction, the Metagross suspect results could be overturned. "We can just suspect it again" seems a bit silly when suspect tests are such a huge community time investment and the 60% supermajority of votes cast already voted for it to be banned, but for the time being I recognize that it's too early to be having that conversation.

Shurtugal · May 27, 2020

Hogg said:
I also believe that it means that if someone participates in the suspect test but then decides not to cast a vote, they are essentially conceding that they do not strongly feel that something should be banned. Ambivalence in the case of a suspect test should never have the result of making something easier to ban, and “do not ban” should almost always be the default position. For that reason, I’ve always gone off of the total number of eligible voters when determining how many votes are required to ban something.

I'm not sure 100% agree with this.

I remember in SM OU, when Dugtrio was first suspected, a lot of people leaned towards not banning or Abstaining because they felt the council suspected it way too soon. There was a lack of metagame development and tournament results, which is why it didn't get banned the first time. It got re-suspcted and banned the second time, so clearly it was a ban-worthy Pokemon.

The point being is that without context/circumstances, it is hard to tell what the status quo might be, I don't think it can always be defined as "do not ban." Sometimes there are reasons outside of metagame reasons, such as how a Tier Leader is running the tier/handling the suspect test, that might drive people to vote to Abstain. [Also, a lot of Tier Leaders try to rush a Suspect Tests before a major tournament, so this is actually a common issue I have seen over the years. No offense to anyone, but it's just the truth.]

In the case of the first Dugtrio test, imagine if a bunch of people voted to Abstain. I think this would prove that Dugtrio would not be ready to be banned at that point of time, but Abstain is also proof that there is a potential issue here, so it would be proof that it should be re-suspected at a later date.

Obviously OU re-suspected it, but in order to do so, they actually had to bend policy by suspecting "arena trap" instead of Dugtrio. If they weren't able to use this loophole, Dugtrio would have taken a longer time before it could have been re-suspected.

If Abstain could be used as a sort of option that allows the tier to re-suspect it later, that could make things more efficient for things moving forward. I'm not sure exactly what numbers you could come up with, but I think Abstain is sort of saying that the Suspect could be ban worthy, but more time is needed to make a choice.

TLDR Abstain is a valid vote, and it is hard to identify which side of the ban/do not ban side it should lean on. I don't think it should be counted as either, if anything, having a certain # or % of Abstains could be an indicator that it needs to be suspected later on.

Bughouse · May 27, 2020

Apologies for the long post, but I think Hogg is coming at this from the wrong angle and there's just a lot to respond to here... It got long enough that I spun off the second half that's less centrally related as part 2 lmao...

There are real world examples of using the full list of possible voters as the denominator rather than actual votes cast. One example off the top of my head is the US Senate rules around many kinds of votes that require a 3/5 (aka 60%) majority.

The Senate rule is about "three-fifths of the Senators duly chosen and sworn" - not 3/5 of those present. 3/5 of all sworn in senators. That includes basically anyone who hasn't died or resigned. Using 100 as a consistent denominator and therefore always requiring 60 votes in favor of things that require cloture results in strange results including a rather important recent one, where an amendment failed 59-37. 59/96 is in fact 61.5%, but since there were 4 no-shows that still count in the denominator, the measure failed. Even though we know for a fact that at least 1 of these 4 missing senators would have voted yes except they were still on a flight back to DC and hadn't made it back yet, as well as at least 1 of the 4 not being able to be present for the vote because they were quarantined due to COVID-19. The rule is a rigid 60/100 and so it failed.

4% of suspect test voters not showing up or abstaining is not an unreasonable thing to imagine. If US Senators do this when it's literally their job to be present, make decisions, and vote on things (putting aside all cynicism lol), it hardly seems strange to me to imagine that 4% of our qualified suspect voters would fail to turn out. Indeed, for the Mega Metagross test that spawned this thread, it was higher.

And so I understand that there's a concern about what to do if too many people don't vote. But so long as you still have a sufficient quorum of voters among those present, then a vote can proceed among those present. Throwing out a total strawman of 40 of 50 voters not voting is not productive as I'm not aware of any such situation ever arising. Someone could crunch the numbers but I'd be surprised if absenteeism has ever been higher than 20%, which leaves still at worst probably around 80% of voters voting. And it's probably usually higher than that. Certainly enough for a quorum, which is usually closer to 50% in most voting systems and sometimes even lower.

I can make a strawman of my own to show the ridiculousness of Hogg's preferred vote counting method, too. In the US, voter turnout for President is around 55% among eligible voters. Clearly that means that 45% of voters believe the status quo is correct and would have voted for the incumbent, so we should just count them that way. We could even argue that they'd also have voted for the candidate of the same party as the incumbent, in cases where the incumbent is term-limited. That's the closest option to status quo, after all.

Both my strawman and Hogg's strawman obviously lead to ridiculous outcomes in situations where a large number of people do not vote.
But while mine is (sadly) a real world situation, neither mine nor Hogg's is actually applicable to how suspect tests go down. I can't imagine we've ever had a real suspect vote with even 50% turnout, much less 20% turnout. If they did happen, I could understand the concern about what to do with them. But they don't.

And in fact, if we want to create a clear rule about what to do in low turnout cases, I don't think the decision is to count no shows any one way. If I as a TL had 50 eligible voters but only 10 voted, I'd 1) tag all voters and extend the vote by at least a few days - this is Pokemon not a sovereign state lol... decisions can wait, and then 2) if there's still not a quorum after the additional tag, I'd frankly throw out the suspect test entirely for lack of a quorum. IMO it is easy enough to just add a rule that at least 50% (or whatever % is agreed upon) of qualified voters have to vote in the suspect test for it to be valid.

So I don't think low turnout votes exist or are really the problem. The kind of vote that we care about and that do happen is a vote with a close margin, and those sorts of close votes can happen even at 96% turnout. Choosing what to do with the missing 4% is entirely sufficient to tip the scale in such close cases.

Bughouse · May 27, 2020

I also understand wanting to make it hard to ban things, but I fail to see how counting abstentions and/or no shows as being in favor of the status quo effectively or fairly does that, compared to other more logical changes, such as just using a higher ban threshold or requiring 2 consecutive rounds of a ban vote, etc.

Suppose a voter pool of 100 people. 66% of them want to ban the suspect, 34% do not. First we're going to consider what happens with 6% absenteeism and then with 12% absenteeism under 2 different voting regimes.

1) 60% ban threshold. Any no shows will be counted as no ban.
6% absentee case - Even in the worst case scenario where all 6% who fail to show were pro-ban and their votes get converted to no-ban, the vote still passes the 60% threshold.
12% absentee case - On average, 7.92 of the 66 pro ban voters will no-show and their votes convert to no-ban. So on average, the result will be no ban. There's still possible outcomes where a ban occurs with the full range of outcomes going from 54-46 up to the true value of 66-34, but no ban is more likely than not with an expected outcome of 58.1-41.9, below the 60% threshold.

2) 65% ban threshold. Any no shows are ignored. 6% of people at random fail to show up.
6% absentee case- The expected value here is still a percentage margin of 66-34, since the 6% who fail to show is random and ignored.
But the worst case scenario if all 6% were pro-ban is a percentage margin of 63.8-36.2. Essentially, excluding a random sample of no-shows opens up the slim chance that despite a true population mean of 66% that we'd get a sample mean of 63.8%. Conversely, the upper bound is 70.2-29.8 if all the no-shows would have been no-ban voters. But on average, the sample mean stays 66%, which is above the 65% ban threshold.
12% absentee case - The expected value here is still a percentage margin of 66-34, since the 12% who fail to show is random and ignored.
The range of possible margins is 61.4-38.6 if all the missing voters would have been pro-ban up to 75-25 if all the missing voters would have been no-ban. But, again, on average, the sample mean stays 66%, which is above the 65% ban threshold.

Essentially, what you see is that while Hogg is worried about vocal minorities banning Pokemon in extreme cases of absenteeism, in fact all it takes is pretty minor degrees of absenteeism to radically change the expected outcome of a vote.
Even though 66-34 isn't THAT close to the 60% threshold - indeed it's a 10% relative difference or 6% absolute difference, the outcome of the vote essentially hinges on whether this is a suspect test with relatively little absenteeism or one with somewhat higher, but still pretty low, absenteeism.

In the situation where you throw out the no-shows, the expected outcome is still a ban, though there's a possibility of no ban if those who no-show were more likely to come from the pro-ban camp.

One voting regime makes an assumption of preferences at the possible cost of inaccuracy in its assumptions, while the other still allows absenteeism to potentially factor in, but only in the context of who ends up being absent. It makes no assumptions about what position those who fail to vote would hold.

Hogg · Jun 7, 2020

Discussion has unfortunately died down on this, and I think it's an important topic that deserves a resolution. It's something I'd like to make a final decision on before the DLC drops, since I anticipate a surge of suspect tests following that.

From discussing this with other tier leaders, it seems I wasn't alone in my assumption of how and why voting works with no-shows, with most TLs following the same assumption. However, I think a lot of good points have been brought up in this thread. I've been thinking of ways we could cut the baby - keep minority votes from having an outsized influence due to no shows without sacrificing the integrity of the voting process. What if we based our votes as a percentage of the total votes cast (as opposed to the total number of qualified voters), but instituted a quorum rule: that in order for a vote to be considered valid, at least 75% of all qualified voters must vote? This way, if so many individuals don't care enough to vote that it could have an influence on the voting process, the vote itself is nulled and TLs have the opportunity to re-run the vote.

This isn't being made into formal policy or anything yet, it's just my thoughts on a way we can move forward. I still think we need to have a discussion re: abstentions, but I'd like to at the very least resolve the no shows issue before we begin running DLC-related tests.

power · Jun 7, 2020

Hogg said:
Discussion has unfortunately died down on this, and I think it's an important topic that deserves a resolution. It's something I'd like to make a final decision on before the DLC drops, since I anticipate a surge of suspect tests following that.

From discussing this with other tier leaders, it seems I wasn't alone in my assumption of how and why voting works with no-shows, with most TLs following the same assumption. However, I think a lot of good points have been brought up in this thread. I've been thinking of ways we could cut the baby - keep minority votes from having an outsized influence due to no shows without sacrificing the integrity of the voting process. What if we based our votes as a percentage of the total votes cast (as opposed to the total number of qualified voters), but instituted a quorum rule: that in order for a vote to be considered valid, at least 75% of all qualified voters must vote? This way, if so many individuals don't care enough to vote that it could have an influence on the voting process, the vote itself is nulled and TLs have the opportunity to re-run the vote.

This isn't being made into formal policy or anything yet, it's just my thoughts on a way we can move forward. I still think we need to have a discussion re: abstentions, but I'd like to at the very least resolve the no shows issue before we begin running DLC-related tests.

This seems like a reasonable idea at first, but I disagree with this implementation. People who don't vote shouldn't have an influence on the tiering process, period. If you cannot be bothered to vote, your qualification should have no effect on the tiering process; you should not be counted in the denominator of the percentage, and no criteria should depend on you. The fundamental principle behind voting in general is that those who vote have a say, and that's that. If you have a problem with the way a suspect test is being conducted, you can always vote to preserve the status quo.

The argument that there will be cases where people fail to vote and thus tests will decided by only a portion of eligible voters seems to be rather unfounded. There is little to no evidence that this happens at all, let alone frequently. I cannot find a single suspect test in the Blind Voting Forum in recent memory where less than 75% of voters voted. The exception seems to be old generation suspect tests, where voters are decided by other criteria, but even then, 60%+ of eligible voters vote. The people who do not vote should not have any influence on the suspect test just by virtue of qualifying to vote; if you cannot bother to vote, your non-voting should not affect the outcome any more than someone who did not even qualify to vote.

Furthermore, I have issues with the idea of a quorum. This leads to perverse mathematical and game theory incentives to not vote for people who want to preserve the status quo. You should not have to think through the game theory implications of casting a vote before you do so, as this undermines the suspect process.

For example, consider the highly simplistic case where 3 people qualify to vote and two of them are staunchly pro-ban in the suspect thread. If you are the third voter and you are anti ban, a quorum rule would incentivize you to not cast a vote, as casting a do not ban vote would lead to the banning of the suspect.

This can easily be extended to more complicated cases, like if 35% of the population is do not ban and 65% of the voter population is pro ban; in that case, anyone who is anti-ban will just not cast a vote and the suspect will fail. In fact, as long as the quorum percentage is higher than the minimum ban percentage, game theory implies that any one who intends to vote do not ban should instead not cast votes.

The "problem" that this thread is trying to solve is, in my honest opinion, quite ridiculous. There really isn't a reason for voting to operate any different than it does in most voting systems. If 65 people cast votes and 42 vote ban, the percentage used should be 42/65 = 64.6%. The reasons given for making it any more complicated than this are rather flimsy.

HoeenHero · Jun 7, 2020

Hogg said:
What if we based our votes as a percentage of the total votes cast (as opposed to the total number of qualified voters), but instituted a quorum rule: that in order for a vote to be considered valid, at least 75% of all qualified voters must vote? This way, if so many individuals don't care enough to vote that it could have an influence on the voting process, the vote itself is nulled and TLs have the opportunity to re-run the vote.

power said:
Furthermore, I have issues with the idea of a quorum. This leads to perverse mathematical and game theory incentives to not vote for people who want to preserve the status quo. You should not have to think through the game theory implications of casting a vote before you do so, as this undermines the suspect process.

For example, consider the highly simplistic case where 3 people qualify to vote and two of them are staunchly pro-ban in the suspect thread. If you are the third voter and you are anti ban, a quorum rule would incentivize you to not cast a vote, as casting a do not ban vote would lead to the banning of the suspect.

This can easily be extended to more complicated cases, like if 35% of the population is do not ban and 65% of the voter population is pro ban; in that case, anyone who is anti-ban will just not cast a vote and the suspect will fail. In fact, as long as the quorum percentage is higher than the minimum ban percentage, game theory implies that any one who intends to vote do not ban should instead not cast votes.

A solution to this while maintaining Hogg's idea would be to simply remove the quorum, that prevents abstain votes from being able to overthrow a pro-ban majority. Its very unlikely the quorum would be applied in any case that wasn't intentionally organized by one side of the vote in an attempt to force the result to maintain the status quo.

Bughouse · Jun 9, 2020

I mean sure, there's a potential problem with the incentives of having a quorum being established at the same time as voting. That's why generally the way it works is you first check if there's a quorum (i.e. checking that 50 people qualified for voting, not 3) and then you vote and majority wins. And trying to shoehorn the establishment of a quorum together with the vote can cause some perverse incentives that are best avoided.

So I recognize that the quorum-at-the-time-of-voting requirement has drawbacks (and I don't actually think it's necessary), but it's the best option I see available for addressing the concern that you could theoretically have 20% turnout and ban something 11-9. But yes, high quorum requirements can effectively lead to minority rule. If the US senate required a 75% quorum to pass legislation, then obviously one party would just not show up to a vote they don't want to lose and the same could happen in a Smogon suspect test. That's why, in reality, the quorum in the US Senate to do any business at all is 51 of the 100 or 51%. Essentially, the threshold is set such that to do most kinds of business, votes that require supermajority excepted, there's no incentive not to show for the minority. The party in majority has enough votes to both establish a quorum on its own and to get a majority in the face of opposition.

So this demonstrates how the risk of minority rule is substantially lowered by having a lower quorum. 60%, aligning with the supermajority ban threshold probably makes the most sense, but even just lowering the quorum threshold to 66% would significantly decrease the likelihood that a scenario could play out where superminority no-ban voters play the system by no-showing.
60% is the logical "maximum" that the quorum threshold should be for reasons power put well: "as long as the quorum percentage is higher than the minimum ban percentage, game theory implies that any one who intends to vote do not ban should instead not cast votes". In practice, I doubt this to be true since we're not all machines and many people laddering are doing so explicitly for TC, but still the point holds that lower quorum thresholds minimize the bad incentives.

Essentially, the lower the quorum threshold is, the more coordination is needed by someone to lead the "don't vote" campaign and the more obvious it would become that manipulation is occurring, and this sort of manipulation could be punished. No quorum requirement is the logical result, but if that's not an option, then I'd still say 75% is too high and something in the ballpark of 60% would be more reasonable.

cityscapes · Jun 20, 2020

power said:
Furthermore, I have issues with the idea of a quorum. This leads to perverse mathematical and game theory incentives to not vote for people who want to preserve the status quo. You should not have to think through the game theory implications of casting a vote before you do so, as this undermines the suspect process.

For example, consider the highly simplistic case where 3 people qualify to vote and two of them are staunchly pro-ban in the suspect thread. If you are the third voter and you are anti ban, a quorum rule would incentivize you to not cast a vote, as casting a do not ban vote would lead to the banning of the suspect.

This can easily be extended to more complicated cases, like if 35% of the population is do not ban and 65% of the voter population is pro ban; in that case, anyone who is anti-ban will just not cast a vote and the suspect will fail. In fact, as long as the quorum percentage is higher than the minimum ban percentage, game theory implies that any one who intends to vote do not ban should instead not cast votes.

solving this issue doesn't seem terribly complicated to me. you can just prioritize the rule of ending the vote when the pro-ban % reaches a high enough value. this would mean that in your examples, the suspected element would get banned once the required amount of pro-ban votes went through, regardless of what the dnb people did.

the only remaining thing to address is how to handle dnb option (say, 41% dnb when the suspect needs 60% for a ban) compared to indecisive without quorum (say, 30 ban 20 dnb with 100 voters). i absolutely agree that if the dnb option is chosen by the voters, there should be more "protection" for the suspected element than if the vote is indecisive.

i think we should open up the window of resuspecting the element earlier if the vote is indecisive compared to if voters choose dnb, so intentionally not voting is suboptimal whether or not you want to ban the element. for example, a dnb vote might force 6 months before another suspect on that element, but an indecisive vote might have only 2-3 months before the element gets suspected again. kind of like what others were talking about with the abstain vote.

pretty sure this addresses everything

Stratos · Jun 27, 2020

A thought I just had—if Hogg's goal is to prevent a motivated minority from shoving change down the throat of an apathetic majority, there are better ways to measure the size of that apathetic majority than the people who qualified to vote. If you qualified, you are probably closer to the motivated minority: you bothered making a suspect alt, for starters.

We have two pretty solid metrics for measuring the size of a tier's overall dedicated playerbase: entrants in a seasonal, and ladder games. I prefer the former just because 1v1 is going to have a very different ratio of ladder games:players than GSC OU, but either probably works. How about we do something like: If the voter pool for a suspect is smaller than [25%] (this number can be whatever) of the latest seasonal, the vote requires special tiering admin approval to be held? And then remove no-shows from the denominator. I think this has the potential to make everyone happy.

The Official Glyx · Jun 29, 2020

In general, I think there does stand to be a discussion with regards to why there needs to be an implicit bias towards the maintaining of the metagame status quo, when the main point of a suspect test stems from how the suspected aspect in question has displayed the potential to greatly disrupt the status quo, as demonstrated by the respective metagame's council/tier leaders starting up the suspect in the first place. You especially have to consider how ban tests result in bans over twice as often as opposed to maintaining said status quo. Does it really make sense to presume the minority preference for no-shows when the majority preference has a supermajority of outcome support?

This bias manifests in two forms: the 60% supermajority for tiers other than UU and below, and the counting of no-shows as DNB votes.

The supermajority makes sense, since suspects should ideally happen for things that have a realistic, community-backed shot at getting banned, particularly those that have already been discussed in advance. If a vote is divisive enough to fall within that 50-59% range to result in no ban, then it stands to reason that a potentially banworthy/fine aspect may need some more time to develop and/or fresh perspectives to get involved before a more confident suspect result can occur. The added 10% simply stands to account for people who might not have known enough when voting.
Counting no-shows as DNB votes makes considerably less sense; you know how the saying about assuming goes. Treating votes this way just creates incentive for anti-status-quo people who know that there is a realistic chance that they might not get to vote in time to simply not post reqs at all, and also arbitrarily makes people who aren't as mindful of the voting deadline an obstacle for the side looking to change the status quo. If the concept that people might not vote is one that stands to benefit one particular side of a decision, then I believe that goes to show that the system itself is skewed.

Next, let's unpack Hogg's reasoning for why no-shows are treated this way:

Hogg said:
I touched on this in the NatDex thread, but in general, banning something should always be an exceptional event, one that only occurs when a significant majority of the community believes that it is unhealthy or problematic. This means that we should always prioritize getting as large a sample size of active players as we can, and that the burden of proof should always rest on the pro-ban side¹. To some degree that’s represented in the fact that you need a majority (60% in OU tiers, 50% +1 in lower tiers) to ban something, but I also believe that it means that if someone participates in the suspect test but then decides not to cast a vote, they are essentially conceding that they do not strongly feel that something should be banned². Ambivalence in the case of a suspect test should never have the result of making something easier to ban, and “do not ban” should almost always be the default position³. For that reason, I’ve always gone off of the total number of eligible voters when determining how many votes are required to ban something.

I disagree. If something has gotten to the point where it's being suspected, this should be the time where the pro-status-quo side makes a case as to why the status quo should be maintained. Naturally, there will have been discussion leading into the suspect, which means that convincing enough anti-status-quo arguments would've had to have been made to warrant a suspect in the first place.
They are simultaneously also conceding that they do not strongly feel that something shouldn't be banned. Else they would have simply voted not to ban.
The "default" position in decision-making is the lack of a position. You don't simply "default" to knowing exactly what you're going to vote and why right from the start. You need time, experience, and/or research to form an opinion, none of which are characteristic of any kind "default" setting. Ambivalence in a balanced decision-making process only serves to provide advantage to the side that needs less votes to succeed (status quo), and provides no advantage at all if both sides need the same number of votes to succeed.

In general, the rhetoric used here seems to be anti-abstain as a whole. While I understand that you'd prefer to keep abstains and no-shows as separate discussion points, I simply don't think that is feasible, especially when using this kind of wording to make it seem like the lack of a preference towards either side at all is a problem. Additionally, while I understand that banning something should be "hard", why do no-shows in particular need to be arbitrarily weaponized into a hurdle for changing the status quo as opposed to simpler options like forcing a higher supermajority percentage?

Abstentions, especially in this environment where voting in suspects is incentivized by the promise of a badge and/or smogon clout™, are mostly useful as a safeguard for people who don't really care all that strongly about the result but got reqs anyways, either for the sake of getting TC or simply just for flexing purposes; people who might otherwise swing the vote even worse by means of using !pick or just listening to whatever their friend tells them to vote without doing any research of their own. By no means does having abstain as an option solve these problems, but at the very least it serves to mitigate them.

I've got a few ideas that I think would help steer things in the right direction:

Make Abstain a proper and legitimate vote option that does not add to the denominator when the results are tallied
Make no-shows default to Abstain
Infract no-shows*
Impose a minimum voter % quorum, but make it a lower value to keep people from gaming the system with no-shows
Force a re-vote if there are enough no-shows to affect the result or otherwise extend the deadline to like a week or something
Require a set minimum vote count in order for a suspect to be viable*
Add a feature to the Blind Voting subforum that explains why people voting in their first suspect aren't able to post immediately
Make a method for notifying users that they have to vote that is more reliable than just tagging them in the BV thread OP
Post the BV thread when the suspect starts and add people as they post reqs so they can vote (or at least reserve a vote) immediately
- Vote "reservations" being something that could also be done separately in a private thread within a private subforum within Blind Voting to keep from being flooded with public spam, if the above suggestion (or something like it) is approved

* These marked options are capable of being applied differently for official tiers vs non-official

Doesn't have to be all of these (as some conflict with one another), but I do believe these all serve to improve the suspect voting experience for everyone, in addition to avoiding confusion in the results of a suspect. There are still discussions to be had over things like generally smaller suspects (OMs/Oldgens), whether or not all concepts discussed here should also apply to unban tests and to what extent, how these and other methods could potentially be scaled based on voter count, etc. so hopefully a decision on this much can be reached sooner, rather than later with the risk of the current system holding back another (potentially official) suspect from getting accurate results.

Hogg · Jul 4, 2020

Thanks to all for your input. There were a lot of very reasonable concerns raised, and we will be formally and officially adjusting our policy regarding abstentions and no-show votes to accommodate them.

Moving forward, the following clarifications apply to all tiering votes:

"Majority" or "supermajority" votes will be determined as a percentage of the votes cast, not of the total potential number of voters. This aligns with standard parliamentary voting procedure, and will mean that failure to vote will not effectively be considered a "do not ban" vote.
Abstentions will be excluded when calculating votes. This means that abstaining will be functionally the same as not voting. As of now abstain votes will still count toward Tiering Contributor requirements, though this is definitely something I'd like to keep discussing, since there was a lot of disagreement regarding whether or not this should continue to be the case.
All tiering votes must be open for a minimum of 72 hours, and must have a designated end point after which cast votes will no longer be counted. Tier leaders can announce a vote result early if it reaches a threshold where subsequent voters will have no possible impact on the final outcome, but the vote should still remain open until the final end point so that all have a chance to vote. Votes cast after the designated end point will not be counted, without exception, even for TC purposes. Tier leaders may wish to consider extending the voting period to 120 hours (five days) or a full week in cases of tests with a particularly large number of voters or tiers with less active playerbases.

I would still like to look into potentially setting a minimum threshold for voter participation (e.g. if greater than X% of eligible voters do not participate, then the vote must be recast), but further discussion is required for a policy of that nature.

Thanks to all who have contributed to this discussion, both here and in Discord.

Suspect voting: Abstentions and no-shows

Banned deucer.

Like ships in the night, you're passing me by

Ah, you're finally awake

Banned deucer.

wouldst thou like to live fergaliciously?

but then what's left of me?

wouldst thou like to live fergaliciously?

but then what's left of me?

The Enterpriser.

Professor Layton's Little Brother

grubbing in the ashes

"Oh come on, if it worked out then it worked out."

The Enterpriser.

Like ships in the night, you're passing me by

Like ships in the night, you're passing me by

grubbing in the ashes

uh-oh, the game in trouble

The Misspelled Hero!

Like ships in the night, you're passing me by

Take care of yourself.

Banned deucer.

Banned deucer.

grubbing in the ashes