Cleaning up the oldgen teambuilders

Stratos

- OTL
is a Tiering Contributor
As metas get old, they continue to change, but the tiering stays frozen from the last time they were current gen. In some ways this is what we want, because it keeps oldgen lower tiers intact. In other ways, it isn't—namely, it makes the teambuilder's sorting algorithm kind of suck sometimes. Here is a picture of the BW DOU teambuilder as it appears currently, with every Pokemon that did not get even a single use in the last DPL crossed out.

Screen Shot 2020-06-25 at 21.12.47.png


BW DOU is probably the single most affected oldgen on the entire site because the ladder only existed for like 9 months and most players were previously singles players who had no idea what they were doing, so this is a bit of a cherry picked example. However, all oldgens are affected to some extent. Something like 8 mons that are currently "DPP OU" are no longer OU by usage on the current DPP ladder. This is not a catastrophic problem, but I think it's worth looking at solutions.


I've been told that without running a relatively big coding project, there are only two real ways to clean up the teambuilder:

1. Actually just re-tier old gens.
2. Expand the definition of "OU by Technicality" in old gens to include Pokemon that were frozen into OU but aren't popular anymore.

While the former would work just fine for DOU (since oldgen lower tiers aren't played), it wouldn't work for singles oldgens. The latter basically has no drawback as far as I can see though and I encourage us to adopt it.


Then the question becomes, how do we determine which mons to place in "OU by Technicality?"

1. Current ladder usage stats
2. Recent tournament usage stats
3. Subjectively by a council, like how VRs are handled

For tiers which have a ladder option 1 is probably the obvious choice. It's aligned with how we handle current gen tiering and should be a relatively good picture of the current state of the meta. After I brought this up, I learned that apparently, we were already planning on implementing at least this much, which is cool.

For tiers which do not have a ladder it's kinda rough. I don't like tournament usage stats; they're heavily skewed in the early rounds by what samples are available, and in the late rounds just by who makes a deep run. We do weigh ladder stats by skill but I think it's different from "Tornadus has 4% usage because Stratos won the tour and has a team with Tornadus on it." On the flipside, subjective metrics are, well, subjective, which isn't great either.

I think the best path forward is to use recent tournament stats, but just be very conservative about it: don't move a pokemon to "OU by Technicality" unless it has <1% usage. Since we're only using this for "drops", and not rises, this should be fine, and clear out the worst offenders, which is all I'm really looking to do here.

I haven't answered every question: what tournaments should we use? How often should we run usage updates? Who is responsible for gathering tournament data? But I think these minor implementation details can be hammered out after we decide on a general course.
 
Last edited:

Hogg

grubbing in the ashes
is a member of the Site Staffis a Smogon Social Media Contributoris a Community Contributoris a Tiering Contributoris an Administratoris a Tournament Director Alumnus
UU Leader
I mentioned this on Discord, but I'd actually been working with Marty to implement something like this for old gen OU tiers. The basic idea was to take usage-based old gens (i.e. DPP and later), run extended ladder stats (we agreed on 6 month unweighted periods, with 2.28% as the drop cutoff) and give everything below the threshold to an "OU by technicality" status for the builder and Dex. This way we still capture a picture of what the usage looked like at the close of the gen, but we also separate out things that are trapped in OU by the fact that we no longer shift things based on usage and don't screw up old gen lower tiers.

So, for example, this is what it looked like when we ran those stats last year for DPP OU:

Code:
Electivire moved from OU to "OU by technicality" 
Snorlax moved from OU to "OU by technicality" 
Umbreon moved from OU to "OU by technicality" 
Ninjask moved from OU to "OU by technicality" 
Togekiss moved from OU to "OU by technicality" 
Dusknoir moved from OU to "OU by technicality" 
Vaporeon moved from OU to "OU by technicality" 
Smeargle moved from OU to "OU by technicality"
(Disclaimer: the current stats may not still look like this; these were how they looked last year.)

Anyhow, we'd actually planned to implement this in January... and then Sword and Shield tiering turned out to be more complicated than I'd initially expected and I completely forgot about it until you and others mentioned a similar move on Discord :psynervous:. The original plan was to run these stats every January and July, so I've followed up with Marty and we plan to get the ball rolling on this after July's tier shifts are completed.

Obviously this only works for old gens that have an active ladder that runs year-round, so it would not help for things like BW DOU, but it's still relevant here. I'd love to see some discussion on how we can do something similar for tiers without a year-round ladder as well. (I personally prefer objective methods like tournament usage stats to subjective methods like VRs or council fiat, but again open to seeing what others think.)
 

Molk

Godlike Usmash
is a Top Tutor Alumnusis a Site Staff Alumnusis a Team Rater Alumnusis a Super Moderator Alumnusis a Community Contributor Alumnusis a Live Chat Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
Wanted to just pop in and say that i strongly support an "in tier by technicality" section on the teambuilder for old gens. So recently i've been working with a few friends on a project to document the history of (and to an extent revive) BW RU, and i believe that BW RU is a perfect example of this. If you look at the recent RUPL usage stats and compare them to what's listed as RU on the strategydex and the PS! teambuilder, the detachment from reality that the strategydex and teambuilder have to what's currently being used is almost comical. To put this into perspective, my friends and i took a look at the cumulative usage stats from the past two RUPLs. We used the 3.41% cutoff that was being used for usage stats at the time and saw what was actually above that cutoff in the RUPL stats during these two years. As expected, a ridiculous amount of Pokemon that are listed as RU on the strategydex/teambuilder didn't see enough usage in RUPL to break this cutoff.

Accelgor, Archeops, Bouffalant, Cinccino, Crawdaunt, Crustle, Dusknoir, Electivire, Feraligatr, Ferroseed, Gallade, Galvantula, Hariyama, Hitmonchan, Klinklang, Magmortar, Medicham, Poliwrath, Quagsire, Sandslash, Scyther, Typhlosion, Whimsicott


Absol, Accelgor, Archeops, Bouffalant, Cinccino, Clefable, Crawdaunt, Crustle, Dusknoir, Electivire, Feraligatr, Galvantula, Hariyama, Hitmonlee, Hitmonchan, Klinklang, Magmortar, Manectric, Mesprit, Omastar, Poliwrath, Quagsire, Sandslash, Scyther, Typhlosion, Whimsicott


Of course these are from just one year and the sample size for a tournament is small, so what i decided to do is cross reference these two lists and see what pokemon didn't make this cutoff in either year, just as i expected, there's still a large amount of pokemon there are listed as RU on the teambuilder/strategydex that currently aren't being used very much in RUPL.

Accelgor, Archeops, Bouffalant, Cinccino, Crawdaunt, Crustle, Dusknoir, Electivire, Feraligatr, Galvantula, Hariyama, Hitmonchan, Klinklang, Magmortar, Poliwrath, Quagsire, Sandslash, Scyther, Typhlosion, Whimsicott


I used the same process but looked through the entirety of the usage stats compiled, and there were plenty of Pokemon on this list that didn't see a single use in at least one of the years of RUPL, and three (Dusknoir, Hariyama, Quagsire) that didn't see any usage at all over a full two years of RUPL, yet are still listed as RU in the dex and teambuilder.

Then of course there are Pokemon that are listed as NU on the teambuilder that are seeing significantly more usage and are considered way more viable than these Pokemon, yet are not considered RU on the teambuilder (Golurk and Alomomola being the most egregious in my opinion). There's really not much that can be done about this without tiers getting unfrozen, which we obviously don't want, but it's funny and kind of sad that there are like 10+ basically irrelevant Pokemon in this metagame that are being shown to the player before Pokemon that never made the cutoff on the ladder when the tier was current but are common and strong picks in modern tourney play.

I think that it's important for past gens to have what's listed in the builder be relevant to what's actually being used in tournament play due to a concept known as non context usage. When a new player is looking to teambuild or play in these past generation metagames, they're often not going to know where to find resources outside of the teambuilder on showdown to build their team with, and therefore are often going to be strongly biased towards picking Pokemon that the teambuilder or strategydex shows them (as this new player would assume they're relevant to the metagame, even if they've fallen out of favor). If what's shown on the teambuilder/strategydex is inaccurate to the current state of the metagame, this new player will be at a massive disadvantage when trying to learn the meta and/or build a proper team that's adequately prepared for the threats they might come across.

For example, consider the following:

https://pokepast.es/abc3cf7654c624fe

My friends and i are going to be running a BW RU tournament for fun, this is an actual team that one player who was interested made by going through the teambuilder himself without asking for any advice from us on what to watch out for. He was shown Whimsicott and Galvantula on the teambuilder even though they're not considered very relevant compared to threats such as Lilligant and Moltres (both of which the team he came up with is extremely weak to) and gravitated towards these two for his team when there are options that would be more consistent (this isn't to say they're completely unviable, just there are other options that should be considered first). A new player might not know to go on the smogon forums or look for resources at all, let alone know where to find these resources, especially for a past gen format such as BW RU or BW DOU, the teambuilder and the strategydex might be the only things they have. For what it's worth, this player actually played quite a bit of BW OU when it was current, so he's not new to pokemon or anything, the teambuilder is just so extremely out of touch with what's relevant in the BW RU metagame that he was left completely lost when trying to build a team.

You can see evidence of non context usage in basically any metagame, past or present, it's a big part of the reason why people on the lower end of the ladder tend to use Pokemon that are considered unviable by the higher level playerbase, it's in the teambuilder, so it might be viable in their eyes. Looking at the Roa spotlight ladders (small sample size, i know, but it's the best you're going to get for an old meta like this) for BW RU, a lot of these pokemon that weren't seeing much if any usage in RUPL have vastly inflated usage with often very negative weighting attached to them. Making the teambuilder more accurate to what's being used in tournament play for these past generation metagames makes it much easier for newer players who want to try these metagames out of curiosity to build for them and start playing at a decent level, and there really isn't any drawback to doing it either.




It's quite possible that there's something i could've missed, so just point it out if you see it.

TLDR:
1593308937259.png
 
Last edited:

Marty

Always more to find
is a member of the Site Staffis a Battle Simulator Administratoris a Programmeris a Super Moderatoris a Top Researcher
Research Leader
Hogg explained this above, so here's what the "drops" to OU by technicality would look like for the past 6 months, averaged equally with a 2.28% "drop" cutoff.
Gen 7
No changes!

Gen 6
Dugtrio moved from OU to (OU)
Heracross-Mega moved from OU to (OU)

Gen 5
Blissey moved from OU to (OU)
Donphan moved from OU to (OU)
Dugtrio moved from OU to (OU)
Jolteon moved from OU to (OU)
Metagross moved from OU to (OU)
Venusaur moved from OU to (OU)

Gen 4
Dusknoir moved from OU to (OU)
Electivire moved from OU to (OU)
Jolteon moved from OU to (OU)
Ninjask moved from OU to (OU)
Shaymin moved from OU to (OU)
Smeargle moved from OU to (OU)
Snorlax moved from OU to (OU)
Tentacruel moved from OU to (OU)
Togekiss moved from OU to (OU)
Umbreon moved from OU to (OU)
Vaporeon moved from OU to (OU)

I ran July to December of last year as well, just to show that this is actually quite consistent as a method to remove Pokemon that no longer see usage from the top part of the teambuilder.
Gen 7
No changes!

Gen 6
Dugtrio moved from OU to (OU)
Heracross-Mega moved from OU to (OU)

Gen 5
Dugtrio moved from OU to (OU)
Haxorus moved from OU to (OU)
Metagross moved from OU to (OU)
Vaporeon moved from OU to (OU)
Venusaur moved from OU to (OU)

Gen 4
Dusknoir moved from OU to (OU)
Electivire moved from OU to (OU)
Jolteon moved from OU to (OU)
Ninjask moved from OU to (OU)
Smeargle moved from OU to (OU)
Snorlax moved from OU to (OU)
Togekiss moved from OU to (OU)
Umbreon moved from OU to (OU)
Vaporeon moved from OU to (OU)

The best way to handle formats without a constant ladder remains to be seen, but there are already some good ideas in this thread to lay the groundwork.
 
Thank you Marty, this results for Gen 4 seems to show that it is a very good way of indicating true usage as all pf these mons have kind of dropped off the radar in the post-DPP years. Will there be an opposite idea implemented as well, an UU by technicality? For example, Clefable is very clearly OU if going by today's usage - but of course we can not remove it from the DPP UU tier. This would help clarify the most used mons even more.
 

Stratos

- OTL
is a Tiering Contributor
Understandably considering the shit going down in IS lately this hasn't pulled many replies. Also controversial threads tend to pull more replies and we seem to be in pretty unanimous consent that we should implement something like this. However I don't want this to die without action being taken, so after thinking about it some more, here's my proposal for a full policy (for non-laddered oldgens):

--

Once per year, prior the July tier shift, tier leaders can provide oldgen tournament stats for their respective oldgens to uh... whoever is in charge of doing tier shift stuff. The tier leaders are allowed to pick whatever tournaments they feel are representative of the state of the metagame, following three criteria:
1. If a tour game is included in the stats, every game from that round and later rounds of the tour must also be included
2. The tours included must be hosted on Smogon
3. The section leader must say which tours/rounds were included in their stats

According to these stats Pokemon will move—downward only—into the (XU by technicality) tier with the following cutoffs, based on sample size:
100+ games (i.e. 200+ teams): 2%
50+ games (i.e. 100+ teams): 1%
25+ games (i.e. 50+ teams): 0%
<25 games (i.e. <50 teams): no shifts made

Tier leaders are not required to submit these stats. If they don't, no shifts will happen.

--

to explain my reasoning behind each piece:

> Once per year, prior to the July tier shift

Non-laddered oldgens don't shift very fast. Once per year I think strikes the right balance of not requiring people to compile stats too often while staying on top of things. I chose July just because it's currently July. Ideally I would like for a special shift to occur shortly after this policy is adopted, and then be regularized into July.

> tier leaders can provide oldgen tournament stats for their respective oldgens

Mainers of each tier are going to be the right people to pick here, as they are both the only ones who are gonna care enough to compile these stats and the ones who know their community well enough to know which tours to use. Tier leaders don't have to do the grindy work themselves, they can and probably will delegate (or be bothered by oldgen crusts like myself to please take the stats I already compiled), but they're the natural choice to own the final product.

Compiling the stats should not be too hard thanks to the valiant work of eo making this app https://replaystats-eo.herokuapp.com/. Really all that is required is picking the replays to be used.

> 1. If a tour game is included in the stats, every game from that round and later rounds of the tour must also be included

I trust people to not be goons about what tours they choose but this is just an anti cherry picking provision to be safe. If the wording is unclear, it basically means if I use a replay from top 32 for my stats, I have to use every replay for top 32 and up. So I can't just pick the replays of the players I think are good, or leave out specific replays I don't personally think are "representative."

> 2. The tours included must be hosted on Smogon

Basically just following the precedent from the RBY situation. I doubt this is a real problem for any usage-tiered oldgen though that was pretty RBY specific.

> 3. The section leader must say which tours/rounds were included in their stats

Just for transparency. I don't think anyone's stats are ever going to be contentious, but in case they are...

> downward only

While I would love Tomahawk's suggestion for lower tier mons that have become popular to move into some (Not OU by Technicality) bracket, I'd like to focus the policy on things that don't require any programming for now, so that may have to wait.

> cutoffs, based on sample size

I absolutely pulled these numbers out of my ass, and I'm open to them being tuned, but I think some kind of gradation is definitely a good idea. This allows less active metas to still address the absolute worst cases, while refining our tiering more toward accuracy when we have a larger sample size to be confident with.

> Tier leaders are not required to submit these stats. If they don't, no shifts will happen.

I don't want to put much more work on people's plates than they already have. If nobody cares enough to compile these stats, we can assume the meta hasn't shifted enough for people to feel like the teambuilder is badly out of date. And even if that assumption is false, apparently nobody cares.

--

Thoughts?
 
Has there been any progress with regards to the possible implementation of this proposal? We have stats readily available as evidenced by prior posts so there seems to be no roadblocks with regards to fixing this issue so it would be a shame to see nothing happening with this topic.
 

Plague von Karma

Rhamantaidd
is a Pokemon Researcher
My first post in Policy Review, if I say anything iffy please don't be afraid to let me know! I followed this back before I got a Researcher badge, and helped Molk out with his look-see on the BW RU usage stats. While I don't want to go out and parrot what he said, there are some small points I want to bring up.

This proposal would solve some things I noticed with BKC's attempts to fix the DPP OU tiering. Last year, he tried to move some Pokemon to BL, but Hogg rightfully stopped it. This was due to it being viability-based, contrary to the usage-based tiering that was run at the time. An implementation of this system would, theoretically, solve the conflict there, allowing the changes to go ahead in this fashion instead.

Marty mentioned that formats without a constant ladder would be difficult to tier, and while I agree, I thought I'd give some thoughts on the matter. Ruins of Alph Spotlight ladders could be a decent place to start, as even formats like BW RU have had ladders before (Jan 2018, Nov 2019 1500, I believe Molk didn't note one of these). It's definitely not the best data, since it does suffer from "no-context usage", but it's something. I think Molk's suggestion to use tournament usage stats would work excellently alongside this, possibly painting an ever-so-slightly better representation of the format. I think a yearly review using data like this could work well, possibly alongside some kind of renovation to how Staff Picks are handled. It's definitely not the best and still quite cloudy, but it's something. Perhaps merging the stats or something could be done, but that may create some inaccuracies. In fact, due to the problem with the presentation of the format in the teambuilder in the first place, I think it could be a larger problem than one would think. One of the biggest things holding Past Gen players back on PS is the way the Pokemon are represented. It's currently in alphabetical order (which I have my own grievances on, but this isn't the thread for that), which I genuinely have thought causes Pokemon like Ambipom to suffer inflated usage on the low ladder. It sounds comical, I know, but it's a thing. There's definitely more that goes into this, and you can bring up counterpoints; Durant, a big threat in BW RU, was almost consistently below 8% usage for the longest time. You can also see the issue with no-context usage in those rotational ladder statistics; Lilligant, a threat that was almost banned once, was just barely over the hypothetical usage cutoff, despite being used significantly more in RUPLs.

The ideas Stratos had were great. I think with this implemented, RoA Rotational statistics would improve dramatically and gain a lot more objectivity. Without Pokemon that have been awful for years potentially serving as bad influences, this "Soft-BL" would at least make the ladders better by giving a proper representation of the format.

I worked on a project with Molk and some RU heads a while ago compiling a bunch of BW RU data, including usage graphs on some of the bigger threats. Given me and the person helming the project parted ways, it's largely dead now. I'll leave it here if it helps the discussion at all.

I really want to see this proposal implemented. It would be so, so helpful. Thanks for hearing me out!
 
Last edited:

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top