Metagame Analyses: Gen VI changes

alkinesthetase · Aug 31, 2012

crap this is super cool. good stuff antar, statistics is the shit. will you be providing analysis of both bias and stalliness, or will you try to focus on just one?

Antar · Sep 1, 2012

alkinesthetase said:
will you be providing analysis of both bias and stalliness, or will you try to focus on just one?

Nah, I'm not reporting bias.

Antar · Sep 7, 2012

So based on the fact that the distributions of "stalliness" in the most of the metagames basically look like bell curves (instead of seeing peaks around certain values), it looks like the only good way to come up with stall-score-based cutoffs for "Hyper Offense" vs. "Offense" vs. "Balance" vs. "Semi-Stall" vs. "Full Stall" will be for people to give me some teams to score.

So here's my request: post a team in this thread in PO/PS-importable plaintext, tell me how YOU would classify the team, and I'll post the stall score. Please use the CODE tags so that this thread doesn't turn into a massive block of text.

Here's an example from one of my teams:

Code:

Tachikoma (Rotom-Mow) @ Choice Scarf
Trait: Levitate
Shiny: Yes
EVs: 4 HP / 252 SAtk / 252 Spd
Modest Nature
IVs: 0 Atk
- Volt Switch
- Will-O-Wisp
- Leaf Storm
- Trick

George III (Slowking) (M) @ Leftovers
Trait: Regenerator
EVs: 252 HP / 4 SAtk / 252 SDef
Calm Nature
IVs: 0 Atk
- Slack Off
- Fire Blast
- Scald
- Thunder Wave

Tuesday (Qwilfish) (F) @ Leftovers
Trait: Intimidate
EVs: 252 HP / 252 Def / 4 Spd
Impish Nature
- Spikes
- Waterfall
- Poison Jab
- Aqua Jet

Haderach (Steelix) (M) @ Leftovers
Trait: Sturdy
EVs: 52 HP / 204 Atk / 252 SDef
Sassy Nature
IVs: 0 Spd
- Dragon Tail
- Gyro Ball
- Earthquake
- Stealth Rock

Maeby (Audino) (F) @ Leftovers
Trait: Regenerator
EVs: 252 HP / 4 Def / 252 SDef
Calm Nature
IVs: 0 Atk
- Toxic
- Protect
- Wish
- Heal Bell

Magna (Amoonguss) (F) @ Black Sludge
Trait: Regenerator
EVs: 252 HP / 252 Def / 4 SAtk
Bold Nature
IVs: 0 Atk
- Spore
- Sludge Bomb
- Giga Drain
- Synthesis

This team is definitely full stall.

Now I run my algorithm over it and find...

Bias: -1864
Stalliness: 2.17296992568 (13.53 T/KO)
Tags: ['weatherless']

"T/KO" means "turns per KO," as in the number of turns the stalliness metric predicts should elapse in your average battle between KOs. Keeping in mind that what you'll more likely see in a battle is something based on the average stalliness of the two teams, this actually matches my experience decently well, as when this team battles other stall teams, the battles will regularly stretch into the 70-80 turn range and will end 5-0 or 4-0.

alkinesthetase · Sep 7, 2012

i'd look at the rmt archive for this; it's got tons of great stuff and it even has the team style/importables written down for convenience. i'll just pluck out two to start with and speculate on what i think their stalliness should look like.

for example: http://www.smogon.com/forums/showthread.php?t=3463417. the archive deems this balanced. it consists of utility physdef jellicent, tspike/spin forry, sdef heatran, sd virizion, np mew and subdd nite. i offer this example mainly because it's a balanced team - 3 bulky mons and 3 boosting sweepers - and could land on any side of the stalliness metric, so it's interesting to see where teams like this actually end up.

http://www.smogon.com/forums/showthread.php?t=3458118 textbook heavy offense in the deoxys-S era. screen/sr deoxys-S, sd viriz, all-out-offensive ddnite, all-out-offensive dd gyara, taunt/dd wallbreaking haxorus and offensive sd scizor. this should be really easy to categorize as extreme offense, every single mon has a 252/252 spread and a boosting move except for an obvious dual screen deoxys.

Antar · Sep 7, 2012

alkinesthetase said:
i'd look at the rmt archive for this; it's got tons of great stuff and it even has the team style/importables written down for convenience. i'll just pluck out two to start with and speculate on what i think their stalliness should look like.

I thought about that, but I'm lazy.

for example: http://www.smogon.com/forums/showthread.php?t=3463417. the archive deems this balanced. it consists of utility physdef jellicent, tspike/spin forry, sdef heatran, sd virizion, np mew and subdd nite. i offer this example mainly because it's a balanced team - 3 bulky mons and 3 boosting sweepers - and could land on any side of the stalliness metric, so it's interesting to see where teams like this actually end up.

Bias: -1340
Stalliness: 0.753748735461 (5.06 T/KO)
Tags: ['weatherless']

A little stallier than I would have expected for balanced (I was thinking balanced was going to be -0.5 to +0.5). Note that the "bias" score falls into the "balanced" range (-1500<bias<600). Really this just appears to be a reflection that the distribution is a bit skewed (that is, you see a larger range on the stall side than on the offense side).

http://www.smogon.com/forums/showthread.php?t=3458118 textbook heavy offense in the deoxys-S era. screen/sr deoxys-S, sd viriz, all-out-offensive ddnite, all-out-offensive dd gyara, taunt/dd wallbreaking haxorus and offensive sd scizor. this should be really easy to categorize as extreme offense, every single mon has a 252/252 spread and a boosting move except for an obvious dual screen deoxys.

Code:

Deoxys-Speed @ Light Clay
Trait: Pressure
EVs: 252 Spd / 252 HP / 4 Def
Timid Nature
- Safeguard
- Taunt
- Reflect
- Light Screen

Virizion @ Life Orb
Trait: Justified
EVs: 252 Spd / 252 Atk / 4 SDef
Jolly Nature
- Swords Dance
- Close Combat
- Leaf Blade
- Stone Edge

Dragonite @ Lum Berry
Trait: Multiscale
EVs: 6 HP / 252 Atk / 252 Spd
Adamant Nature
- Dragon Dance
- Outrage
- Fire Punch
- ExtremeSpeed

Gyarados @ Life Orb
Trait: Intimidate
EVs: 252 Spd / 252 Atk / 4 Def
Adamant Nature
- Dragon Dance
- Waterfall
- Double-Edge
- Bounce

Haxorus @ Lum Berry
Trait: Mold Breaker
EVs: 126 HP / 252 Atk / 132 Spd
Adamant Nature
- Dragon Dance
- Outrage
- Earthquake
- Taunt

Scizor @ Life Orb
Trait: Technician
EVs: 224 HP / 252 Atk / 32 Spd
Adamant Nature
- Swords Dance
- Bullet Punch
- U-turn
- Superpower

Bias: 640
Stalliness: -2.09125482297 (0.70 T/KO)
Tags: ['weatherless']

Note that going off bias alone, Innocent Criminal would have classified this team as merely "offense," due to the mixed EVs on Scizor and Haxorus and the HP investment on Deo-S. Of course, he had the exception to the rule that a screener makes it HO. Meanwhile, take the Light Clay off Deo-S, and the team still has a stall score of -1.92, which is most definitely HO.

Let me do one more that I've worked with quite a bit recently: august's The Running of the Bulls

Code:

TURRETBOT (Smeargle) @ Focus Sash
Trait: Own Tempo
EVs: 4 HP / 252 Atk / 252 Spd
Jolly Nature
- Spikes
- Explosion
- Spore
- Stealth Rock

HENRYVIIIBOT (Slowking) (M) @ Choice Specs
Trait: Regenerator
EVs: 248 HP / 252 SAtk / 8 Spd
Modest Nature
IVs: 0 Atk
- Surf
- Psyshock
- Trick
- Fire Blast

Kabutops @ Life Orb
Trait: Weak Armor
EVs: 252 Atk / 4 Def / 252 Spd
Adamant Nature
- Rapid Spin
- Swords Dance
- Stone Edge
- Aqua Jet

DOGBOT (Entei) @ Choice Band
Trait: Pressure
EVs: 252 Atk / 4 Def / 252 Spd
Adamant Nature
- Flare Blitz
- ExtremeSpeed
- Stone Edge
- Sleep Talk

ANGRYBOT (Tauros) @ Life Orb
Trait: Sheer Force
EVs: 252 Atk / 252 Spd / 4 Def
Naive Nature
- Rock Climb
- Earthquake
- Zen Headbutt
- Fire Blast

GEICOBOT (Sceptile) @ Flying Gem
Trait: Unburden
EVs: 32 Spd / 224 SAtk / 252 Atk
Naive Nature
- Acrobatics
- Leaf Storm
- Hidden Power [Fire]
- Rock Slide

Bias: 1472
Stalliness: -1.22751600486 (1.28 T/KO)
Tags: ['weatherless']

Bias is more than 1200, so Innocent Criminal says it's HO. Based on how I've seen it play, I would agree.

Antar · Sep 7, 2012

Oh shit. I actually clicked the link to the "rmt archive." That's actually a link to the INDEX, which I didn't realize was complete with importables and classification.

So let's go through these.

Shrang's I used to hate Rain Dance (Offense)

Bias: 8
Stalliness: -0.13358472882 (2.73 T/KO)
Tags: ['rain']

IC would have called that "balance," and my first instinct with a stall score like that would be to agree.

Layla's Iconic (Offense)

Bias: -528
Stalliness: -0.371123101207 (2.32 T/KO)
Tags: ['sand']

IC would have called this "bulky offense" I guess.

More when I get the chance, but this is a really awesome resource. Thanks for pointing it out to me, alkinesthetase.

Princess Bubblegum · Sep 7, 2012

I just wasted a ton of my life but yay, my ultra stall team gets 26.76 and a -2498 bias B).

One point I want to bring up is Chansey and other low attack pokemon, Chansey, all by itself without eviolite, gets 4.27! I think that is a bit off to be honest. As always though, great work.

alkinesthetase · Sep 7, 2012

^ EDIT: i think i corrected my math now
anyway the stalliness metric is the base-2 logarithm of how many turns it would take for a mon to kill itself. in that case log2 X = 4.27, X = 2^4.27 which is over 19. it does sound pretty legit when chansey would be softboiling while throwing seismic tosses at its mirror image; it'd probably take a lot of turns for it to kill itself.

moreover, mons with low attack and big defenses are pretty much only useful for their defensive utility. when you have that much defense and that little attack, the number is unlikely to be an accurate reflection of the turns taken to KO yourself, but i still think such mons should significantly swing the team in favor of stall because of their massive lean to defense. just including one of them on a team should dramatically alter it. i could call a team bulky offense if it had a cb ferrothorn on it, but i'd be rather dubious if it had a chansey

oh and antar, np

Princess Bubblegum · Sep 7, 2012

If this purely about how long it would take for a pokemon to kill itself in a mirror, then Chansey should be 2.81 considering it takes realistically 7 turns to kill itself with seismic toss. I guess if Chansey was using ice beam it would take 19 turns, but thats pretty unrealistic (I think you run out of PP before then lol) Blissey could be 3, but I guess in theory 4 does apply if Blissey decides to go with flamethrower (it has just enough PP for 16 turns...).

Not saying there anything wrong, but Chansey with such high a number is admittedly an interesting quirk of the system.

Edit: gotcha antar :)

Antar · Sep 7, 2012

Scarfwynaut said:
I just wasted a ton of my life but yay, my ultra stall team gets 26.76 and a -2498 bias B).

For stalliness, don't forget to divide by six.

GatoDelFuego · Sep 7, 2012

I'm curious to see how stally full-stall is.

Tyranitar (F) @ Choice Scarf
Trait: Sand Stream
EVs: 4 HP / 252 Atk / 252 Spd
Jolly Nature
- Crunch
- Pursuit
- Superpower
- Stone Edge

Heatran (F) @ Leftovers
Trait: Flash Fire
EVs: 248 HP / 252 SDef / 8 Spd
Calm Nature
IVs: 0 Atk
- Stealth Rock
- Lava Plume
- Toxic
- Roar

Slowbro (F) @ Leftovers
Trait: Regenerator
EVs: 252 HP / 252 Def / 4 SAtk
Bold Nature
IVs: 0 Atk
- Scald
- Psychic
- Slack Off
- Toxic

Amoonguss (M) @ Leftovers
Trait: Regenerator
EVs: 252 HP / 4 SAtk / 252 SDef
Calm Nature
- Spore
- Clear Smog
- Hidden Power [Ice]
- Giga Drain

Gliscor (F) @ Toxic Orb
Trait: Poison Heal
EVs: 252 HP / 184 Def / 72 Spd
Impish Nature
- Earthquake
- Substitute
- Toxic
- Protect

Forretress (F) @ Leftovers
Trait: Sturdy
EVs: 252 HP / 4 Atk / 252 Def
Relaxed Nature
IVs: 2 Spd
- Hidden Power [Ice]
- Volt Switch
- Rapid Spin
- Spikes

Here's a little rain offense-balance team. I'd classify it offense with some defensive pivots, mainly that the strategy is to have offense with things to take hits, not stally mons with things to clean up. It'll be interesting to see how this gets classified.

Politoed (M) @ Leftovers
Trait: Drizzle
EVs: 252 HP / 252 Def / 4 SAtk
Bold Nature
- Scald
- Toxic
- Protect
- Perish Song

Tornadus-Therian (M) @ Life Orb
Trait: Regenerator
EVs: 252 SAtk / 4 SDef / 252 Spd
Hasty Nature
- Hurricane
- Superpower
- U-turn
- Rain Dance

Keldeo @ Leftovers
Trait: Justified
EVs: 4 HP / 252 SAtk / 252 Spd
Timid Nature
- Calm Mind
- Hydro Pump
- Secret Sword
- Hidden Power [Ghost]

Dugtrio (M) @ Focus Sash
Trait: Arena Trap
EVs: 4 HP / 252 Atk / 252 Spd
Jolly Nature
IVs: 20 HP
- Stealth Rock
- Earthquake
- Memento
- Reversal

Amoonguss (M) @ Leftovers
Trait: Regenerator
EVs: 252 HP / 4 SAtk / 252 SDef
Calm Nature
- Spore
- Giga Drain
- Stun Spore
- Hidden Power [Ice]

Jirachi @ Leftovers
Trait: Serene Grace
EVs: 252 HP / 32 Def / 224 SDef
Sassy Nature
- Wish
- Thunder
- Iron Head
- Protect

Also, antar, this looks amazing.

Antar · Sep 7, 2012

GatoDelFuego said:
I'm curious to see how stally full-stall is.

Bias: -2188
Stalliness: 1.94288779003 (11.53 T/KO)
Tags: ['sand']

Compare to my RU full-stall team: bias -1864 / stalliness 2.173

Here's a little rain offense-balance team. I'd classify it offense with some defensive pivots, mainly that the strategy is to have offense with things to take hits, not stally mons with things to clean up. It'll be interesting to see how this gets classified.

Bias: -764
Stalliness: 0.195072164949 (3.43 T/KO)
Tags: ['rain']

I think this is going to end up being "balance."

Princess Bubblegum · Sep 7, 2012

I don't know if this has been brought up, but this is similar to a tread made a while back:
http://www.smogon.com/forums/showthread.php?t=24931

In actually wondering if you can use this method to make sort of a mixed bulk defense teirs, I am sure it can be done, but I am unaware of the logarithms involved to do this. Maybe you can give some incite antar.

Antar · Sep 9, 2012

Scarfwynaut, that's pretty cool.

So I ran my algorithm over the 54 non-LC, non-VGC teams in the RMT archive and plotted bias and stalliness vs. how the RMT archive classifies the teams. The results are in this blog post.

As you can see, neither metric does a particularly good job of suggesting defined cutoffs for the playstyles. As I say in the post,

That’s really disappointing, because what is says is that I’m missing something. What makes a team heavy offense vs. offense? balance vs. semi-stall? Is it just a judgement call, or is there something concrete that I can try to incorporate?

So I'm asking: what am I missing? What makes "Hawaiian Air" offense and "Brute Force" heavy offense? Why is "5 Minute Excadrill Sand Team" full- and not semi-stall?

SpecsX · Sep 9, 2012

Well, "Hawaiian Air" Uses ChestoRest Kingdra and SubCharge Magnezone, which may have changed their rating. "Brute Force" Utilizes a Dual Screener, which changes the bias for that team.

alkinesthetase · Sep 9, 2012

brute force is really a textbook example of hyper offense: dual screen lead + 5 setup sweepers. you can't really get any more hyper than that. however, i would say hawaiian air is ALSO a hyper offense team (it might not have screens, but it's all about attacking attacking attacking and establishing momentum by sheer force and revenge potential), but the line between "offense" and "hyper offense" is one that needs a bit more defining, just in general.

jimera's definition of hyper offense, which i believe revolutionized the term for me, is that a) the team maintains momentum by quickly revenge killing anything that attempts to exert offensive pressure on it, so that it can resume pushing back, and b) the team's attackers all take out one another's checks, counters and walls so that no defensive team can hope to wall any mon safely, lest it give a free switch to something else that will proceed to set up and end the game. twash's team totally looks like that to me: several strong fast attackers with revenge potential (eg mamo), plus a setup sweeper or two that closes out the show. offense might follow a similar concept, but for me, i'd guess that the line is between a team whose supporters drive towards that goal, and a team that has no supporters and simply uses every single pokemon for it (bar a suicide lead that might establish favorable conditions to develop play). please feel free to argue because i think the terms *are* a bit loose and could use discussion.

when i look at yee's sand team though, it reeks of full stall. i don't think it would be considered semistall at all, although perhaps that's what the archive labels it? sdef jirachi and standard slowbro tank stuff and serve as the team's walls, rade/hippo/dtail all provide a mixture of respectable bulk and residual damage (hazards and phazing), and excadrill is the cleaner. some will disagree with me, but i think one fast, aggressive pokemon cannot take away from the full-stall-ness of a stall team, since that mon often ends up in the dual role of killing mons that the stall team has allowed to set up too much, and cleaning up worn-down teams in the mid-late game. such a pokemon does not prevent a team from being full stall... in my opinion, at least.

i think the real problem is that there is a ton of overlap between the team types that we're trying to define, and i think establishing the boundaries is really difficult, unless we start looking at the roles played by a mon, and how the team actually functions in the hands of its creator. those are things that are really difficult to look at when breaking the team down into mons, moves or EVs but they make all the difference for those subtle lines between bulky offense, balance, and semistall, or that kind of stuff. perhaps it is time to delve even deeper and come up with ways of defining a pokemon's roles: does it generate momentum? set up residual damage? serve as an all-out wall? sweep? offensive pivot? defensive pivot? where are the lines in between? is it even possible to define them? those are the questions we need to answer, i think. if we can put together the data we have about each mon and convert it into a list of roles that the mon plays on a team, we can see the roles that the team as a whole puts the most stress on, and those most important roles define what the team needs to function properly - ie how it works. that is the core of playstyle, imo.

Antar · Sep 10, 2012

Defining playstyle as a combination of roles rather than through a metric like stalliness is definitely one route we could go down, but I'm not quite ready yet to give up on stalliness.

It's looking like I can improve on agreement by adjusting my moveset modifications, raising and lowering some weights while adding or removing other modifications.

For example, hail and sand should count towards stall, rapid spin is really a stall move (magic bounce, though, I think plays better for offense, so I'm leaving it off for now), I was obviously weighting will-o-wisp WAY too strongly in favor of stall, and setup moves are not all created equal (Shell Smash is far more offensive than Calm Mind).

So that's what I'm working on now. We'll see how far this can get me.

BurningMan · Sep 10, 2012

Antar said:
Defining playstyle as a combination of roles rather than through a metric like stalliness is definitely one route we could go down, but I'm not quite ready yet to give up on stalliness.

i don't think that defining it through the combination of roles would be wise, simply because it would get so much more complicated due to the absurd amount of possible (and valid) combinations.

Antar said:
For example, hail and sand should count towards stall, rapid spin is really a stall move (magic bounce, though, I think plays better for offense, so I'm leaving it off for now), I was obviously weighting will-o-wisp WAY too strongly in favor of stall, and setup moves are not all created equal (Shell Smash is far more offensive than Calm Mind).

I don't think that Sand should count towards stall, a lot of people use TTar simply to fight rain and sun teams (not to mention that TTar is an excellent pokemon all by it self). Sun on the other hand should clearly count towards offense because the amount of sun stall teams is completly negligible wich already brings me to the next point Sun teams almost always carry a spinner and often even a defensive one like Forretress or Donphan without being defensive in any way outside of this pokemon (and often Ninetails). Maybe it would be the easiest solution simply always categorizing Ninetails as an ultra offensive Pokemon to make up for this flaws of the metric.
Counting Rapid Spin as a stall move is a bit one sided as Starmie is one of the most useful partners for Pokemon such as Cloyster, Volcarona, Dragonite and many other heavy offensive, but SR weak Pokemon though you are absolutly right that many offensive teams prefer Magic Bounce Espeon/Xatu are not that easy to fit on any team and require some prediction so maybe you shouldn't count it too much towards stall.
I am not too sure about set-up moves sure bulk up and calm mind are not as offensive as the other boosts (and could be treated more as defensive boosts), but i wouldn't say that SD or DD are less offensive than shell smash ( wich is also a special case when you look at how often it also involves baton pass) and i wouldn't rank Cloyster as a more offensive mon than SD Terrakion or DD Salamence.

Overall big thanks to you for doing all the work to get even better statistics

alkinesthetase · Sep 10, 2012

For example, hail and sand should count towards stall, rapid spin is really a stall move (magic bounce, though, I think plays better for offense, so I'm leaving it off for now), I was obviously weighting will-o-wisp WAY too strongly in favor of stall, and setup moves are not all created equal (Shell Smash is far more offensive than Calm Mind).

it remains to be seen if these adjustments really do make the metric more accurate, so it's worth trying, but i wouldn't agree that sand should be stally, and rapid spin is not always stally either (although it's definitely more defensive than magic bounce). both of those things are all-around useful and even offensive teams will often incorporate a spinner if it doesn't hurt their momentum excessively to do so. perhaps make magic bounce offensive, because if anything it's an ability that's really all about momentum and offensive teams tend to appreciate it much more than defensive ones. just take a look at lavos's archetypal sun team lol

i also think that if anything cm and bulk up should be rated down in offensiveness, rather than shell smash being rated up. shell smash's effect is way more aggressive than say sd or nasty plot, but in the end it tends to wind up on the same kind of team as those other boosters would. on the other hand, things like cm and bulk up can just as easily work on a slow and steady late game finisher as they would on an all-out offensive mon - in fact, now that i think about it, i'd say cm and bulk up actually work BETTER as late game bulky setup moves than they do as offensive ones. perhaps they should actually be rated as stall moves! just think about how much rarer CM latios is than specs/recover+3/4 attacks. only 10% of latios run CM, where as a whopping 65% of latias do. the difference between the two mons, and the reason they do or do not run CM, is obvious.

Antar · Sep 12, 2012

I finished my modifications to my stalliness metric, and I think I'm ready to call it done.

With my revisions complete, I also feel confident in defining the cutoffs between the various stall-related playstyles. First, a graph, showing how my original and newly revised stall scores score the 54 non-LC, non-VGC teams in the RMT archive:

And now the cutoffs:

Hyper-Offense: stall score <= -1 (corresponding to 1.5 turns/KO)
Offense (including Bulky Offense): -1 < stalliness <= 0 (between 1.5 and 3 turns/KO)
Balance: 0 < stalliness <= 1 (3-6 turns/KO)
Semi-Stall: 1 < stalliness <= ~1.58 (6-9 turns/KO)
Full Stall (a.k.a. Stall) stalliness > ~1.58

The limitation of this would be that my system has no way of differentiating between offense and bulky offense, which is fine with me.

Also, to quote the blog post,

As you can see, it’s not perfect, but a lot of the teams that are incorrectly classified are pathological cases (for example, the Balance team that almost scores high enough to be Full Stall was designed by Molk and is built around a Scraggy). Frankly, no one has adequately explained to me the difference between Offense and Heavy Offense (Hawaiian Air is the Offense team with the lowest stall score, and it features two exploders and two more Pokemon that set up and, frankly, seems much more offensive than the "Hyper Offense" team Reflections).

Any final comments?

alkinesthetase · Sep 12, 2012

results are looking pretty legit. curious to know which teams landed where in the metric vs what they were classified as in the archive index. perhaps it could give us some insight as to how we classify teams, vs how a computer would like to classify teams. otherwise the results seem to speak for themselves; i don't think there are any arguments left to make. let's see what happens if this is run on the official ladders!

Antar · Sep 13, 2012

alkinesthetase said:
curious to know which teams landed where in the metric vs what they were classified as in the archive index

The following teams had discrepancies between archive classification and "stalliness" classificiation:

Offense

Apocalypse @ stall of -1.04 (HO). Close enough.
Choice team etc @ -1.09. Ditto.
Final Attack Orders @ -1.74. The comments actually classify this as Heavy Offense! So consider this a stealth success!
Hawaiian Air @ -1.76
Iron Maiden Can't Be Fought! @ -1.77. Description sounds like classic HO to me. I note that no NU teams in the archive are classified as Hyper Offense.
Too Many Mices @ -1.80. Ditto. Also kind of an idiosyncratic team.
Eriatarka @ 0.23
I used to hate Rain Dance @ 0.20
Lysergic Acid Diethylamide @ 0.15

Bulky Offense

Won't You Stay @ -1.55. Another idiosyncratic team.
Dancing Free @ 0.05. Close enough.
Download Destruction @ 0.004. Um... Ditto.
Ultimate Balance @ 0.38. Dude. It's CALLED Ultimate Balance
Negative 3 @ 1.26. Wish/Protect Jirachi + no-attack Skarm + Slowbro throws everything off.

Balanced

Team WOLF GANG @ -0.75. Two SDers and a Nasty Plot Celebi. How is this not offense?
Drown All @ 1.21. I blame Lugia.
The Little Lizard that could @ 1.45. Molk be trollin'.

Semi-Stall

Subfreeze @ 0.97. Close enough.

All Full Stall teams were classified correctly.

alkinesthetase · Sep 13, 2012

Iron Maiden Can't Be Fought! @ -1.77. Description sounds like classic HO to me. I note that no NU teams in the archive are classified as Hyper Offense.

Too Many Mices @ -1.80. Ditto. Also kind of an idiosyncratic team.

i agree, these both look like HO

Eriatarka @ 0.23

I used to hate Rain Dance @ 0.20

blame these both on the ferrothorn, and CM users (wishcm rachi in the first, manaphy in the second are most responsible, i'd suspect). the stalliness looks reasonable imo

Lysergic Acid Diethylamide @ 0.15

it's the toxic staller gliscor and the sdef heatran here. with mons as offensive as hydreigon and cb terrakion, this team definitely plays offensive despite those two. the score looks pretty legit though so not much room to complain, it's PRACTICALLY on the offense side of the split.

Won't You Stay @ -1.55. Another idiosyncratic team.

the fact that it's all choiced except a hitmontop makes me think HO right away so the score seems to agree with a surface analysis. i guess it was mainly called bulky offense because the mons themselves (azumarill, zapdos in particular) are a bit bulkier and not as aggressive as setup-based HO would be.

Dancing Free @ 0.05. Close enough.

Download Destruction @ 0.004. Um... Ditto.

Ultimate Balance @ 0.38. Dude. It's CALLED Ultimate Balance

i had individual things to say for each of these but they all sounded the same.. basically i agree. they all have enough bulky mons that they might as well be balance, and really the line between bulky offense and balance is a fiiiine one

Negative 3 @ 1.26. Wish/Protect Jirachi + no-attack Skarm + Slowbro throws everything off.

yeah this team is actually really stally even though the objective is obviously to set up for a garchomp sweep. i would rather say that it's semistall with garchomp as the cleaner, than offense with the rest of the team as support lol

Team WOLF GANG @ -0.75. Two SDers and a Nasty Plot Celebi. How is this not offense?

no argument, i agree completely

Drown All @ 1.21. I blame Lugia.

intriguing because i would definitely call this balance. i kind of wanna break this down more finely to see what it looks like but in any case i agree, lugia is the most obvious culprit

The Little Lizard that could @ 1.45. Molk be trollin'.

this team is kinda on the defensive side what with flareon/tangela/quag FWG and a very bulky variant of scraggy that probably looks like a stallmon. however it's obviously not THAT stally. i have an interesting suggestion here because i don't think this team deserves that high of a stalliness: perhaps eviolite should count for less in lower tiers? as the higher evolutions get caught in higher tiers, lower tiers can often make use of eviolite to turn their pre-evos into viable competitors, especially if those mons have either boosting moves with which to turn their bulk into a sweep, or some natural bulk to begin with. the obvious culprit here is the scraggy whose moveset, EVs and item look stall-minded at first glance, but are actually meant to play as an offensive late-game cleaner. this could have significant effect in LC as well because even the most aggressive of LC mons can run eviolite and suddenly go from all-or-nothing suicide sweeper to bulky setup mon.

Subfreeze @ 0.97. Close enough.

i'd categorize this team as bordering on full stall actually if you ask me. the victini is probably what makes all the difference. interspersing even ONE offensive uturner into a team swings it towards offense quite significantly, in terms of actual play. score looks reasonable.

overall looks pretty legit. also, lol at how all full stall teams were classified correctly. full stall is just THAT obvious lol

ssbbm · Sep 13, 2012

drown all is actually pretty stallish in the sense that you mainly get damage from spikes + lugia with a check-all in kyogre

i guess you could call it balance because it's not exactly as stally as groudon / blissey / forretress / giratina / latias / filler, along with the fact that tr is used to full stall so he probably rates it as a more offensive team

i would call it semi-stall though, as it's not really spike stacking offense as much as spikes + sr + phazer + kyogre + kyogre check + spinblocker

eric the espeon · Sep 18, 2012

Read through all the blog posts and this thread. Interesting project, and cool scatter plots. Mostly looks very good, though there were a few things which seem like they may be questionable. If you're not keen on revisiting things that's fair enough, but my thoughts:

Leftovers do nothing to the metric. I based this decision on my observations of the differences in metric between bulky- and fully-offensive sets. In some ways, Leftovers are the “anti-Life Orb” in that it adds health where Life Orb takes it away, but the difference between a Life Orb and Leftovers set shouldn’t be a whopping 1.0. Fine then, you might suggest, split the difference and have it be Life Orb -0.25, Leftovers +0.25. The two problems with this are that (1) I truly believe that Life Orb should have the same effect as Choice items, and (2) in my experience, Leftovers is the item you throw on your Pokemon when you don’t have anything better to give it. I see plenty of Leftovers Pokemon who run offensive (even heavily offensive) sets. On the other hand, you rarely see a bulky Pokemon go with Life Orb.

I don't think Leftovers should have an equal effect to Life Orb since Leftovers both has a smaller per-turn effect on HP and does not change damage output, however giving it no effect (equal to no item, Quick Claw, BrightPowder, or even the 20% type boosting items in the initial blog post) seems.. not right. Leftovers may be used as a default item in a few cases, but even in those cases it is very clearly increasing the defensive ability of the holder, and it seems like this should be represented by a + on the metric, albeit one smaller than Life Orb's -.

Also, why exactly do you feel LO should have exactly the same effect as Choice items? From my experience Choice users tend to be more wallbreakers than full sweepers, and unlike choice items Life Orb has a significant direct harm to the holder's defensive ability. My gut feeling is to give LO a higher rating, though I'm not entirely sure about that.

Second and much more major point, you seem to be discarding the lower offensive and defensive stats entirely.

I propose to measure “stalliness” based on the the number of hits of a (non-STAB) base-120* neutrally effective move it would take for a Pokemon to KO itself, or, more precisely, its mirror (ignoring items, abilities, status and actual movesets, and assuming the Pokemon is using its stronger attack stat against its stronger defense stat).

This will mean your formula cannot take into account the advantages of being a mixed sweeper, or, more importantly, the fact that some Pokemon may have one decent defensive stat but be extremely frail to the other kind of attacks (Cloyster, Aggron, Blissey, and Mantine are excellent examples, but even more mildly unbalanced defenses will cause a Pokémon's stallishness to be overestimated to a lesser extent). I can see why you'd want to make that simplification, dealing with both stats can get kind of messy, but this seems likely to be the biggest issue with your formula's correctly assigning stallishness from stats. For attacks perhaps raising both to a power, adding them, then taking that power's root of the result would be effective? A larger power would mean a smaller boost for mixed attackers, and visa versa. Ideally this would only be applied if the set used both physical and special moves. A similar method (perhaps with a different power) could be used for defenses.

Doing this may complicate the effects of certain items. In particular, Eviolite and the Choice items could no longer reasonably said to grant exactly the same boosts. Applying the item boosts in the initial calculation would solve this. And doing the same with Life Orb changes the previous point, applying the boost to both stats then having a smaller modifier simply from HP loss which is near equal or equal seems sane.

One-time use items subtract 0.5 from the metric. The idea here is that consumption is antithetical to stall. Stall teams are often in pretty much the exact same position 50 turns in as they are 25 turns in. It’s what makes stall so annoying. There is an exception to this reasoning: Harvest and Recycle. See below. Note that this negates the effect of Red Card, which I believe is well and good.

Generally a good idea, but I'd suggest some change to how healing berries and berry juice are handled. In LC holding an item like that gives a massive boost to endurance, even though Eviolite seems much more popular in 5th gen and Berry Juice is banned from both. Making these items have the same effect as Salac or a Gem seems backwards. I'd suggest making one time use items which heal health either have +0.5 or at least be neutral (also helps with VGC/doubles/triples, where Sitrus is somewhat viable, and clearly more defensive than other one-time use items). Status healing berries are more debatable. They're used with Rest for one time healing, but of course that's still just a one time thing, not full stall's style, but also not hyper offense style.

The move Protect (and variants) adds 1 to the metric. From a mathematical standpoint, it’ll take you at least twice as many turns to KO this Pokemon.

This is assuming Protect is used purely as a stall tactic, rather than to activate a status orb, delay for more Speed Boosts, or for scouting dangerous moves as a frail sweeper. Also, while Protect is used to stall along with Wish/Leech Seed/Toxic/other passive damage, those forms of passive damage already give a significant + score. Adding a whole +1 to that seems too much. Many offensive sets will use Protect only occasionally, and even defensive seed/toxic stallers risk giving free turns by using it predictably, so the mathematical double turns is not generally applicable (except for Stallrien and friends).

The ability Regenerator adds 0.5 to the metric. It’s less simply because it recovers less health.

Halving the change to the metric because of a fairly small difference in health gained, when it can be activated on the switch rather than needing a turn to just heal.. hm, maybe it's not quite as stally as others, but 0.5 does seem slightly low.

Also missing items which seem possibly worth considering:
Expert Belt
20% type boosting items and plates
Wise Glasses and Muscle Band
Most species specific boosting items (Soul Dew, DeepSeaTooth, DeepSeaScale, Light Ball, Thick Club, Adamant Orb, Lustrous Orb, Griseous Orb, and maybe Ditto's two, Lucky Punch, and Stick?) when held by the correct species
Shell Bell
And to generalise this to all generations, maybe Berserk Gene?

Metagame Analyses: Gen VI changes

alkinesthetase

<@dtc> every day with alk is a bad day

Antar

Antar

alkinesthetase

<@dtc> every day with alk is a bad day

Antar

Antar

Princess Bubblegum

alkinesthetase

<@dtc> every day with alk is a bad day

Princess Bubblegum

Antar

GatoDelFuego

The Antimonymph of the Internet

Antar

Princess Bubblegum

Antar

SpecsX

alkinesthetase

<@dtc> every day with alk is a bad day

Antar

BurningMan

fueled by beer

alkinesthetase

<@dtc> every day with alk is a bad day

Antar

alkinesthetase

<@dtc> every day with alk is a bad day

Antar

alkinesthetase

<@dtc> every day with alk is a bad day

ssbbm

eric the espeon

maybe I just misunderstood