Metagame Analyses: Gen VI changes

SpecsX · Sep 18, 2012

eric the espeon said:
Read through all the blog posts and this thread. Interesting project, and cool scatter plots. Mostly looks very good, though there were a few things which seem like they may be questionable. If you're not keen on revisiting things that's fair enough, but my thoughts:

Why exactly do you feel LO should have exactly the same effect as Choice items? From my experience Choice users tend to be more wallbreakers than full sweepers, and unlike choice items Life Orb has a significant direct harm to the holder's defensive ability. My gut feeling is to give LO a higher rating, though I'm not entirely sure about that.

Second and much more major point, you seem to be discarding the lower offensive and defensive stats entirely.

This will mean your formula cannot take into account the advantages of being a mixed sweeper, or, more importantly, the fact that some Pokemon may have one decent defensive stat but be extremely frail to the other kind of attacks (Cloyster, Aggron, Blissey, and Mantine are excellent examples, but even more mildly unbalanced defenses will cause a Pokémon's stallishness to be overestimated to a lesser extent). I can see why you'd want to make that simplification, dealing with both stats can get kind of messy, but this seems likely to be the biggest issue with your formula's correctly assigning stallishness from stats. For attacks perhaps raising both to a power, adding them, then taking that power's root of the result would be effective? A larger power would mean a smaller boost for mixed attackers, and visa versa. Ideally this would only be applied if the set used both physical and special moves. A similar method (perhaps with a different power) could be used for defenses.

Doing this may complicate the effects of certain items. In particular, Eviolite and the Choice items could no longer reasonably said to grant exactly the same boosts. Applying the item boosts in the initial calculation would solve this. And doing the same with Life Orb changes the previous point, applying the boost to both stats then having a smaller modifier simply from HP loss which is near equal or equal seems sane.

Generally a good idea, but I'd suggest some change to how healing berries and berry juice are handled. In LC holding an item like that gives a massive boost to endurance, even though Eviolite seems much more popular in 5th gen and Berry Juice is banned from both. Making these items have the same effect as Salac or a Gem seems backwards. I'd suggest making one time use items which heal health either have +0.5 or at least be neutral (also helps with VGC/doubles/triples, where Sitrus is somewhat viable, and clearly more defensive than other one-time use items). Status healing berries are more debatable. They're used with Rest for one time healing, but of course that's still just a one time thing, not full stall's style, but also not hyper offense style.

Halving the change to the metric because of a fairly small difference in health gained, when it can be activated on the switch rather than needing a turn to just heal.. hm, maybe it's not quite as stally as others, but 0.5 does seem slightly low.

Also missing items which seem possibly worth considering:
Expert Belt
20% type boosting items and plates
Wise Glasses and Muscle Band
Most species specific boosting items (Soul Dew, DeepSeaTooth, DeepSeaScale, Light Ball, Thick Club, Adamant Orb, Lustrous Orb, Griseous Orb, and maybe Ditto's two, Lucky Punch, and Stick?) when held by the correct species
Shell Bell
And to generalise this to all generations, maybe Berserk Gene?

First of all, VGC and LC aren't calculated in his blog posts. They are so radically different from other metas that they should be formula'd differently.
Eh... I think Life Orb is less offensive than Choice Band/Specs, and more offensive than Scarf. This is going on the power output alone, but it's certainly debatable.
Expert Belt and Species Specific Boosting Items(SSBI) These really should be in there. EBelt is a very viable item that shouldn't be ignored(especially with Genesect just being released, and the SSBI are HUGE differences to a pokemon's playstyle. THe best thing to do here is to calculate them with the same level as the moves they imitate(i.e. Soul Dew=Calm Mind, Thick Club=Swords Dance). Certainly the easiest thing to do.

Antar · Sep 25, 2012

@SpecsX--your comments will be addressed below as well.

Okiedoke. First the missing items.

eric the espeon said:
Most species specific boosting items (Soul Dew, DeepSeaTooth, DeepSeaScale, Light Ball, Thick Club, Adamant Orb, Lustrous Orb, Griseous Orb, and maybe Ditto's two, Lucky Punch, and Stick?) when held by the correct species

Most of these should definitely be added, with species enforcement.

Soul Dew should subtract 0.5 from the metric--as much as specs
Light Ball, Thick Club and DeepSeaTooth should subtract 1.0 from the metric
DeepSeaScale should add 1.0 to the metric
Metal and Quick Powder do not work after Ditto has transformed, so they should be ignored.
The crit items (Stick, Lucky Punch, but also Scope Lens) are too inconsistent to be factored in.

Also missing items which seem possibly worth considering:
Expert Belt
20% type boosting items and plates
Wise Glasses and Muscle Band

Turns out that log_2(1.2)=.263 = ~.25. I'll consider adding them in.

Shell Bell

Shell Bell really has only one viable use in Singles: Sturdy-FEAR. But I already account for FEAR, and I don't want to give too much weighting for what is really a gimmicky strategy.

And to generalise this to all generations, maybe Berserk Gene?

I had to look this one up. Raises attack one stage, then confuses. If/when we implement Gen II on PS, this will fall into the category of "one-time use" items, which puts it on the same footing as Leichi Berry, which I think makes sense.

Also, why exactly do you feel LO should have exactly the same effect as Choice items? From my experience Choice users tend to be more wallbreakers than full sweepers, and unlike choice items Life Orb has a significant direct harm to the holder's defensive ability. My gut feeling is to give LO a higher rating, though I'm not entirely sure about that.

You'll note that I named by metric "stalliness" rather than "offensiveness." That's because it really is more about stall vs. "anti-stall" than stall vs. offense. There are two ways to combat stall. The first is to "wallbreak," the second is to sweep. You can't (or rather, it's hard to) sweep with a Band or Specs, but as you say, Choice items are potent for destroying stall. Weighting one over the other is a tricky business.

Leftovers

Several times as I played with the metric I was confronted with the problem of "intent," which ideally would not be an issue at all. I had to ignore modifications from many abilities simply because they tend to have no practical use on most sets, and if I accounted for them, the metric would get thrown off. Not accounting for Leftovers at all was thus more of a "fitting" decision than anything else. However, when I add in Expert Belt et al., I'll consider throwing in a +0.25 for Lefties as well.

discarding the lower offensive and defensive stats entirely.

I stand by this decision. Basically, my reasoning assumes that if a matchup is unfavorable, one can always make a switch. You aren't going to leave Blissey in against Machamp. You aren't going to try to take out Steelix with Druddigon. Are mixed sweepers more potent than single-side sweepers? I've wrecked enough teams with mix Deo-A (Espeed / Superpower / Ice Beam / Psycho Boost) to know that walling such sets is much more difficult. But it's been my experience that truly effective mixed sets are few and far between and don't really need the extra weight to be classified as the deadly beasts they are.

As for "double walls," a well-constructed stall team is dependent not on one Pokemon being able to slow all attacks but on several Pokemon having the synergy to completely block each other's weaknesses. In other words, it's been my experience that a wall really is as good as its strongest defensive side. On my super-stally RU team, Audino NEVER takes a Close Combat, and Steelix NEVER takes a Flamethrower.

In LC holding an item like that gives a massive boost to endurance, even though Eviolite seems much more popular in 5th gen and Berry Juice is banned from both. Making these items have the same effect as Salac or a Gem seems backwards.

SpecsX correctly pointed out that I'm not touching doubles/triples with a ten-foot pole (Item Clause alone completely ruins many of my assumptions). But this metric *should* still be somewhat applicable to LC.

The bottom line is that one-use items are the antithesis of stall. In non-LC, the only reason to use one (namely Sitrus Berry) is to give the user a little more time to set up/execute a sweep (Belly Drum Linoone, I'm looking at you). Oran Berry has very much fallen out of disfavor in the current LC metagame, due to Eviolite netting more than 10 "effective hit points" on any Pokemon whose HP is greater than 20 (and most Pokemon with less than 20 hit points are too frail to benefit from Oran anyway). The only time you really see Oran is with Sturdy Pokemon, who often use it the same way that BD Linoone uses Sitrus.

If Berry Juice is ever unbanned or Gen IV Little Cup ever goes live on PS, I will reconsider this decision (will probably play it safe and make them neutral).

This is assuming Protect is used purely as a stall tactic, rather than to activate a status orb, delay for more Speed Boosts, or for scouting dangerous moves as a frail sweeper.

This is true, but Guts and Speed Boost subtract 0.5 from the metric to counteract this effect somewhat. What I *need* to do is add a rule that says "if Guts / Flare Boost / Toxic Boost / Quick Feet AND Toxic/Flame Orb, subtract 1," which would cancel out Protect.

Speed Boost should also probably have a more significant weight (Moody as well).

Beyond that, I've found my Protect: +1 (which is based on solid mathematical footing, after all) to work pretty well.

Re: Regenerator... Halving the change to the metric because of a fairly small difference in health gained, when it can be activated on the switch rather than needing a turn to just heal.. hm, maybe it's not quite as stally as others, but 0.5 does seem slightly low.

It's simply due to the fact that less health is gained, and, in the presence of entry hazards, this will be even more true. Again, it seems to work out okay.

Okay, so making the modifications I talk about above (with and without Leftovers: +0.25), and running the new revised metric against the RMT archive, here's what I get:

As I expected, Blue and Red are pretty much identical (most of these new modifications weren't applicable to the teams in the archive--if they had been, I would have probably implemented them earlier). Meanwhile, adding the Leftovers modification seems to have a negligible effect on HO, but as the metagames get stallier, the difference becomes more and more pronounced.

It's hard to decide which version (red or magenta) is "better." Certainly, I could fix the SS-FS cutoff simply by bumping it from 9 turns/KO to 10 (in order to get that stalliest magneta team in line with the others would necessitate bumping it to the very not-round number of 10.4--I might prefer to move it up to 11--itself an ugly number--and lose the least stally full stall team).

I'll have to think on this. I would definitely welcome input.

marilli · Sep 25, 2012

First, one thing I thought was making a cut-off for the ratings of people to actually count in this calculation. As much as I don't like this hate on 'noobs', in spirit of pure statistics they ruin the curve. They let their 'stally' team die off quickly. They don't get enough kills with their offensive teams. But unlike the '1337' stats, you want to lower the cutoff significantly - you will want 2 people who know what they're doing battling each other. If the cutoff is too high, then your sample size will be way too small.

Secondly, how about making broader categories than HO, Offene, BO, Balance, Semi-Stall, Full Stall? I'm sure you've heard of people knock on terms such as 'Semi-Stall.' Offense, Balance, Stall might be a relatively more 'objective' comparison. Going along the lines of Archived teams, note there is an 'All-Out Offense' team which makes you wonder if all these can be mutually exclusive / can be even differentiated to begin with. Yes, you can see that teams classified as 'Heavy Offense' or 'Semi Stall' are really rare. You also see the stats for 'Offense' and 'Bulky Offense' is nearly identical. Something to think about - it really does not mean any decrease in the quality of this analysis.

If this idea doesn't interest you, though, and you're concerned with making the minute distinctions, go ahead. It's just that no one will 100% agree with whatever definitions you stick with.

Next up is the concept of typing / resistances as it affects the 'stalliness.' However stally moves you give your ice-type, chances are it will not survive very long, etc. Not exactly sure how to proceed with this, though. Probably as some attack types are more common in one tier as opposed to another. Dragon resistances are not quite a thing in lower tiers, for an obvious example.

Finally, some small nitpicks: I'd say Taunt counts as a good anti-stall measure and probably signifies that the team is rather offensive, especially considering its short duration. Taunt + WOW + Recovery is indeed a thing, but that leads to an overall increased stalliness. Probably a -0.5? But that is up to you, I guess! Also, I'm not sure if Sub should be negative, given the very nature of the move, it delays the kill.

I'm interested in where this is headed! (I'm a big math / GRAPHS person irl)

Antar · Sep 25, 2012

Thanks for the feedback!

Amarillo said:
cut-off for the ratings

First off, this metric is completely divorced from rating. If you're concerned about how fuzzy those scatterplots look, I completely agree, and I do strongly suspect the correlation will be stronger when I limit myself to only players with a certain rating.

broader categories

I make these distinctions purely for testing purposes to see how well predictions based on "stalliness" agrees with RMT Archive classification. If you look at last month's metagame analyses, you'll see that instead of reporting the percentage of teams that were semi stall vs. full stall, I instead plot a histogram of stalliness values.

Next month, I plan to do BOTH.

Next up is the concept of typing / resistances as it affects the 'stalliness.' However stally moves you give your ice-type, chances are it will not survive very long, etc. Not exactly sure how to proceed with this, though. Probably as some attack types are more common in one tier as opposed to another. Dragon resistances are not quite a thing in lower tiers, for an obvious example.

I have considered this, yes. I may add it to future revisions, but for now I've ignored it.

I'd say Taunt counts as a good anti-stall measure and probably signifies that the team is rather offensive, especially considering its short duration.

Taunt also works really well at preventing sweepers from setting up and from entry hazards from being put on the field--two roles that are quite important for stall.

Also, I'm not sure if Sub should be negative, given the very nature of the move, it delays the kill.

Sub-stall strategies (Sub+Roost, Sub+Leech Seed) work out to be net positives, but in general, subs are set-up moves. Stall relies on frequent switching. If you try to do that while setting up subs, you're going to wear down your health AWFULLY fast.

marilli · Sep 25, 2012

Well, I've never really seen a good stall team use Taunt for those purposes: they have phazers for stopping setup sweeps anyways and defensive taunt users are often too slow to stop the setup right away. Same goes for stopping hazards and you usually stop them from coming up through good play and Rapid Spinners (which you should have on stall.) Sub without recovery is a hallmark of an offensive Pokemon, like Gengar. But Sub isn't exactly an offensive measure in its own. Rather, Sub is arguably Gengar's (and other frail sub-users') sole defensive mechanism. I know that most gengars do not run either set, but I don't see why Sub+3 Attacks should be considered less stally than an all-out attacking set.

/end rant

Now that's out of the way, I am curious about something. You mentioned Stall relying on frequent switching and all: how does your stalliness index correlate to how often a team switches in on the opponent? You mention how you can figure out turn / KO ratio. Can you do something similar like turn / Switch ratio?

Idk if you have time for all this, but you could go further. You could analyse the ratio of turns for the type of move you make. 'Offensive Moves' (Damage Dealing moves / Boosting setup moves for example), 'Defensive / Utility Moves' (recovery, hazard stack, toxic, burn), 'Switching' (when you switch something in and it doesn't die), 'Sacrifice' (switch something in and gets KOed, or you leave in something to die), and 'Others' (Double Switches, etc, I'm sure I covered most of my bases)

By definition, if you are sponging around by switching, you are making a defensive play. If you are sacking and attacking, you are making an offensive play. Instead of looking at the team you look at how they play. I hope this would bring up a very to-the-point correlation.

Unless, of course, if this is too hard to implement.

Princess Bubblegum · Sep 25, 2012

Yeah taunt can go either way, on many stall teams I have had I used taunt on Gliscor so it can beat bulk-up Conkeldurr. If you want to count it as offense go ahead, I mean sleep moves are considered offensive, and plenty of stall teams use spore Amoonguss, I doubt it would hurt the stalliness.

Antar · Sep 26, 2012

Amarillo said:
Now that's out of the way, I am curious about something. You mentioned Stall relying on frequent switching and all: how does your stalliness index correlate to how often a team switches in on the opponent? You mention how you can figure out turn / KO ratio. Can you do something similar like turn / Switch ratio?

Yeah, I can. Here's how my usage data scripts work.

First, I have a "Log Reader" that reads in the raw PS battle logs (which are in a format called json and are MUCH MORE machine-readable than, say PO html logs). This "Log Reader" takes all the events in the PS battle log and distills them down to a summary of the events of the battle, such as the following:

Code:

[I][REDACTED][/I] (bias:-280, stalliness:0.39415090447, tags:hail,balance)
Staryu (3,3)
Houndour (2,7)
Snover (1,4)
Misdreavus (0,3)
Mienfoo (1,2)
Bronzor (0,0)
***
[I][REDACTED][/I] (bias:616, stalliness:-1.57102191703, tags:weatherless,hyperoffense)
Staryu (0,3)
Cyndaquil (1,3)
Mienfoo (0,4)
Stunky (1,5)
Gastly (0,0)
Cacnea (1,4)
@@@
Mienfoo vs. Snover: Snover was switched out
Mienfoo vs. Misdreavus: Mienfoo was switched out
Misdreavus vs. Stunky: Misdreavus was KOed
Houndour vs. Stunky: Stunky was switched out
Houndour vs. Staryu: Houndour was switched out
Snover vs. Staryu: Staryu was switched out
Gastly vs. Snover: Gastly was KOed
Mienfoo vs. Snover: Snover was switched out
Mienfoo vs. Staryu: Mienfoo was KOed
Cacnea vs. Staryu: Staryu was switched out
Cacnea vs. Snover: Snover was KOed
Cacnea vs. Houndour: Cacnea was switched out
Houndour vs. Staryu: Staryu was KOed
Houndour vs. Stunky: Stunky was KOed
Cyndaquil vs. Houndour: Houndour was KOed
Cyndaquil vs. Mienfoo: Cyndaquil was switched out
Cacnea vs. Mienfoo: Cacnea was u-turn KOed
Cyndaquil vs. Staryu: Cyndaquil was KOed

Note that I don't record the moves, only the results of the individual matchups. The idea was that, at some point when I could figure out how to present the data, I'd publish a "matchup matrix" which told you what happened statistically when Pokemon A went up against Pokemon B (from that, you can get statistics for who the best counters are for each Pokemon, that sort of thing).

But since each switch is recorded, and the number of turns in the battle is easily distilled from the header (the two numbers in parentheses by each Pokemon's name are (KOs,turns in battle)), this would be trivial to look at. Some time after September 1, I'll give it a look.

Idk if you have time for all this, but you could go further. You could analyse the ratio of turns for the type of move you make. 'Offensive Moves' (Damage Dealing moves / Boosting setup moves for example), 'Defensive / Utility Moves' (recovery, hazard stack, toxic, burn), 'Switching' (when you switch something in and it doesn't die), 'Sacrifice' (switch something in and gets KOed, or you leave in something to die), and 'Others' (Double Switches, etc, I'm sure I covered most of my bases)

Much of this would involve modifications to my log reader, but that's certainly doable.

eric the espeon · Sep 26, 2012

If I don't reply to something, either you've made the change I suggested, or I accept your reasoning for not making the change and don't have a counter argument. I've reordered a bit.

I stand by this decision. Basically, my reasoning assumes that if a matchup is unfavorable, one can always make a switch. You aren't going to leave Blissey in against Machamp. You aren't going to try to take out Steelix with Druddigon. Are mixed sweepers more potent than single-side sweepers? I've wrecked enough teams with mix Deo-A (Espeed / Superpower / Ice Beam / Psycho Boost) to know that walling such sets is much more difficult. But it's been my experience that truly effective mixed sets are few and far between and don't really need the extra weight to be classified as the deadly beasts they are.

As for "double walls," a well-constructed stall team is dependent not on one Pokemon being able to slow all attacks but on several Pokemon having the synergy to completely block each other's weaknesses. In other words, it's been my experience that a wall really is as good as its strongest defensive side. On my super-stally RU team, Audino NEVER takes a Close Combat, and Steelix NEVER takes a Flamethrower.

While you're unlikely to leave a one side only wall in against their weakness for long, a one side only wall seems massively less defensive than a wall which could take hits as well as that one side wall from both physical and special attackers. You can switch in an unfavorable matchup, but doing so shows that this set has been unable to stall out the foe, and had to bring in a team mate. Using a fairly small power for the multiplication you could easily enough give an appropriate boost. There being few or many deadly mixed sweepers seems irrelevant to the extra potency given by the ability to strike with physical and special attacks.

If Blissey had 130 base defense it would be rated as exactly the same stalliness as it currently is. And it would be staying in against pretty much any physical attacker. If Druddigon had 120 base special attack it would again have exactly the same rating currently, and it really would be blasting right through Steelix with Flamethrower. These are of course thought experiments of extreme cases, but it helps to clarify, and you're going to be having these same effects on a smaller scale for practically all sets (especially those which split EVs, for equal attack Pokémon like Deo-A if you're putting some EVs in both Atk and SpA you're going to be classed as more stally than just dumping it into one attack). With the method I suggested previously ((Atk^x+SpA^x)^-x, same for defenses, with perhaps a different x value), you could weigh the lesser defensive and offensive stat as strongly or weakly as appropriate with relative ease. Pokémon which are very capable of taking one kind of attack would still be given a fairly high stalliness rating so long as x is not very small, which corresponds to them taking only the kind of attacks you want.

Your reasoning holds for not weighing both greater and lower stats equally, but it does not show that the lower stat should have zero weight.

You'll note that I named by metric "stalliness" rather than "offensiveness." That's because it really is more about stall vs. "anti-stall" than stall vs. offense.

Hm, I'm curious about the distinction between anti-stall and offense.

Several times as I played with the metric I was confronted with the problem of "intent," which ideally would not be an issue at all. I had to ignore modifications from many abilities simply because they tend to have no practical use on most sets, and if I accounted for them, the metric would get thrown off. Not accounting for Leftovers at all was thus more of a "fitting" decision than anything else. However, when I add in Expert Belt et al., I'll consider throwing in a +0.25 for Lefties as well.

Perhaps ignoring abilities was not necessarily the best way to handle it? As you say, intent would ideally not be an issue. I'd like to think it was possible to make a metric which can work simply off the stalliness of the set, taking as much into account as it can. Maybe the reason the metric was getting so thrown off was that abilities were given too high weights?

If Berry Juice is ever unbanned or Gen IV Little Cup ever goes live on PS, I will reconsider this decision (will probably play it safe and make them neutral).

hm, if you're likely to change it when gen 4 LC comes to PS.. is it not worth preparing the formula for tiers which are not yet live on PS, since it's being used not just by you but by UPC's team and set analyzer? Obviously lower priority, but still. Also, is Oran Berry not used on Sub sets in LC, like Drifloon? hm, actually, yea, even 5th gen LC has quite a few Pokémon which sometimes prefer HP to defenses for more Subs, Wynaut's countercoat, or various abilities.

Graphs

hm, I like how adding leftovers differentiates between the different styles more (other than the most stally bulky offense, which seems to be an anomaly anyway, perhaps classified incorrectly? or perhaps an example of something that's being missed by the metric? which team is that?). The balance/semi-stall division seems to benefit most from it, though with only three semi-stall teams it could be a fluke.

Also, something which you seem not to have replied to from my previous post was the idea of applying the stat modifying effects before doing the damage to self calc. This may be mathematically equivalent to current implementation in many cases, but (especially if you take both stats into account) it seems a neater way to handle things (prevents the need for rounding on 20% boost items to get a tidy number, and with the split, makes Life Orb, Light Ball, etc much simpler, makes Wise Glasses and Muscle Band give appropriate boosts (currently their 1.1x for both is a ~20% overall boost, but Light Ball's 2x boost for both is a 100% boost, inconsistent)), and makes it a more easy to understand for those not familiar with logs. The direct changes to the score from other effects are useful, but applying boosts directly when possible seems sane.

And:

blog post said:
Light Ball, Thick Club and DeepSeaTooth subtract 1.0 from the metric when held by the correct Pokemon
DeepSeaTooth adds 1.0 to the metric when held by Clamperl

Antar · Sep 26, 2012

eric the espeon said:
Re: Mixed sweepers / double-walls
Your reasoning holds for not weighing both greater and lower stats equally, but it does not show that the lower stat should have zero weight.

In physics terms, you've just acknowledged that adding the lower stats is a "higher order" correction. It's not something I'm interested in dealing with right now, but if you want to propose a specific change or test one yourself, my source code is readily available, and I'll provide you with a sample dataset if you'd like.

Hm, I'm curious about the distinction between anti-stall and offense.

Simply put, this post: higher degree of stall should correlate with longer battles, and thus team strategies that lead to shorter battles would be "anti-stall." Set-up sweeping--a hallmark of hyper-offense--is one way to get a low turns/KO ratio, but it's not the only way to skin this particular cat.

Perhaps ignoring abilities was not necessarily the best way to handle it? As you say, intent would ideally not be an issue. I'd like to think it was possible to make a metric which can work simply off the stalliness of the set, taking as much into account as it can. Maybe the reason the metric was getting so thrown off was that abilities were given too high weights?

Again, "higher order" terms. What definitely needs to be factored in is not so much intent but utility. Rivalry is hard to pull off unless you're on a default-gender simulator (old PO). Stat-dropping moves and abilities are too rare for Defiant to come into play very often. Same with stuff like Super Luck, Sniper and Anger Point.

Also, is Oran Berry not used on Sub sets in LC, like Drifloon?

You actually just made my case for me. Here, Oran Berry is very much assisting in a sweep, doubling speed with Unburden and working alongside Sub, which, as I argued a few posts ago, is an inherently offensive move.

If this sounds confusing, it's because you're thinking of it in terms of "how easy is this Pokemon to kill" rather than "how easy is this Pokemon to kill vs. how easy is it for this Pokemon to kill?"

Wynaut's countercoat

Wobbs and Wynaut are VERY offensive Pokemon. Arena Trap guarantees either a dead Pokemon at the end of the matchup, or, at the very least, a free switch into a teammate who needs it to set up.

hm, I like how adding leftovers differentiates between the different styles more

It's true that Leftovers adding greater differences between the various teams is an argument in favor of applying the moveset modification, but I don't think I'll be doing it this month (before Sept. 1), as I either need to come up with a counterbalance to help widen the gap between stall, semi-stall and balance.

other than the most stally bulky offense, which seems to be an anomaly anyway, perhaps classified incorrectly? or perhaps an example of something that's being missed by the metric? which team is that?

Antar said:
The following teams had discrepancies between archive classification and "stalliness" classificiation:

Bulky Offense

Negative 3 @ 1.26. Wish/Protect Jirachi + no-attack Skarm + Slowbro throws everything off.

with only three semi-stall teams it could be a fluke.

I would LOVE to add some more semi-stall teams to my sample set.

Also, something which you seem not to have replied to from my previous post was the idea of applying the stat modifying effects before doing the damage to self calc.

It's essentially equivalent (SD = x2: -1, Expert Belt = x1.2: ~-.25), and there are issues where some of these abilities/items/moves don't deserve the full weight because they won't be applied consistently, but there may be some merit to this idea. For instance, Life Orb vs. Choice Band: would the mod for LO be greater than log_2(3) if I factored in recoil?

Again, ete, I urge you not just to argue with me but to try out some of your suggestions yourself--mod my code and come up with your own version of the metric. If you come up with some demonstrable improvements, I'd be delighted to use them.

eric the espeon · Sep 26, 2012

Antar said:
In physics terms, you've just acknowledged that adding the lower stats is a "higher order" correction. It's not something I'm interested in dealing with right now..

Again, "higher order" terms. What definitely needs to be factored in is not so much intent but utility. Rivalry is hard to pull off unless you're on a default-gender simulator (old PO). Stat-dropping moves and abilities are too rare for Defiant to come into play very often. Same with stuff like Super Luck, Sniper and Anger Point.

Alright. You are right, these are each going to be minor changes on their own, but I think the combination of all little extra weights should add up to a better metric if they're all fairly weighted.

Suggestions that I change the code.

While I've read a lot of code, I am not a programmer, and despite perhaps being able to make some math tweaks (I think stalliness=-math.log(((2.0*poke['level']+10)/250*(stats[1]**2+stats[3]**2)**0.5/(stats[2]**2+stats[4]**2)**0.5*120+2)*0.925/stats[0],2) with **2 and **0.5 adjustable *could* work as an implementation of splitting the stats as I meant, but I don't actually know the python math syntax), I have no experience of creating/running new programs. It's something I've been meaning to learn for many years, but have never got to. If you could give me the sample data and some idea of what to do (I've got python, have saved both TA.py and baseStats.json, and it does not give errors when I try to run it. But it also does nothing, probably because I don't know how to input a team.) then I'll try and make a version with the changes I'd suggest.

Simply put, this post: higher degree of stall should correlate with longer battles, and thus team strategies that lead to shorter battles would be "anti-stall." Set-up sweeping--a hallmark of hyper-offense--is one way to get a low turns/KO ratio, but it's not the only way to skin this particular cat.

Right, defining stall as how much a team extends the battle. But, how is having a low inclination to increase the number of turns in a battle different from being offensive (assuming you're aiming to win and not using six level 1 Shuckle or something, which would give a very short match and be more defensive than offensive)?

You actually just made my case for me. Here, Oran Berry is very much assisting in a sweep, doubling speed with Unburden and working alongside Sub, which, as I argued a few posts ago, is an inherently offensive move.

If this sounds confusing, it's because you're thinking of it in terms of "how easy is this Pokemon to kill" rather than "how easy is this Pokemon to kill vs. how easy is it for this Pokemon to kill?"

Wobbs and Wynaut are VERY offensive Pokemon. Arena Trap guarantees either a dead Pokemon at the end of the matchup, or, at the very least, a free switch into a teammate who needs it to set up.

hm, my point is not exactly that this is not massively stallish, but that it is more stallish than alternate berries (PetayaFloon or Custap Wynaut are going to die faster and often kill stuff faster than their Oran counterparts). It could be, as you put it for my other points, a second order correction. One use items are inherently more short term and so less stally than similar unconsumables, but weighing the more offensive and defensive one use items equally when measuring stallyness seems questionable.

It's true that Leftovers adding greater differences between the various teams is an argument in favor of applying the moveset modification, but I don't think I'll be doing it this month (before Sept. 1), as I either need to come up with a counterbalance to help widen the gap between stall, semi-stall and balance.

Ok, makes sense.

I would LOVE to add some more semi-stall teams to my sample set.

Perhaps corner a team rater/RMT staff member or two and task them with expanding your collection of exportables?

It's essentially equivalent (SD = x2: -1, Expert Belt = x1.2: ~-.25), and there are issues where some of these abilities/items/moves don't deserve the full weight because they won't be applied consistently, but there may be some merit to this idea. For instance, Life Orb vs. Choice Band: would the mod for LO be greater than log_2(3) if I factored in recoil?

Yes, for many it's the same. And it's true that for type boost items etc you're not always getting the boost, so scaling it down slightly may be appropriate (assuming that a 'mon will be using a water move a large majority of the time if it holds Mystic Water is reasonable, but not all of the time. Perhaps drop it from a 20% to a 15-18% boost.). For LO, my instinct is to (assuming stat split, which feels necessary to implement physical+special boosts accurately) apply the boost directly to both stats before calculating base stalliness, then have a separate smaller mod which deals with the recoil (and have this mod approximately equal to Leftovers, since LO will not activate every turn but Leftovers will unless at max HP). The mod should in basically all situations be greater than log_2(3) with recoil as an extra and sane values for lower stat weighting.

Antar · Nov 13, 2013

Note: I'm going to be controlling this thread pretty tightly. Any post that I don't feel is constructive will be deleted.

With the new generation, we have some new moves, items and abilities that need to be incorporated.

Here are the changes I've decided to make:

The abilities Dark Aura, Fairy Aura, Infiltrator,* Parental Bond, Protean,Strong Jaws, Sweet Veil and Tough Claws subtract 0.5 from the metric.
The abilities Aroma Veil, Bulletproof, Cheek Pouch and Gooey add 0.5 to the metric.
The ability Fur Coat adds 1.0 to the metric.
The move Crafty Shield does not affect the metric (as it does not prevent damaging moves).
The moves King’s Shield, Mat Block and Spiky Shield get added to Protect and Detect in the list of moves that, if present on a moveset, add 1.0 to the metric.
The move Nuzzle gets added to the other paralysis moves for adding 0.5 to the metric.
The moves Power-Up Punch and Rototiller gets added to the list of setup moves that subtract 0.5 from the metric (recall that multiple setup moves do not stack).
The move Geomancy gets added to the list of setup moves that subtract 1.0 from the metric.
The move Sticky Web subtracts 0.5 from the metric (since stall teams really won’t benefit from having the opponents’ speed lowered).
The item Assault Vest does not change the metric.
The items Kee Berry, Maranga Berry, Roseli Berry and Snowball get added to the list of “consumables” which subtract 0.5 from the metric.
The item Pixie Plate subtracts 0.25 from the metric.
The item Weakness Policy subtracts 1.0 from the metric.
The item Safety Goggles does not change the metric (“powder” moves are few and far between, and neutralizing weather is better accomplished with Leftovers)

Mega Stones, if held by the corresponding Pokemon, will result in stalliness being calculated as the AVERAGE of the metric under each form. That is, for Aerodactyl holding Aerodactylite, calculate stalliness once assuming it stays an Aerodactyl (old stats, old ability), then calculate again assuming Aerodactylite is used and it has the Mega forme’s stats and ability. Take those two values and average them (this is because Mega Evolution is not guaranteed and is in fact limited to one-per-team, even though a team may contain multiple Pokemon that can Mega evolve).

Regarding team classification outside of stalliness, the only relevant change is the nerfing of auto-weather. One line of argument would be that auto-weather abilities should now be treated like weather moves, that is, having a Drizzler does not a rain team make. That, I think, is inappropriate for two reasons. First, having the weather happen automatically still trumps manual weather induction--all an auto-weather Poke has to do to keep the weather alive is to not faint and stay above a health level where they can switch back in. After all, switching your weather pokemon in and out to keep your weather alive is something that happened in Gen V as well as part of the "weather wars." Secondly, there are very few auto-weather Pokemon that can function as their own weather sweepers: the list, I would say, is Kyogre, Mega Charizard Y and Snover in LC. The rest are too slow or don't benefit enough from the weather to classify as "weather sweepers" (Groudon isn't a fire-type, Tyranitar gets no boost to any attacks). So for now, I'm leaving the "auto weather ability = weather team" rule in effect (and applying it to Mega Charizard Y, although I'm not 100% happy with the idea). That being said, I'm willing to be convinced otherwise.

-Nitro- · Dec 5, 2013

Antar said:
So for now, I'm leaving the "auto weather ability = weather team" rule in effect (and applying it to Mega Charizard Y, although I'm not 100% happy with the idea). That being said, I'm willing to be convinced otherwise.

I believe that, for now, sand stream should be looked at separately. There are a 2 reasons for this.

The first reason is that teams can use Tyranitar without having to worry about permanent sand, something that they had to take into account before hand. As a result of the lack of permanent sand, you have to worry about poor synergies and Sandstorm damage less than you did before, as it wouldn't be active for the entire battle. This opens the door for many, many types of teams to use tyranitar, and not have to worry about permanent sand ruining the rest of their team.

The other factor is that, unlike other auto-inducers (except maybe ZardY), Tyranitar (and arguably Hippowdon) are both incredibly useful pokemon in the OU tier of their own merit. Tyranitar is a very omnipresent threat, between his new assault vest set, new mega, and (currently) being one of the few Stealth rock setters in pre-pokebank, he can be used on a variety of teams.

It is the combination of these two factors that make me feel that sand stream alone should not classify a team as a sand-based team, especially because of how popular Tyranitar is overall. Obviously, sandstorm + smooth rock, or the presence of sand rush / Sand force means that a team is likely to be based around sand. But the fact is, many teams are running Tyranitar as just a regular mon, as opposed to a sandstorm setter.

Arcticblast · Dec 8, 2013

-Nitro- said:
The first reason is that teams can use Tyranitar without having to worry about permanent sand, something that they had to take into account before hand. As a result of the lack of permanent sand, you have to worry about poor synergies and Sandstorm damage less than you did before, as it wouldn't be active for the entire battle. This opens the door for many, many types of teams to use tyranitar, and not have to worry about permanent sand ruining the rest of their team.

While I agree with your overall post, this particular point is entirely wrong. At higher levels of play, Tyranitar's Sand Stream was often used last generation not as the building block of a team but a way to increase Tyranitar's bulk and block opposing weather, making a good Pokemon into a great Pokemon. Keldeo, Latios, Latias, Celebi, Jellicent, Technician Breloom, Gengar, Kyurem-B, Rotom-W, and Starmie were all fairly common Pokemon seen alongside Tyranitar, and some of these Pokemon plus Tyranitar contributed to the deadliest teams of the metagame.

loler · Dec 18, 2013

zpattack12 said:
I feel like sand should be a little different than the other weathers, as some people use it as anit-weather, rather than abusing it themselves.

I agree. As so many people are using ttar/mega-ttar now and most of them do not fully abuse the weather brought by Sand Stream such as setting up sand for ttar and ttar only(for the spD boost), i think this way sand would be significantly more used than any other weather, and even those that should not be counted in would be counted in. idk if that was brought up before tho

Eievui-Nymphia · Mar 3, 2014

I'm going through Antar's post of stall and there are thing that I want to discuss.
Leftovers in fine to zero. True, stall teams utilize it, but alone not define "stall". Unless in some cases of specialized items, mega Evolutions and some heavy offensive sets, leftovers is the item that you throw if you don't know what to do.
I have no problem with entry hazards. Stealth Rock adds 0, while Spikes an dToxic Spikes add 0,5. It make sense given his effects.
I have to test Chlorophyll, Sand Rush, Solar Power and Swift Swim (actually they're add 0,5, but I think it should be 0 if the weather isn't on the team).
The abilities Hydration, Ice Body and Hydration should be 0 if the weather is not present on the team. A offensive Tail Glow Manaphy doesn't need to add +0,5 of stall when it's not going to use it.
I think evasion (in Evasion-free* tiers) should be 0 in the metric. If Double Team is allowed, it will be realistic used on your Sylveon, Heatran, MVenusaur than on your Greninja. If Double Team is allowed, your main problems are the former than the latter. 0 because alone is not a defensive startegy.
-Sleep moves should only substract +0,25. Stall also benefites from sleep at least partially.
-I don't know what Sweet veil does and why subtracts 0,5 from the metric?
-Sticky Web should add nothing. Tanks and bulky offense are more beneficiated from Sticky Web than Hyper Offense.
-Assault Vest should add 0,5. the Special Defense is good to pass up. True that doesn't help but the abscene of stalling mvoes in the moveset is fine.

-Mega Stone should be modified. If a team has only one Mega Stone in the team, it's easy to assume that its going to use it as a Mega.

Antar · Mar 3, 2014

Eievui-Nymphia--I'll take your opinions into consideration, but I'm unlikely to further revise the stalliness metric until there is a larger volume of teams in the Gen VI RMT collection for me to test against.

Metagame Analyses: Gen VI changes

SpecsX

Antar

marilli

I COULD BE BANNED!

Antar

marilli

I COULD BE BANNED!

Princess Bubblegum

Antar

eric the espeon

maybe I just misunderstood

Antar

eric the espeon

maybe I just misunderstood

Antar

-Nitro-

Arcticblast

Trans rights are human rights

loler

Eievui-Nymphia

Antar