Anti-Hax Ladder Scoring

Status
Not open for further replies.

Chou Toshio

Over9000
is an Artist Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
So, four years ago, this very same forum hosted the most epic prank of Smogon History.

While it was a great joke for people to even think that wins or losses could be re-calculated based on incidents of hax; the idea only got as far as it did because actual discussion was able to be made on the subject. Let's face it, we smogonites hate hax.

Thinking back on that epic prank, and also about my own loathing of hax, I came upon this idea:

What about creating a rating system where rank-drops are reduced by hax? (crits against you prevent you from losing rank points when you lose) The winner would still win-- he'd still EARN the same # of points regardless


To put it simply, this would be a rating system where, if you got critical hit, frozen, fully paralyzed, missed focus blast, etc. etc., your rank would drop less than it normally would.

We could even set it such that if enough incidents of hax occurred, your rank would be completely unaffected.

I think that, unlike the "anti-hax formula" that tried to assign wins and losses, this would actually be a fair and implementable system. This would effectively reduce the effect of hax in competative battling, but without re-assigning the winner, removing hax from the game, or altering the mechanics of Pokemon. The rating system is something totally our own; we control it as we see fit. The games themselves will be unaffected, but we can make it so people have less reason to care about hax. Granted this does nothing to stop hax's influence on tourneys, but it would potentially improve laddering greatly.

The winner of the match could care less, because he gets his points regardless-- while players in general could enjoys some "ease" laddering knowing that the effect of excessive hax on their hard-earned ladder scores would be padded against. This would make laddering less frustrating in general, especially during suspect tests.

While I'd like people to discuss specifics, I'd imagine that we could assign % value decreases to rank drop based on the frequency of an event.

Example (JUST an example, not a proposal):

F = Events with 10% or less chance to occur (Freeze from Blizzard, Critical hits, etc.)

H = Events with 15-20% chance to occur (Fire Blast Miss, Waterfall Flinch, etc.)

A = Event with 25-30% chance to occur (Full paralysis, Thunder miss, Burned by Scald, etc.)

X = Event with 55-60% chance involving Serene Grace (lol)


And then we make a formula like:

% Rank Drop Reduction = 33%F + 25%H + 20%A + 10%X

So, in this example, if I got crit 3 times in a match, I would lose zero ladder points.

If I got fully paralyzed 4 times in a match, I'd lose zero ladder points.

If I missed 5 Focus Blasts, I'd lose zero ladder points

And if Jirachi Flinched me 10 times, I'd lose zero ladder points

So, if I got crit once, and fully paralyzed once, the drop reduction would be 58%. If it was a loss that should have cost me 20 points, instead, I'd only lose 8.4 points. Again though, this is just an example-- we'd have to talk about how we'd weight the various types of hax, and what types of hax should even be included.



We'd have to talk about exactly what kind of drop-reduction formula we want, and what types of hax should be accounted for (for instance, I didn't include Confusion in this model), but I'm sure we could work something out as a community-- if you guys like this idea.

In this example I'm just trying to illustrate the concept behind the overall idea.

What do you guys think? Is this something you guys would want? I know it's a totally bizarre proposition, and I don't even know the challenge of technical implementation, but I think we could potentially alleviate a lot of our community's frustration without compromising the game itself.


Please keep in mind: This is not a proposal that can completely eliminate the effect of hax on scoring-- but it would be a system that provides great insurance for battles that experience inordinarily high incidence of hax.
 

ginganinja

It's all coming back to me now
is a Community Leader Alumnusis a Community Contributor Alumnusis a CAP Contributor Alumnusis a Contributor Alumnusis a Battle Simulator Moderator Alumnus
So how do you distinguish between hax that "matters" and hax that doesn't. For example, if I had a 3 mons that all died to critical hits that didn't actually matter (say a Rapid Spin CH against Ferrothorn, a CH on my 1% Infernape etc etc) then I would potentially lose 0 points, when in all honesty, I might have deserved to lose 20 points. How do you get a code to understand the difference?

Heck, it could be potentially possible to "fish" for no points by sending in some paralysed mon vs Jirachi aiming to achieve a certain number of flinches / full para to lose 0 points when you lose.

The idea sounds nice I guess, but I see quite a few issues that limit how possible it actually is to implement.
 
What would you do depending on the length of the battle? Or about "useless" hax? Some critical hits don't matter because that doesn't affect the win condition. I feel as if we'd be needlessly adding subjectivity into a situation that there really isn't anything we can do about. I feel like "anti-hax" measures are just ways to polish a turd. It still sucks; I'd rather it suck consistently than subjectively like the proposal.

edit: dammit gingaNINJA
 

Chou Toshio

Over9000
is an Artist Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Well giga-- as mentioned, part of the discussion's role would be deciding which incidents of hax deserve consideration.

I personally envision a system where we ignore things like % damage rolls, and also avoid subjectivity-- let's not quival over "which hax mattered" and "which hax didn't" in situations of specific battles.

I'd prefer a system based solely on measurable events, for ease-- as you said, this wouldn't be something easy to implement technically.

Sticking purely to objectively measurable incidents:

-critical hits
-misses
-move %effect activations (blizzard freeze, scald burn, etc.)
-full paralysis
-hurt in confusion
etc.

Not adding a subjective element, and taking into account measurable and easily definable incidents of "hax", I think we'd still make everyone happier in the long run.

I mean-- the winner is still going to win. And he's still going to get his points for winning.

Heck, it could be potentially possible to "fish" for no points by sending in some paralysed mon vs Jirachi aiming to achieve a certain number of flinches / full para to lose 0 points when you lose.
People play to WIN-- they don't play to not lose points.
 

ginganinja

It's all coming back to me now
is a Community Leader Alumnusis a Community Contributor Alumnusis a CAP Contributor Alumnusis a Contributor Alumnusis a Battle Simulator Moderator Alumnus
Your missing the point, if I play a poor game, and my opponent gets 3 (or 4, or whatever scale you want) critical hits that didn't matter in the scope of things, or I miss a Focus Blast, or I succeeded in burning something that would have killed itself by LO etc etc, then I deserve to lose the full amount of points, however under your examples, the amount of points lost would be minimised, or I would lose no points altogether. The is no real one "value" of hax, depending on the game a critical hit or burn, or whatever will be worth more or less depending on the match as well as the teams being used. For example a critical hit LO DM from Latios on Ferrothorn is nowhere near as crucial as critical hit LO DM from the same pokemon hitting a Lum DD Dragonite. They have differing values and this is what your code needs to be able to calculate in the interest of fairness.

Please don't sweep away whether "hax did or didn't matter" since its actually a valid point. Failure to address it results in people losing no points for battles they lost, when the hax they suffered didn't matter at all. This is even ignoring the fact that whatever value we decide on is entirely subjective, 2 critical hits causes you to lose 10 points, while 3 might cause you to lose 0 points

People play to WIN-- they don't play to not lose points.
That is 100% true, people do play to win. The problem is that people could still fish for a 0 point drop if they had realised they were, in fact, going to lose the game. For example, lets say I am going for suspect reqs, so losing here would be undesirable to me. I am facing a Specially Defensive Jirachi with a LO Refresh Latias, and a weakened CB Terrakion in the wings. My opponent also has a Scarf Mence, something I have no chance at beating, since my opponent will simply clean house when Jirachi has fainted. Under current battle conditions, my best option would be to click the forfeit button, and move on to the next match, but under your proposal, I can play on, and fish for as much hax as I can get, so that when I eventually lose, I lose no points. Can you blame my opponent, for going for the simple and risk free win by spamming Thunder Wave + Iron Head despite this giving me an easy out to avoiding losing ladder points? This is something your code (and proposal) would need to address as it would be very easy to abuse.

For your proposal to succeed you would need to put a value on hax something thats very difficult to do. Failure to do so results in people losing no points when they deserved to lose more, which potentially ruins the spirit of the game. Consider an example of someone who received 3 meaningless crits and lose 0 points, with someone who recieved 1 single critical hit that mattered, and still lose 18 points. What seperates them. What seperates the guy who played his heart out in a clean game only to lose on a damage roll (and therefore loses the full 20 points), vs the guy who had his blissey burnt an x number of times.

Please tell me how you plan to address this without losing the excuse of
"let's not quival over "which hax mattered" and "which hax didn't" in situations of specific battles.
 

Pocket

be the upgraded version of me
is a Site Content Manager Alumnusis a Team Rater Alumnusis a Community Leader Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
I like Chou Toshio's proposal - sure hax that may not have mattered may happen, but that's a rare occurrence compared to hax that DO matter in some degree. I don't know what's the big fuss anyways, since the victor will still earn the same amount of points as usual, whereas the loser may have a small chance of saving points from hax that "did not matter." It's a win-win scenario, really.

Whether it's feasible or not is up to Zarel and his development team to decide.
 

Chou Toshio

Over9000
is an Artist Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Giga-- Hax is still hax-- chance events, even when having minimal effect, usually still have some effect on battle. Moreover, in Pokemon, critical hits and move misses (for example) rarely have "no impact" on a battle.

Of course strategically, some hax incidents have a greater effect on the overall course of a battle; but that does not change the fact that objectively, you did suffer from a critical hit or a full paralysis that were outside your control.


I do not think the concerns you addressed are outside of what could be accounted for by simply adjusting the functional decrease of rating drops.

For instance, if you really worry about people "not losing enough points" for matches "they should lose", we could just make the multipliers weaker-- so that it takes MORE overall hax to make a big difference in points. In the other extreme, if each critical hit were to only prevent a 5% drop for instance, and it took 20 critical hits to throw out a loser's score drop-- well, by the time you got crit 20 times in a match, those incidents of hax would definitely matter.

In other words, you're missing the main objective of doing this.

The goal is NOT to completely eliminate the effect of hax on scoring. The goal would be to limit the impact of extreme cases of hax.

In other words, we would want a scoring system that has less effect on battles with less hax, but a very notable effect on battles that have a LOT of hax. In fact, I'll add this point to the OP.




As for the "fishing to lose less badly" point-- it just seems pretty unrealistic strategy-wise.

When you know you are going to lose (and you would only "try" to get haxed after you know you're going to lose), how much control of the match do you have to try and force your opponent to hax you before you lose? Hax is hax because it is random-- Pokemon is a game of controlling probabilities, but trying to get the opponent to critical hit you is not a very realistic strategy.
 
I really think you are undermining the legitimate situations ginga is bringing up in order to advocate for your overly subjective and really entirely unfeasible proposal. Some hax matters, some does not. That happens. Critical hits and move misses do not rarely have no impact on a battle. In fact, they very frequently have no impact. Fuck I just crit your 5% Jirachi with my Landorus Earth Power...that definitely mattered. It legitimately includes arbitrary gaming into the system.

For example, in a losing match, you might use low accuracy moves in order to hope for misses in order to hopefully decrease your drop. That's not unrealistic or unlikely, so really stop undermining what people do or can do or what can happen in game. I may have no win condition, but I can unnecessarily make a game go longer by switching around and playing off resists to fish for a crit. It is possible, and if it's implemented, it's something that will be done.

I entirely disagree with this proposal. You cannot create a program that will properly evaluate the importance of the hax at hand (as it would have to analyze the battle conditions and if a win could have occurred otherwise), whether hax is useless, or anything. All something like this does is create an arbitrary measure to define an arbitrary evaluation system, and inflates rankings.

If you lose you lose. Easiest thing to do, and least subjective.
 

Chou Toshio

Over9000
is an Artist Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Like I said, I haven't at all stated that I have a completed proposal because I'd want the community's imput in putting together a system that would actually make sense.


As another example, you could make a "cut off" system where there is no gradual effect-- but one where you'd need a very significant degree of hax for the system to kick in at all. So that "ordinary levels of hax" would be completely uneffected, and only truly ridiculous games would be targeted. How likely would people be to try to fish for a crit if they have get crit 5 times for it to matter? How likely would it matter if they keep clicking focus blast when they have to miss 7 times for it to matter?

I'm just bringing up the philosophical idea here, but pragmatically speaking--

depending on how you set up the system, there's a lot of flexibility in controlling the effect of such a system, even if you don't use any subjective measurement of hax incidents.


I think you guys are jumping too quick to judge without thinking about how the effects of implementation could be greatly honed by simply considering the ideal (or more conservative) measure.


Of course in this system, scores will be inflated-- but the best players will still come out on top. If anything, incidents of "luck" preventing scores from dropping will mean players whose losses are most associated with luck (players with higher skill) will have their scores even more inflated over the long-term.
 

Myzozoa

to find better ways to say what nobody says
is a Top Tiering Contributor Alumnusis a Past WCoP Champion
The luck of the game is balanced as it is: if you sit with a Jirachi Cosmic Powering while I attack you, you should be Critted after 9 turns and die. This is a check on the power of moves that boost defenses. Same thing goes for Flamethrower burns and other secondary affects of attacking moves.

Even if I were to ignore Critical Hits, I don't see how an anti-hax rating system can be implemented fairly. If I hit my Focus Blast three times in a row, is that counted as a 34.3% chance or as no hax at all? Inaccurate moves are supposed to have an opportunity cost. I might lose 10% of my ladder matches to Focus Blast missing, I might lose more, I might not win 20% of my matches if I don't choose to use Focus Blast. I made a choice during teambuilding and decided that in the long run having Focus Blast is the best choice. The ladder is about the long run, not about giving people A's for effort in the short run.

So I've thrown out these 'controversies' 'worries' whatever you'd like to call it, about this proposal, because I don't understand how you can make a proposal like this, when it is manifestly true that luck balances out given enough time (unless you're yee). The ladder is meant to be a long run affair where in order to be ranked highly, you must play enough games that you would experience good and bad luck as anyone else, and win. Pokemon is a game of percentages, crits, misses, fucking damage rolls. Unless you can find a way to incorporate all these things into a system (waste of time), you're just picking and choosing what is and is not hax.
 
I think that the closest thing we'll get to an objective measure of "hax" is to multiply the probabilities of what actually happened in each turn. For example, if Jirachi uses Iron Head on Bulbasaur, it doesn't crit, and Bulbasaur flinches, and then Bulbasaur uses Tackle or something and that doesn't crit either, then the turn had a 52.734375% probability of playing out exactly as it did. I don't think that this would actually impact ratings much, though. I mean, the turn I just simulated was pretty ordinary and it had a pretty low probability of actually occurring!
 

Rhys DeAnno

Slacking Off
I see two massive problems with this:

1) It encourages passive play. Stall teams are more likely to have longer games, more likely to switch a lot, and more likely to lose due to a crit or a freeze or whatever. Stall teams would essentially get a long term bonus to their ladder ranking.

2) Ladders are a zero sum game. When we set suspect test qualifications, for instance, we do it with the ease of achieving them in mind. With this change the average ladder rating rises, so the requirement should as well. Therefore we have not only helped stall but hurt offense.

One might argue stall is weak now and helping it is fine, but helping it in such an artificial, biased way is detestable.
 

Woodchuck

actual cannibal
is a Battle Simulator Admin Alumnusis a Forum Moderator Alumnus
I object to this proposal for a rather simple reason: there is no way to objectively and fairly evaluate the extent of hax that occurred in a battle. Our current ladder system is built on one ironclad objective measurement of player skill: the player who won. Every other measure of the outcome of as match can never compare, and I strongly disagree with moving away from this system.

I would also like to back the issue that myzozoa brought up: the decision to run moves with imperfect accuracy is supposed to have consequences. We should not mitigate them.
 

alexwolf

lurks in the shadows
is a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnus
So with this new ladder system, the goal is to punish less the people that lose from hax. Ok, this is one side of the coin. What about the other side of it? What about the people that win due to hax? Will they get affected and how? They will not be affacted. So let's sum those up and what do we have? Every player loses less point when losing a game due to hax than it gains when winning due to hax. This translates to ''players that rely on luck are rewarded more in this new system''. To make it easier to understand, let me give you an example...

Take a new player that after 20 battles with the current system has 15 loses and 5 wins. This player will have an X ranking. Now imagine the same player doing the same battles with the new system... He would still have the same win ratio, but he would have a higher ranking and a better position on the ladder, because his wins would get him the same points, but his loses would cost him less. And the more a player loses, the more the difference of its ranking on the new system and the old system will grow. Which means that the new system favors players who lose a lot, in comparison to the current system. And this happens for the reason Myzozoa already mentioned. Hax balances out in the long run. The good player will win the games it should have won in the long run, as long as he makes the correct risk management.
 
This doesn't make any sense to me.

By laddering, you play a large volume of games to get a rating estimate . As the sample size increases, "hax" should matter less and less eventually evening out to zero. Unlike a tournament match, I could go win some more ladder games to offset a hax loss. Everyone else can too, so ladder hax doesn't really matter.
 

jc104

Humblest person ever
is a Top Contributor Alumnus
I'm not saying I especially approve of this idea, but your ridiculous example system is causing people to raise objections that should not be relevant. The hax score should not be based on the absolute number of crits/misses/flinches in a match, but on the probability of having the same or greater number of crits/misses/flinches in the match. For example, if Jirachi uses Iron Head while faster 20 times in a match, and it flinches 12 times, then this should not count at all, because getting at least 12 flinches is probable (roughly 60% chance). However, if it were to flinch 16 times, this would only be a 5% chance. I also think that your opponent's hax score should be subtracted from yours before the calculation is made. It doesn't seem right that you should effectively gain points for being critted less than the opponent.

So basically, the fishing for hax/game length/cosmic power jirachi concerns are not really justified if we use a vaguely intelligent system for calculating hax.
 
What about creating a rating system where rank-drops are reduced by hax? (crits against you prevent you from losing rank points when you lose) The winner would still win-- he'd still EARN the same # of points regardless
The fact that there are still prominent members of this community who talk about "ladder points" as if these are something we dish out arbitrarily makes me (and probably Zarel, though I hesitate to speak for him) very sad.

[Edit: Thank you, Rhys DeAnno and undisputed]

Here's a summary of how our rating system works, posted again since not all of you have access to the original post:

Okay, I think I finally understand all the maths.

The issue here is confusion between a player's performance and a player's rating. This post will basically work out to be a brief overview of chess rating systems. For a not-so-brief history, I highly recommend this reference (PDF warning).

So the theoretical underpinning behind both Elo and Glicko is a model of "pairwise-comparison" called Bradley-Terry (or Bradley-Terry-Luce). The model simplifies all games of skill down to the following scenario: You have two players, each of whom has a box containing slips of paper, and on each slip of paper is a number. Each player shuffles their box and pulls out a random slip of paper. The player with the higher number wins. The idea behind this model is that there's a range of performance levels at which each player can play on a given day, and the player with better performance on that given day will be the victor.

So what's in these player's boxes? The distribution of skill levels at which each player can play (the numbers in each player's box) follows the extreme value distribution.

In the Bradley-Terry model, these distributions are identical, except for their center (it's unclear if by center they mean the parameter "a," the mean or the median, but as you'll see in a minute, we don't care). Consequently, you can do a bit of fancy math, and you'll find that the difference in performance between the two players follows a logistic distribution.

Okay, so the Bradley-Terry model assumes that players' "performance distributions" are identical, except for their "centers." The centers of these distributions is given by a player's Elo rating. And Elo, it turns out, is just Glicko if RD=0.

You see, Glicko was designed as an improvement on Elo to take into account the fact that we don't really know a player's rating (which, as I just said, is the center of the distribution of performance levels at which a player can play). So Glicko added the parameter RD which tells us that we have high confidence that a player's Elo rating (which I earlier called "true rating") falls within the range (R-n*RD,R+n*RD) where n=1 for 68% confidence, n=2 for 95% confidence and n=3 for 99.7% confidence. These "confidence" levels correspond to the premise that the probability distribution of a player's Elo (aka "true") rating is normal.

Now, for Cathy, how does GXE fit in? GXE is the probability of winning a match against a random player on the ladder (or a player with rating 1500±350). This comes from Glickman's formula for the chance of a player with rating R1±RD1 defeating a second player with rating R2±RD2, which, in the case where RD1=RD2=0, simplifies to the "Elo victory formula," which is based on the player's performances following extreme value distributions and thus having a performance difference distribution given by a logistic function.

Phew.


And with that, I hope I've tied everything together with a nice, neat, and mathematically dense bow.


Also,

TrueSkill is Glicko for multiplayer games.
I figure I should also throw in this bit about ACRE, since that's what most of you are referring to when you talk about "ladder points:"

CREs [(conservative rating estimates)] basically say, "I don't know for certain how good a player is, but I'm pretty sure he or she is better than this. Specifically, there's about a 92% chance that your true rating is better than your ACRE.
The bottom line is that unless you want to throw out all established modern rating systems, there is no way to allow players to gain "ladder points" while preventing others from losing them.

Now, you could certainly simply invalidate the results of an entire game due to "bs hax," but please consider for a second that our rating systems take "hax" into account, in that it's built right into the theoretical underpinnings that, in a matchup between two players of unequal skill, the stronger player will not always win.

I'm closing this thread. If someone feels that they can reconcile this idea with the realities of how rating systems actually work, they can either reopen this thread or ask me (or another smod) to reopen it for them.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top