March Stats

#26
Hi, I hope this doesn't come across as aggressive, impatient, or otherwise, but I was wondering if there was any final decision on our plan here, either from Antar or SS in general?

Speaking from the perspective of an NU community member, a lot of us are coming to terms with the fact that we aren't getting the alpha ladder we were promised this month (based on march stats, sample stats, feb stats, viability tiering or otherwise), but it would be really nice to hear we are, in some capacity, prepared for the beta launch in May. As far as it is observable except to those working on it, we are in the same spot we have been in since we were caught at the beginning of the month. The idea to sample stats has been promising, but Antar still hasn't successfully run a sample and we don't have any backup plan, just like this month. Waiting this long is having a very noticeable impact on users as well, at least in the NU room. Tonight, I've noticed a shift in the questions asked in our PS room have gone from "when is NU coming out" to people believing that there was some sort of decision that killed off the tier.

I'm just concerned, as I know many users are, and some kind of official decision would go a long way to keeping users on board. We don't know when NU is coming out, when to plan for anything (NU had a year-long tournament schedule that will have to be reworked from scratch if our release is delayed a month), or what we can expect.

Regardless, I appreciate all of the work that everyone has put into helping find a solution so far. I know Antar and everyone else involved are trying their hardest to make this work which is why I'm confident this will end up as another bump in the history of our usage-based tiering development for Gen 7.
 

august

youre a voice that never sings
is a Tutor Alumnusis a Team Rater Alumnusis a Forum Moderator Alumnusis a Live Chat Contributor Alumnusis a Tiering Contributor Alumnusis a Smogon Media Contributor Alumnuswon the 5th Official Smogon Tournamentis a defending World Cup of Pokemon Champion
#27
i don't mean to interject where i'm not welcome, but its generally a terrible idea to compute symmetric wald confidence intervals (aka the ones that were created itt) when you have p-values sitting near the boundary because your confidence intervals may suffer from empirical undercoverage. if you want a better estimator for small p-values you should use a bayesian estimate with a beta prior (probably 1/2 1/2 if you wanna perform well towards the boundary) and compute an acceptable sample size based on the bayesian credible intervals rather than the wald CIs

guess it might not matter due to the clt but the credible intervals will almost surely perform better than the CIs for inferential purposes anyway

anyway the sample sizes that were computed on page 1 were definitely too low (edit: not really i just misread the question), so i just ran a quick sim study in order to get an estimate on the sample size required in order to make sure that the length of the intervals are below the cut off, which ill throw in a pastebin incase anyone is interested (it was written in R)

https://pastebin.com/n8XV0CPX

probably won't be easy to follow if you arent familiar with the bayesian paradigm, but ull have to trust the method for now if thats true

anyway i used the median of the N values rather than the means (just depends what you'd rather optimize, L2 loss or L1 loss, and in this case i was more interested in L1 loss) and got that in order to be safe:

90% confidence: N = 355,000
95% confidence: N = 504,000
99% confidence: N = 870,000

s.t. the length of our confidence intervals is less than .001. note this corresponds to p_hat +/ .0005.

for estimating the sample in from first page we were talking about, i get (87250, 123950, and 214800), which is a bit less conservative

code can be adjusted fairly easily in order to compute N values if you wanna change the cut off at all. i threw an example of using the binomial approximation to the normal as well, but simulation studies at N = 100,000 still showed empirical undercoverage near the boundaries (https://pastebin.com/eKb9XS9q for those who are interested). that being said, the normal approximation gave similar values at least for the 90% confidence case

yea hope this helped maybe, sorry for ranting. if you're confused about what empirical undercoverage is shoot me a pm and ill send u a paper i wrote on this exact topic. also its late for me and i misread some stuff and ended up giving a lot of unnecessary calculations

also, i saw people tossin around potentially being ok with the cutoff being .005 (ie: p_hat +/ .0025), in which case the simulation yields the medians 13700 19550 and 33900 respectively, which is definitely MORE than enough in order to make good inferences about the data. a sample of 1/10th of the TOTAL population that is taken randomly in order to mitigate bias is p much a statisticians dream. i fully support using the sub populations

tl;dr: basing tiering decisions off of 10% of the total population chosen at random is a perfectly legimate way to do things and there should be no problems. also credible intervals are better when we're dealing with deciding whether to drop something (p relatively close to the boundary) and they require even smaller sample sizes
 
Last edited:

Bughouse

Like ships in the night, you're passing me by
is a member of the Site Staffis a Forum Moderator Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnus
#28
I'm not an expert on Bayesian stuff whatsoever, and I know this is your thing august, but how were you reading the previous posts to get to a target of 870k matches for 99% CI. That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision. I'll trust your math though... Bayesian shit always looks wrong to me.

In any event, even on the lower numbers of matches in the 2-300k range, that's still obviously not practical in the short term, since Antar made it sound like even getting 20k out wasn't doable.

I think our best option at this point is to put NU into Alpha based on whatever RU stats can actually be obtained (like I'd settle for 5k games on RU ladder in March at this point just to get a vague idea) and to do no tier shifts for any other metas. NU will take a while to settle down whether it has an extra month of RU stats built into its tiering or not.
 

august

youre a voice that never sings
is a Tutor Alumnusis a Team Rater Alumnusis a Forum Moderator Alumnusis a Live Chat Contributor Alumnusis a Tiering Contributor Alumnusis a Smogon Media Contributor Alumnuswon the 5th Official Smogon Tournamentis a defending World Cup of Pokemon Champion
#29
I'm not an expert on Bayesian stuff whatsoever, and I know this is your thing august, but how were you reading the previous posts to get to a target of 870k matches for 99% CI. That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision. I'll trust your math though... Bayesian shit always looks wrong to me.

In any event, even on the lower numbers of matches in the 2-300k range, that's still obviously not practical in the short term, since Antar made it sound like even getting 20k out wasn't doable.

I think our best option at this point is to put NU into Alpha based on whatever RU stats can actually be obtained (like I'd settle for 5k games on RU ladder in March at this point just to get a vague idea) and to do no tier shifts for any other metas. NU will take a while to settle down whether it has an extra month of RU stats built into its tiering or not.
my b, ill explain. i misread the a post on the first page and thought that people wanted the LENGTH of the intervals to be less than .001, but in reality it seems people wanted phat +/ .001, which corresponds to an interval length of .002. so the estimates that include 870k should realistically be about a quarter of what i posted (which they are, see edited post).

also
That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision.
the normal approximation to the binomial requires 874,000 minimum to pull that kind of precision though so

if you guys wanna make inferences about the newer tiers on lower sample sizes (like 5k) then thats an even better reason to use the bayesian estimates. also i could see why people arent trusting of bayesian methods - there are people in my department who have devoted their lives to statistics and still don't bother with bayesian methods (mostly quality control guys) but it is certainly safer near the boundaries
 

quziel

I simulate Pottery
is a Pre-Contributor
#30
Do you know if it would be possible to make the NU alpha based on 10 days or so of usage from May now that the server's set up? Its not perfect, and there would be a fair bit of uncertainty in the usage, but having 15-20 days to play an Alpha would really help to develop the future of the tier.

Apologies if this isn't the right thread to post in.
 

erisia

i'M gIvInG iT aLl I'vE gOt
is a Site Staff Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnus
#35
It's a bit too late for this suggestion; the impression I got from Antar is that the server issues are now fixed.
I think they meant in general for future cases? Unless anyone actually uses the random battle stats for anything other than checking it works properly, I don't really see a downside to removing these stats and potentially speeding up future stat processing by a day or two.
 

Antar

is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
Official Data Miner
#36
Update:

  • March is 80% done. I'm already skipping any format with "random" in the name. It's slow because the issues with the old server aren't 100% resolved. There might be some hard disk failing going on, it could be the fault of Zarel backing up the chat logs at the same time, or it could be that the server's hard drive is 99% full. In retrospect, it probably would have been more useful for folks if I'd started with April... ¯\_(ツ)_/¯
  • Still working out access to the new server with Zarel, but assuming we sort that out in the next two weeks, Just got access to the new server. I doubt there are going to be any problems getting May stats out in a timely fashion.
 
Last edited:

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top