March Stats

Disjunction · Apr 17, 2017

Hi, I hope this doesn't come across as aggressive, impatient, or otherwise, but I was wondering if there was any final decision on our plan here, either from Antar or SS in general?

Speaking from the perspective of an NU community member, a lot of us are coming to terms with the fact that we aren't getting the alpha ladder we were promised this month (based on march stats, sample stats, feb stats, viability tiering or otherwise), but it would be really nice to hear we are, in some capacity, prepared for the beta launch in May. As far as it is observable except to those working on it, we are in the same spot we have been in since we were caught at the beginning of the month. The idea to sample stats has been promising, but Antar still hasn't successfully run a sample and we don't have any backup plan, just like this month. Waiting this long is having a very noticeable impact on users as well, at least in the NU room. Tonight, I've noticed a shift in the questions asked in our PS room have gone from "when is NU coming out" to people believing that there was some sort of decision that killed off the tier.

I'm just concerned, as I know many users are, and some kind of official decision would go a long way to keeping users on board. We don't know when NU is coming out, when to plan for anything (NU had a year-long tournament schedule that will have to be reworked from scratch if our release is delayed a month), or what we can expect.

Regardless, I appreciate all of the work that everyone has put into helping find a solution so far. I know Antar and everyone else involved are trying their hardest to make this work which is why I'm confident this will end up as another bump in the history of our usage-based tiering development for Gen 7.

august · Apr 17, 2017

i don't mean to interject where i'm not welcome, but its generally a terrible idea to compute symmetric wald confidence intervals (aka the ones that were created itt) when you have p-values sitting near the boundary because your confidence intervals may suffer from empirical undercoverage. if you want a better estimator for small p-values you should use a bayesian estimate with a beta prior (probably 1/2 1/2 if you wanna perform well towards the boundary) and compute an acceptable sample size based on the bayesian credible intervals rather than the wald CIs

guess it might not matter due to the clt but the credible intervals will almost surely perform better than the CIs for inferential purposes anyway

anyway the sample sizes that were computed on page 1 were definitely too low (edit: not really i just misread the question), so i just ran a quick sim study in order to get an estimate on the sample size required in order to make sure that the length of the intervals are below the cut off, which ill throw in a pastebin incase anyone is interested (it was written in R)

https://pastebin.com/n8XV0CPX

probably won't be easy to follow if you arent familiar with the bayesian paradigm, but ull have to trust the method for now if thats true

anyway i used the median of the N values rather than the means (just depends what you'd rather optimize, L2 loss or L1 loss, and in this case i was more interested in L1 loss) and got that in order to be safe:

90% confidence: N = 355,000
95% confidence: N = 504,000
99% confidence: N = 870,000

s.t. the length of our confidence intervals is less than .001. note this corresponds to p_hat +/ .0005.

for estimating the sample in from first page we were talking about, i get (87250, 123950, and 214800), which is a bit less conservative

code can be adjusted fairly easily in order to compute N values if you wanna change the cut off at all. i threw an example of using the binomial approximation to the normal as well, but simulation studies at N = 100,000 still showed empirical undercoverage near the boundaries (https://pastebin.com/eKb9XS9q for those who are interested). that being said, the normal approximation gave similar values at least for the 90% confidence case

yea hope this helped maybe, sorry for ranting. if you're confused about what empirical undercoverage is shoot me a pm and ill send u a paper i wrote on this exact topic. also its late for me and i misread some stuff and ended up giving a lot of unnecessary calculations

also, i saw people tossin around potentially being ok with the cutoff being .005 (ie: p_hat +/ .0025), in which case the simulation yields the medians 13700 19550 and 33900 respectively, which is definitely MORE than enough in order to make good inferences about the data. a sample of 1/10th of the TOTAL population that is taken randomly in order to mitigate bias is p much a statisticians dream. i fully support using the sub populations

tl;dr: basing tiering decisions off of 10% of the total population chosen at random is a perfectly legimate way to do things and there should be no problems. also credible intervals are better when we're dealing with deciding whether to drop something (p relatively close to the boundary) and they require even smaller sample sizes

Bughouse · Apr 18, 2017

I'm not an expert on Bayesian stuff whatsoever, and I know this is your thing august, but how were you reading the previous posts to get to a target of 870k matches for 99% CI. That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision. I'll trust your math though... Bayesian shit always looks wrong to me.

In any event, even on the lower numbers of matches in the 2-300k range, that's still obviously not practical in the short term, since Antar made it sound like even getting 20k out wasn't doable.

I think our best option at this point is to put NU into Alpha based on whatever RU stats can actually be obtained (like I'd settle for 5k games on RU ladder in March at this point just to get a vague idea) and to do no tier shifts for any other metas. NU will take a while to settle down whether it has an extra month of RU stats built into its tiering or not.

august · Apr 18, 2017

Bughouse said:
I'm not an expert on Bayesian stuff whatsoever, and I know this is your thing august, but how were you reading the previous posts to get to a target of 870k matches for 99% CI. That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision. I'll trust your math though... Bayesian shit always looks wrong to me.

In any event, even on the lower numbers of matches in the 2-300k range, that's still obviously not practical in the short term, since Antar made it sound like even getting 20k out wasn't doable.

I think our best option at this point is to put NU into Alpha based on whatever RU stats can actually be obtained (like I'd settle for 5k games on RU ladder in March at this point just to get a vague idea) and to do no tier shifts for any other metas. NU will take a while to settle down whether it has an extra month of RU stats built into its tiering or not.

my b, ill explain. i misread the a post on the first page and thought that people wanted the LENGTH of the intervals to be less than .001, but in reality it seems people wanted phat +/ .001, which corresponds to an interval length of .002. so the estimates that include 870k should realistically be about a quarter of what i posted (which they are, see edited post).

also

That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision.

the normal approximation to the binomial requires 874,000 minimum to pull that kind of precision though so

if you guys wanna make inferences about the newer tiers on lower sample sizes (like 5k) then thats an even better reason to use the bayesian estimates. also i could see why people arent trusting of bayesian methods - there are people in my department who have devoted their lives to statistics and still don't bother with bayesian methods (mostly quality control guys) but it is certainly safer near the boundaries

quziel · May 3, 2017

Do you know if it would be possible to make the NU alpha based on 10 days or so of usage from May now that the server's set up? Its not perfect, and there would be a fair bit of uncertainty in the usage, but having 15-20 days to play an Alpha would really help to develop the future of the tier.

Apologies if this isn't the right thread to post in.

Antar · May 4, 2017

If the new server is up, then presumably no one's putting a load on the old server, so I can just compile the last 2 months of stats. But I was waiting on Zarel to let me know when that happened...

Edit: Yep, compiling now. I'll keep everyone posted as I make progress.

Sugarbear · May 12, 2017

I don't want to come across as aggressive or insulting because that's not my intention at all, but do you have a rough idea of when it will be ready? Or is it one of those "it'll be done when it's done" things?

toshimelonhead · May 13, 2017

How much would it help to cut back on the Random Battle data? It's by far the most popular tier with ~40% of all battles, yet I don't see the utility of it because no one actually builds random teams.

Zarel · May 13, 2017

toshimelonhead said:
How much would it help to cut back on the Random Battle data? It's by far the most popular tier with ~40% of all battles, yet I don't see the utility of it because no one actually builds random teams.

It's a bit too late for this suggestion; the impression I got from Antar is that the server issues are now fixed.

erisia · May 14, 2017

Zarel said:
It's a bit too late for this suggestion; the impression I got from Antar is that the server issues are now fixed.

I think they meant in general for future cases? Unless anyone actually uses the random battle stats for anything other than checking it works properly, I don't really see a downside to removing these stats and potentially speeding up future stat processing by a day or two.

Antar · May 14, 2017

Update:

March is 80% done. I'm already skipping any format with "random" in the name. It's slow because the issues with the old server aren't 100% resolved. There might be some hard disk failing going on, it could be the fault of Zarel backing up the chat logs at the same time, or it could be that the server's hard drive is 99% full. In retrospect, it probably would have been more useful for folks if I'd started with April... ¯\_(ツ)_/¯
~~Still working out access to the new server with Zarel, but assuming we sort that out in the next two weeks,~~ Just got access to the new server. I doubt there are going to be any problems getting May stats out in a timely fashion.

March Stats

Disjunction

Everything I waste gets recycled

august

you’re a voice that never sings

Bughouse

Like ships in the night, you're passing me by

august

you’re a voice that never sings

quziel

I am the Scientist now

Antar

Sugarbear

Formerly ChrystalFalchion

toshimelonhead

The path was never linear, but it was always mine.

Zarel

Not a Yuyuko fan

erisia

Innovative new design!

Antar