1. Welcome to Smogon Forums! Please take a minute to read the rules.
  2. Click here to ensure that you never miss a new SmogonU video upload!

March Stats

Discussion in 'Policy Review' started by Antar, Apr 9, 2017.

  1. Disjunction

    Disjunction It's So Good Now
    is a Forum Moderatoris a Community Contributoris a Tiering Contributoris a Contributor Alumnus

    Mar 22, 2014
    Hi, I hope this doesn't come across as aggressive, impatient, or otherwise, but I was wondering if there was any final decision on our plan here, either from Antar or SS in general?

    Speaking from the perspective of an NU community member, a lot of us are coming to terms with the fact that we aren't getting the alpha ladder we were promised this month (based on march stats, sample stats, feb stats, viability tiering or otherwise), but it would be really nice to hear we are, in some capacity, prepared for the beta launch in May. As far as it is observable except to those working on it, we are in the same spot we have been in since we were caught at the beginning of the month. The idea to sample stats has been promising, but Antar still hasn't successfully run a sample and we don't have any backup plan, just like this month. Waiting this long is having a very noticeable impact on users as well, at least in the NU room. Tonight, I've noticed a shift in the questions asked in our PS room have gone from "when is NU coming out" to people believing that there was some sort of decision that killed off the tier.

    I'm just concerned, as I know many users are, and some kind of official decision would go a long way to keeping users on board. We don't know when NU is coming out, when to plan for anything (NU had a year-long tournament schedule that will have to be reworked from scratch if our release is delayed a month), or what we can expect.

    Regardless, I appreciate all of the work that everyone has put into helping find a solution so far. I know Antar and everyone else involved are trying their hardest to make this work which is why I'm confident this will end up as another bump in the history of our usage-based tiering development for Gen 7.
  2. august

    august the worst season
    is a Tutor Alumnusis a Team Rater Alumnusis a Forum Moderator Alumnusis a Live Chat Contributor Alumnusis a Smogon Media Contributor Alumnuswon the 5th Official Smogon Tournamentis a defending World Cup of Pokemon champion

    Nov 25, 2007
    i don't mean to interject where i'm not welcome, but its generally a terrible idea to compute symmetric wald confidence intervals (aka the ones that were created itt) when you have p-values sitting near the boundary because your confidence intervals may suffer from empirical undercoverage. if you want a better estimator for small p-values you should use a bayesian estimate with a beta prior (probably 1/2 1/2 if you wanna perform well towards the boundary) and compute an acceptable sample size based on the bayesian credible intervals rather than the wald CIs

    guess it might not matter due to the clt but the credible intervals will almost surely perform better than the CIs for inferential purposes anyway

    anyway the sample sizes that were computed on page 1 were definitely too low (edit: not really i just misread the question), so i just ran a quick sim study in order to get an estimate on the sample size required in order to make sure that the length of the intervals are below the cut off, which ill throw in a pastebin incase anyone is interested (it was written in R)


    probably won't be easy to follow if you arent familiar with the bayesian paradigm, but ull have to trust the method for now if thats true

    anyway i used the median of the N values rather than the means (just depends what you'd rather optimize, L2 loss or L1 loss, and in this case i was more interested in L1 loss) and got that in order to be safe:

    90% confidence: N = 355,000
    95% confidence: N = 504,000
    99% confidence: N = 870,000

    s.t. the length of our confidence intervals is less than .001. note this corresponds to p_hat +/ .0005.

    for estimating the sample in from first page we were talking about, i get (87250, 123950, and 214800), which is a bit less conservative

    code can be adjusted fairly easily in order to compute N values if you wanna change the cut off at all. i threw an example of using the binomial approximation to the normal as well, but simulation studies at N = 100,000 still showed empirical undercoverage near the boundaries (https://pastebin.com/eKb9XS9q for those who are interested). that being said, the normal approximation gave similar values at least for the 90% confidence case

    yea hope this helped maybe, sorry for ranting. if you're confused about what empirical undercoverage is shoot me a pm and ill send u a paper i wrote on this exact topic. also its late for me and i misread some stuff and ended up giving a lot of unnecessary calculations

    also, i saw people tossin around potentially being ok with the cutoff being .005 (ie: p_hat +/ .0025), in which case the simulation yields the medians 13700 19550 and 33900 respectively, which is definitely MORE than enough in order to make good inferences about the data. a sample of 1/10th of the TOTAL population that is taken randomly in order to mitigate bias is p much a statisticians dream. i fully support using the sub populations

    tl;dr: basing tiering decisions off of 10% of the total population chosen at random is a perfectly legimate way to do things and there should be no problems. also credible intervals are better when we're dealing with deciding whether to drop something (p relatively close to the boundary) and they require even smaller sample sizes
    Last edited: Apr 18, 2017 at 1:59 AM
    Josh, slurmz, ToF and 25 others like this.
  3. Bughouse

    Bughouse Like ships in the night, you're passing me by
    is a member of the Site Staffis a Forum Moderator Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnus

    May 28, 2010
    I'm not an expert on Bayesian stuff whatsoever, and I know this is your thing august, but how were you reading the previous posts to get to a target of 870k matches for 99% CI. That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision. I'll trust your math though... Bayesian shit always looks wrong to me.

    In any event, even on the lower numbers of matches in the 2-300k range, that's still obviously not practical in the short term, since Antar made it sound like even getting 20k out wasn't doable.

    I think our best option at this point is to put NU into Alpha based on whatever RU stats can actually be obtained (like I'd settle for 5k games on RU ladder in March at this point just to get a vague idea) and to do no tier shifts for any other metas. NU will take a while to settle down whether it has an extra month of RU stats built into its tiering or not.
  4. august

    august the worst season
    is a Tutor Alumnusis a Team Rater Alumnusis a Forum Moderator Alumnusis a Live Chat Contributor Alumnusis a Smogon Media Contributor Alumnuswon the 5th Official Smogon Tournamentis a defending World Cup of Pokemon champion

    Nov 25, 2007
    my b, ill explain. i misread the a post on the first page and thought that people wanted the LENGTH of the intervals to be less than .001, but in reality it seems people wanted phat +/ .001, which corresponds to an interval length of .002. so the estimates that include 870k should realistically be about a quarter of what i posted (which they are, see edited post).

    the normal approximation to the binomial requires 874,000 minimum to pull that kind of precision though so

    if you guys wanna make inferences about the newer tiers on lower sample sizes (like 5k) then thats an even better reason to use the bayesian estimates. also i could see why people arent trusting of bayesian methods - there are people in my department who have devoted their lives to statistics and still don't bother with bayesian methods (mostly quality control guys) but it is certainly safer near the boundaries
    Josh, A, ToF and 6 others like this.

Users Viewing Thread (Users: 0, Guests: 1)