1. Welcome to Smogon Forums! Please take a minute to read the rules.
  2. Click here to ensure that you never miss a new SmogonU video upload!

March Stats

Discussion in 'Policy Review' started by Antar, Apr 9, 2017.

  1. Disjunction

    Disjunction It's So Good Now
    is a Forum Moderatoris a Community Contributoris a Tiering Contributoris a Contributor Alumnus
    Moderator

    Joined:
    Mar 22, 2014
    Messages:
    1,129
    Hi, I hope this doesn't come across as aggressive, impatient, or otherwise, but I was wondering if there was any final decision on our plan here, either from Antar or SS in general?

    Speaking from the perspective of an NU community member, a lot of us are coming to terms with the fact that we aren't getting the alpha ladder we were promised this month (based on march stats, sample stats, feb stats, viability tiering or otherwise), but it would be really nice to hear we are, in some capacity, prepared for the beta launch in May. As far as it is observable except to those working on it, we are in the same spot we have been in since we were caught at the beginning of the month. The idea to sample stats has been promising, but Antar still hasn't successfully run a sample and we don't have any backup plan, just like this month. Waiting this long is having a very noticeable impact on users as well, at least in the NU room. Tonight, I've noticed a shift in the questions asked in our PS room have gone from "when is NU coming out" to people believing that there was some sort of decision that killed off the tier.

    I'm just concerned, as I know many users are, and some kind of official decision would go a long way to keeping users on board. We don't know when NU is coming out, when to plan for anything (NU had a year-long tournament schedule that will have to be reworked from scratch if our release is delayed a month), or what we can expect.

    Regardless, I appreciate all of the work that everyone has put into helping find a solution so far. I know Antar and everyone else involved are trying their hardest to make this work which is why I'm confident this will end up as another bump in the history of our usage-based tiering development for Gen 7.
    G-Luke, sedertz, Boyeraj17 and 23 others like this.
  2. august

    august the worst season
    is a Tutor Alumnusis a Team Rater Alumnusis a Forum Moderator Alumnusis a Live Chat Contributor Alumnusis a Smogon Media Contributor Alumnuswon the 5th Official Smogon Tournamentis a defending World Cup of Pokemon champion

    Joined:
    Nov 25, 2007
    Messages:
    3,189
    i don't mean to interject where i'm not welcome, but its generally a terrible idea to compute symmetric wald confidence intervals (aka the ones that were created itt) when you have p-values sitting near the boundary because your confidence intervals may suffer from empirical undercoverage. if you want a better estimator for small p-values you should use a bayesian estimate with a beta prior (probably 1/2 1/2 if you wanna perform well towards the boundary) and compute an acceptable sample size based on the bayesian credible intervals rather than the wald CIs

    guess it might not matter due to the clt but the credible intervals will almost surely perform better than the CIs for inferential purposes anyway

    anyway the sample sizes that were computed on page 1 were definitely too low (edit: not really i just misread the question), so i just ran a quick sim study in order to get an estimate on the sample size required in order to make sure that the length of the intervals are below the cut off, which ill throw in a pastebin incase anyone is interested (it was written in R)

    https://pastebin.com/n8XV0CPX

    probably won't be easy to follow if you arent familiar with the bayesian paradigm, but ull have to trust the method for now if thats true

    anyway i used the median of the N values rather than the means (just depends what you'd rather optimize, L2 loss or L1 loss, and in this case i was more interested in L1 loss) and got that in order to be safe:

    90% confidence: N = 355,000
    95% confidence: N = 504,000
    99% confidence: N = 870,000

    s.t. the length of our confidence intervals is less than .001. note this corresponds to p_hat +/ .0005.

    for estimating the sample in from first page we were talking about, i get (87250, 123950, and 214800), which is a bit less conservative

    code can be adjusted fairly easily in order to compute N values if you wanna change the cut off at all. i threw an example of using the binomial approximation to the normal as well, but simulation studies at N = 100,000 still showed empirical undercoverage near the boundaries (https://pastebin.com/eKb9XS9q for those who are interested). that being said, the normal approximation gave similar values at least for the 90% confidence case

    yea hope this helped maybe, sorry for ranting. if you're confused about what empirical undercoverage is shoot me a pm and ill send u a paper i wrote on this exact topic. also its late for me and i misread some stuff and ended up giving a lot of unnecessary calculations

    also, i saw people tossin around potentially being ok with the cutoff being .005 (ie: p_hat +/ .0025), in which case the simulation yields the medians 13700 19550 and 33900 respectively, which is definitely MORE than enough in order to make good inferences about the data. a sample of 1/10th of the TOTAL population that is taken randomly in order to mitigate bias is p much a statisticians dream. i fully support using the sub populations

    tl;dr: basing tiering decisions off of 10% of the total population chosen at random is a perfectly legimate way to do things and there should be no problems. also credible intervals are better when we're dealing with deciding whether to drop something (p relatively close to the boundary) and they require even smaller sample sizes
    Last edited: Apr 18, 2017
    Arikado, obii, Boyeraj17 and 29 others like this.
  3. Bughouse

    Bughouse Like ships in the night, you're passing me by
    is a member of the Site Staffis a Forum Moderator Alumnusis a CAP Contributor Alumnusis a Tiering Contributor Alumnusis a Contributor Alumnus

    Joined:
    May 28, 2010
    Messages:
    5,412
    I'm not an expert on Bayesian stuff whatsoever, and I know this is your thing august, but how were you reading the previous posts to get to a target of 870k matches for 99% CI. That's 870k of the ~2 million games (extrapolating from 20k amounting to 1% ie battles ending 00) played per month. There's just no way that's the minimum amount necessary under any circumstance. You never need to pull 45% of the population to get that kind of precision. I'll trust your math though... Bayesian shit always looks wrong to me.

    In any event, even on the lower numbers of matches in the 2-300k range, that's still obviously not practical in the short term, since Antar made it sound like even getting 20k out wasn't doable.

    I think our best option at this point is to put NU into Alpha based on whatever RU stats can actually be obtained (like I'd settle for 5k games on RU ladder in March at this point just to get a vague idea) and to do no tier shifts for any other metas. NU will take a while to settle down whether it has an extra month of RU stats built into its tiering or not.
  4. august

    august the worst season
    is a Tutor Alumnusis a Team Rater Alumnusis a Forum Moderator Alumnusis a Live Chat Contributor Alumnusis a Smogon Media Contributor Alumnuswon the 5th Official Smogon Tournamentis a defending World Cup of Pokemon champion

    Joined:
    Nov 25, 2007
    Messages:
    3,189
    my b, ill explain. i misread the a post on the first page and thought that people wanted the LENGTH of the intervals to be less than .001, but in reality it seems people wanted phat +/ .001, which corresponds to an interval length of .002. so the estimates that include 870k should realistically be about a quarter of what i posted (which they are, see edited post).

    also
    the normal approximation to the binomial requires 874,000 minimum to pull that kind of precision though so

    if you guys wanna make inferences about the newer tiers on lower sample sizes (like 5k) then thats an even better reason to use the bayesian estimates. also i could see why people arent trusting of bayesian methods - there are people in my department who have devoted their lives to statistics and still don't bother with bayesian methods (mostly quality control guys) but it is certainly safer near the boundaries
    sugarhigh, Josh, A and 7 others like this.
  5. quziel

    quziel

    Joined:
    Nov 23, 2015
    Messages:
    125
    Do you know if it would be possible to make the NU alpha based on 10 days or so of usage from May now that the server's set up? Its not perfect, and there would be a fair bit of uncertainty in the usage, but having 15-20 days to play an Alpha would really help to develop the future of the tier.

    Apologies if this isn't the right thread to post in.
    mashonem, nv, Arikado and 4 others like this.
  6. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,855
    If the new server is up, then presumably no one's putting a load on the old server, so I can just compile the last 2 months of stats. But I was waiting on Zarel to let me know when that happened...

    Edit: Yep, compiling now. I'll keep everyone posted as I make progress.
    Last edited: May 4, 2017
  7. ChrystalFalchion

    ChrystalFalchion
    is a Pre-Contributor

    Joined:
    Jul 18, 2015
    Messages:
    1,464
    I don't want to come across as aggressive or insulting because that's not my intention at all, but do you have a rough idea of when it will be ready? Or is it one of those "it'll be done when it's done" things?
  8. toshimelonhead

    toshimelonhead Honey Badger don't care.
    is a Tiering Contributor

    Joined:
    Nov 24, 2009
    Messages:
    1,743
    How much would it help to cut back on the Random Battle data? It's by far the most popular tier with ~40% of all battles, yet I don't see the utility of it because no one actually builds random teams.
  9. Zarel

    Zarel Not a Yuyuko fan
    is a member of the Site Staffis a Battle Server Administratoris a Programmeris a Pokemon Researcheris an Administrator
    Creator of PS

    Joined:
    Aug 16, 2011
    Messages:
    3,478
    It's a bit too late for this suggestion; the impression I got from Antar is that the server issues are now fixed.
    thesecondbest, 0Nl, Sacri' and 10 others like this.
  10. erisia

    erisia this is fine
    is a member of the Site Staffis a Forum Moderatoris a Community Contributoris a Contributor to Smogonis a Smogon Media Contributor
    Moderator

    Joined:
    Nov 29, 2011
    Messages:
    1,866
    I think they meant in general for future cases? Unless anyone actually uses the random battle stats for anything other than checking it works properly, I don't really see a downside to removing these stats and potentially speeding up future stat processing by a day or two.
  11. Antar

    Antar
    is a Battle Server Administratoris a Programmeris a Super Moderatoris a Community Contributor
    Official Data Miner

    Joined:
    Feb 17, 2010
    Messages:
    3,855
    Update:

    • March is 80% done. I'm already skipping any format with "random" in the name. It's slow because the issues with the old server aren't 100% resolved. There might be some hard disk failing going on, it could be the fault of Zarel backing up the chat logs at the same time, or it could be that the server's hard drive is 99% full. In retrospect, it probably would have been more useful for folks if I'd started with April... ¯\_(ツ)_/¯
    • Still working out access to the new server with Zarel, but assuming we sort that out in the next two weeks, Just got access to the new server. I doubt there are going to be any problems getting May stats out in a timely fashion.
    Last edited: May 14, 2017

Users Viewing Thread (Users: 0, Guests: 0)