Usage Stats: What's the Minimum Viable Feature Set?

First off, if you use my usage stats, especially the detailed reports or the JSON files and you have an opinion, but you don't have permission to post to this forum, PM me separately or post your thoughts to the GitHub issue.

I'm in the process of completely rewriting the scripts I use to generate usage stats. The system that we have now works, but just barely--it's horribly written, slow as molasses, about as sturdy as a piece of tissue paper and an absolute migraine to maintain.

I just finished a "proof of concept" release of the new system, which is able to generate the basic usage rankings for all non-mod (read: Gen VI) tiers. It's slow--not significantly faster than the old system--but it's decently resilient, and most of the configuration files get scraped straight from Pokemon Showdown, so maintenance is a piece of cake.

My original plan was not to switch over to the new system until I'd finished building out the full feature set and done significant work optimizing and performance-tuning. But more and more, I feel that keeping the old system up-to-date and running is a significant waste of my extremely limited time and energy.

So what I'm asking is this: what is the absolute minimum set of analyses that people can live with when it comes to the usage stats? Do people actually use the moveset stats or the stalliness analyses? Do people find checks & counters or teammate stats actually useful, to the point that they'd be seriously bummed if they went away for a while?

I'd also like input as to performance benchmarks. That is, if it takes 30+ days to process a month's worth of logs, that's obviously a nonstarter, but what about 1 week? The current system can process a full month of logs and generate reports in about 4-5 days (reports end up being delayed by needing to fix bugs and rerun). Do I need to make sure I hit that target?

Zarel, I'd also like your input in terms of resource consumption. The current system runs mostly single-thread (and log-reading, which takes a while, doesn't use up significant CPU), and a month's worth of processed logs takes up about 14GB of disk space. A month of data from processed logs under Onix v0.1 exists as a ~60GB sqlite database file, and the processing is probably a bit more CPU and disk-intensive, plus I'm about to throw in multiprocessing, so if this is a no-go for deployment on the server, let me know.
 

scpinion

Life > Monotype... unfortunately :)
is a Site Content Manager Alumnusis a Community Leader Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Battle Simulator Moderator Alumnus
I routinely (couple times a week?) find myself looking at the moveset stats (moves, items, natures, evs). I'd notice if they weren't there and I'm assuming I'd miss it.

I wouldn't miss stalliness, checks and counters, or teammates.

Side note that is particularly important to me and the community I lead (Monotype), but probably won't matter to anyone else: If possible, will you keep letting us know how often particular team types (rain, sun, monowater, monosteel, etc.) are used. We regularly utilize that information for our metagame.
 

Zarel

Not a Yuyuko fan
is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Pokemon Researcheris an Administrator
Creator of PS
I really have no clue, but log size probably needs to be fixed in some way...
 
Many people do look at checks and counters. It's been on my bot on IRC/PS ever since it was a thing. Teammates are much less used from my experience.

Outside of Monotype I haven't seen team types utilized.
 
moveset stats offer some pretty crucial information. knowing that mesprit is the most used pokemon in nu is easy. knowing which of its 40 sets is most-used is a different beast. you don't even necessarily get that information from movesets, but at least you get a picture/an idea based on the moves and items it uses most. it's also the most helpful of the non-standard stats for someone looking to get into a new metagame. if you know what's out there and what those things use, you can figure out the rest (checks and counters, especially).
 
How vital are usage stats for mod tiers? Obviously Gen VII is a mod right now, so I need to support that, but what about Gen I? Or Mix & Mega? Could those be uncounted for a few months?
 

Honko

he of many honks
is a Site Content Manager Alumnusis a Programmer Alumnusis a Top Contributor Alumnus
It would be kind of a bummer to lose those for the non-permanent ladders like OMotM and RoA Spotlight in particular. I looked forward to seeing the DPP NU stats after last month, for example, since it was the biggest sample of matches that tier had seen (and will see, most likely) in years. I would have been disappointed if they hadn't been counted. I imagine the same would be true for OMotM, especially if it's a meta that hasn't been featured before. It's not as important to me as the moveset stats, though.
 
Raseri, it's not super clear to me yet which mods will be "easy" to incorporate and which will be hard. I know Gen I-II is currently a problem because the IV/EV system is different. And I haven't yet figured out a way to completely auto-generate accessible forme rules (the stuff that says, eg, that Gardevoir can Mega evolve if it's holding a Gardevoirite), so M&M is going to be hard. But Gen III-VII should involve very little effort.

Again, the focus here is to prioritize my task list. Everything's going to get done eventually, it's just a matter of what gets done first.
 
Have you dropped stats for random formats? Personally, I'd like you to keep only moveset stats and only for "Random Battle". Everything else can be dropped imo.
 

Anty

let's drop
is a Site Content Manager Alumnusis a Team Rater Alumnusis a Community Leader Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnus
Would dropping the 1500 level stats make a difference? The 1630 is used for tier shifts and 1760 is for high ladder, but 1500 doesn't serve much purpose (none that I can think of) and would take away 1/4 of the stats.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top