Programming p^2's thread statistics (feat: xPL signup sheets)

P Squared

a great unrecorded history
is a Site Content Manageris a Community Contributoris a Top Contributoris a Top Social Media Contributor Alumnusis a Senior Staff Member Alumnusis a Top Smogon Media Contributor Alumnus
Hello friends. Here's some fun stuff I can do for forum threads. Unfortunately, learning how to make these do-it-yourself applications is not a priority for me, but feel free to request stuff itt and I can probably do it relatively quickly for you.

Scraping is done with Python (the Scrapy library in particular) and everything else in R, because I like R. I guess you can check out my Github, but it is probably embarrassing and bad. In general I am just a kid learning how to do interesting things, so if you have any suggestions for improvements let me know.

Premier league signup sheets
Turns signup threads into nice spreadsheets for the PS auction bot. It is fairly dependent on people actually following the signup format, which about 40% of people don't, but even if they don't I will still get their username and post number, link, and timestamp. I wrote this one back in December when I found out the old method required the user to go through every page of a signup thread (that was shocking) and did a simple search for tier names--I think it's good practice to design search parameters so that, for example, a signup that includes a word that has "ou" in it (like "hey I just found this thread!") doesn't get counted as a signup for OU, and more commonly that a signup that says "everything except DPP" does not count as a signup for DPP. I got to play around with negative lookaround regular expressions for this, so that was fun. Also, no shade at the previous signup scraping people, they're still great for working to automate a very tedious process.

Some recent examples:
http://spo.ink/oupl4signups
http://spo.ink/dpl4signups

This can also be done for non-premier league threads of course. If you just want to see a list of all the posters in a thread with their post numbers, links, timestamps etc that works too.

Graph posts over time
Can be a nice quick way to see how active a thread is, and compare it to other threads. I coded this back in January when I noticed how active the OST 14 signups thread was and wanted to see if it was actually different from previous years.
rstudio_2018-01-11_09-10-40.png
If you're using this for signups, a caveat is that it's just counting posts, because it's basically impossible to distinguish signups and non-signup posts in the same thread. I could probably do something about double posters though.

Most liked posts in a thread
More spreadsheets yay. This was one of the earlier things I did after I learned how to scrape Smogon, so the sheets are outdated and not pretty. I did this for the SPL 7 Commencement thread and later the old QDB when it was moved to a public forum (spiders can't access private subforums like Firebot and Inside Scoop without some modifications that admins would probably frown upon). The links in the latter are all broken now because firemods deleted a bunch of posts in the thread, which messes up the links.

I haven't rewritten the likes-scraping part of my code since we moved to Xenforo 2, but I imagine it won't take too long. Also note that I used to be able to scrape each poster's like count and join date, but since those are no longer immediately visible on Xenforo 2 (you have to hover to see more info on a user in a thread), I can't do that anymore without learning how to use Selenium or something. Maybe one day...




I won't do the following things for you by request, but feel free to adapt them for your own use.

GP check formatter
Clicking colors is definitely the worst and most time-wasting part of GP checking. I started using bold / italic / underline while GPing instead of colors since those three have keyboard shortcuts (later when we moved to Xenforo 2 I replaced italics with strikethroughs) and then find-and-replaced [b] with [b][color=blue] and so on. Then I learned how to write shell scripts in R, so I automated the process so I just have to type (for example) RScript formatCheck.R hajime 2 into Git Bash to format a GP check and add my Hajime stamp and a GP 2/2 to the top. There are also optional arguments for colors (RScript formatCheck.R hajime 2 dodgerblue tomato mediumorchid); defaults are deepskyblue, red, and limegreen. This is also how I get to circumvent Xenforo 2's disappointing default color picker colors. :)

Anyway, this speeds up GPing immensely and I definitely recommend it to GPers if they are confident enough to not need to see colors while they check. The R code is up on my Github. Just substitute my stamps and colors for yours (and filepaths).

Markov chain text generation (post simulator)
Initially I had the ambitious goal to write my own Markov chain text generator, but after weeks of laziness I just scraped posts and fed them into a website that did the work for me. Still kinda fun though. Here's my old Smogon simulator thread about it if you want to see examples or learn more about Markov chains.
 

P Squared

a great unrecorded history
is a Site Content Manageris a Community Contributoris a Top Contributoris a Top Social Media Contributor Alumnusis a Senior Staff Member Alumnusis a Top Smogon Media Contributor Alumnus
Tournament growth is something that interests me as of late so here are some graphs for Smogon Classic signups. There are definitely prettier ways to show the data;; but this is what I have for now

1523384137014.png
I don't know much about... anything, but especially old gens, so it was surprising to see RBY always gets more signups than GSC. Also that DPP sometimes gets more than BW! Also, as is the case with OST, we're getting a lot more signups this year. I wonder if it's a one-time thing and we'll go back down next year or if this will continue.

1523384561244.png
The same info but split by generation instead if that's more of interest to you. Newer gens get more signups than older gens, as expected. RBY, GSC, and DPP are alike in that Classic I had the fewest signups, followed by II, and then III, and then this year has the most. But BW Cup III had ~50 fewer signups than BW Cup II and actually needed a few more hours to reach the number of signups as BW Cup I. Weird.
 

P Squared

a great unrecorded history
is a Site Content Manageris a Community Contributoris a Top Contributoris a Top Social Media Contributor Alumnusis a Senior Staff Member Alumnusis a Top Smogon Media Contributor Alumnus
Update update.
Also note that I used to be able to scrape each poster's like count and join date, but since those are no longer immediately visible on Xenforo 2 (you have to hover to see more info on a user in a thread), I can't do that anymore without learning how to use Selenium or something. Maybe one day...
I ended up learning Selenium. It sucks.

Classic
Last time I ran the plots before RBY and GSC signups closed, so here's an update.
1524699731162.png
1524699868464.png

Top usage changes
Eisenherz made a nice usage graphic for April with biggest increases/decreases in usage so I wrote a quick script to calculate those, it doesn't take into account if the Pokemon actually left the tier (like Emboar in NU) but it sounds like a pain to deal with that. If anyone wants to see more feel free to ask.
1524700627914.png

Lemonade suggested I use biggest rank increase/decrease instead of usage for the last column, which is a good idea. Will implement that for next month.

Premier League signups
If you need me to make your sheet, please let me know a couple days in advance. Following this would be ideal:
- PM me with a link to the signup thread a few days before the auction
- Close signups at least 12 hours before the auction
- PM me again after you close the thread

You can also post requests here I guess.

Making the sheet doesn't take 12 hours (dumping the data into a Google spreadsheet only takes about 2 minutes, plus another ~3 minutes to make the spreadsheet look pretty), but if you ask for a sheet an hour before the auction I might not be online, sooo. Just tell me in advance.
 
Last edited:

P Squared

a great unrecorded history
is a Site Content Manageris a Community Contributoris a Top Contributoris a Top Social Media Contributor Alumnusis a Senior Staff Member Alumnusis a Top Smogon Media Contributor Alumnus
Seems like it's premier league season again. I've gotten a couple of messages about signup sheets, so here's a reminder...

Premier League signups
If you need me to make your sheet, please let me know a couple days in advance. Following this would be ideal:
- PM me with a link to the signup thread a few days before the auction
- Close signups at least 12 hours before the auction
- PM me again after you close the thread

You can also post requests here I guess.

Making the sheet doesn't take 12 hours (dumping the data into a Google spreadsheet only takes about 2 minutes, plus another ~3 minutes to make the spreadsheet look pretty), but if you ask for a sheet an hour before the auction I might not be online, sooo. Just tell me in advance.
Couple of other notes...

- My script is actually broken right now due to the forum update... I will still get each post's username and text, but I'll have to fix it to get permalink, timestamp, and # likes. Not sure when I'll have time to do that, but regardless the current output is still enough for the auction bot.

- People ask this a lot-- do posts have to follow a signup format? Again, no matter what, I can get each post's username (which is enough for the auction bot) and text. But if your post doesn't include "Username: [...]" or "Tiers Played: [...]" or whatever the format asks for, it won't be able to parse your post to populate the metagame columns. So if that matters to you, follow the format.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top