• Smogon Premier League is here and the team collection is now available. Support your team!

Semi-automatic creation of a "replay bank" for gen7OU and any other gen

Edit : done, see here

Hi everyone, I recently started doing some coding for an idea I've had for some time : a publicly available website that displays most of, if not all the tournament replays of one specific metagame. In principle similar to FullLifeGames' replay scouter and tournament team collector, it would have the advantage of not having to specify a tournament/player. Each replay would be tied to many informations in order to let users filter and sort for the type of battles they want.
For the moment, I coded a way to import a single replay or a whole smogon tournament thread (that contains the replays), so you would "just" need to give it all the links to the tournament threads to fill the bank.
Since everything is stored on a google Sheet, the code is in the google app script language (which is basically javascript apparently)

Here is a screenshot of what it currently looks like when I just give a link to a smogon thread :
1768086161778.png

For now, the search/sort features that I thought of are :
  • Teams (winning and losing) (capacity to search by pokemon/combinations of pokemons)
  • Players (winner and loser)
  • Date
  • Tournament (and potentially tournament stage - ie final, semifinal, etc)
  • [Very experimental] team style (the original idea of this : being able to search replays of offence beating defense for example)
  • Teams that are sample teams
As it is currently, the code is immediately applicable to all tiers with team preview.

I will be happy to share the code with anyone who might want such a resource for their tier.

I made a very primitive team style detector which (surprisingly) works for i'd say 80% of teams?
Similar to the SM OU sample teams, I chose 4 team styles : offence, hyper offence, weather and balance/defense.
Weather teams are simple in sm, so its just a check for pelipper of for ttar+excadrill.
For the other styles, team style is guessed from team preview : for every pokemon, I gave 3 grades that indicates how well it fits on each style.
Then for each team, I sum 2^[score] for each style, and see what style has the highest score.
For example, I gave heatran 3 in Offence and Defence, and 1 in HO, so that means that it will give 8 points to the Defense and Offence scores, and only 2 to the HyperOffence score.
So it's quite primitive, but I find that it works rather well.

If you have suggestions or remarks on this concept, do not hesitate to share them, I am still looking for cool ideas to add to this.

Thx for reading! and sorry about my English ;)
 
Last edited:
I did smth similar ive a postgre SQL database containing:
- a table for all the public showdown replays, all the smogon replays and even some of the replay from the replay incidents indexed by ID with the logs of each replay and the smogon post thread ID + post number
- a smogon post table which basically get autoupdated every week adding new replays by scrapping this endpoint https://www.smogon.com/forums/search
- a tournament thread table
- a smogon user table
- a PS user table
- a scouted teams table (two row per replays)
- a pokepastes/pastebin/raw text table for team shared on smogon/ps/web

I also have a neo4j database to guess alts (WIP)

and i just do sql queries that i visualise with some html page (client WIP)

the code is quite simple as im using pkmn/ps packages https://github.com/pkmn/ps
i do have a lot of tests tho to make this as accurate as possible. so yea i advise you to write most of your code in typescript as most of the PS! related libraries are only written in this language. the only non ts code that I used was that bash xen-foro parser https://github.com/TUVIMEN/xenforo-scraper quite useful. working with PS lib or PS itself allow you to handle team data in a more secure way, and being able to use the teamvalidator for example as some teams might be outdated or just for data consistency.

As for the team style detection I didnt really started to dig into it yet, but I saw stuff about "stalliness" measures on the official showdown usage stats a while ago and an article about it.
 
Last edited:
Wsh Maxouille! t'as vu l'heure qu'il est qu'est ce que tu fait encore debout
thx for the reply, I figured someone must have already made it in SQL, but I did it on Sheets bc idk where to code sql :(

- a table for all the public showdown replays and even some of the replay from the replay incidents indexed by ID with the logs of each replay and the smogon post thread ID + post number
- a smogon post table which basically get autoupdated every week adding new replays by scrapping this endpoint https://www.smogon.com/forums/search/search
- a tournament thread table
- a smogon user table
- a PS user table
- a scouted teams table (two row per replays)
- a pokepastes/pastebin/raw text table for team shared on smogon/ps/web
Are the replays linked with the other info? (tournament, player,...) (I dont speak SQL in english sorrry :( ) Is there a way to separate "high level" replays from the rest? The auto update is nice, I could add that...

i advise you to write most of your code in typescript
BRUH I made everything in the google app script language, I hope its not too far (should be ok, its basically js)

Had no idea this existed, so I just brute-forced with the html of the whole forum pages and it works fine (not like I know how to install it)

As for the team style detection I didnt really started to dig into it yet, but I saw stuff about "stalliness" measures on the official showdown usage stats a while ago and an article about it.
Wtf is this LMAO why is swagplay a style??? Do you remember where the article was?
 
UPDATE : I have found and tested an ok-ish way to have each user be able to independently search and filter through the replays. However, it comes with a drawback : users must be logged in to google and accept a security prompt since I need to execute a script.

If you are reading this and you do not want to trust a random script to access a resource, please react to this post so I know if it's a problem.

the prompt to accept will be this :
1768397980855.png

1768398096587.png

1768398136146.png

The "edit,... you google sheets" is because the method I found is to, for every user that connects to the spreadsheet, a sheet is copied just for them.
The "connect to external service" is a non public part of the script (the part that extracts data form replays url)

Obviously the script will be open sourced, but it would be better if people don't have to trust a random script just to access a resource...


Edit : currently re-learning html to bypass this issue, stay tuned for updates ...
 
Last edited:
Done. Ended up migrating totally to an html page. Currently hosted on google.
Here is the link to the page : link (no longer loads slowly)
The main idea of this resource is to let players search specific replays based on many criterias, to answer questions such as
  • How are teams featuring [mon(s)] played at a hight level?
  • How are offensive teams able to win against pelipper teams?
  • How can ferro heatran teams win against medicham?
  • How are sample teams played and tweaked at high level?
  • I want to see kommo-o teams played by ABR
  • How do top players beat toxapex teams?

The bank currently contains all top 16 games from official circuit tournaments played this year in SM OU, plus every smog tour playoff game ever (since all of them are on the same page). The raw data is stored in an independent google sheet, and adding whole tournaments is very quick (~2mins/tour I'd say), so I'll soon add more tournaments.

Currently, it is possible to search by pokemon, player, teamstyle, tournament and presence of sample team.
There are no sorting options, i'll add that maybe in the future.

If you want, here are the links of the scripts : the rendering and filtering and the importing of replays and threads into the database (may not up to date)

This whole thing is very easy to apply to any gen with team preview, and can be tweaked to work on non preview gens too, so if anyone is interested in such a resource for another tier, do not hesitate to contact me and I'll be glad to help.

27/01:
  • Added a quick descritpion at the top of the page
  • Changed the data loading system, increasing efficency (loading times with ~2k replays : ~3s loading the page and ~10s loading the table)
  • Added a better search by tournament feature, being also more visualy pleasing
  • Added a hide old replays (picked 2022 as the breaking point, may go back to 2020), on by default
  • Added a hide player column button, ideally would find a way to automatically turn it on when the screen is small
  • Added a loading screen feature (extremely important ik)
31/01
  • Added a "usage stats" tab, with one table showing the usage stats (and win rates) in every replay and one showing the usage stats within the searched. This lets user see usage stats in tournaments/years, by so and so player, or alongside so and so pokemon(s)
  • Made so that searching highlights the teams that match the search for convenience
03/02
  • Added a row of usage stats to show teams/cores of 2 or more mons. A bit janky tho, so might rework the system
  • Added filtering option for usage stats (by usage or by winrate)
  • Add list of mons to make searching for mons more convenient
 
Last edited:
Hello friend, I think your project is fantastic. I previously attempted something similar, but my methods were far less efficient compared to yours. I would like to ask you a few questions regarding your methodology:

  1. How do you filter the matches?
  2. How do you determine the specific tournament rounds? (Especially in cases where matches are delayed and their replays are posted in the thread for the following week/round.)
  3. How do you achieve automatic updates?
I used to manually update matches for Gens 6 through 9 and generated tables similar to yours, but keeping them updated in real-time was incredibly exhausting. I would really love to learn about the techniques you use. Thank you very much!

https://123xuwu123.github.io/PS-Replay-Summary.github.io/en/
https://123xuwu123.github.io/PS-Replay-Summary.github.io/en/formats/gen8-ou/2025/wcop/

Here is an example of my work. And this is an older version, I've made significant updates to the UI since then. I've subsequently added previews for Moves, Items, and Tera Types, as well as a more comprehensive search function. If you're interested, I can send the latest version to you.
 
Last edited:
Hi, thank you very much for these kind words. The answer to all of your questions is that I didn't implement an automatic update system, and this is what I mean by "semi automatic". The way I import replays is through the tournament threads in smogon : I have a function that takes the url of a smogon thread and finds every replay link (https://replay.pokemonshowdown.com/) and then imports them individually.
As such, the limitations of this system is that in order to be efficient, the tournament thread must have "Replays required". However, since I only want to add "high level" replays, it works perfectly since every tournament that I've seen so far requiers replays as soon as their reach top 16.
To add the round of the replays, it is also manual and relies on each thread featuring a different round, and I assign a round name to each thread. As such, the rounds are by far the most unpolished part of this hole system since many niche cases arise (few examples : all rounds in the same thread, all the replays of a tournament, or multiple rounds in one thread (here there is both the winner's finals and the loser's prefinals). So this is why if you look at all the replays, many have no round.

In order to quickly import whole tournaments that span many threads, I made this interface
1768912036301.png
, and since the top 16 spans no more that 5-6 threads, it is quite quick to just search the name of the tournament and find all threads where replays were posted.
In practice, if you know where to look, I found that adding a year worth of replays takes 15-20 mins.

I hope I answered your questions, and thank you for your interest :)
 
Hi, thank you very much for these kind words. The answer to all of your questions is that I didn't implement an automatic update system, and this is what I mean by "semi automatic". The way I import replays is through the tournament threads in smogon : I have a function that takes the url of a smogon thread and finds every replay link (https://replay.pokemonshowdown.com/) and then imports them individually.
As such, the limitations of this system is that in order to be efficient, the tournament thread must have "Replays required". However, since I only want to add "high level" replays, it works perfectly since every tournament that I've seen so far requiers replays as soon as their reach top 16.
To add the round of the replays, it is also manual and relies on each thread featuring a different round, and I assign a round name to each thread. As such, the rounds are by far the most unpolished part of this hole system since many niche cases arise (few examples : all rounds in the same thread, all the replays of a tournament, or multiple rounds in one thread (here there is both the winner's finals and the loser's prefinals). So this is why if you look at all the replays, many have no round.

In order to quickly import whole tournaments that span many threads, I made this interface , and since the top 16 spans no more that 5-6 threads, it is quite quick to just search the name of the tournament and find all threads where replays were posted.
In practice, if you know where to look, I found that adding a year worth of replays takes 15-20 mins.

I hope I answered your questions, and thank you for your interest :)
OK, thank you very much.
 
UPDATE :
I finally got lazy enough to spend like 15 hours coding in 3 days.
With this time, I have basically changed the entire architecture to increase efficiency and scalability.
The main ting I did was to change the data storage from google sheet-based to an SQL database. The database is composed of the fetchedThreads table that contain the most important thing : the non automatic part of the upload, aka every smog thread link that was manually uploaded. This lets me store the hardest part of the whole system, as it is quite easy to refresh all the other data from that table (although it might take a few hours with my current optimisation).
But now that the loading speed of the database is no longer an issue, I can go all the way and import every single replay found in those threads.
Also to make it look more professional, I moved the project to github, and the webpage is gonna be hosted on github pages.

Here are the links to the github and to the new and improved web app (actually did not change that much since last time except for loading times... but now you can click on sprites in the usage tab!)
Anyway ty for tuning in :) have an excellent day
 
Last edited:
UPDATE : 更新:
I finally got lazy enough to spend like 15 hours coding in 3 days.
我终于懒到三天内花了大约15个小时编程。
With this time, I have basically changed the entire architecture to increase efficiency and scalability.
这段时间,我基本上改变了整个架构,以提高效率和可扩展性。
The main ting I did was to change the data storage from google sheet-based to an SQL database. The database is composed of the fetchedThreads table that contain the most important thing : the non automatic part of the upload, aka every smog thread link that was manually uploaded. This lets me store the hardest part of the whole system, as it is quite easy to refresh all the other data from that table (although it might take a few hours with my current optimisation).
我主要做的就是把数据存储从 Google Sheet 改成 SQL 数据库 。数据库由 fetchedThreads 表组成,包含最重要的内容:上传的非自动部分,也就是每一个手动上传的烟雾线程链接。这样我就能存储整个系统中最难的部分,因为刷新该表中的其他数据相对容易(虽然以我目前的优化方式,可能需要几个小时)。
But now that the loading speed of the database is no longer an issue, I can go all the way and import every single replay found in those threads.
但现在数据库加载速度不再是问题,我可以一路导入那些线程里的每一个回放。
Also to make it look more professional, I moved the project to github, and the webpage is gonna be hosted on github pages.
另外为了让它看起来更专业,我把项目搬到了 github,网页也会托管在 github 页面上。

Here are the links to the github and to the new and improved web app (actually did not change that much since last time except for loading times... but now you can click on sprites in the usage tab!)
这里是通往 GitHub 和新改进网页应用 的链接(实际上自上次以来变化不大,除了加载时间......但现在你可以在使用标签页点击精灵!)
Anyway ty for tuning in :) have an excellent day
总之谢谢你关注:)祝你有美好的一天
Cool, just a minor suggestion: it might be good to optimize the layout of the search bar and the tiers. I'm not sure if it's because of my monitor's aspect ratio, but they look a bit cluttered.
好的,只是个小建议:优化搜索栏和分层布局可能会更好。我不确定是不是因为显示器的宽高比,但它们看起来有点杂乱。
1772504033957.png
 
Thank you for pointing it out, i did noticed it acted weird ( but not to the scale of your screenshot), i already started to fix it so i hope i finish soon
 
Back
Top