Hey folks, I am extremely new to both the world of Smogon and competitive Pokemon battling in general, so apologies if I'm asking questions in the wrong place.
I'm an analytics graduate student looking to start a new personal project to support my work in this program, and I am fascinated with competitive Pokemon for two reasons: 1) I want to start playing competitively in the future and want to better understand team composition and individual matchups, and 2) from a data scientist's standpoint, there seems to be a TON of data available on both individual competitive Pokemon as well as battle replays available through Showdown. But that brings me to my primary concern: given this wealth of information, I have no idea where to start.
While I admit my project vision is vague at best right now, I want to merge a predictive logistic model (or a more advanced one, if necessary) with an interactive dashboard to allow users to build two teams and identify the odds of one beating the other. This would require multiple datasets, and would certainly require a table recording Showdown replays with both team compositions explicitly labeled and the winner identified. If this is too ambitious, I could downgrade to a dashboard that could identify a Pokemon's best and worst matchups given the data.
I can refine this scope as necessary; I should mention that I only have until January 2026 to carry this project out. So, with all that said: what recommendations do you all have, in terms of project scope and data sources? I recognize that this forum is flooded with projects that probably resemble mine quite a bit; I unfortunately just lack the time to scrutinize all of them. Any feedback or clarifying questions is eagerly welcomed.
I'm an analytics graduate student looking to start a new personal project to support my work in this program, and I am fascinated with competitive Pokemon for two reasons: 1) I want to start playing competitively in the future and want to better understand team composition and individual matchups, and 2) from a data scientist's standpoint, there seems to be a TON of data available on both individual competitive Pokemon as well as battle replays available through Showdown. But that brings me to my primary concern: given this wealth of information, I have no idea where to start.
While I admit my project vision is vague at best right now, I want to merge a predictive logistic model (or a more advanced one, if necessary) with an interactive dashboard to allow users to build two teams and identify the odds of one beating the other. This would require multiple datasets, and would certainly require a table recording Showdown replays with both team compositions explicitly labeled and the winner identified. If this is too ambitious, I could downgrade to a dashboard that could identify a Pokemon's best and worst matchups given the data.
I can refine this scope as necessary; I should mention that I only have until January 2026 to carry this project out. So, with all that said: what recommendations do you all have, in terms of project scope and data sources? I recognize that this forum is flooded with projects that probably resemble mine quite a bit; I unfortunately just lack the time to scrutinize all of them. Any feedback or clarifying questions is eagerly welcomed.