Policy Review Multi-Layered CAP Release Process

Birkal

We have the technology.
is a Top Artistis a Top CAP Contributoris a Top Smogon Media Contributoris a Site Content Manager Alumnusis a Battle Simulator Admin Alumnusis a Super Moderator Alumnusis a Community Contributor Alumnus
Multi-Layered CAP Release Process

This is the second of a series of three policy review threads about the future of the CAP process. CAP moderator Dogfish44 is leading the first on Stage Adjustments and Quality Control. This one will focus on the releasing of CAP Pokemon into the CAP metagame once we're done creating them.

--------------------------------------------------------------------​

The Problem with a Single-Release Process

With our current CAP process, we release CAPs into the metagame once we're done creating them. This has been sensible for many years; it was a difficult labor to upload a CAP onto the server. Back on Pokemon Online, it was agony to get anything uploaded. But thanks to the flexibility of Pokemon Showdown, it's become relatively easier to upload our own creations. That became even easier with the opening of PS's entire source code, which will let anyone host their own server, battles, and most importantly: custom creations.

This is relevant because for the past few CAPs, we've been actively testing our creations before their official launch on test servers. This has even come to change the actual process of creating our Pokemon: playing with Smokomodo in actual battles revealed quickly that Flame Charge put a very obnoxious win conditions on battles, which was something we deemed uncompetitive and ended up disallowing on its movepool. And while there have been attempts to leverage this testing, nothing has been officially acknowledged by the CAP process yet with regards to this pre-testing.

In my opinion, this testing is great. It gives us hands-on experience with our creations before releasing them out into the public. It creates more cohesive Pokemon that are better balanced for the metagame. The problem is that our process has not updated itself for this sort of testing. This thread attempts to rectify that problem. I'm going to list a proposal for a new policy step that I think will account for this well.

--------------------------------------------------------------------​

Proposal: Multi-Layed CAP Release Process
  • Implement a Release 1.0 with the following features:
    • Art, Sprites (optional, but preferred), Name
    • Typing, Competitive Abilities, Stats, Competitive Movepool
      • All of these competitive attributes are already baked into the process, meaning we won't have to create any new stages.
This first release would take place after we conclude the competitive movepool stage. When discussing this implementation, we'd call it CAP28.0 to indicate that it's not fully released yet. While we can share the creation of this Pokemon with the community, we won't make it a large Public Relations (PR) push until later on. This is basically formalizing the pre-testing phase that we've been informally doing already.
  • Implement a Release 2.0 with the following features:
    • Everything above
    • 3D Model, Pokedex Entry, Pre-Evolution (optional, but preferred)
    • A reassessed competitive movepool, based on the playtesting of Release 1.0 of that CAP. This would be a new stage, and links in to GMars’ earlier proposal of us revisiting CAPs after some testing
    • Full Public Relations (PR) release, including posts to social media, inaugural battle, and playtest tournaments
This 2.0 Release would look similar to what we do currently. It would have all the fanfare of a normal CAP release, despite it technically being several weeks after the 1.0 Release of the CAP. Note that a new step is being included here: we'll be using the Release 1.0 to gauge where the competitive movepool lies, and if we should make any final tweaks or adjustments before finally releasing it to the public. This could include banning moves, adding moves, or generally readjusting what needs to be fixed with regards to the competitive quality of the Pokemon.

--------------------------------------------------------------------​

Closing Statements

So why implement this? It allows us to edit our Pokemon competitively if they need some rebalancing, which creates more discussions and opportunities to contribute. Having public discussions on the balance of a 1.0 Release is a lot better than us making anecdotes on Discord about random battles we've had on the test server. It gives us something concrete to look back on, and makes our Pokemon more competitively balanced.

It also has the advantage of giving 3D modelers, movepool creators, and pre-evo crews more time to perfect their efforts, without feeling the strain of trying to hit a release that’s holding up competitive contributors. 3D modelers in particular perform an absolutely Herculean effort to get their models done on time. There is no reason that we should rush them as much as we do, and a 2.0 Release alleviates that.

Why would we not implement this? Well, it's gonna take up more time, which is why I am proposing in threads one and three ways to make the CAP process more efficient. The final thread in this PR cycle will address our calendar. It lengthens the amount of discussions we have about a single CAP, which some could see as a negative that takes away from a more global perspective of looking at the CAP metagame as a whole. And finally, it could be confusing to the public at large to have a CAP released twice, especially at the start.

Overall, I still think the benefits outweigh the risks. Staggering our releases will free up the CAP process a lot more to further testing, while simultaneously granting more time to our contributors to make a more cohesive Pokemon. Let me know your feedback on this proposal below. Do you like it? Hate it? What would you do differently?
 
I don't have too much to say, but I'd be all for a gradual release on Showdown for 28 onwards, rather than trying to work everything out before the CAP fully releases and making sure it isn't broken or useless. We can also work with some broken-sounding moves just to make sure they are what we would think, even if we end up removing them later. That way, we don't need too long to nerf something like we did with Equilibra, which took part of Summer 2019 to get finished. But that CAP already lost those moves, so it's good to look forward to what we can do better.
 
Last edited:

Wulfanator

Clefable's wish came true!
is a Pre-Contributor
This is a smart addition to the project as it opens testing to a larger pool of players and provides us with more concrete information to make decisions from. I'd be curious to see how this impacts steps like movepool. We usually try to error on the side of caution when it comes to questionable moves, so would we see more leniency when it comes to on-the-fence options since we can address them on the second pass. Could this potentially lead to unforeseen power creep since we have a built in "save our ass" option instead of having to go through the nerfing process? I don't really have much to add beyond that.
 
Last edited:

Quanyails

is a Top Artistis a Community Leaderis a Community Contributor Alumnus
CAP Co-Leader
Proposal: Multi-Layed CAP Release Process
  • Implement a Release 1.0 with the following features:
    • Art, Sprites (optional, but preferred), Name
    • Typing, Competitive Abilities, Stats, Competitive Movepool
      • All of these competitive attributes are already baked into the process, meaning we won't have to create any new stages.
If the CAP test server serves as an "alpha" release, and this release is considered a "beta" release, I think this second release would have its benefits. If the CAP is on the main PS! server, it could see more use on the ladder and tournaments, so we may get a wider perspective of how the CAP fares. In the case of Equilibra and Astrolotl, I've heard the sentiment that while individual sets are not "broken", we end up giving it too many options, making it hard to play against. This is a case where "casting a larger net" after movesets are defined would help identify how enjoyable it is to play with a CAP creation and what needs to be culled back.

Also keep in mind that a two-stage release will require more coordination on behalf of the PS! devs. I'll tag Marty here, since he's our point of contact, in case he has any thoughts.

--

Release is a lot better than us making anecdotes on Discord about random battles we've had on the test server. It gives us something concrete to look back on, and makes our Pokemon more competitively balanced.
I just want to clarify that the test server has full replay saving. Even before then it did, we had replays. Sure, saving replays was a bit clunky before the server was registered, but we were never limited to just anecdotes--we had have full evidence for or against the viability of a competitive moveset.

--

This first release would take place after we conclude the competitive movepool stage. When discussing this implementation, we'd call it CAP28.0 to indicate that it's not fully released yet. While we can share the creation of this Pokemon with the community, we won't make it a large Public Relations (PR) push until later on. This is basically formalizing the pre-testing phase that we've been informally doing already.
This is a nit, but in it's common in software versioning to use 0.x to indicate a beta build, while 1.x refers to a finished product. I'd imagine if it were for CAP 28, we should call it v. 0.28 or something to indicate the beta.
 

MrDollSteak

CAP 1v1 me IRL
is a Forum Moderatoris a Community Contributor
Moderator
Overall, I think this feels like an intuitive change. I could arguably even see the testing happen without a sprite to truly indicate its beta status, as a means to not confuse the community at large too much. I think the additional time that is afforded to the modelers, and the greater emphasis it puts on the finishing of the pre-evolution process is excellent. I think in combination with the previous adjustments to stage order, it may actually shorten the length of CAP creation as a whole by using time more efficiently, particularly inregards to the competitive move set stage. I think if the competitive move set is split into a shorter 'defining moves' stage as dogfish44 suggests, and a formal 'preliminary testing' stage where any issues are identified, the overall down time will be less, and momentum will continue which strikes me as a benefit.
 

Quanyails

is a Top Artistis a Community Leaderis a Community Contributor Alumnus
CAP Co-Leader
Posting this for Marty:

I don't think it should require more coordination than usual since there would still be one inaugural battle. My process for implementing CAPs generally involves not doing anything until the movepool is ready, coding up everything available at that point, then waiting for sprites.

However, I would ask that at minimum all of the parts essential to implementation on PS are ready for the first release. This includes all competitive stages, weight in kg, a Gen 5-style sprite (this can be just a front sprite and then I'd copy-paste it for back/shiny/back-shiny), and how many prevos the CAP will have. The prevo number isn't strictly necessary but the new sprite process for Smogon and PS relies on static ID numbers, and no one wants to be rearranging the IDs every time a prevo is implemented.
 
I support the idea of an official beta / release candidate version: as in software development it is virtually impossible to release a bug-free product even with extensive testing and patches and updated are the norm, I feel the idea of a locked final release is just naïve. Competitive-oriented games do tend to get balance patches / rereleases, after all.

This raises some deeper points, however:
- What can be changed in the transition from beta to final? Can moves be added? Removed? What about abilities?
- What reasons can make an element eligible to be changed? Because it ends up being considered overpowered? Underpowered? Not fulfilling the original stated concept of a CAP? How much is too much when altering the identity of a CAP?

I am mainly thinking about Aurumoth, which was supposed to be about risk management but ended up being much safer than expected. With a pre-release phase, would Illusion or part of Aurumoth's coverage have been eligible for replacement / removal in order to reduce its power level and/or to bring it more in line with its stated concept?

This is a nit, but in it's common in software versioning to use 0.x to indicate a beta build, while 1.x refers to a finished product. I'd imagine if it were for CAP 28, we should call it v. 0.28 or something to indicate the beta.
"0.28" reads more as "pre-release version 28" of a new product, though. "CAP28 0.1" then "CAP28 1.0" is more in line with how actual versioning tends to work, and could be further used for updates, even generational ones (although full versioning feels too much for what CAP is).

(More on a common software versioning practice, for those interested.)
 
Last edited:

jas61292

used substitute
is a Community Contributoris a Top CAP Contributoris a Forum Moderator Alumnusis a Battle Simulator Moderator Alumnus
I have never been a huge fan of the pre-testing that has been done in recent projects with the test server. I believe such testing to be highly flawed in its very nature as it basically is a free for all with regard to what you can use. In my experience with such testing, either you tell your opponent what exactly you are testing, and thus they know your set from the outset and therefore the battle is a poor representation of a normal battle, or you don't tell them, and its just as bad but in reverse; they would have no idea what you are using, which is also unrepresentative, as in real play you always know all the possible options, just not the set itself. While I know a lot of people have used testing to form strong opinions, and those opinions have often had force or sway on the final outcome of the project, I personally have always viewed such testing as highly flawed at best, and often (but certainly not always) heavily tinged with confirmation bias.

Now, if we were to establish a formal testing stage, it would allow for battles that do not suffer from the information issues that current testing does. The Pokemon would have a defined set of moves, and so it would generally function just like normal, but with a more limited movepool. This would be far more valuable than any other kind of testing, and I would generally have no issue with it.

That being said, I think there are some potential issues with the idea. The first of which is a fundamental issue regarding the scope of the project. Honestly, I think it is fair to question whether we want any testing at all as a required part of the main process itself, which is fundamentally a forum project. In the past we have generally avoided anything that requires you to venture beyond the forums to participate. I know we have changed this some with how updates were concluded, but it has always been a CAP standard that the entire project is a forum only project, until the Pokemon is done. Also, perhaps even more fundamentally, CAP is a project about theorymon. I know it might sound absurd to argue that testing is inherently a bad thing, but if the project becomes less about speculative theory and debate and more about objective analysis, I feel something would be lost.

Furthermore, when you really think about it, movepool is the last competitive step of the project. Releasing a Pokemon with a preliminary movepool and then changing it is not really all that different than releasing something with our current process and then deciding we screwed up and it needs to be adjusted. The Pokemon will have been out there, and the public will have seen it and gotten to use it. Even if we don't highly publicize it, there are still plenty of people who hop on the CAP ladder without ever paying attention to the forum project itself. And outside the core CAP crowd, no one is going to care that the changes being made are now an official part of the process. A change is a change and will be viewed as such.

Beyond these worries, I also think that if we are to go down this road, there are multiple issues that would need to be addressed. As Menshay stated, any such process would need well defined rules. What can be removed? Can things be added? These obviously are potential issues, as adding a move means allowing something untested in the final product, but not allowing adding means the pre-testing phase could limit things regardless of what testing suggests is fine. And of course, you can't just say allow everything relevant for testing, as the number of options something has and the number of sets it can run is a huge influence on how good it is. I don't think there is any one right answer to this issue, but it is something that would need to be defined and set in stone before any such phase is implemented.

Other implementation issues would involve the format of the testing. How long would it last? Would be be having any organized testing events/tournaments in order to get good data? And if we are doing anything, how would we address the inherent pitfall of non-forum activity: timezones? I'm sure there are many other such questions, but I think I have typed enough for one post.

Suffice it to say that I am not totally sold on the idea of a testing phase, for a variety of reasons, but I think that if we want to do it, it certainly could be done decently. There would just be a lot we would need to address first.
 

MrDollSteak

CAP 1v1 me IRL
is a Forum Moderatoris a Community Contributor
Moderator
So after thinking about this proposal more deeply, as a result of the conversations surrounding the cascading CAP process and the questions about the metagame, I started thinking generally about how these three threads can work in tandem to promote more people playing the CAP metagame itself, and to spend time there after processes are finished. I think that this thread, and specifically the idea of moving from version 1.0 of a CAP to 2.0, is the best place where this can happen.

I haven't fully thought about all of the logistics, and my suggestion will of course need further discussion about how to tighten it up, but I think having some kind of official 'Playtest' or 'Analysis' phase that occurs after the release of version 1.0 and before, or perhaps simultaneous to, the creation of the prevolution's moveset, where community members can discuss and debate whether the CAP is successful, undertuned, or overtuned. The reason I suggest this, is because in my anecdotal opinion, the Clefable suspect test was the time in which I played against the most people on ladder. There was a specific purpose to laddering, through tiering up and hitting GXE, that encouraged more people to experiment, and of course, with a specific Pokemon. Now of course, because this isn't a suspect test, the actual voting requirements wouldn't need to be as high, but I think some nominal kind of requirement for people to hit certain games played and win-loss ratios with the 1.0 CAP, will incentivise people to actually play the CAP ladder more frequently, allowing for greater traffic and interest, as well as allowing for modifications to the CAP in question being evidence based. I think that in regards to the modifications, as Jas quite rightly points out, we will need to make clear what can be voted on. I think that they should in general be less wide ranging than the official nerfs, and should subsequently primarily be focused on movepool modifications rather than touching abilities and stats. I think this would be less invasive overall, and subsequently ideal, as it is already under discussion with the creation of the final movepools through the prevolution process.
 

Birkal

We have the technology.
is a Top Artistis a Top CAP Contributoris a Top Smogon Media Contributoris a Site Content Manager Alumnusis a Battle Simulator Admin Alumnusis a Super Moderator Alumnusis a Community Contributor Alumnus
While we are not doing cascading CAPs as a result of the community consensus, this proposal runs into a bit more difficulty being fully realized. This proposal will add significantly more time to the end of the process without cascading, which is already difficult to time without GameFreak manages their releases. Therefore, I recommend we continue the pattern we currently have, where we have unofficial testing with CAP participants on custom PS servers that is ancillary to the process. Having playtest tournaments will be discussed in another thread, and edits are something that we already do with the CAP metagame council.

I'm going to leave this thread open, if anyone has anything they'd like to add, or an idea that was presented here that we should really move forward with. But otherwise, let's keep on doing what we're doing and improve upon the pre-existing framework.
 

MrDollSteak

CAP 1v1 me IRL
is a Forum Moderatoris a Community Contributor
Moderator
I'm a bit sad to see that this proposal stall. I think that there is a way to make this proposal work without needing to cascade CAP projects. The playtest or analysis stage that I mentioned in the previous post could act as a replacement for the current 'full movepool' stage for example, working in conjunction with the prevolution proposal. The final movepool could then be moved into the prevolution stage, and both be handled simultaneously to ensure consistency (while also allowing for greater freedoms in terms of what flavour movepools the prevolutions receive). In this regard I don't think we would take much more time, and as mentioned previously, can allow for the model to come out in the 2.0 stage once the prevolution is resolved, to give more time to modellers. I'd be curious to see what others think about the proposal for an official playtest stage, especially since the TLT nominations are coming up, but I would envision that running it like a suspect test would be the best way of identifying problematic moves. I think this is important because it gives greater power for the community to determine any nerfing processes, especially with Astrolotl looking increasingly in need of one due to a few overbearing moves like Toxic and Knock Off that were only identified as a result of the replays and tournament matches of CAPPL.
 

quziel

I am the Scientist now
is a Site Content Manageris a Forum Moderatoris a Live Chat Contributoris a CAP Contributoris a Tiering Contributoris a Contributor to Smogonis a member of the Battle Simulator Staff
Moderator
I am wondering if we could possibly use a modified version of this proposal;

Implement the cap onto a server after a draft of movesets is done, and run a live tournament with it, and use the results to refine the accepted movesets.

While a full 2-week long laddering period would provide a lot of relevant info to the project, there is the issue of significantly extending project timelines. A live tournament on a test server would still give us significant info, significant replays, and at least some metagame info we can use without incurring any time penalty at all. My proposal, to fully explain the above, is to come to some draft version of the accepted movesets, run a live tournament with them, and then to refine and remove/add after we get all the info we can get.

Edit:
The issue here is thatt to fully explore a mon takes a long time, and to fully adjust to it takes a long time as well. As we saw in Astrolotl, the full force of the Knock+Toxic+Spikes set wasn't really apparent until significant metagame developments occured, making something that seemed fine on test turn out to actually be op.

Ideally we'd have a multi-layered approach like: movesets draft => live tour => movesets finalize => release => forum tour => further editing.
 

Mx

is a Forum Moderatoris a Community Contributoris a CAP Contributor
Moderator
Given that CAP 28 is already underway, I would love to finalize this soon. I think out of all the proposals currently in PRC, this is the one I'd like to apply the most to our current project, as it would help us to achieve our goal of finishing the current project before the Crown Tundra releases, and I believe it would significantly improve the quality of our final product. Here's the basic idea I would like to implement for CAP 28:
  1. After finalizing moveset submissions, release CAP 28 with a placeholder movepool, marking the 1.0 release of CAP 28.
  2. Begin a CAP 28 Playtest Tournament to test this new product.
  3. After a month or so has passed, implement a Post-Play Lookback thread, in which we would discuss possible changes to CAP 28, mostly using the model that GMars proposed back in the thread linked.
  4. After finalizing this discussion, make all the formal announcements marking the 2.0 release of CAP 28.
A few notes about things that are not immediately obvious from above:

The proposed "placeholder movepool" should include most flavor moves that might be associated with the current project that have not been explicitly banned by the TLT. I think the most elegant solution to implement this would be to base it solely on our typing, making a pre-determined list of moves associated with each type, which would include all non-signature moves of that type, alongside a number of other moves that we might consider to be "associated" with that type (For example Ice Beam for Water-types, Solar Beam for Fire-types, or Sunny Day for Grass-types). This list would be supposed to be as inclusive as possible, and after the 2.0 release that would include an actual full movepool, many of these may be deleted for not really fitting with the flavor. The purpose of including these moves in the 1.0 release is to be able to catch any possible competitive interaction that might not be desirable, like Astrolotl's controversial Fire Spin set.

The full movepool discussion would run right after the 1.0 release, alongside the playtest tournament. I don't think this should affect how this stage goes too much. Also, in case this wasn't clear, people would still be able to submit moves outside of the initial placeholder movepool as long as they don't have any competitive implications. These additional moves would be added in the 2.0 release.

As for the Playtest Tournament, its Signups should start 2 weeks before the 1.0 release, so that we don't waste any time, and ideally they should be promoted as much as possible to maximize the number of player. I'm not completely sure of what format we should use for this tournament, but I think at least for this time, we should go with a double elimination tournament, as this would host more games than a single elimination one. We could also go with the Swiss style tournament that quziel proposed, although I worry that on later rounds, people that can no longer win will drop out in mass with that format. EDIT: after a brief discussion with quziel on Discord, I now believe than a Swiss style tournament with playoffs would be the best way to go here. It's likely that this tournament will last after the Post-Play lookback ends and in this case, the later rounds would be a great time to see any changes that might have been implemented in action.

Of course, as I imagine that many of you already have noticed, with these new additions, it's likely that The Crown Tundra will be released by the time we finish with the 2.0 release, but I think at that point, the proposed Playtest Tournament should have given us some very good results about how CAP 28 performs on its intended metagame and how well we accomplished our concept. For these reasons, I think that the positives of implementing this proposal would greatly outweight the negatives.
 
Last edited:

Birkal

We have the technology.
is a Top Artistis a Top CAP Contributoris a Top Smogon Media Contributoris a Site Content Manager Alumnusis a Battle Simulator Admin Alumnusis a Super Moderator Alumnusis a Community Contributor Alumnus
Sorry we didn’t post sooner, but we will move forward with Mx and his proposal. The gears have already been turning on this, so big thanks to him for getting things going. We will use this thread after CAP28 to reflect on this release schedule, and choose whether or not to implement it for future CAPs.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top