OK, so, I meant to respond to a lot of this last night, but didn't get a chance to really sit down at a computer. Anyhow, here are my thoughts. I'll try to hit some of your points one by one as well, Mazz.
Let me start by saying that overall I like the idea of +2/-1 and the potential for multiple challenges instead of +3 for only a single challenge. I'm not sure I'm thrilled with dropping effectiveness as a judging category entirely in favor of basing it off of victories or W/L, though. I do have a few concerns in general that I would want to consider before making the move, though.
1. Disadvantaging contestants who aren't on as much. I like the idea of having the option to challenge multiple Iron Chefs, but in the end I want someone who can only manage to schedule a game with a single Iron Chef to still have a chance of succeeding. With time zones and availability as factors, I don't want to complicate things in a way that prevents people from participating, or makes people who can only manage to schedule a single battle feel like there's no way they can win against people who have the time to schedule four games. I tried to pick Iron Chefs who are regularly on, but still, it's tough.
2. Overburdening the Iron Chefs. So, with three days still left in teambuilding, we've had 13 matches so far. I also have 3 people that have submitted teams to me but have not yet scheduled matches. Even if no one else submits a team this week, that means we should still expect a minimum of 16 participants. With four Iron Chefs that's not so bad - four matches a piece or thereabouts depending on scheduling. It also means that a chef doesn't have to drop everything and battle any time a challenger is on, since usually someone else can. But if we're not just allowing but encouraging multiple battles, we'd be talking about sixteen battles a pop for each Iron Chef. That's... a lot to schedule in a week. I have a feeling that it would be pretty hard to retain good, competitive Iron Chefs if they were expected to battle so much. It's especially true because a lot of the best battlers are already involved in several other projects (for example, every single Iron Chef is currently involved in at least some capacity with UUPL, and most of them have multiple other projects they're working on as well).
3. Overcomplicating things. The point of this is to have fun, and to get people involved. On the one hand I like the idea of a really exact mechanism to determine which team is the best. On the other hand, things are pretty simple right now: PM me a team, battle an Iron Chef, post a replay. Any potential judges have it pretty simple as well: read the team and watch the replay, pick a number based on how well they did, add a sentence or two explaining their vote. It's not the end of the world to make things more complicated, but I'd like to keep it relatively simple so that people still have fun with it.
Anyhow, onto some specific points...
Winning can only snag you 3 points, and I (as well as some others) are confused by this. Winning, by direct extension, is evidence of effectiveness. If your team does not win, your team is not effective, or not effective enough. While I understand that players with more skill can get more mileage on a less-effective team in comparison to a new user with that same team, wins and losses are the easiest and most direct way to measure a team's effectiveness.
Honestly, my first thought had been to base 100% of the scoring off of judging, since I was originally inspired by the Iron Chef show. The Iron Chef battles were just to add a fun, competitive element and let the contestants show off their teams. From talking to the Iron Chefs and some other folks about the idea, though, it was decided that there wasn't really any incentive for beating an Iron Chef. So after some back and forth, I decided on 3 points - high enough that it's a big advantage for anyone who wins their battle, but low enough that it's not a guarantee.
"Effectiveness" is now worth up to a bonus of 5 points and shall be based on the scores of matches. If the user is only winning matches by a hair, less or zero points are awarded. If the user blows the current Iron Chefs out of the water, then more or all 5 points are rewarded.
I'm not sure I love this measure of effectiveness. For one, it means HO teams are at a disadvantage, since they tend to play "closer" (ie, saccing mons to avoid losing momentum instead of pivoting out, so that the final score is usually closer even if they never lost control throughout the match). For another, it means that there's no real way to give someone a bonus or penalty for something that didn't come up in their matches. For example, let's say a hypothetical team is mega-weak to Chandelure. If none of the Iron Chef battles feature a Chandelure, then the competitor more or less lucks out in that he won't get penalized for it. By allowing judges to vote on effectiveness (and provide reasoning for their vote, like a brief RMT), though, things like that can be caught, and the team can be improved. Finally, as you mentioned, part of the goal of having scored judging instead of just having people pick their favorite team is so that teams and teambuilding can be improved. Taking away effectiveness as a scored category means we're taking away feedback on what, to me, is the most important thing to provide feedback on.
The new point system would look like this: a-5-5-b, with "a" being any integer value between -4 and 8 (based on W/L record), and "b" being any whole number between 0-5 (based on user effectiveness).
Makes sense, but see above concerns with over-complicating things and removing effectiveness as a scored category.
EDIT: Essentially, the current process is very arbitrary. How do you judge a team's effectiveness based on one match? You must consider match-up, alternate threats, and even user effectiveness. That is mighty difficult to observe in one match, and allowing for more than one fixes this.
True - it
was arbitrary, in that I made it up :P And yeah, I agree that it's difficult to solely assess a team's effectiveness based on a single match. I'm hoping that judges think about other matchups, common threats, etc., when providing scoring.
In order to keep users from racking up points, the current Iron Chefs must be better. I don't think the burden of not racking up points is to be put on the entrant here - these four players were chosen because they are a cut above the rest. It's time for them to step up and prove it. To alleviate your issue with players racking up points, it should be up to the four Iron Chefs to beat the user. If certain users prove to be too good for the current batch of chefs, it should be up to you, the chairman, to cut the weakest link and offer this better user a place among the Iron Chefs. Such a change would of course require new requirements for what it takes to be an Iron Chef, something you can iron out later.
As of right now I'm pretty happy with my Iron Chefs. They've provided some excellent battles, remained active on the thread, and stayed consistently available for match-ups. They're also available across a wide variety of times of day, which is a concern. And they're all well known in one way or another, which makes it fun to challenge them for most users.
That said, there IS a mechanism built in already to keep the Iron Chefs competitive: anyone who makes it into the Hall of Fame twice can choose to become an Iron Chef.
EDIT 2: Also, in terms of evaluation by the Iron Chefs (and Chairman?), what will the evaluation look like? If a user gets a near-perfect to perfect score, I can see praise being thrown around, but what about those who fare poorly? Will the four Chefs provide teambuilding tips for the future and "rate" the team? This is something I'd like to see happen if this wasn't the plan, as it encourages better competition in later rounds.
My plan is to provide a few sentences on why I give each score, whether good, poor or mediocre. I don't plan on doing a full RMT or anything, but I do want to provide actual feedback.
Because I want to encourage as many judges to participate as possible, I didn't make it a requirement to provide a ton of info on the scoring, but instead planned to lead by example. Hopefully that will work out, but it'll be tough to say until we actually get to the scoring portion of the contest.