Really interesting stuff there, obi. For the scoring system, you could try adopting the win probability format, similar to what baseball sabermetricians use (0% for a sure loss, 100% for a sure win, 50% for an even match). Percentages may get messy depending how precise the numbers get though. It should function well as a starting point and can be scaled later for aesthetic purposes.
Looking at "endgame" scenarios like 1v1 or 2v1 is probably the easiest place to start IMO. At the very least, you avoid the mystery Pokemon factor that you pointed out in the first post. But even then, with the multitude of possible moves on some Pokemon, getting that initial evaluation will be tough. Perhaps some weighting of move usage percentage from Shoddy statistics is something to consider?
4639 6767 1236
0217 3425 8379
1636 2204 6115