To explain how I came to the above results, I first publicly linked a survey form in
this post. There were a total of 14 respondents to the survey form:
Mr.378,
CKW, me,
FriendOfMrGolem120,
Eeveeto,
Beelzemon 2003,
Eo Ut Mortus,
Waveshaper,
Isa,
zf,
M Dragon,
Watchog,
Jorgen, and
Blightbringer. The form contained a question asking for the number of GSC Ubers games played as a means of determining experience in the tier and deciding how to weight the responses when determining the final results. Initially I used the upper end of # of games played as a raw weighting, i.e. my rankings counted as though 200 people had submitted them, while Isa's counted as 100, etc. I then received feedback saying that it would be better to use the square root of this number instead to reduce the degree of bias towards people that had played a huge number of games in the tier. I was also originally adding a flat 1 extra weighting to all submissions, meaning even though Eo Ut Mortus had never played GSC Ubers before, his ratings would influence the outcome. Upon receiving further feedback, this was removed. I then compared the rankings with and without outliers with
vapicuno. He showed me how outliers were affecting the outcome due to the relatively small number of respondents and wildly different ratings from Beelzemon 2003. With outliers removed, the results look a lot cleaner and I am confident we have come to the optimal result possible from this data set.
This time, there were material differences between the rankings with and without outliers. You can see what changed by comparing the Weighted_Result with the Weighted_OR_Result sheet in the
spreadsheet.
This form was a big improvement on the previous google form used in the OU VR, but one thing I will clarify next time is when to use the "N/A" option--some people used it for Pokemon they thought didn't deserve to be ranked, while others used it when they had no experience using or facing the Pokemon in question.
With
vapicuno's help yet again, here are some nice pictures to show everyone's rankings and how I split up the tiers above. Everything below is using the
weighted results, with
outliers removed. The charts have error bars based on
weighted standard deviation.
Everyone's rankings compared with the result and the Old VR (which didn't rank Pokemon within sub-ranks so is a bit misleading)
Chart of each Pokemon's ranking against their weighted average ranks (i.e. if they were very close in ranking, they will be close together on the y axis)
Same chart as above but with coloured boxes showing where I drew the divisions between ranks in the OP
Please let me know if you have any questions and I will do my best to answer.