Here's what I learned from tiering in Gen 4:
Tiering decisions in Smogon are almost entirely subjective, and every "test" will be driven mostly by bandwagon thinking and popularity contests.
The first major tiering decision of the 4th generation, Garchomp, was a straight-up community vote. To prevent that vote from being a pure popularity contest, we instituted a ratings minimum for voters. Presumably, if we selected "good battlers", we would get mostly intelligent and logical voters that would make a decision based on good reasoning and competitive experience.
Almost immediately after Garchomp was banned, people started bitching about the quality of the vote and the factors driving individual decisions. People were concerned about cheaters inflating their ratings so they could vote. People complained that "voter recruiting drives" were occurring on other sites and within certain clans, and they were voting based on a mandate from others instead of their own personal decisions. Rumors flew around about voters intentionally voting one way or the other "just to fuck with the results". Some people openly argued that their vote was based on what Nintendo allowed in VGC play, or the BST total of the pokemon in question. Complaints flew in from every direction, and they all pretty much said the same thing:
"This was supposed to be a vote of skilled, intelligent, competitive pokemon players and the results would reflect what is best for the metagame based on solid competitive principles. It wasn't. This was a big clusterfuck popularity contest with very little rigor and no meaningful controls to ensure a high-quality end result."
From there, we began instituting more controls and more process to attempt to make tiering decisions less of a big uncontrolled popularity contest and more of a scientific test or legal trial. All the stages of suspect testing, the Characteristics of an Ubers, paragraph submissions, ratings thresholds, deviation requirements, suspect EXP formulas, special testing ladders, voter checkmark badges, restricted access suspect voter forums -- EVERYTHING... it was all an attempt to make a "high-quality decision" on tiering.
At the time it was all going on, I agreed with some aspects of the tiering process and disagreed with others. But, on the whole, I felt that every element of the tiering process was a reasonable way to achieve the result that it was intended to achieve.
I emphasize that for a reason, and I worded it carefully. Was each step executed to perfection? No. Did we get perfect results from each step? No. I'm saying I think that each decision regarding the process along the way was a reasonable decision for what was intended at the time it was done.
The big change in heart for me is that I no longer think we should attempt to make "high-quality" tiering decisions based on scientific rigor or legal procedures. Not because I disagree with it in principle, but because I do not think ANY process will ever prevent the result from boiling down to a big popularity contest anyway. Despite all the process and machinations of the Gen 4 tiering process, I think every vote was mostly influenced by populist thinking and a community bandwagon mentality. I can't say that for sure, that's just my read on the situation. And I suspect that the metagame we have right now would not be much different than it would have been if we had been conducting uncontrolled popularity votes all along.
That's what I learned from Gen 4 tiering.
(to answer the question posed in the title of this thread)
Going forward into Gen 5, I think we should stop trying to ensure our tiering decisions are based on logic, specific competitive experience, or even ensuring that the decision is being made by the "best" or "most qualified" people. We should just accept that this is a big popularity contest, and is driven by factors like tradition, spectacle, information and playstyles from past generations, and the publicity of opinions espoused by popular people in the most visible forums and media. If tiering is a science -- then it is political science, at best. Future tiering processes should just acknowledge that this is a big subjective popular opinion issue, and dispense with all the stuff we did in Gen 4 to try and make it a scientific test or trial.