I recently participated in a panel discussion at the annual meeting of the California Postsecondary Education Commission (CPEC) for recipients of Improving Teacher Quality Grants. We were discussing the practical challenges of conducting what has been dubbed scientifically-based research (SBR). While there is some debate over what types of research should fall under this heading, SBR almost always includes randomized trials (experiments) and quasi-experiments (close approximations to experiments) that are used to establish whether a program made a difference.
SBR is a hot topic because it has found favor with a number of influential funding organizations. Perhaps the most famous example is the US Department of Education, which vigorously advocates SBR and at times has made it a requirement for funding. The push for SBR is part of a larger, longer-term trend in which funders have been seeking greater certainty about the social utility of programs they fund.
However, SBR is not the only way to evaluate whether a program made a difference, and not all evaluations set out to do so (as is the case with needs assessment and formative evaluation). At the same time, not all evaluators want to or can conduct randomized trials. Consequently, the push for SBR has sparked considerable debate in the evaluation community.
In the course of this debate, proponents of SBR have frequently referred to randomized trials as the “gold standard” in research. As Love (2003) points out, this is an unfortunate metaphor-the gold standard is an obsolete and abandoned method of establishing the value of nation’s currency, not a reference to high quality (like a gold medal) or intrinsic value (like something made of solid gold). Nonetheless, the metaphor has caught on and its effect has been to set randomized trials on a pedestal and overstate their utility to some degree.
Perhaps a more useful approach would be to admit (with apologies to Winston Churchill) that randomized trials are the worst method for establishing that a program made a difference, except for every other method. This is not to say we should abandon randomized trials. On the contrary, we should use them whenever warranted to establish this sort of program-caused-outcome relationship, but in doing so we should admit that SBR has its limits and conduct randomized trials with our eyes open.
What are the limitations of SBR and do they completely undermine it? During our discussion at the CPEC annual meeting, panel member Robert Calfee posed the question of whether the gold standard is nothing more than fool’s gold. It is a good question, one that Pringle and Churchill (1995) addressed on a practical level and Kaptchuk (2001) investigated on a more technical level (in the latter case, raising the metaphorical stakes by asking if the gold standard was in fact a golden calf).
Unfortunately, I do not believe we fully appreciate the limitations of SBR in general and randomized trials in particular, especially when they are undertaken in situ as most evaluations demand. At the same time, we do not undertake them often enough nor get all that we can out of them.
When the discussion turns to how we can go about this, the emphasis is usually on the statistical (How can we fix problems with our evaluation using sophisticated statistical methods?) rather than the logistical (How do we prevent problems in the first place?). And the most difficult part of science is never discussed-accepting that we evaluate programs because their effectiveness is an open question and the possibility exists that they might not live up to their promise.
This stance can be difficult to embrace. After all, we want the programs we work with to be successful and the people they serve to be better off. Yet difficult as it may be, it is this stance-not a research method-that is the cornerstone of scientifically-based research and the root of our desire to conduct the most rigorous evaluations possible.