In the opening plenary of the Evaluation 2010 conference, AEA President Leslie Cooksy invited three leaders in the field—Eleanor Chelimsky, Laura Leviton, and Michael Patton– to speak on The Tensions Among Evaluation Perspectives in the Age of Obama: Influences on Evaluation Quality, Thinking and Values. They covered topics ranging from how government should use evaluation information to how Jon Stewart of the Daily Show outed himself as an evaluator during his Rally to Restore Sanity/Fear (“I think you know that the success or failure of a rally is judged by only two criteria; the intellectual coherence of the content, and its correlation to the engagement—I’m just kidding. It’s color and size. We all know it’s color and size.”)
One piece that resonated with me was Laura Leviton’s discussion of how the quality of an evaluation is related to our ability to apply its results to future programs—what is referred to as generalization. She presented a graphic that described a possible process for generalization that seemed right to me; it’s what should happen. But how it happens was not addressed, at least in the short time in which she spoke. It is no small task to gather prior research and evaluation results, translate them into a small theory of improvement (a program theory), and then adapt that theory to fit specific contexts, values, and resources. Who should be doing that work? What are the features that might make it more effective?
Stewart Donaldson and I recently co-authored a paper on that topic that will appear in New Directions for Evaluation in 2011. We argue that stakeholders are and should be doing this work, and we explore how the logic underlying traditional notions of external validity—considered by some to be outdated—can be built upon to create a relatively simple, collaborative process for predicting the future results of programs. The paper is a small step toward raising the discussion of external validity (how we judge whether a program will work in the future) to the same level as the discussion of internal validity (how we judge whether a program worked in the past), while trying to avoid the rancor that has been associated with the latter.
More from the conference later.
Evaluation in the Post-Data Age: What Evaluators Can Learn from the 2012 Presidential Election
Stop me if you’ve heard this one before. An evaluator uses data to assess the effectiveness of a program, arrives at a well-reasoned but disappointing conclusion, and finds that the conclusion is not embraced—perhaps ignored or even rejected—by those with a stake in the program.
People—even evaluators—have difficulty accepting new information if it contradicts their beliefs, desires, or interests. It’s unavoidable. When faced with empirical evidence, however, most people will open their minds. At least that has been my experience.
During the presidential election, reluctance to embrace empirical evidence was virtually universal. I began to wonder—had we entered the post-data age?
The human race creates an astonishing amount of data—2.5 quintillion bytes of data per day. In the last two years, we created 90% of all data created throughout human history.
In that time, I suspect that we also engaged in more denial and distortion of data than in all human history.
The election was a particularly bad time for data and the people who love them—but there was a bright spot.
On election day I boarded a plane for London (after voting, of course). Although I had no access to news reports during the flight, I already knew the result—President Obama had about an 84% chance of winning reelection. When I stepped off the plane, I learned he had indeed won. No surprise.
How could I be so certain of the result when the election was hailed as too close to call? I read the FiveThiryEight blog, that’s how. By using data—every available, well-implemented poll—and a strong statistical model, Nate Silver was able to produce a highly credible estimate of the likelihood that one or the other candidate would win.
Most importantly, the estimate did not depend on the analysts’—or anyone’s—desires regarding the outcome of the election.
Although this first-rate work was available to all, television and print news was dominated by unsophisticated analysis of poll data. How often were the results of an individual poll—one data point—presented in a provocative way and its implications debated for as long as breath and column inches could sustain?
Isn’t this the way that we interpret evaluations?
News agencies were looking for the story. The advocates for each candidate were telling their stories. Nothing wrong with that. But when stories shape the particular bits of data that are presented to the public, rather than all of the data being used to shape the story, I fear that the post-data age is already upon us.
Are evaluators expected to do the same when they are asked to tell a program’s story?
It has become acceptable to use data poorly or opportunistically while asserting that our conclusions are data driven. All the while, much stronger conclusions based on better data and data analysis are all around us.
Do evaluators promote similar behavior when we insist that all forms of evaluation can improve data-driven decision making?
The New York Times reported that on election night one commentator, with a sizable stake in the outcome, was unable to accept that actual voting data were valid because they contradicted the story he wanted to tell.
He was already living in the post-data age. Are we?
6 Comments
Filed under Commentary, Evaluation, Evaluation Quality, Program Evaluation
Tagged as evaluation, evaluations, FiveThirtyEight Blog, How to Lie with Statistics, Nate Silver, obama, Program Evaluation