EvalBlog

October 16, 2013 · 8:06 am

On the Ground at AEA #1: Tina and Rodney

Rodney Hopson, Professor, George Mason University (Past President of AEA)

I’m plotting. I’m always plotting. That’s how you make change in the world. You find the opportunities, great people to work with, and make things happen.

Tina Christie, Professor, UCLA

I’ve just finished three years on the AEA board with Rodney. The chance to connect with colleagues like Rodney–work with them, debate with them, laugh with them–is something I look forward to each year. It quickly starts to feel like family.

Leave a comment

Filed under AEA Conference, Conference Blog, Evaluation, Program Evaluation

Tagged as AEA Conference, American Evaluation Association, evaluation, evaluations, Program Evaluation

October 16, 2013 · 4:06 am

Confessions of a Conference Junkie

It’s true—I am addicted to conferences. While I read about evaluation, write about evaluation, and do evaluations in my day-to-day professional life, it’s not enough. To truly connect to the field and its swelling ranks of practitioners, researchers, and supporters, I need to attend conferences. Compulsively. Enthusiastically. Constantly.

Over the past few months, I was honored to be the keynote speaker at the Canadian Evaluation Society conference in Toronto and the Danish Evaluation Society in Kolding. Over the past two years I have been from Helsinki to Honolulu to speak, present, and give workshops. The figure below shows some of that travel (conferences indicated with darker circles, upcoming travel with dashed lines).

But today is special—it’s the first day of the American Evaluation Association conference in Washington, DC. If conferences were cities, this one would be New York—big, vibrant, and international.

And this year, in addition to my presentations, receptions, and workshops (here and here), I will attempt to do something I have never done before—blog from the conference.

EvalBlog has been quiet this summer. Time to make a little digital noise.

Leave a comment

Filed under AEA Conference, Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Tagged as AEA Conference, American Evaluation Association, evaluation, evaluations, Program Design, Program Evaluation

May 7, 2013 · 1:59 pm

Evaluation Across Boundaries—Literally and Metaphorically

Cross-posted at the Canadian Evaluation Society Conference website.

………………………………………………………………………………………………………………………………………………

A few weeks ago, the New York Times reported that the United States Department of Homeland Security was in the midst of an evaluation failure. Since 2010, the Department has been struggling to develop a measure of border security that would help Congress evaluate and improve immigration policies. Senior officials reported to Congress that the Department “had not completed the new measurements and were not likely to in coming months.” This could delay comprehensive immigration reform legislation, which would have vast political, legal, economic, and social consequences.

The State of Modern Evaluation Practice

This is a cautionary tale of the state of modern evaluation practice. It represents a situation in which stakeholders believe that evaluation can improve social change efforts—immigration policies that almost all stakeholders consider flawed—yet evaluation has not. The reasons are complex, touching on long-discussed themes of use, politics, stakeholder inclusion, and methods. However, an important consequence has not been widely discussed—whether in the face of evaluation failure stakeholders will continue to believe that evaluation can improve society for the better.

A Shared Belief

If there is one thing that holds evaluators together as a community it is our shared belief that our work matters. This is more than a belief in the importance of evaluation use. It is a belief about impact. Our impact. We are willing to believe in the impact of our work in the absence of evidence. Should we expect others to do the same? We should not. Nor should we stop believing. We should respond by adapting our practice in ways that are more likely to achieve impact and demonstrate that we have. I call this the new practice of evaluation, and it is emerging in exciting ways in unexpected places.

Shaping Evaluation for the Future

When I give my keynote at the Canadian Evaluation Society Conference (June 9-12), I will be discussing the new practice of evaluation. The conference theme is Evaluation Across Boundaries, a metaphorical hook that is literally what the new practice of evaluation is advancing—evaluators crossing boundaries to become change makers, program designers, and market engineers. I will describe:

how this new practice is taking form
how it is disrupting evaluation practice today, and
how it may shape evaluation practice in the future.

These are principally undirected efforts. Should we—collectively as a profession and individually as practitioners—attempt to influence them? If so how? To what end?

Be Part of the Discussion

I cannot claim to have the answers to these questions. But I want you to be a part of the discussion. Join us in Toronto, let your voice be heard, and help define what evaluation practice will be.

What, Where & When

My keynote address “The New Practice of Evaluation: Crossing Boundaries, Creating Change” on Wednesday June 12, 2013 at 8.30 am, directly following the Thematic Breakfast.

Leave a comment

Filed under Conference Blog, Evaluation, Program Evaluation

February 20, 2013 · 12:47 pm

San Francisco Bay Area Evaluators (SFBAE) Celebrate “EVAL-entine’s” Day

It started a few years back when some evaluators jokingly suggested there should be a national holiday to honor evaluation. Well, that sounded like a pretty good idea to me, so last year I invented a holiday—EVALentine’s Day—that is celebrated throughout the month of February.

A day (February 15) celebrated for an entire month? Think of it like a jubilee year for the Queen.

EVALentine’s Day is an excuse to share your love of evaluation with colleagues, introduce evaluation to those who are unfamiliar with it, and connect with evaluators in your area.

That’s exactly what the members and friends of San Francisco Bay Area Evaluators (SFBAE) did at their recent EVALentine’s Day meet up in Berkeley, California. SFBAE is a local affiliate of the American Evaluation Association. Why not find a local affiliate near you and share the love?

If you live in San Francisco, the East Bay, the South Bay, Marin, Sacramento, or neighboring areas, visit www.sfbae.org or our new LinkedIn group page to learn about upcoming events.

Hope to see you there.

3 Comments

Filed under Evaluation

Tagged as AEA, AEA Local Affiliates, American Evaluation Association, EVALentine's Day, San Francisco Bay Area Evaluators, SFBAE

February 13, 2013 · 2:30 pm

Measurement Is Not the Answer

Bill Gates recently summarized his yearly letter in an article for the Wall Street Journal entitled My Plan to Fix the World’s Biggest Problems…Measure Them!

As an evaluator, I was thrilled. I thought, “Someone with clout is making the case for high-quality evaluation!” I was ready to love the article.

To my great surprise, I didn’t.

The premise of the piece was simple. Organizations working to change the world should set clear goals, choose an approach, measure results, and use those measures to continually refine the approach.

At this level of generality, who could disagree? Certainly not evaluators—we make arguments like this all the time.

Yet, I must—with great disappointment—conclude that Gates failed to make the case that measurement matters. In fact, I believe he undermined it by the way he used measurements.

Gates is not unique in this respect. His Wall Street Journal article is just one instance of a widespread problem in the social sector—confusing good measures for good inference.

Measures versus Inference

The difference between measures and inferences can be subtle. Measures quantify something that is observable. The number of students who graduate from high school or estimates of the calories people consume are measures. In order to draw conclusions from measures, we make inferences. Two types of inference are of particular interest to evaluators.

(1) Inferences from measures to constructs. Constructs—unobservable aspects of humans or the world that we seek to understand—and the measures that shed light on them are not interchangeable. For example, what construct does the high school graduation rate measure? That depends. Possibly education quality, student motivation, workforce readiness, or something else that we cannot directly observe. To make an inference from measure to construct, the construct of interest must be well defined and its measure selected on the basis of evidence.

Evidence is important because, among other things, it can suggest whether many, few, or only one measure is required to understand a construct well. By using the sole measure of calories consumed, for example, we gain a poor understanding of a broad construct like health. However, we can use that single measure to gain a critical understanding of a narrower construct like risk of obesity.

(2) Inferences from measures to impacts. If high school graduation rates go up, was it the result of new policies, parental support, another reason left unconsidered, or a combination of several reasons? This sort of inference represents one of the fundamental challenges of program evaluation, and we have developed a number of strategies to address it. None is perfect, but more often than not we can identify a strategy that is good enough for a specific context and purpose.

Why do I think Gates made weak inferences from good measures? Let’s look at the three examples he offered in support of his premise that measurement is the key to solving the world’s biggest problems.

Example 1: Ethiopia

Gates described how Ethiopia became more committed to providing healthcare services in 2000 as part of the Millennium Development Goals. After that time, the country began tracking the health services it provided in new ways. As evidence that the new measurement strategy had an impact, Gates reported that child mortality decreased 60% in Ethiopia since 1990.

In this example, the inference from measure to impact is not warranted. Based on the article, the sole reason to believe that the new health measurement strategy decreased child mortality is that the former happened before the latter. Inferring causality from the sequential timing of events alone has been recognized as an inferential misstep for so long that it is best known by its Latin name, post hoc ergo propter hoc.

Even if we were willing to make causal inferences based on sequential timing alone, it would not be possible in this case—the tracking system began sometime after 2000 while the reported decrease in child mortality was measured from 1990.

Example 2: Polio

The global effort to eradicate polio has come down to three countries—Nigeria, Pakistan, and Afghanistan—where immunizing children has proven especially difficult. Gates described how new measurement strategies, such as using technology to map villages and track health workers, are making it possible to reach remote, undocumented communities in these countries.

It makes sense that these measurement strategies should be a part of the solution. But do they represent, “Another story of success driven by better measurement,” as Gates suggests?

Maybe yes, maybe no—the inference from measure to impact is again not warranted, but for different reasons.

In the prior example, Gates was looking back, claiming that actions (in the past) made an impact (in the past) because the actions preceded the impact. In this example, he made that claim that ongoing actions will lead to a future impact because the actions precede the intended impact of eradicating polio. The former was a weak inference, the latter weaker still because it incorporates speculation about the future.

Even if we are willing to trust an inference about an unrealized future in which polio has been eradicated, there is another problem. The measures Gates described are implementation measures. Inferring impact from implementation may be warranted if we have strong faith in a causal mechanism, in this case that contact with remote communities leads to immunization which in turn leads to reduction in the transmission of the disease.

We should have strong faith in second step of this causal mechanism—vaccines work. Unfortunately, we should have doubts about the first step because many who are contacted by health workers refuse immunization. The Bulletin of the World Health Organization reported that parental refusal in some areas around Karachi has been widespread, accounting for 74% of missed immunizations there. It is believed that the reasons for the refusals were fear related to safety and the religious implications of the vaccines. New strategies for mapping and tracking cannot, on the face of it, address these concerns.

So I find it difficult to accept that polio immunization is a story of success driven by measurement. It seems more like a story in which new measures are being used in a strategic manner. That’s laudable—but quite different from what was claimed.

Example 3: Education

The final example Gates provided came from the foundation’s $45 million Measures of Effective Teaching (MET) study. As described in the article, the MET study concluded that multiple measures of teacher effectiveness can be used to improve the way administrators manage school systems and teachers provide instruction. The three measures considered in the study were standardized test scores (transformed into controversial units called value-added scores), student surveys of teacher quality, and scores provided by trained observers of classroom instruction.

The first problem with this example is the inference from measures to construct. Everyone wants more effective teachers, but not everyone defines effectiveness the same way. There are many who disagree with how the construct of teacher effectiveness was defined in the MET study—that a more effective teacher is one who promotes student learning in ways that are reflected by standardized test scores.

Even if we accept the MET study’s narrow construct of teacher effectiveness, we should question whether multiple measures are required to understand it well. As reported by the foundation, all three measures in combination explain about 52% of the variation in teacher effectiveness in math and 26% in English-language arts. Test scores alone (transformed into value-added scores) explain about 48% and 20% of the variation in the math and English-language arts, respectively. The difference is trivial, making the cost of gathering additional survey and observation data difficult to justify.

The second problem is inference from measures to impact. Gates presented Eagle County’s experience as evidence that teacher evaluations improve education. He stated that Eagle County’s teacher evaluation system is “likely one reason why student test scores improved in Eagle County over the past five years.” Why does he believe this is likely? He doesn’t say. I can only respond post hoc ergo propter hoc.

So What?

The old chestnut that lack of evidence is not evidence of lacking applies here. Although Gates made inferences that were not well supported by logic and evidence, it doesn’t mean he arrived at the wrong conclusions. Or the right conclusions. All we can do is shrug our shoulders.

And it doesn’t mean we should not be measuring the performance and impact of social enterprises. I believe we should.

It does mean that Gates believes in the effectiveness of potential solutions for which there is little evidence. For someone who is arguing that measurement matters, he is setting a poor example. For someone who has the power to implement solutions on an unprecedented scale, it can also be dangerous.

5 Comments

Filed under Commentary, Evaluation, Evaluation Quality, Program Evaluation

Tagged as Bill & Melinda Gates Foundation, Bill Gates, bill gates annual letter, evaluation, inference, measurement, Program Evaluation, Wall Street Journal

December 23, 2012 · 3:10 pm

Above the Arctic Circle in Search of Evaluation

Earlier this year, I traveled to Helsinki to participate in the European Evalu ation Society (EES) conference. It was fantastic…but more on that in future posts.

While in Finland, I traveled above the Arctic Circle to Rovaniemi, the capital of Lapland.

This is the region where many Sami people live, re indeer are herded, and—so they claim—Santa Clause lives.

I thought…How could I pass up the chance to visit Santa Clause? More importantly, how could I pass up the opportunity to interview Santa about evaluation?

Easily, you might respond.

First, you might find the image of Santa objectionable because of its secular, religious, commercial, and/or cultural connotations. But standing above the Arctic Circle, I must admit I had no interested in unpacking the semiotics of Santa—I was far more concerned with keeping warm. (NOTE: I suggest visiting here and here if you want culturally complex places to start a full-on anthropological investigation.)

Second, you might not believe in Santa. While there is more documentary evidence for Santa than there is for Bigfoot, the evidence has something of a credibility problem. But again, in the extreme cold, I was all too ready to believe, if only to gain access to the heated halls of Santa’s castle.

Third, you might assume that Santa would know little about evaluation. That, it turns out, would be a misconception. Take a look at this excerpt from the interview that I conducted (no joshing):

ME: Santa, how do you know if you and your elves are making the world a better place?

SANTA: It is my opinion that this is something in which we all invest our little part. When we realize we can do this together, we realize we can do anything.

ME: Tell me more.

SANTA: Every year, we have millions and millions of children waiting for presents. Sometimes my young elves wonder—How is this possible? How can we accomplish this huge work? When they realize they are not alone, that we are all working together, they understand the important role they play and that we can be successful.

ME: I would assume that you have sophisticated methods to evaluate your success.

SANTA: We do…but we have had hundreds of years of experience.

What does this tell us?

Santa is a proponent of contribution analysis—understanding how one effort among many helps to advance social change.

He holds a social mechanisms perspective—that there is an explainable process through which “micro-level” individual change (the self-efficacy of elves; children receiving presents) can be transformed into “macro-level” social change (an effective workforce; a better world).

He has successfully used evaluation to foster continuous improvement over centuries.

I was impressed. And a little confused. Why was Santa so thoughtful about evaluation?

While I waited for my return train to Helsinki, I went for a coffee at the café in the train station. Throughout the world, train station cafés are dismal affairs, reflecting the power of location over the virtues of quality. Yet this is what greeted me when I sat down:

An evaluation survey! Then it hit me. Santa was the product of a larger evaluation culture. An evaluation society. So, like the train station café, he gathered data and used it to improve his work.

So I leave you with this new year wish—that in this respect we can all be a bit more Santa Clause.

6 Comments

Filed under Evaluation

November 8, 2012 · 11:21 pm

Evaluation in the Post-Data Age: What Evaluators Can Learn from the 2012 Presidential Election

Stop me if you’ve heard this one before. An evaluator uses data to assess the effectiveness of a program, arrives at a well-reasoned but disappointing conclusion, and finds that the conclusion is not embraced—perhaps ignored or even rejected—by those with a stake in the program.

People—even evaluators—have difficulty accepting new information if it contradicts their beliefs, desires, or interests. It’s unavoidable. When faced with empirical evidence, however, most people will open their minds. At least that has been my experience.

During the presidential election, reluctance to embrace empirical evidence was virtually universal. I began to wonder—had we entered the post-data age?

The human race creates an astonishing amount of data—2.5 quintillion bytes of data per day. In the last two years, we created 90% of all data created throughout human history.

In that time, I suspect that we also engaged in more denial and distortion of data than in all human history.

The election was a particularly bad time for data and the people who love them—but there was a bright spot.

On election day I boarded a plane for London (after voting, of course). Although I had no access to news reports during the flight, I already knew the result—President Obama had about an 84% chance of winning reelection. When I stepped off the plane, I learned he had indeed won. No surprise.

How could I be so certain of the result when the election was hailed as too close to call? I read the FiveThiryEight blog, that’s how. By using data—every available, well-implemented poll—and a strong statistical model, Nate Silver was able to produce a highly credible estimate of the likelihood that one or the other candidate would win.

Most importantly, the estimate did not depend on the analysts’—or anyone’s—desires regarding the outcome of the election.

Although this first-rate work was available to all, television and print news was dominated by unsophisticated analysis of poll data. How often were the results of an individual poll—one data point—presented in a provocative way and its implications debated for as long as breath and column inches could sustain?

Isn’t this the way that we interpret evaluations?

News agencies were looking for the story. The advocates for each candidate were telling their stories. Nothing wrong with that. But when stories shape the particular bits of data that are presented to the public, rather than all of the data being used to shape the story, I fear that the post-data age is already upon us.

Are evaluators expected to do the same when they are asked to tell a program’s story?

It has become acceptable to use data poorly or opportunistically while asserting that our conclusions are data driven. All the while, much stronger conclusions based on better data and data analysis are all around us.

Do evaluators promote similar behavior when we insist that all forms of evaluation can improve data-driven decision making?

The New York Times reported that on election night one commentator, with a sizable stake in the outcome, was unable to accept that actual voting data were valid because they contradicted the story he wanted to tell.

He was already living in the post-data age. Are we?

6 Comments

Filed under Commentary, Evaluation, Evaluation Quality, Program Evaluation

Tagged as evaluation, evaluations, FiveThirtyEight Blog, How to Lie with Statistics, Nate Silver, obama, Program Evaluation

October 31, 2012 · 6:52 am

Conference Blog: Evaluation 2012 (Part 1)—Complexity

I have a great fondness for the American Evaluation Association and its Annual Conference. At this year’s conference—Evaluation 2012—roughly 3,000 evaluators from around the world came together to share their work, rekindle old friendships, and establish new ones. I was pleased and honored to be a part of it.

As I moved from session to session, I would ask those I met my favorite question—What have you learned that you will use in your practice?

Their answers—lists, connections, reflections—were filled with insights and surprises. They helped me understand the wide range of ideas being discussed at the conference and how those ideas are likely to emerge in practice.

In the spirit of that question, I would like to share some thoughts about a few ideas that were thick in the air, starting with this post on complexity.

Complexity: The Undefined Elephant in the Room

The theme of the conference was Evaluation in Complex Ecologies: Relationships, Responsibilities, Relevance. Not surprisingly, the concept of complexity received a great deal of attention.

Like many bits of evaluation jargon, it has a variety of legitimate formal and informal definitions. Consequently, evaluators use the term in different ways at different times, which led a number of presenters to make statements that I found difficult to parse.

Here are a few that I jotted down:

“That’s not complex, it’s complicated.”

“A few simple rules can give rise to tremendous complexity.”

“Complexity can lead to startling simplicity.”

“A system can be simple and complicated at the same time.”

“Complexity can lead to highly stable systems or highly unstable systems.”

“Much of time people use the term complexity wrong.”

We are, indeed, a profession divided by a common language.

Why can’t we agree on a definition for complexity?

First, no other discipline has. Perhaps that is too strong a statement—small sub-disciplines have developed common understandings of the term, but across those small groups there is little agreement.

Second, we cannot decide if complexity, simplicity, and complicatedness, however defined, are:

(A) Mutually exclusive

(B) Distinct but associated

(D) All of the above

From what I can tell, the answer is (D). That doesn’t help much, does it?

Third, we conflate the entities that we label as complex, complicated, or simple. Over the past week, I heard the term complexity used to describe:

real-world structures such as social, environmental, and physical systems;
cognitive structures that we use to reason about real-world structures;
representations that we use to describe and communicate our cognitive structures;
computer models that we use to reveal the behavior of a system that is governed by a mathematically formal interpretation of our representations;
behaviors exhibited by real-world structures, cognitive structures, and computer models;
strategies that we develop to change the real world in a positive way;
human actions undertaken to implement change strategies; and
evaluations of our actions and strategies.

When we neglect to specify which entities we are discussing, or treat these entities as interchangeable, clarity is lost.

Where does this get us?

I hope it encourages us to do the following when we invoke the concept of complexity: define what we mean and identify what we are describing. If we do that, we don’t need to agree—and we will be better understood.

Leave a comment

Filed under AEA Conference, Evaluation, Program Evaluation

Tagged as AEA Conference, complexity, evaluation, Evaluation 2012, Program Evaluation

October 24, 2012 · 5:39 am

Conference Blog: The American Evaluation Association Conference About to Kickoff

It’s been a busy few months for me. I have been leading workshops, making presentations, attending conferences, and working in Honolulu, Helsinki, London, Tallinn (Estonia), and Claremont. I met some amazing people and learned a great deal about how evaluation is being practiced around the world. More about this in later posts.

This morning, I am in Minneapolis for the Annual Conference of the American Evaluation Association, which begins today. While I am here, I will be reporting on the latest trends, techniques, and opportunities in evaluation.

Today will be interesting. I lead a half-day workshop on program design with Stewart Donaldson. Then I chair a panel discussion on the future of evaluation (a topic that, to my surprise, has mushroomed from a previous EvalBlog post into a number of conference presentations and a website).

Off to the conference–more later.

Leave a comment

Filed under AEA Conference, Design, Evaluation, Program Design, Program Evaluation

Tagged as AEA Conference, evaluation, Program Design, Program Evaluation

May 25, 2012 · 12:01 am

Conference Blog: Catapult Labs 2012

Did you miss the Catapult Labs conference on May 19? Then you missed something extraordinary.

But don’t worry, you can get the recap here.

The event was sponsored by Catapult Design, a nonprofit firm in San Francisco that uses the process and products of design to alleviate poverty in marginalized communities. Their work spans the worlds of development, mechanical engineering, ethnography, product design, and evaluation.

That is really, really cool.

I find them remarkable and their approach refreshing. Even more so because they are not alone. The conference was very well attended by diverse professionals—from government, the nonprofit sector, the for-profit sector, and design—all doing similar work.

The day was divided into three sets of three concurrent sessions, each presented as hands-on labs. So, sadly, I could attend only one third of what was on offer. My apologies to those who presented and are not included here.

I started the day by attending Democratizing Design: Co-creating With Your Users presented by Catapult’s Heather Fleming. It provided an overview of techniques designers use to include stakeholders in the design process.

Evaluators go to great lengths to include stakeholders. We have broad, well-established approaches such as empowerment evaluation and participatory evaluation. But the techniques designers use are largely unknown to evaluators. I believe there is a great deal we can learn from designers in this area.

An example is games. Heather organized a game in which we used beans as money. Players chose which crops to plant, each with its own associated cost, risk profile, and potential return. The expected payoff varied by gender, which was arbitrarily assigned to players. After a few rounds the problem was clear—higher costs, lower returns, and greater risks for women increased their chances of financial ruin, and this had negative consequences for communities.

I believe that evaluators could put games to good use. Describing a social problem as a game requires stakeholders to express their cause-and-effect assumptions about the problem. Playing with a group allows others to understand those assumptions intimately, comment upon them, and offer suggestions about how to solve the problem within the rules of the game (or perhaps change the rules to make the problem solvable).

I have never met a group of people who were more sincere in their pursuit of positive change. And honest in their struggle to evaluate their impact. I believe that impact evaluation is an area where evaluators have something valuable to share with designers.

That was the purpose of my workshop Measuring Social Impact: How to Integrate Evaluation & Design. I presented a number of techniques and tools we use at Gargani + Company to design and evaluate programs. They are part of a more comprehensive program design approach that Stewart Donaldson and I will be sharing this summer and fall in workshops and publications (details to follow).

The hands-on format of the lab made for a great experience. I was able to watch participants work through the real-world design problems that I posed. And I was encouraged by how quickly they were able to use the tools and techniques I presented to find creative solutions.

That made my task of providing feedback on their designs a joy. We shared a common conceptual framework and were able to speak a common language. Given the abstract nature of social impact, I was very impressed with that—and their designs—after less than 90 minutes of interaction.

I wrapped up the conference by attending Three Cups, Rosa Parks, and the Polar Bear: Telling Stories that Work presented by Melanie Moore Kubo and Michaela Leslie-Rule from See Change. They use stories as a vehicle for conducting (primarily) qualitative evaluations. They call it story science. A nifty idea.

I liked this session for two reasons. First, Melanie and Michaela are expressive storytellers, so it was great fun listening to them speak. Second, they posed a simple question—Is this story true?—that turns out to be amazingly complex.

We summarize, simplify, and translate meaning all the time. Those of us who undertake (primarily) quantitative evaluations agonize over this because our standards for interpreting evidence are relatively clear but our standards for judging the quality of evidence are not.

For example, imagine that we perform a t-test to estimate a program’s impact. The t-test indicates that the impact is positive, meaningfully large, and statistically significant. We know how to interpret this result and what story we should tell—there is strong evidence that the program is effective.

But what if the outcome measure was not well aligned with the program’s activities? Or there were many cases with missing data? Would our story still be true? There is little consensus on where to draw the line between truth and fiction when quantitative evidence is flawed.

As Melanie and Michaela pointed out, it is critical that we strive to tell stories that are true, but equally important to understand and communicate our standards for truth. Amen to that.

The icing on the cake was the conference evaluation. Perhaps the best conference evaluation I have come across.

Everyone received four post-it notes, each a different color. As a group, we were given a question to answer on a post-it of a particular color, and only a minute to answer the question. Immediately afterward, the post-its were collected and displayed for all to view, as one would view art in a gallery.

Evaluation as art—I like that. Immediate. Intimate. Transparent.

Gosh, I like designers.