Category Archives: Conference Blog

Curb Your “Malthusiasm”—How Evaluation Can Contribute to A Sustainable and Equitable Future

malthus_evalblogThe theme of the upcoming 2014 annual conference of the American Evaluation Association (AEA) challenges participants to consider how evaluation can contribute to a sustainable and equitable future. It’s a fantastic challenge, one that cuts to the core of why evaluation matters—its potential to promote the public good locally and globally, today and in the future.

As I prepare my presentations, I want to share some of my thoughts and encourage others to take up the challenge.

The End is Nigh(ish)

The natural and social environments in which we live have limits. Exceed them, and society puts itself at risk.

It’s a simple idea, but one that did not enter the public’s thinking until Thomas Malthus wrote about it in the late 18th century. He famously predicted that, unless something changed, the British population would soon grow too large to feed itself. As it turns out, something did change—among other things, merchants imported food—and the crisis never came to pass.

Today, Malthus is strongly—and unjustly—associated with, as Lauren F. Landsburg put it, “a pessimistic prediction of the lock-step demise of a humanity doomed to starvation via overpopulation.” This jolly point of view is sometimes referred to as Malthusianism, and applied to all forms of catastrophic environmental and social decline.

The underlying concept Malthus articulated—there are real environmental and societal limits, and real consequences for exceeding them—is not controversial. There are, however, controversial perspectives related to it, including:

  • “Malthusiasm”: A passionate belief in—bordering on enthusiasm for—the inevitability of environmental and social collapse, especially in the short term.
  • Denialism: An equally passionate belief that predictions of environmental and social disaster, like those made by Malthus, never come to pass.
  • Self-correctionism: A belief that many small, undirected changes in individual and organizational behavior, related primarily to markets and other social structures, will naturally correct for problems in complex ways that may, at first, be difficult to notice.
  • Intentionalism: A belief that intentional action at the individual, organizational, and social levels—when well planned, executed, and evaluated—can not only help avoid disaster, but produce positive benefits that serve the public good.

I reject the first two. I hope for the third. I’ve spent my life working for the fourth—and this is where evaluation can play a significant role.

From Avoiding Disaster to Promoting Sustainability

I am as much for avoiding disaster as the next guy, but—rightly or wrongly—I expect more from organized human action. Like sustainability. It’s a concept that I and others strongly believe should guide the actions of every organization. It is also a slippery concept that we have not fully defined, making it a rough guide, at best.

So, connecting ideas from various sources (and a few of my own), I’ve developed a preliminary working definition based on a set of underlying principles (in parentheses):

Actions are sustainable when they do not affect future generations adversely (futurity), social groups differentially (equity), larger social and natural systems destructively (globality), or their own objectives negatively (complexity).

I’m not fully satisfied with the definition, but so far it has helped clarify my thinking.

Why Evaluation Matters

Unfortunately, action is only weakly linked to upholding these principles, in part because there is often a lack of information about how well the principles have been (or will be) met.

That is where evaluation comes in. If we use our skills to help design the actions of commercial and social enterprises in ways that uphold these principles, we serve the public good. If we evaluate programs in ways that shed light on these principles—which would require most of us to expand our field of view—we also serve the public good.

This is why evaluation matters—because it has the potential to serve the public good—and why we need to work together to make it matter more. That would truly be evaluation for a sustainable and equitable future.


Filed under AEA Conference, Conference Blog, Evaluation, Program Design, Program Evaluation

New European Standard for Social Impact Measurement



Evaluation has truly become a global movement. The number of evaluators and evaluation associations around the world is growing, and they are becoming more interconnected. What affects evaluation in one part of the world increasingly affects how it is practiced in another.

That is why the European standard for social impact measurement, announced just a few weeks ago, is important for evaluators in the US.

According to the published report and its accompanying press release, the immediate purpose of the standard is to help social enterprises access EU financial support, especially in relation to the European Social Entrepreneurship Funds (EuSEFs) and the Programme for Employment and Social Innovation (EaSI).

But as László Andor, EU Commissioner for Employment, Social Affairs and Inclusion, pointed out, there is a larger purpose:

The new standard…sets the groundwork for social impact measurement in Europe. It also contributes to the work of the Taskforce on Social Impact Investment set up by the G7 to develop a set of general guidelines for impact measurement to be used by social impact investors globally.

That is big, and it has the potential to affect evaluation around the world.

What is impact measurement?

For evaluators in the US, the term impact measurement may be unfamiliar. It has greater currency in Europe and, of late, in Canada. Defining the term precisely is difficult because, as an area of practice, impact measurement is evolving quickly.

Around the world, there is a growing demand for evaluations that incorporate information about impact, values, and value. It is coming from government agencies, philanthropic foundations, and private investors who want to increase their social impact by allocating their public or private funds more efficiently.

Sometimes these funders are called impact investors. In some contexts, the label signals a commitment to grant making that incorporates the tools and techniques of financial investors. In others, it signals a commitment by private investors to a double bottom line—a social return on their investment for others and a financial return for themselves.

These funders want to know if people are better off in ways that they and other stakeholders believe are important. Moreover, they want to know whether those impacts are large enough and important enough to warrant the funds being spent to produce them. In other words, did the program add value?

Impact measurement may engage a wide range of stakeholders to define the outcomes of interest, but the overarching definition of success—that the program adds value—is typically driven by funders. Value may be assessed with quantitative, qualitative, or mixed methods, but almost all of the impact measurement work that I have seen has framed value in quantitative terms.

Is impact measurement the same as evaluation?

I consider impact measurement a specialized practice within evaluation. Others do not. Geographic and disciplinary boundaries have tended to isolate those who identify themselves as evaluators from those who conduct impact measurement—often referred to as impact analysts. These two groups are beginning to connect, like evaluators of every kind around the world.

I like to think of impact analysts and evaluators as twins who were separated at birth and then, as adults, accidentally bump into each other at the local coffee shop. They are delighted and confused, but mostly delighted. They have a great deal to talk about.

How is impact measurement different from impact evaluation?

There is more than one approach to impact evaluation. There is what we might call traditional impact evaluation—randomized control trials and quasi-experiments as described by Shadish, Cook, and Campbell. There are also many recently developed alternatives—contribution analysis, evaluation of collective impact, and others.

Impact measurement differs from traditional and alternative impact evaluation in a number of ways, among them:

  1. how impacts are estimated and
  2. a strong emphasis on valuation.

I discuss both in more detail below. Briefly, impacts are frequently estimated by adjusting outcomes for a pre-established set of potential biases, usually without reference to a comparison or control group. Valuation estimates the importance of impacts to stakeholders—the domain of human values—and expresses it in monetary units.

These two features are woven into the European standard and have the potential to become standard practices elsewhere, including the US. If they were to be incorporated into US practice, it would represent a substantial change in how we conduct evaluations.

What is the new European standard?

The standard creates a common process for conducting impact measurement, not a common set of impacts or indicators. The five-step process presented in the report is surprisingly similar to Tyler’s seven-step evaluation procedure, which he developed in the 1930s as he directed the evaluation of the Eight-Year Study across 30 schools. For its time, Tyler’s work was novel and the scale impressive.


Tyler’s evaluation procedure developed in the 1930s and the new European standard process: déjà vu all over again?

Tyler’s first two steps were formulating and classifying objectives (what do programs hope to achieve and which objectives can be shared across sites to facilitate comparability and learning). Deeply rooted in the philosophy of progressive education, he and his team identified the most important stakeholders—students, parents, educators, and the larger community—and conducted much of their work collaboratively (most often with teachers and school staff).

Similarly, the first two steps of the European standard process are identifying objectives and stakeholders (what does the program hope to achieve, who benefits, and who pays). They are to be implemented collaboratively with stakeholders (funders and program staff chief among them) with an explicit commitment to serving the interests of society more broadly.

Tyler’s third and fourth steps were defining outcomes in terms of behavior and identifying how and where the behaviors could be observed. The word behavior was trendy in Tyler’s day. What he meant was developing a way to observe or quantify outcomes. This is precisely setting relevant measures, the third step of the new European standard process.

Tyler’s fifth and sixth steps were selecting, trying, proving, and improving measures as they function in the evaluation. Today we would call this piloting, validation, and implementation. The corresponding step in the standard is measure, validate and value, only the last of these falling outside the scope of Tyler’s procedure.

Tyler concluded his procedure with interpreting results, which for him included analysis, reporting, and working with stakeholders to facilitate the effective use of results. The new European standard process concludes in much the same way, with reporting results, learning from them, and using them to improve the program.

How are impacts estimated?

Traditional impact evaluation defines an impact as the difference in potential outcomes—the outcomes participants realized with the program compared to the outcomes they would have realized without the program.

It is impossible to observe both of these mutually exclusive conditions at the same time. Thus, all research designs can be thought of as hacks, some more elegant than others, that allow us to approximate one condition while observing the other.

The European standard takes a similar view of impacts and describes a good research design as one that takes the following into account:

  • attribution,the extent to which the program, as opposed to other programs or factors, caused the outcomes;
  • deadweight, outcomes that, in the absence of the program, would have been realized anyway;
  • drop-off, the tendency of impacts to diminish over time; and
  • displacement, the extent to which outcomes realized by program participants prevent others from realizing those outcomes (for example, when participants of a job training program find employment, it reduces the number of open jobs and as a result may make it more difficult for non-participants to find employment).

For any given evaluation, many research designs may meet the above criteria, some with the potential to provide more credible findings than others.

However, impact analysts may not be free to choose the research design with the potential to provide the most credible results. According to the standard, the cost and complexity of the design must be proportionate to the size, scope, cost, potential risks, and potential benefits of the program being evaluated. In other words, impact analysts must make a difficult tradeoff between credibility and feasibility.

How well are analysts making the tradeoff between credibility and feasibility?

At the recent Canadian Evaluation Society Conference, my colleagues Cristina Tangonan, Anna Fagergren (not pictured), and I addressed this question. We described the potential weaknesses of research designs used in impact measurement generally and Social Return on Investment (SROI) analyses specifically. Our work is based on a review of publicly available SROI reports (to date, 107 of 156 identified reports) and theoretical work on the statistical properties of the estimates produced.

ces_2014_tangonan_gargani_evalblogAt the CES 2014 conference.

What we have found so far leads us to question whether the credibility-feasibility tradeoffs are being made in ways that adequately support the purposes of SROI analyses and other forms of impact measurement.

One design that we discussed starts with measuring the outcome realized by program participants. For example, how many participants of a job training program found employment, or the test scores realized by students who were enrolled in a new education program. Sometimes impact analysts will measure the outcome as a pre-program/post-program difference, often they measure the post-program outcome level on its own.

Once the outcome measure is in hand, impact analysts adjust it for attribution, deadweight, drop-off, and displacement by subtracting some amount or percentage for each potential bias. The adjustments may be based on interviews with past participants, prior academic or policy research, or sensitivity analysis. Rarely are they based on comparison or control groups constructed for the evaluation. The resulting adjusted outcome measure is taken as the impact estimate.

This is an example of a high-feasibility, low-credibility design. Is it good enough for the purposes that impact analysts have in mind? Perhaps, but I’m skeptical. There is a century of systematic research on estimating impacts—why didn’t this method, which is much more feasible than many alternatives, become a standard part of evaluation practice decades before? I believe it is because the credibility of the design (or more accurately, the results it can produce) is considered too low for most purposes.

From what I understand, this design–and others that are similar–would meet the European standard. That leads me to question whether the new standard has set the bar too low, unduly favoring feasibility over credibility.

What is valuation?

In the US, I believe we do far less valuation than is currently being done in Europe and Canada. Valuation expresses the value (importance) of impacts in monetary units (a measure of importance).

If the outcome, for example, were earned income, then valuation would entail estimating an impact as we usually would. If the outcome were health, happiness, or well-being, valuation would be more complicated. In this case, we would need to translate non-monetary units to monetary units in a way that accurately reflects the relative value of impacts to stakeholders. No easy feat.

In some cases, valuation may help us gauge whether the monetized value of a program’s impact is large enough to matter. It is difficult to defend spending $2,000 per participant of a job training program that, on average, results in additional earned income of $1,000 per participant. Participants would be better off if we gave $2,000 to each.

At other times, valuation may not be useful. For example, if one health program saves more lives than another, I don’t believe we need to value lives in dollars to judge their relative effectiveness.

Another concern is that valuation reduces the certainty of the final estimate (in monetary units) as compared to an impact estimate on its own (in its original units). That is a topic that I discussed at the CES conference, and will again at the conferences of the European Evaluation Society, Social Impact Analysts Association, and the American Evaluation Association .

There is more to this than I can hope to address here. In brief—the credibility of a valuation can never be greater than the credibility of the impact estimate upon which it is based. Call that Gargani’s Law.

If ensuring the feasibility of an evaluation results in impact estimates with low credibility (see above), we should think carefully before reducing credibility further by expressing the impact in monetary units.

Where do we go from here?

The European standard sets out to solve a problem that is intrinsic to our profession–stakeholders with different perspectives are constantly struggling to come to agreement about what makes an evaluation good enough for the purposes they have in mind. In the case of the new standard, I fear the bar may be set too low, tipping the balance in favor of feasibility over credibility.

That is, of course, speculation. But so too is believing the balance is right or that it is tipped in the other direction. What is needed is a program of research—research on evaluation—that helps us understand whether the tradeoffs we make bear the fruit we expect.

The lack of research on evaluation is a weak link in the chain of reasoning that makes our work matter in Europe, the US, and around the world. My colleagues and I are hoping to strengthen that link a little, but we need others to join us. I hope you will.


Filed under AEA Conference, Conference Blog, Evaluation, Evaluation Quality, Program Evaluation, Research

AfrEA Conference 2014 #2: Commitment, Community, and Change


The 2014 Conference of the African Evaluation Association (AfrEA) was just opened. Organizers delayed the start of the opening ceremony, however, as they waited for the arrival of officials from the government of Cameroon. Fifteen minutes. Thirty minutes. An hour. More.

This may sound like a problem, but it wasn’t—the unofficial conference had already begun. Participants from around the world were mixing, laughing, and learning. I met evaluators from Kenya, South Africa, Sri Lanka, Europe, and America. I learned about health programs, education systems, evaluation use in government, and the development of evaluation as a profession across the continent. It was a truly delightful delay.

And it reflects the mindset I am finding here—a strong belief that commitment and community can overcome circumstance.

: : : : : : : : : : : :

During the opening ceremony, the Former President of AGRA, Dr. Namanga Ngongi, stated that one of the greatest challenges facing development programs is finding enough qualified evaluators—those who not only  have technical skills, but also the ability to help organizations increase their impact.

Where will these much-needed evaluators come from?

Historically, many evaluators have come from outside of Africa. The current push for made-in-Africa evaluations promises to change that by training more African evaluators.

Evaluators are trained in many ways, chief among them university programs, professional mentoring, practical experience, and ongoing professional development. The CLEAR initiative—Centers for Learning on Evaluation and Results—is a new approach. With centers in Anglophone and Francophone Africa, CLEAR has set out to strengthen monitoring, evaluation, performance management, and evaluation use at the country level.

While much of CLEAR’s work is face-to-face, a great many organizations have made training material available on the web. One can now piece together free resources online—webinars, documents, videos, correspondence, and even one-on-one meetings with experts—that can result in highly contextualized learning. This is what many of the African evaluators I have met are telling me they are doing.

The US, Canada, Australia, and New Zealand appear to be leading exporters of evaluation content to Africa. Claremont Graduate University, Western Michigan University, the American Evaluation Association, the Canadian Government, and BetterEvaluation are some of the better-known sources.

What’s next? Perhaps consolidators who organize online and in-person content into high-quality curricula that are convenient, coherent, and comprehensive.

: : : : : : : : : : : :

Although the supply of evaluators may be limited in many parts of Africa, the demand for evaluation continues to increase. The history of evaluation in the US, Canada, and Europe suggests that demand grows when evaluation is required as a condition of funding or by law. From what I have seen, it appears that history is repeating itself in Africa. In large part this is due to the tremendous influence that funders from outside of Africa have.

An important exception is South Africa, where there government and evaluators work cooperatively to produce and use evaluations. I hope to learn more about this in the days to come.


Filed under Conference Blog, Evaluation, Gargani News, Program Evaluation

AfrEA Conference 2014 #1: What a Difference 32 Hours Makes


“Tell me again why you are going to Cameroon?” my wife asked. I paused, searching for an answer. New business? Not really, although that is always welcome. Old connections? I have very few among those currently working in Africa. What should I say? How could I explain?

I decided to confess.

“Because I am curious. There is something exciting going on across Africa. The African Evaluation Association—AfrEA—is playing a critical role. I want to learn more about it. Support it. Maybe be a part of it.”

She found that perfectly reasonable. I suppose that is why I married her.

Then she asked more questions about the conference and how my work might be useful to practitioners in that part of the world. As it turns out, she was curious, too. I believe many are, especially evaluation practitioners.

It takes a certain irrational obsessiveness, however, to fly 32 hours because you are curious.

For those not yet prepared to follow their curiosity to such lengths, I will be blogging about the AfrEA Conference over the next week.

You can find guest posts about the previous AfrEA conference in Ghana two years ago here, here, here, and here.

Check back here for the latest conference news from Youndé, Cameroon.


Leave a comment

Filed under Conference Blog, Evaluation, Gargani News, Program Evaluation

On the Ground at AEA #2: What Participants Had to Say

Are you suffering from “post-parting depression” now that the conference of the American Evaluation Association has ended? Maybe this will help–a sampling of the professionals who attended the conference, along with their thoughts on the experience.  Special thanks to Anna Fagergren who collected most of these photos and quotes.


Stefany Tobel Ramos, City Year

This is my first time here and I really enjoyed the professional development workshop Evaluation-Specific Methodology. I learned a lot and have new ideas about how to get a sense of students as a whole.


Jonathan Karanja, Independent Consultant with Nielsen, Kenya

This is my first time here and Nielsen is trying to get into the evaluation space, because that is what our clients want. The conference is a little overwhelming but I have a strategy – go to the not technically demanding, easy-to-digest sessions. Baby steps. I want to ensure that our company learns to not just apply market research techniques but to actually do evaluation.

george_julnes_aea_2013_evalblogGeorge Julnes, University of Baltimore

When I attend AEA, I get to present to enthusiastic groups of evaluation professionals. It makes me feel like a rock star for a week. Then I go home and do the dishes.

linda_pursley_aea_2013_evalblogLinda Pursley, Lesley University

I’m returning to the conference after some years away—it’s great to renew contact with acquaintances and colleagues. I am struck by the conference’s growth and the huge diversity of TIGs (topical interest groups), and I’m finding a lot of sessions of interest.


Pieta Blakely, Commonwealth Corporation

It’s my first time here and it’s a little overwhelming. I’m getting to know what I don’t know. But it’s also really exciting to see people working on youth engagement because I’m really interested in that.

linda_stern_aea_2103_evalblogLinda Stern, National Democratic Institute

I’ve been coming for many years, and I really like the two professional development workshops I took—Sampling and Empowerment Evaluation Strategies—and how they helped guide my way through the greater conference program.

DSC02841Carsten Strømbæk Pedersen, National Board of Social Services, Denmark

John, I really like your blog. You have…how do you say it in English?…a twisted mind. I really like that.

Aske Graulund, National Board of Social Services, Denmark

Nina Middelboe, Oxford Research AS, Denmark

[nods of agreement]

No greater compliment, Carsten!  And my compliments to all 3,500 professionals who participated in the conference.


Filed under AEA Conference, Conference Blog, Evaluation, Program Evaluation

On the Recursive Nature of Recursion: Reflections on the AEA 2013 Conference

John Gargani Not Blogging

Recursion is when your local bookstore opens a café inside the store in order to attract more readers, and then the café opens a bookstore inside itself to attract more coffee drinkers.

Chris Lysy at noticed, laughed at, and illustrated (above) the same phenomenon as it relates to my blogging (or rather lack of it) during the American Evaluation Association Conference last week.

I intended to harness the power of recursion by blogging about blogging at the conference. I reckoned that would nudge a few others to blog at the conference, which in turn would nudge me to do the same.

I ended up blogging very little during those hectic days, and none of it was about blogging at the conference. Giving up on that idea, I landed on blogging about not blogging, then not blogging about not blogging, then blogging about not blogging about not blogging, and so on.

Once Chris opened my eyes to the recursive nature of recursion, I noticed it all around me at the conference.

roe_aea_2013_evalblogFor example, the Research on Evaluation TIG (Topical Interest Group) discussed using evaluation methods to evaluate how we evaluate. Is that merely academic navel gazing? It isn’t. I would argue that it may be the most important area of evaluation today.

As practitioners, we conduct evaluations because we believe they can make a positive impact in the world, and we choose how to evaluate in ways we believe produce the greatest impact. Ironically, we have little evidence upon which to base our choices. We rarely measure our own impact or study how we can best achieve it.

ROE (research on evaluation, for those in the know) is setting that right. And the growing community of ROE researchers and practitioners is attempting to do so in an organized fashion. I find it quite inspiring.

A great example of ROE and the power of recursion is the work of Tom Cook and his colleagues (chief among them Will Shadish).tom_cook_aea_2103_evalblogI must confess that Tom is a hero of mine. A wonderful person who finds tremendous joy in his work and shares that joy with others. So I can’t help but smile every time I think of him using experimental and quasi-experimental methods to evaluate experimental and quasi-experimental methods.

Experiments and quasi-experiments follow the same general logic. Create two (or more) comparable groups of people (or whatever may be of interest). Provide one experience to one group and a different experience to the other. Measure outcomes of interest for the two groups at the end of their experiences. Given that, differences in outcomes between the groups are attributable to differences in the experiences of the groups.

If on group received a program and the other did not, you have a very strong method for estimating program impacts. If on group received a program designed one way and the other a program designed another way, you have a strong basis for choosing between program designs.

Experiments and quasi-experiments differ principally in how they create comparable groups. Experiments assign people to groups at random. In essence, names are pulled from a hat (in reality, computers select names at random from a list). This yields two highly comparable but artificially constructed groups.

Quasi-experiments typically operate by allowing people to choose experiences as they do in everyday life. This yields naturally constructed groups that are less comparable. Why are they less comparable? The groups are comprised of people who made difference choices, and these choice may be associated with other factors that affect outcomes. The good news is that the groups can be made more comparable–to some degree–by using a variety of statistical methods.

four_arm_study_aea_2013_evalblogIs one approach better than another? At the AEA Conference, Tom described his involvement with efforts to answer that question. One way that is done is by randomly assigning people to two groups–one group that will be part of an experiment or another group that will be part of a quasi-experiment (referred to as an observational study in the picture above). Within the experimental group, participants are randomly assigned to either a treatment group (e.g., math training) or control group (vocabulary training). Within the quasi-experimental group, participants choose between the same two experiences, forming treatment and comparison groups according to their preference.

Program impact estimates are compared for the experimental and quasi-experimental groups. Differences at this level are attributable to the evaluation method and can indicate whether one method is biased with respect to the other. So far, there seems to be pretty good agreement between the methods (when implemented well–no small achievement), but much work remains to be done.


Perhaps the most important form of recursion at the AEA Conference is membership. AEA is comprised of members who manage themselves by forming groups of members who manage themselves by forming groups of members who manage themselves. The board of AEA, TIGs, local affiliates, task forces, working groups, volunteer committees, and conference sessions are all organized by and comprised of groups of members who manage themselves. That is power of recursion–3,500 strangers coming together to create a community dedicated to making the world a better place. And what a joy to watch them pull it off.


Filed under AEA Conference, Conference Blog, Evaluation, Program Evaluation, Research

On the Ground at AEA #1: Tina and Rodney


Rodney Hopson, Professor, George Mason University (Past President of AEA)

I’m plotting.  I’m always plotting. That’s how you make change in the world. You find the opportunities, great people to work with, and make things happen.

Tina Christie, Professor, UCLA

I’ve just finished three years on the AEA board with Rodney. The chance to connect with colleagues like Rodney–work with them, debate with them, laugh with them–is something I look forward to each year. It quickly starts to feel like family.

Leave a comment

Filed under AEA Conference, Conference Blog, Evaluation, Program Evaluation

Confessions of a Conference Junkie

evalblog_conference_junkieIt’s true—I am addicted to conferences. While I read about evaluation, write about evaluation, and do evaluations in my day-to-day professional life, it’s not enough. To truly connect to the field and its swelling ranks of practitioners, researchers, and supporters, I need to attend conferences. Compulsively. Enthusiastically. Constantly.

Over the past few months, I was honored to be the keynote speaker at the Canadian Evaluation Society conference in Toronto and the Danish Evaluation Society in Kolding. Over the past two years I have been from Helsinki to Honolulu to speak, present, and give workshops. The figure below shows some of that travel (conferences indicated with darker circles, upcoming travel with dashed lines).


But today is special—it’s the first day of the American Evaluation Association conference in Washington, DC. If conferences were cities, this one would be New York—big, vibrant, and international.


And this year, in addition to my presentations, receptions, and workshops (here and here), I will attempt to do something I have never done before—blog from the conference.

EvalBlog has been quiet this summer. Time to make a little digital noise.

Leave a comment

Filed under AEA Conference, Conference Blog, Design, Evaluation, Program Design, Program Evaluation

Evaluation Across Boundaries—Literally and Metaphorically


Cross-posted at the Canadian Evaluation Society Conference website.


A few weeks ago, the New York Times reported that the United States Department of Homeland Security was in the midst of an evaluation failure. Since 2010, the Department has been struggling to develop a measure of border security that would help Congress evaluate and improve immigration policies. Senior officials reported to Congress that the Department “had not completed the new measurements and were not likely to in coming months.” This could delay comprehensive immigration reform legislation, which would have vast political, legal, economic, and social consequences.

The State of Modern Evaluation Practice

This is a cautionary tale of the state of modern evaluation practice. It represents a situation in which stakeholders believe that evaluation can improve social change efforts—immigration policies that almost all stakeholders consider flawed—yet evaluation has not. The reasons are complex, touching on long-discussed themes of use, politics, stakeholder inclusion, and methods. However, an important consequence has not been widely discussed—whether in the face of evaluation failure stakeholders will continue to believe that evaluation can improve society for the better.

A Shared Belief

If there is one thing that holds evaluators together as a community it is our shared belief that our work matters. This is more than a belief in the importance of evaluation use. It is a belief about impact. Our impact. We are willing to believe in the impact of our work in the absence of evidence. Should we expect others to do the same? We should not. Nor should we stop believing. We should respond by adapting our practice in ways that are more likely to achieve impact and demonstrate that we have. I call this the new practice of evaluation, and it is emerging in exciting ways in unexpected places.

Shaping Evaluation for the Future

When I give my keynote at the Canadian Evaluation Society Conference (June 9-12), I will be discussing the new practice of evaluation. The conference theme is Evaluation Across Boundaries, a metaphorical hook that is literally what the new practice of evaluation is advancing—evaluators crossing boundaries to become change makers, program designers, and market engineers. I will describe:

  • how this new practice is taking form
  • how it is disrupting evaluation practice today, and
  • how it may shape evaluation practice in the future.

These are principally undirected efforts. Should we—collectively as a profession and individually as practitioners—attempt to influence them? If so how? To what end?

Be Part of the Discussion

I cannot claim to have the answers to these questions. But I want you to be a part of the discussion. Join us in Toronto, let your voice be heard, and help define what evaluation practice will be.

What, Where & When

My keynote address “The New Practice of Evaluation: Crossing Boundaries, Creating Change” on Wednesday June 12, 2013 at 8.30 am, directly following the Thematic Breakfast.

Leave a comment

Filed under Conference Blog, Evaluation, Program Evaluation

Conference Blog: Catapult Labs 2012

Did you miss the Catapult Labs conference on May 19?  Then you missed something extraordinary.

But don’t worry, you can get the recap here.

The event was sponsored by Catapult Design, a nonprofit firm in San Francisco that uses the process and products of design to alleviate poverty in marginalized communities.  Their work spans the worlds of development, mechanical engineering, ethnography, product design, and evaluation.

That is really, really cool.

I find them remarkable and their approach refreshing.  Even more so because they are not alone.  The conference was very well attended by diverse professionals—from government, the nonprofit sector, the for-profit sector, and design—all doing similar work.

The day was divided into three sets of three concurrent sessions, each presented as hands-on labs.  So, sadly, I could attend only one third of what was on offer.  My apologies to those who presented and are not included here.

I started the day by attending Democratizing Design: Co-creating With Your Users presented by Catapult’s Heather Fleming.  It provided an overview of techniques designers use to include stakeholders in the design process.

Evaluators go to great lengths to include stakeholders.  We have broad, well-established approaches such as empowerment evaluation and participatory evaluation.  But the techniques designers use are largely unknown to evaluators.  I believe there is a great deal we can learn from designers in this area.

An example is games.  Heather organized a game in which we used beans as money.  Players chose which crops to plant, each with its own associated cost, risk profile, and potential return.  The expected payoff varied by gender, which was arbitrarily assigned to players.  After a few rounds the problem was clear—higher costs, lower returns, and greater risks for women increased their chances of financial ruin, and this had negative consequences for communities.

I believe that evaluators could put games to good use.  Describing a social problem as a game requires stakeholders to express their cause-and-effect assumptions about the problem.  Playing with a group allows others to understand those assumptions intimately, comment upon them, and offer suggestions about how to solve the problem within the rules of the game (or perhaps change the rules to make the problem solvable).

I have never met a group of people who were more sincere in their pursuit of positive change.  And honest in their struggle to evaluate their impact.  I believe that impact evaluation is an area where evaluators have something valuable to share with designers.

That was the purpose of my workshop Measuring Social Impact: How to Integrate Evaluation & Design.  I presented a number of techniques and tools we use at Gargani + Company to design and evaluate programs.  They are part of a more comprehensive program design approach that Stewart Donaldson and I will be sharing this summer and fall in workshops and publications (details to follow).

The hands-on format of the lab made for a great experience.  I was able to watch participants work through the real-world design problems that I posed.  And I was encouraged by how quickly they were able to use the tools and techniques I presented to find creative solutions.

That made my task of providing feedback on their designs a joy.  We shared a common conceptual framework and were able to speak a common language.  Given the abstract nature of social impact, I was very impressed with that—and their designs—after less than 90 minutes of interaction.

I wrapped up the conference by attending Three Cups, Rosa Parks, and the Polar Bear: Telling Stories that Work presented by Melanie Moore Kubo and Michaela Leslie-Rule from See Change.  They use stories as a vehicle for conducting (primarily) qualitative evaluations.  They call it story science.  A nifty idea.

I liked this session for two reasons.  First, Melanie and Michaela are expressive storytellers, so it was great fun listening to them speak.  Second, they posed a simple question—Is this story true?—that turns out to be amazingly complex.

We summarize, simplify, and translate meaning all the time.  Those of us who undertake (primarily) quantitative evaluations agonize over this because our standards for interpreting evidence are relatively clear but our standards for judging the quality of evidence are not.

For example, imagine that we perform a t-test to estimate a program’s impact.  The t-test indicates that the impact is positive, meaningfully large, and statistically significant.  We know how to interpret this result and what story we should tell—there is strong evidence that the program is effective.

But what if the outcome measure was not well aligned with the program’s activities? Or there were many cases with missing data?  Would our story still be true?  There is little consensus on where to draw the line between truth and fiction when quantitative evidence is flawed.

As Melanie and Michaela pointed out, it is critical that we strive to tell stories that are true, but equally important to understand and communicate our standards for truth.  Amen to that.

The icing on the cake was the conference evaluation.  Perhaps the best conference evaluation I have come across.

Everyone received four post-it notes, each a different color.  As a group, we were given a question to answer on a post-it of a particular color, and only a minute to answer the question.  Immediately afterward, the post-its were collected and displayed for all to view, as one would view art in a gallery.

Evaluation as art—I like that.  Immediate.  Intimate.  Transparent.

Gosh, I like designers.


Filed under Conference Blog, Design, Evaluation, Program Design, Program Evaluation